Skip to main contentA logo with &quat;the muse&quat; in dark blue text.

Site Reliability Engineer

AT Bank of America
Bank of America

Site Reliability Engineer

Plano, TX

Job Description:

About us:

At Bank of America, we are guided by a common purpose to help make financial lives better through the power of every connection. Responsible Growth is how we run our company and how we deliver for our clients, teammates, communities and shareholders every day.

One of the keys to driving Responsible Growth is being a great place to work for our teammates around the world. We're devoted to being a diverse and inclusive workplace for everyone. We hire individuals with a broad range of backgrounds and experiences and invest heavily in our teammates and their families by offering competitive benefits to support their physical, emotional, and financial well-being.

Bank of America believes both in the importance of working together and offering flexibility to our employees. We use a multi-faceted approach for flexibility, depending on the various roles in our organization.

Want more jobs like this?

Get Software Engineering jobs in Plano, TX delivered to your inbox every week.

By signing up, you agree to our Terms of Service & Privacy Policy.


Working at Bank of America will give you a great career with opportunities to learn, grow and make an impact, along with the power to make a difference. Join us!

Job Description:

This job is responsible for partnering with engineering and technology teams to implement measures as prescribed by lead/senior SRE engineers. Key responsibilities include ensuring appropriate instrumentation, tooling, ticketing, alerting and on call routines are in place for key services, identifying root causes of issues through production triage efforts, and suggesting code enhancements to technology teams to automate services and improve reliability and efficiency. Job expectations include using software development skills to improve efficiency and to address gaps in reliability.

Overview:

This position is for a Site Reliability Engineer (SRE) who provides 24x7 application support for Crowdstrike Falcon on Linux and Windows operating systems. The candidate should also have experience in diagnosing performance related issues and escalating them to a third-party vendor for review and remediation. It is preferable that the candidate have experience in a large corporation and 5+ years of experience in supporting enterprise level applications. This role also requires working with other enterprise level business and administrative groups and being able to communicate (spoken/written) effectively.

Responsibilities:

  • Develops and maintains reliability scripts, tools and libraries and leverages them for common instrumentation, automation, and operational needs, and when mentoring Site Reliability Engineer (SRE) resources on reliability practices and established tools/capabilities
  • Collaborates with Development and Infrastructure teams to understand technical solutions and implement monitoring capabilities outlined in the application and system monitoring designs put forward by the SRE Lead
  • Partners to implement code changes to make use of common reliability libraries and tools and helps Application Production Services and Application Development teammates understand how to use them
  • Identifies vulnerabilities and opportunities for reliability improvement, such as investigating low level error rates and 'noise' in monitoring, and defines solutions to reduce manual support effort and/or improve system reliability
  • Engages as a subject matter expert in major incident triage efforts and failure scenario modelling and diagnosis with Problem Manager root causes for major incident/problem management investigations
  • Participates regularly in an on-call rotation with Production Support teammates to learn more about reliability issues affecting their portfolio

Required Qualifications:

  • 5+ years of experience in supporting Crowdstrike Falcon and other enterprise security scanning and patching solutions
  • Experience with enterprise monitoring and reporting tools and providing 24x7x365 support
  • Ability to work with product managers and development leads to design, build and maintain enterprise solutions
  • Proficient in Linux & Windows

Desired Qualifications:

  • Development languages (Java, Python)
  • Additional security scanning and patching solutions (Bladelogic, Microsoft SCCM, Tanium, BMC Atrium Orchestrator)
  • Remedy ITSM
  • Ansible Tower, Bladelogic, BMC Atrium Orchestrator
  • Monitoring Tools (Tivoli ITM, Sitescope, Dynatrace)

Skills:

  • Analytical Thinking
  • Automation
  • Collaboration
  • Production Support
  • Result Orientation
  • Application Development
  • Architecture
  • Influence
  • Project Management
  • Solution Design
  • Adaptability
  • DevOps Practices
  • Risk Management
  • Solution Delivery Process
  • Stakeholder Management

Shift:
1st shift (United States of America)

Hours Per Week:
40

Client-provided location(s): Plano, TX, USA
Job ID: BankOfAmerica-JR-24040332
Employment Type: Full Time

Perks and Benefits

  • Health and Wellness

    • FSA
    • HSA
    • On-Site Gym
    • Health Insurance
    • Dental Insurance
    • Vision Insurance
    • Life Insurance
  • Parental Benefits

    • Non-Birth Parent or Paternity Leave
    • Birth Parent or Maternity Leave
  • Vacation and Time Off

    • Leave of Absence
    • Personal/Sick Days
    • Paid Holidays
    • Paid Vacation
    • Sabbatical
  • Financial and Retirement

    • Performance Bonus
    • Company Equity
    • 401(K) With Company Matching
  • Professional Development

    • Promote From Within
    • Mentor Program
    • Access to Online Courses
    • Lunch and Learns
    • Tuition Reimbursement
  • Diversity and Inclusion

    • Diversity, Equity, and Inclusion Program