Skip to main contentA logo with &quat;the muse&quat; in dark blue text.
NetApp

Mgr, Site Reliability Engineer

United States

Job Summary

The Site Reliability Engineering (SRE) Manager will lead a dynamic team responsible for ensuring our critical systems' reliability, performance, and efficiency. This role involves a strategic blend of engineering and operations and requires a strong background in software development, systems engineering, and leadership. This is a pivotal role in our operations, demanding a dedicated individual who excels in a fast-paced and collaborative environment. We invite you to apply if you are driven by system reliability and ready to lead a high-performing team.

Job Responsibilities

Lead and mentor a team of SREs, fostering a culture of continuous improvement and innovation.

Collaborate with product and engineering teams to design and implement scalable solutions.

Want more jobs like this?

Get Software Engineering jobs in United States delivered to your inbox every week.

By signing up, you agree to our Terms of Service & Privacy Policy.


Develop and maintain a reliable monitoring and alerting system to detect and mitigate issues proactively.

Drive incident management processes and conduct post-mortem analyses to prevent future outages.

Manage priorities, projects, and the overall workflow of the SRE team.

Ensure compliance with security best practices and company policies.

Stay ahead of industry trends and emerging technologies to continuously improve system reliability and performance.

Job Requirements

Bachelor's degree in computer science, Engineering, or a related field; Master's preferred.

Minimum of 7 years of experience in SRE, DevOps, or similar roles, with at least 3 years in a leadership position.

Experience leading geographically dispersed teams.

Proficiency in programming languages such as Python, Go, or Java.

Extensive experience with cloud services (AWS, GCP, Azure) and container orchestration tools (Kubernetes, Docker).

Solid understanding of CI/CD pipelines and automation tools (Jenkins, Ansible, Terraform).

Exceptional knowledge of observability tools and setting up architecture for proactive monitoring of the product.

Proven track record of designing and implementing scalable, high-availability systems.

Exceptional problem-solving skills and the ability to work under pressure.

Excellent communication and team-building skills.

Job Segment: Cloud, Developer, Java, Computer Science, Systems Engineer, Technology, Engineering

Client-provided location(s): United States
Job ID: netapp-1174313900
Employment Type: Other