Acquia empowers the world's most ambitious brands to create digital customer experiences that matter. With open source Drupa l at its core, the Acquia Digital Experience Platform (DXP) enables marketers, developers, and IT operations teams at thousands of global organizations to rapidly compose and deploy digital products and services that engage customers, enhance conversions, and help businesses stand out.

Headquartered in the U.S., Acquia is positioned as a market leader by the analyst community and is listed as one of the world's top software companies by The Software Report. We are Acquia. We are a global company with employees located in more than 30 countries, and we're building for the future. We want you to be a part of it!

Want more jobs like this?

Get jobs that are Remote delivered to your inbox every week.

By signing up, you agree to our Terms of Service & Privacy Policy.

About the role:

As a Staff Site Reliability Engineer, you will be a key player in designing, implementing, and maintaining our CI/CD pipelines, cloud infrastructure, and monitoring solutions. Your expertise in tools like ArgoCD, Kubernetes, and cloud-native architecture will help us achieve operational excellence at scale. You will work closely with engineering teams to ensure they have the right infrastructure in place to deploy rapidly, safely, and reliably.

This is a hands-on role for someone who thrives in an environment where automation is the goal, reliability is the baseline, and scalability is second nature. You won't just be maintaining systems-you'll be innovating, designing new ways to make our infrastructure smarter and our development faster.

Job Responsibilities:

CI/CD Pipeline Mastery: Design, build, and optimize continuous integration and continuous deployment (CI/CD) pipelines using ArgoCD, Jenkins, or similar tools. Ensure zero-downtime, fully automated deployment pipelines.
Infrastructure as Code (IaC): Build and manage scalable, reliable infrastructure using Terraform, Kubernetes, and other IaC tools. Ensure everything is automated-from deployments to monitoring-so that infrastructure becomes a self-service platform.
Cloud Expertise: Architect and manage cloud environments (AWS, GCP, or Azure), focusing on cost optimization, scalability, and performance. Implement disaster recovery, fault tolerance, and high availability strategies.
Monitoring and Alerting: Implement comprehensive monitoring solutions using Prometheus, Grafana, ELK, and Datadog to detect and resolve performance bottlenecks before they impact customers. Design and implement automated alerts for proactive system health monitoring.
DevOps Advocacy: Champion the culture of DevOps across teams-promote best practices, encourage adoption of new technologies, and drive a continuous learning mindset within the engineering teams. Be the go-to person for CI/CD, infrastructure scaling, and deployment automation.
SRE Mindset: Focus on building systems that are resilient by design, automating processes that improve reliability, and implementing Service Level Objectives (SLOs) to align engineering efforts with operational goals.
Security-First Approach: Collaborate with security teams to implement robust security practices, from container security to infrastructure hardening. Automate security checks within the pipeline for compliance and vulnerability management.
Collaboration with Engineering Teams: Work hand-in-hand with product development teams to understand their needs, integrate CI/CD practices into their workflows, and provide a fast, reliable, and secure path from code to production.

Skills:

BS in Computer Science or a comparable field of study, or equivalent practical experience.
Experience working with one or more of: Go, Python, Ruby, PHP, Java or Javascript.
Experience with Unix/Linux systems administration using the CLI.
Fundamental understanding of TCP/UDP networking concepts
Solid oral and written communications skills.
CI/CD Expertise: Extensive hands-on experience with CI/CD tools such as ArgoCD, Jenkins, CircleCI, or GitLab CI. Ability to design and implement pipelines that ensure rapid, reliable deployments.
Kubernetes Guru: Strong understanding and experience with Kubernetes, Helm, and container orchestration. Ability to scale and manage microservices in production.
Cloud Mastery: Proficient in at least one major cloud provider-AWS, GCP, or Azure. Experience with multi-cloud or hybrid-cloud architecture is a plus.
IaC Champion: Proficiency in Terraform, Ansible, or CloudFormation to manage infrastructure as code. Familiarity with GitOps workflows and version-controlled infrastructure.
Monitoring & Observability: Strong experience with monitoring tools like Prometheus, Grafana, Datadog, ELK, or New Relic. Ability to build custom dashboards and alerting systems.
Security-Focused: Deep understanding of security best practices in DevOps, including container security, CI/CD pipeline security, and cloud infrastructure hardening.
Problem Solver: Excellent troubleshooting skills with the ability to diagnose issues across a variety of environments, from code to infrastructure.
Collaboration Skills: Ability to work effectively in cross-functional teams, influencing peers and driving adoption of best practices across the organization.

Preferred Qualifications:

8-13 years of hands-on experience as a DevOps Engineer, SRE, or related role in a cloud-native environment.
Proven experience mentoring junior team-members.
Deep knowledge of CI/CD pipelines, especially using ArgoCD or similar tools.
Proven expertise in cloud platforms (AWS, GCP, Azure), with experience building and managing scalable, reliable infrastructure.
Strong coding skills in Python, Go, or Ruby.
Experience with service mesh architectures like Istio or Linkerd is a plus.
SRE Certification (or equivalent experience) is a bonus.
Certified Kubernetes Administrator (CKA) is preferred.
A passion for automation, observability, and reliability.

All qualified applicants will receive consideration for employment without regard to race, color, religion, religious creed, sex, national origin, ancestry, age, physical or mental disability, medical condition, genetic information, military and veteran status, marital status, pregnancy, gender, gender expression, gender identity, sexual orientation, or any other characteristic protected by local law, regulation, or ordinance.

Staff Site Reliability Engineer

Want more jobs like this?

Search Additional Jobs