Acquia empowers the world's most ambitious brands to create digital customer experiences that matter. With open source Drupal at its core, the Acquia Digital Experience Platform (DXP) enables marketers, developers, and IT operations teams at thousands of global organizations to rapidly compose and deploy digital products and services that engage customers, enhance conversions, and help businesses stand out.
Headquartered in the U.S., Acquia is a Great Place to Work-CertifiedTM company in India, is listed as one of the world's top software companies by The Software Report, and is positioned as a market leader by the analyst community. We are Acquia. We are building for the future and we want you to be a part of it!
The Opportunity
Want more jobs like this?
Get jobs that are Remote delivered to your inbox every week.
The Senior Site Reliability Engineer is responsible for designing and delivering secure and highly available solutions. You will be a critical part of a team focused on ensuring our services are ready and stress tested. You should be comfortable taking on new challenges, defining potential solutions and implementing designs in a team environment. You will be working on a tech stack composed of Linux, Kubernetes, Ruby, Go-lang, Python, pgSQL, MySQL, Redis, Jenkins, Github and GCP.
You'll Spend Time:
- Partnering closely with Engineering and Support.
- We are responsible for the deployment, and continuous operation of the Monsido platform.
- Making sure we automate as many tasks as possible to make diagnostics, scaling, healing and deployments a breeze.
- Working on a team responsible for a blend of architecture, automation, development, and application administration.
- Developing and deploy solutions from the infrastructure, to the network, and application layers, on public cloud platforms.
- Ensuring our SaaS platform is available and performing, and that we can notice problems before our customers.
- Collaborating with Support and Engineering on customer issues, as needed.
- Working with distributed data infrastructure, including containerization and virtualization tools, to enable unified engineering and production environments;
- Developing dashboards, monitors, and alerts to increase situational awareness of the state of our production issues/sla/security incidents.Independently conceiving and implementing ways to improve development efficiency, code reliability, and test fidelity.
- Participating in on-call rotation
You'll be Successful if You:
- Proficient with Unix/Linux OS administration (5-8 years)
- Proficient with computer network setup and debugging
- Proficient with at least one scripting language (Shell, Python, ...)
- Competent with deploying, tuning, and maintaining Linux-based, highly available, fault-tolerant platforms in public cloud providers such as GCP, AWS or Azure
- Competent with Kubernetes, like configuration management, running deployments , debugging etc.
- Competent with application containerization
- Basic understanding with SQL and relational database administration (PostgreSQL, MySQL)
- Basic understanding with configuration management like terraform, Saltstack etc.
- Flexible working in rotational On-Calls.
Requirements & Suggested Years of Experience:
- Build & release experience including delivery: +3 years
- Software Configuration Management tools like Puppet, Saltstack, Chef, Ansible : +2 years
- Application monitoring tools: +2 years
- Experience with Kubernetes and containerization +1 year
Extra credit:
- Best practices in infosec.
- The ability to dig deep into infrastructure and code to solve problems.
- The drive to solve traditional operations problems through automation.
- High attention to detail.
Individuals seeking employment at Acquia are considered without regard to race, color, religion, caste, creed, national origin, age, sex, marital status, ancestry, physical or mental disability, veteran status, gender identity, or sexual orientation. Whatever you answer will not be considered in the hiring process or thereafter.