We are seeking a highly skilled Site Reliability Engineer with 3 years of experience to join our dynamic team. The ideal candidate will have a strong background in cloud technologies, with a focus on designing, implementing, and managing cloud-based solutions. As a Site Reliability Engineer, you will play a key role in ensuring the availability, performance, and security of our cloud infrastructure.
In this role you will:
Lead the day-to-day technical operations, providing the highest levels of availability, reliability, and scalability of the services.Implement best practices for cloud security, including identity and access management, encryption, and network security.Provide technical expertise to handle customer escalations and ensure stability in customer environments.Conduct performance analysis and lead monitoring initiatives on multiple hosted products/platforms.Maintain operational run book procedures for all production systems and document the knowledge base.Administer incident management activities (detection, recording, classification, and closure) and provide timely escalations and notifications as required by procedure.Participate in on-call rotation to respond to cloud-related incidents and emergencies.Troubleshoot and resolve complex technical issues in a timely manner.Monitor and optimize cloud infrastructure for performance, cost, and security.Collaborate with cross-functional teams to troubleshoot and resolve complex cloud-related issues.Mentor junior team members and provide technical guidance and support.Want more jobs like this?
Get Software Engineering jobs in Pleasanton, CA delivered to your inbox every week.
By signing up, you agree to our Terms of Service & Privacy Policy.
You've got what it takes if you have:
Minimum bachelor's degree in computer science, engineering, or a related field, or equivalent experience.2-4+ years of experience in cloud operations.Comprehensive understanding of cloud computing principles and architectures.Extensive experience in Linux/Unix environments.Proficiency in containerization technologies like Docker and Kubernetes.Proficiency in Terraform and Helm.Strong scripting skills in Python or Bash.Proficient in debugging and optimizing AWS services.Hands-on experience in managing and optimizing AWS managed services (Elastic Cache, IAM roles, S3 Buckets, Cloud Front, Lambda, etc).Experience with monitoring and logging tools such as ELK stack, Prometheus, Grafana.Sound knowledge of networking concepts, including TCP/IP, DNS, and VPN.Proficiency in automation and configuration management tools like Ansible, Jenkins, and Bitbucket.Strong communication and collaboration skills.Excellent troubleshooting and problem-solving skills.Client-provided location(s): Dublin, CA 94568, USA
Job ID: CornerstoneOnDemand-req10265
Employment Type: Other