EPAM is a leading global provider of digital platform engineering and development services. We are committed to having a positive impact on our customers, our employees, and our communities. We embrace a dynamic and inclusive culture. Here you will collaborate with multi-national teams, contribute to a myriad of innovative projects that deliver the most creative and cutting-edge solutions, and have an opportunity to continuously learn and grow. No matter where you are located, you will join a dedicated, creative, and diverse community that will help you discover your fullest potential.
We are seeking a highly skilled and experienced Lead Site Reliability Engineer with a focus on Azure environments to join our team.
In this crucial role, you will leverage your expertise to enhance the reliability and scalability of our cloud-based platforms, ensuring efficient operation and optimal performance. This position involves collaborating closely with cross-functional teams to migrate existing services to the OpenShift platform and make our infrastructure Cloud agnostic. As a leader, you'll guide your team in creating resilient systems and processes that support both internal and external customers relying on our desktop applications and services.
Want more jobs like this?
Get jobs in Pune, India delivered to your inbox every week.
#LI-DNI#EasyApply
Responsibilities
- Oversee migration of services to OpenShift and work towards making our infrastructure Cloud agnostic
- Run pipelines using Azure DevOps for environment configuration and application deployment
- Leverage Python, bash, and PowerShell to automate routine and complex tasks
- Implement and manage Kubernetes and container-based environments
- Monitor cloud resources efficiently and improve system performance in line with SLI metrics
- Debug and resolve operational issues swiftly and effectively
- Collaborate with development and operations teams to ensure system reliability and security
- Mentor team members and lead by example in maintaining best practices for site reliability
- Continuously assess, improve and optimize existing system architecture and applications
- Stay up-to-date with technological advancements and integrate innovative tools and techniques
- 5+ years of experience as a Systems Engineer with a development background
- 1+ years of relevant leadership experience
- Proficiency in Linux and Docker with hands-on experience in Kubernetes
- Capability to use at least one of the following scripting languages: Python, Bash, PowerShell
- Background in infrastructure management including networking and operating systems
- Familiarity with monitoring tools in cloud environments and understanding of SLI concepts
- Familiarity with Azure and/or GCP as cloud service providers
- Experience working with Windows
- Knowledge of CI/CD pipelines, particularly Azure DevOps
- Understanding of Istio and GitOps tools like ArgoCD
- Opportunity to work on technical challenges that may impact across geographies
- Vast opportunities for self-development: online university, knowledge sharing opportunities globally, learning opportunities through external certifications
- Opportunity to share your ideas on international platforms
- Sponsored Tech Talks & Hackathons
- Unlimited access to LinkedIn learning solutions
- Possibility to relocate to any EPAM office for short and long-term projects
- Focused individual development
- Benefit package:
- Health benefits
- Retirement benefits
- Paid time off
- Flexible benefits
- Forums to explore beyond work passion (CSR, photography, painting, sports, etc.)