We are looking for a Senior Cloud & Reliability Engineer to drive the scalability, reliability, and automation of our cloud infrastructure. This role blends DevOps and SRE practices, ensuring high availability, performance, and security of large-scale cloud-based systems.
#LI-DNI
Responsibilities
- Design, implement, and maintain cloud infrastructure on Microsoft Azure
- Develop Infrastructure as Code (IaC) using Terraform for scalable and repeatable deployments
- Build and optimize Kubernetes-based containerized services, ensuring both development and operational excellence
- Configure and manage databases (MongoDB, PostgreSQL, Cassandra) and optimize performance
- Manage message bus systems like RabbitMQ, Kafka, ensuring seamless event-driven architecture
- Implement and maintain observability tools (Elastic, Grafana, Prometheus, Loki, OpenTelemetry) for monitoring, logging, and tracing
- Enhance system resilience, automation, and scalability using DevOps & SRE best practices
- Ensure security, performance tuning, and troubleshooting of cloud-based services
- Collaborate with development teams to improve CI/CD pipelines, automation, and infrastructure reliability
Want more jobs like this?
Get jobs in Kuala Lumpur, Malaysia delivered to your inbox every week.
- 6+ years of hands-on experience in cloud infrastructure and large-scale cloud-based system operations (Azure preferred)
- Strong Kubernetes expertise in both development and operations
- Expertise in Terraform for infrastructure automation
- Experience in database configuration & maintenance, especially MongoDB, PostgreSQL, Cassandra
- Experience with messaging systems such as RabbitMQ, Kafka
- Familiarity with gateways like Kong, Nginx
- Experience with observability solutions (Elastic, Grafana, Prometheus, Loki, OpenTelemetry)
- Solid software development background with at least one object-oriented programming language
- Strong problem-solving skills, proactive mindset, and ability to work under pressure
- Good English communication skills to collaborate with global teams
- By choosing EPAM, you're getting a job at one of the most loved workplaces according to Newsweek 2021 & 2022&2023
- Employee ideas are the main driver of our business. We have a very supportive environment where your voice matters
- You will be challenged while working side-by-side with the best talent globally. We work with top-notch technologies, constantly seeking new industry trends and best practices
- We offer a transparent career path and an individual roadmap to engineer your future & accelerate your journey
- At EPAM, you can find vast opportunities for self-development: online courses and libraries, mentoring programs, partial grants of certification, and experience exchange with colleagues around the world. You will learn, contribute, and grow with us
- EPAM is a leader in the fastest-growing segment (product development/digital platform engineering) of the IT industry. We acquired Just-BI in 2021 to reinforce our leading position as a global Business Intelligence services provider and have been growing rapidly. With a talented multinational team, we provide data and analytics expertise
- We are currently involved in end-to-end BI design and implementation projects in major national and international companies. We are proud of our entrepreneurial start-up culture and are focused on investing in people by creating continuous learning and development opportunities for our employees who deliver engineering excellence for our clients