EPAM is a leading global provider of digital platform engineering and development services. We are committed to having a positive impact on our customers, our employees, and our communities. We embrace a dynamic and inclusive culture. Here you will collaborate with multi-national teams, contribute to a myriad of innovative projects that deliver the most creative and cutting-edge solutions, and have an opportunity to continuously learn and grow. No matter where you are located, you will join a dedicated, creative, and diverse community that will help you discover your fullest potential.
We are seeking an experienced Cloud AIOps Architect to lead the design and implementation of advanced AI-driven operational systems across multi-cloud and hybrid cloud environments. This role demands a blend of technical expertise, innovation, and leadership to develop scalable solutions for complex IT systems with a focus on automation, machine learning, and operational efficiency.
Want more jobs like this?
Get Software Engineering jobs in Chennai, India delivered to your inbox every week.
#LI-DNI#EasyApply
Responsibilities
- Architect and design the AIOps solution leveraging AWS, Azure, and Cloud Agnostic services, ensuring portability and scalability
- Develop an end-to-end automated machine learning (ML) pipeline from data ingestion, DataOps, model training, to inference pipelines across multi-cloud environments
- Design hybrid architectures leveraging cloud-native services like Amazon SageMaker, Azure Machine Learning, and Kubernetes for development, model deployment, and orchestration
- Design and implement ChatOps integration, allowing users to interface with the platform through Slack, Microsoft Teams, or similar communication platforms
- Leverage Jupyter Notebooks in AWS SageMaker, Azure Machine Learning Studio, or cloud-agnostic environments to create model prototypes and experiment with datasets
- Lead the design of classification models and other ML models using AWS SageMaker training jobs, Azure ML training jobs, or open-source tools in a Kubernetes container
- Implement automated rule management systems using Python in containers deployed to AWS ECS/EKS, Azure AKS, or Kubernetes for cloud-agnostic solutions
- Architect the integration of ChatOps backend services using Python containers running in AWS ECS/EKS, Azure AKS, or Kubernetes for real-time interactions and updates
- Oversee the continuous deployment and retraining of models based on updated data and feedback loops, ensuring models remain efficient and adaptive
- Design platform-agnostic solutions to ensure that the system can be ported across different cloud environments or run in hybrid clouds (on-premises and cloud)
- 13+ years of overall experience and 7+ years of experience in AIOps, Cloud Architecture, or DevOps roles
- Hands-on experience with AWS services such as SageMaker, S3, Glue, Kinesis, ECS, EKS
- Strong experience with Azure services such as Azure Machine Learning, Blob Storage, Azure Event Hubs, Azure AKS
- Hands-on experience working on the design, development, and deployment of contact centre solutions at scale
- Proficiency in container orchestration (e.g., Kubernetes) and experience with multi-cloud environments
- Experience with machine learning model training, deployment, and data management across cloud-native and cloud-agnostic environments
- Expertise in implementing ChatOps solutions using platforms like Microsoft Teams, Slack, and integrating them with AIOps automation
- Familiarity with data lake architectures, data pipelines, and inference pipelines using event-driven architectures
- Strong programming skills in Python for rule management, automation, and integration with cloud services
- Experience in Kafka, Azure DevOps, and AWS DevOps for CI/CD pipelines
- Opportunity to work on technical challenges that may impact across geographies
- Vast opportunities for self-development: online university, knowledge sharing opportunities globally, learning opportunities through external certifications
- Opportunity to share your ideas on international platforms
- Sponsored Tech Talks & Hackathons
- Unlimited access to LinkedIn learning solutions
- Possibility to relocate to any EPAM office for short and long-term projects
- Focused individual development
- Benefit package:
- Health benefits
- Retirement benefits
- Paid time off
- Flexible benefits
- Forums to explore beyond work passion (CSR, photography, painting, sports, etc.)