As an ML Ops Engineer, you will be instrumental in operationalizing and managing machine learning models. You'll focus on optimizing model deployment, scaling, and managing the lifecycle of machine learning models in production environments.
In this role, you will have the opportunity to work on cutting-edge projects that impact various industries, driving innovation and operational efficiency. Join EPAM and help shape the future of machine learning applications in real-world settings.
Unlock the potential of remote work in Kazakhstan, giving you the flexibility to work from home or access our offices in Astana, Almaty or Karaganda.
#LI-DNI#top-vacancies-10-KZ
Responsibilities
- Develop and maintain automation tools for continuous integration and deployment (CI/CD) of machine learning models
- Ensure robust monitoring and logging systems for models in production to quickly identify and address any issues
- Collaborate closely with data scientists and data engineers to enhance model architecture and performance
- Manage the infrastructure and resources needed for deploying models in production, including servers, data storage, and computational resources
- Implement version control and change management for machine learning models
- Implement and maintain robust monitoring, logging, and alerting systems to ensure the operational health and performance of ML models in production environments
- Proactively identify, diagnose, and resolve performance bottlenecks and anomalies in deployed models
- Enforce data security best practices and ensure compliance with regulatory requirements
- Stay updated with the latest advancements in machine learning technologies, tools, and industrial standards
Want more jobs like this?
Get jobs in Zhezqazghan, Kazakhstan delivered to your inbox every week.
- A degree in Computer Science, Engineering, Statistics, or a related field
- Experience with tools like Docker, Kubernetes, Jenkins, or similar for managing containerized applications and automating workflows
- Proficiency in programming languages commonly used in machine learning, such as Python or R
- Strong understanding of machine learning frameworks (e.g., TensorFlow, PyTorch) and model management
- Experience with cloud platforms (AWS, Azure, Google Cloud) and understanding of scalable architectures
- Strong experience with monitoring tools and best practices, including setting up automated alerts, dashboards, and logging pipelines for production ML systems
- Familiarity with observability stacks like Prometheus, Grafana, or similar solutions
- Excellent problem-solving skills and the ability to work as part of a multi-disciplinary team
- Strong communication skills, with the ability to articulate complex technical details to non-technical stakeholders
- Certification in relevant cloud technologies or machine learning
- Prior experience in a similar role, specifically within a tech-driven enterprise
- We connect like-minded people: :
- Delivering innovative solutions to industry leaders, making a global impact
- Enjoyable working environment, whether it is the vibrant office or the comfort of your own home
- Opportunity to work abroad for up to two months per year
- Relocation opportunities within our offices in 55+ countries
- Corporate and social events
- We invest in your growth: :
- Leadership development, career advising, soft skills and well-being programs
- Certifications, including GCP, Azure and AWS
- Unlimited access to LinkedIn Learning, Get Abstract, O'Reilly
- Free English classes with certified teachers
- Discounts in local language schools, including online courses for the Kazakh language
- We cover it all: :
- Participation in the Employee Stock Purchase Plan
- Monetary bonuses for engaging in the referral program
- Medical & family care package
- Six trust days per year (sick leave without a medical certificate)
- Coverage of psychology sessions of your choice
- Benefits package (sports activities, a variety of stores and services)