Skip to main contentA logo with &quat;the muse&quat; in dark blue text.

AI DevOps / SRE Engineer

AT EPAM Systems
EPAM Systems

AI DevOps / SRE Engineer

Kayseri, Turkey

As an AI DevOps/SRE Engineer, you will be pivotal in deploying, maintaining, and scaling our AI solutions, including LLMs and RAG systems. You will work closely with data scientists and software developers to ensure seamless integration and operational efficiency of our AI deployments. Your role will involve both classic DevOps tasks and innovative approaches to MLOps, ensuring high availability and optimal performance of our systems.

Responsibilities

  • Implement and maintain CI/CD pipelines for AI and machine learning projects, ensuring robust deployment strategies and continuous integration
  • Monitor and ensure the reliability, availability, and performance of AI applications, particularly those involving LLMs and RAG
  • Collaborate with AI research teams to operationalize machine learning models and systems efficiently
  • Develop and enforce best practices for version control, configuration management, and testing of AI-driven software solutions
  • Utilize MLOps tools such as Kubeflow, MLflow, or TensorFlow Extended (TFX) to streamline the machine learning lifecycle from experimentation to production
  • Implement monitoring solutions that track both system metrics and model performance to facilitate proactive issue resolution
  • Participate in on-call rotations to support the operational health of critical systems, employing SRE principles to meet service-level objectives (SLOs) and reduce downtime
Requirements

Want more jobs like this?

Get Software Engineering jobs in Kayseri, Turkey delivered to your inbox every week.

By signing up, you agree to our Terms of Service & Privacy Policy.
  • Bachelor's degree in Computer Science, Engineering, or a related field
  • Proven experience as a DevOps Engineer or SRE, with a strong background in software development and automation
  • Experience with deployment and management of LLMs, including technologies like RAG
  • Proficient in CI/CD tools (e.g., Jenkins, GitLab CI, CircleCI) and infrastructure as code (e.g., Terraform, Ansible)
  • Knowledge of container orchestration technologies (e.g., Kubernetes, Docker)
  • Familiarity with MLOps tools and practices to support machine learning lifecycle management
  • Strong problem-solving skills and ability to work in a dynamic, fast-paced environment
Nice to have
  • Experience with cloud services (AWS, GCP, Azure) particularly in AI/ML deployments
  • Background in monitoring tools like Prometheus, Grafana, and ELK stack
  • Knowledge of Python, particularly in data science and machine learning contexts
  • Certification in Kubernetes, AWS/GCP/Azure, or similar technologies
We Offer
  • Build a global career with international projects and clients
  • Stay ahead in your career by working with diverse and cutting-edge technologies
  • Competitive compensation in USD, regular assessments, and salary reviews
  • Private Health Insurance: Unlimited usage with 80% coverage
  • Meal Allowance
  • Extensive Annual Leave Policy including extra workday-annual leaves granted for the first year
  • Referral program: cash bonus for each successful recommendation
  • Relocation opportunities within our offices in 50+ countries
  • Amazing learning and development opportunities: Hard & soft skills internal training courses, mentoring programs, and free access to over 2,500 in-house courses and over 18,000 on LinkedIn learning

Client-provided location(s): Türkiye
Job ID: EPAM-96484
Employment Type: Other