We are actively searching for a skilled and motivated Senior Site Reliability Engineer specializing in system software, infrastructure management, and performance optimization.
As a member of our team, you will be responsible for ensuring the stability, scalability, and reliability of our services and products.
This position offers hybrid setup with the flexibility to work from any location in Latvia, whether it's your home or our office in Riga.
#LI-DNI
Responsibilities
- Design, develop, and implement system software enhancing the stability, scalability, availability, and robustness of our services and products
- Develop reusable automation and instrumentation patterns across teams and products
- Prioritize automation over manual fixing of operational issues
- Develop effective and proactive system monitoring strategies
- Provide senior technical leadership on Major Incident calls and drive service outage recoveries
- Manage cross-functional technical resources post Major Incidents to ensure in-depth understanding and documentation of root cause and service protection measures
- Participate in an on-call rotation, including weekend or after-hours coverage
Want more jobs like this?
Get jobs in Ogre, Latvia delivered to your inbox every week.
- Skills in public cloud infrastructure (e.g., AWS, Azure) and related technologies (e.g., Docker, Kubernetes, Cloud Formation)
- Solid understanding of storage, database systems, caching, queuing, and networking
- Experience in designing, analyzing, and troubleshooting distributed systems
- Ability to debug, optimize code, and automate routine operational tasks
- Proficiency in Linux or Windows administration and troubleshooting
- Familiarity with Prometheus, Grafana, Kibana, Elasticsearch is favorable
- Understanding of Service Level Agreements & Objectives and Service Management practices (ITIL)
- Proficiency in JavaScript or TypeScript, with ReactJS, NextJS, React Native being an added advantage
- Background in building KPI dashboards for proactive monitoring
- English of B2 and higher level
- Engineering Heritage: Best-in-class experts sharing a culture of engineering excellence and tackling complex engineering challenges for over 30 years
- Advanced Tech Stack: Innovative projects where you can apply or enhance your expertise in Cloud, Data, AI, and other emerging technologies
- World-Class Clients: Work closely with 295+ of the Forbes Global 2000 on creating disruptive solutions that make a global impact
- Professional Growth: Exceptional support for career development with comprehensive resources for upskilling or reskilling in pioneering practices
- GenAI Community: Strong AI competencies with 600+ experts across 55+ locations driving GenAI-enabled transformation journeys
- Entrepreneurial Culture: If you're passionate and dedicated to improving business transformation, we provide the support you need to bring your ideas to life
- Hybrid Setup: The flexibility to work from any location in Latvia, whether it's your home or our office in Riga
- Other Benefits: Additional vacation and trust days, private health insurance, Employee Stock Purchase Plan and more
About EPAM
EPAM is a leading global provider of digital platform engineering and development services. For over 30 years, our team has helped leading brands navigate the waves of digital transformation, building solutions that help them stay competitive through constant market disruption.
With offices in 55+ countries, EPAM has grown in Latvia to over 150+ talented innovators in 3 years. We foster creativity and unconventional ways of doing things, welcoming like-minded professionals to join us.