We are seeking a Reliability Engineer to join our remote team. In this role, you will ensure our information systems' stability, integrity, and efficiency, which support core organizational functions. You will also be instrumental in identifying and resolving issues that affect the reliability of our systems and services. A successful candidate will thrive in a fast-paced environment and be committed to proactive service optimization and issue prevention.
#LI-DNI
Responsibilities
- Monitor system performance and reliability, identifying and resolving issues before they impact users
- Develop and implement maintenance procedures to reduce system downtime and increase overall efficiency
- Collaborate with development teams to enhance system design and architecture with a focus on reliability and scalability
- Conduct root cause analysis on incidents to prevent recurrence
- Optimize system configurations and settings for improved performance and reliability
- Implement and manage monitoring tools and software to provide critical operational metrics and insights
Want more jobs like this?
Get jobs in Río Grande, Mexico delivered to your inbox every week.
- Minimum of 2 years experience as a Reliability Engineer
- Proven scripting skills in Python and PowerShell to automate tasks and processes
- Strong knowledge of cloud platforms, specifically Azure and GCP
- Experience with Azure DevOps pipelines for continuous integration and deployment
- Proficient in debugging and troubleshooting complex software and hardware issues
- Familiarity with monitoring tools such as GCP Cloud Logging, Grafana, and Azure Logs
- Solid understanding of Site Reliability Engineering (SRE) principles and practices
- Fluent English communication skills at a B2 level or higher
- Experience with Kubernetes and container technologies
- Ability to lead cross-functional projects and initiatives to improve system reliability
- Experience in implementing disaster recovery plans and failover mechanisms to ensure high availability and business continuity
- Career plan and real growth opportunities
- Unlimited access to LinkedIn learning solutions
- International Mobility Plan within 25 countries
- Constant training, mentoring, online corporate courses, eLearning and more
- English classes with a certified teacher
- Support for employee's initiatives (Algorithms club, toastmasters, agile club and more)
- Enjoyable working environment (Gaming room, napping area, amenities, events, sport teams and more)
- Flexible work schedule and dress code
- Collaborate in a multicultural environment and share best practices from around the globe
- Hired directly by EPAM & 100% under payroll
- Law benefits (IMSS, INFONAVIT, 25% vacation bonus)
- Major medical expenses insurance: Life, Major medical expenses with dental & visual coverage (for the employee and direct family members)
- 13 % employee savings fund, capped to the law limit
- Grocery coupons
- 30 days December bonus
- Employee Stock Purchase Plan
- 12 vacations days plus 4 floating days
- Official Mexican holidays, plus 5 extra holidays (Maundry Thursday and Friday, November 2nd, December 24th & 31st)
- Monthly non-taxable amount for the electricity and internet bills
By applying to our role, you are agreeing that your personal data may be used as in set out in EPAM's Privacy Notice and Policy.