We are looking for an experienced Senior DevOps Engineer to join our team, focusing on incident and request management, with proficiency in tools such as Dynatrace, Grafana, and Splunk.
This role requires expertise in monitoring setup and tool administration along with the ability to manage medium complexity break/fix tickets. If you are a strategic thinker with a knack for maintaining high availability and fault tolerance in systems, we encourage you to apply.
#LI-DNI
Responsibilities
- Develop and maintain documentation that explains best practices for logging and monitoring
- Conduct regular audits to ensure compliance with policies and industry standards
- Engage in cross-functional discussions to promote logging and monitoring best practices across the company
- Manage and oversee monitoring, alerting, operability, and observability using Dynatrace, Splunk, and Grafana
- Triage, update, and assess the urgency of tickets
- Evaluate documentation to escalate tickets that surpass Level 2 troubleshooting capabilities
- Create and leverage documentation for standard incidents and requests
- Establish average time to complete tickets and create SLOs for each product request type
- Document and review metrics and escalated tickets regularly to optimize the support process
- Handle incidents and requests for monitoring setup and tool administration using JIRA
- Be available for off-hours monitoring, escalation, and carry pager duty for emergencies
Want more jobs like this?
Get jobs in Río Grande, Mexico delivered to your inbox every week.
- Over 3 years of experience in DevOps or SRE roles
- Bachelor's degree in computer science or a related field and/or equivalent work experience
- Strong knowledge of observability including monitoring, logging, and tracing
- Hands-on experience with Dynatrace, Splunk, Grafana
- Background in Azure logging and monitoring tools such as Log Analytics, Azure Monitor, App Insights
- Capability to work both independently and as part of a team
- Strong analytical and problem-solving skills, with proficiency in troubleshooting under pressure
- Strategic thinker with excellent organizational and interpersonal skills
- Flexibility to adapt quickly to new technologies
- Exceptional communication skills and fluency in English
- Experience developing and promoting a culture of operational maturity
- Proven track record of managing high-availability, fault-tolerant, scalable systems in a production environment
- Expertise in managing a diverse team and fostering collaboration
- Career plan and real growth opportunities
- Unlimited access to LinkedIn learning solutions
- International Mobility Plan within 25 countries
- Constant training, mentoring, online corporate courses, eLearning and more
- English classes with a certified teacher
- Support for employee's initiatives (Algorithms club, toastmasters, agile club and more)
- Enjoyable working environment (Gaming room, napping area, amenities, events, sport teams and more)
- Flexible work schedule and dress code
- Collaborate in a multicultural environment and share best practices from around the globe
- Hired directly by EPAM & 100% under payroll
- Law benefits (IMSS, INFONAVIT, 25% vacation bonus)
- Major medical expenses insurance: Life, Major medical expenses with dental & visual coverage (for the employee and direct family members)
- 13 % employee savings fund, capped to the law limit
- Grocery coupons
- 30 days December bonus
- Employee Stock Purchase Plan
- 12 vacations days plus 4 floating days
- Official Mexican holidays, plus 5 extra holidays (Maundry Thursday and Friday, November 2nd, December 24th & 31st)
- Relocation bonus: transportation, 2 weeks of accommodation for you and your family and more
By applying to our role, you are agreeing that your personal data may be used as in set out in EPAM's Privacy Notice and Policy.