We are actively seeking a Lead DevOps Engineer skilled in incident and request management, utilizing resources like Dynatrace, Grafana, and Splunk.
The role includes critical duties such as setting up monitoring systems, tool management, and addressing medium complexity break-fix activities. The selected candidate will provide weekday and weekend on-call support.
#LI-DNI
Responsibilities
- Create and sustain a documentation framework that outlines best logging and monitoring practices
- Perform regular evaluations to verify logging and monitoring adherence to corporate and industry specifications
- Lead cross-functional groups to endorse logging and monitoring best practices
- Administer monitoring, alerting, and observability tools such as Dynatrace, Splunk, and Grafana
- Categorize tickets, update specifics, and determine urgency
- Examine documentation and escalate tickets that are beyond Level 2 troubleshooting
- Supply detailed handover notes for escalated tickets
- Employ and create documentation addressed towards standard incidents and requests
- Calculate the average time per ticket and set SLOs for each type of product request
- Consistently assess and share updates regarding metrics and escalated tickets to refine the support process
- Oversee incidents and requests concerning the setup and management of monitoring tools
- Remain reachable for off-hours monitoring, escalation, and emergency pager duties
Want more jobs like this?
Get jobs in Río Grande, Mexico delivered to your inbox every week.
- Bachelor's degree in computer science, related fields, or equivalent work history
- 5+ years within DevOps or SRE teams
- 5+ years in managing high-availability, fault-tolerant, scalable distributed software in live environments
- 1+ years in a relevant leadership role
- Proficiency in observability techniques including monitoring, logging, and tracing
- Familiarity with Dynatrace, Splunk, Grafana
- Background in Azure tools such as Log Analytics, Azure Monitor, and App Insights
- Capability to work autonomously and in team settings
- Strong skills in analytics, strategic thinking, and complex problem resolution
- Proven troubleshooting abilities during high-pressure situations
- Well-developed organizational and interpersonal talents promoting operational excellence
- Flexibility to quickly adapt to new technologies
- Exceptional communication abilities with proficiency in English
- Career plan and real growth opportunities
- Unlimited access to LinkedIn learning solutions
- International Mobility Plan within 25 countries
- Constant training, mentoring, online corporate courses, eLearning and more
- English classes with a certified teacher
- Support for employee's initiatives (Algorithms club, toastmasters, agile club and more)
- Enjoyable working environment (Gaming room, napping area, amenities, events, sport teams and more)
- Flexible work schedule and dress code
- Collaborate in a multicultural environment and share best practices from around the globe
- Hired directly by EPAM & 100% under payroll
- Law benefits (IMSS, INFONAVIT, 25% vacation bonus)
- Major medical expenses insurance: Life, Major medical expenses with dental & visual coverage (for the employee and direct family members)
- 13 % employee savings fund, capped to the law limit
- Grocery coupons
- 30 days December bonus
- Employee Stock Purchase Plan
- 12 vacations days plus 4 floating days
- Official Mexican holidays, plus 5 extra holidays (Maundry Thursday and Friday, November 2nd, December 24th & 31st)
- Monthly non-taxable amount for the electricity and internet bills
By applying to our role, you are agreeing that your personal data may be used as in set out in EPAM's Privacy Notice and Policy.