We are seeking an experienced Senior Observability DevOps Engineer to join our dynamic team.
In this role, you will be responsible for managing our AWS infrastructure, enhancing our observability services, and automating operations while focusing on efficiency and scalability. Ideal candidates will demonstrate a robust understanding of DevOps principles and observability tools, and have experience automating infrastructure and handling large-scale environments.
#LI-DNI
Responsibilities
- Manage AWS infrastructure using Terraform and CloudFormation, including tasks like EKS version upgrades, blue/green deployments, and scaling
- Set up, tune, and modernize various observability services including Cortex/Mimir, Loki, Tempo, OpenTelemetry, Grafana, and Alertmanager
- Automate operations programmatically using Python or Golang and Gitlab CI, plus develop custom self-service solutions based on AWS Service Catalog
- Build Docker images for multiple architectures including arm64 and amd64
- Troubleshoot issues related to microservices in Kubernetes, AWS connectivity, service performance, Lambda functions, and Kafka
- Participate in hypercare events and on-call shifts
Want more jobs like this?
Get jobs in Río Grande, Mexico delivered to your inbox every week.
- Proficiency in version control using Git, GitHub, and GitLab alongside CI/CD pipelines
- Strong experience with Infrastructure as Code (IaC) tools like Terraform and CloudFormation for automation
- Expertise in Grafana, including logs, traces, and metrics alongside familiar usage with Tempo, Mimir (Prometheus), Loki, Datadog, and NewRelic
- Competency in AWS cloud services including S3, IAM, tagging, load balancers, Lambda, and EKS (Kubernetes)
- Skills in programming with Python
- Background in ITIL processes covering knowledge, incident, and problem management
- Qualifications in observability concepts and strategies for signal ingestion and billing reduction
- Familiarity with Cortex and Tempo
- Understanding of Promtail/FluentBit and Elasticsearch
- Flexibility to use Kafka and Golang when needed
- Career plan and real growth opportunities
- Unlimited access to LinkedIn learning solutions
- International Mobility Plan within 25 countries
- Constant training, mentoring, online corporate courses, eLearning and more
- English classes with a certified teacher
- Support for employee's initiatives (Algorithms club, toastmasters, agile club and more)
- Enjoyable working environment (Gaming room, napping area, amenities, events, sport teams and more)
- Flexible work schedule and dress code
- Collaborate in a multicultural environment and share best practices from around the globe
- Hired directly by EPAM & 100% under payroll
- Law benefits (IMSS, INFONAVIT, 25% vacation bonus)
- Major medical expenses insurance: Life, Major medical expenses with dental & visual coverage (for the employee and direct family members)
- 13 % employee savings fund, capped to the law limit
- Grocery coupons
- 30 days December bonus
- Employee Stock Purchase Plan
- 12 vacations days plus 4 floating days
- Official Mexican holidays, plus 5 extra holidays (Maundry Thursday and Friday, November 2nd, December 24th & 31st)
- Monthly non-taxable amount for the electricity and internet bills
By applying to our role, you are agreeing that your personal data may be used as in set out in EPAM's Privacy Notice and Policy.