We are seeking a Senior Cloud Engineer to join our Observability team.
The chosen candidate will play a crucial role in managing and optimizing our AWS cloud infrastructure using a range of technologies and methodologies. This position involves daily tasks such as managing AWS infrastructure, setting up observability services, automating operations, building Docker images, and troubleshooting multiple service issues.
#LI-DNI
Responsibilities
- Manage the AWS infrastructure through Terraform and CloudFormation, including tasks like EKS version upgrades, blue/green deployments, scaling, and right-sizing
- Deploy and optimize a variety of observability services such as Cortex/Mimir, Loki, Tempo, OpenTelemetry, Grafana, and Alertmanager
- Automate operations programmatically using Python or Golang and CI tools like Gitlab CI
- Construct Docker images compatible with multiple architectures like arm64 and amd64
- Diagnose issues concerning microservices in Kubernetes, AWS connectivity, performance of services, Lambda functions, and Kafka
- Participate actively in hypercare events and on-call shifts
Want more jobs like this?
Get jobs in Río Grande, Mexico delivered to your inbox every week.
- Competency in version control usage including Git, GitHub, and GitLab alongside CI/CD pipelines
- Proficiency in Infrastructure as Code for automation using Terraform and Cloud Formation
- Background in managing large teams
- Strong understanding of observability tools including Datadog, NewRelic, and Grafana and their billing and usage calculations
- Expertise in cloud tracking and cost management strategies
- Familiarity with ITIL process methodologies including knowledge, incident, and problem management
- Comprehensive understanding of Grafana, Tempo, Mimir (Prometheus), and Loki
- Proficiency in managing AWS Cloud essentials such as IAM, tagging, load balancers, S3, Lambda, and EKS (Kubernetes)
- Knowledge of Python programming language
- Background in Cortex, Tempo, Promtail/FluentBit, Kafka, Elasticsearch, and Golang
- Career plan and real growth opportunities
- Unlimited access to LinkedIn learning solutions
- International Mobility Plan within 25 countries
- Constant training, mentoring, online corporate courses, eLearning and more
- English classes with a certified teacher
- Support for employee's initiatives (Algorithms club, toastmasters, agile club and more)
- Enjoyable working environment (Gaming room, napping area, amenities, events, sport teams and more)
- Flexible work schedule and dress code
- Collaborate in a multicultural environment and share best practices from around the globe
- Hired directly by EPAM & 100% under payroll
- Law benefits (IMSS, INFONAVIT, 25% vacation bonus)
- Major medical expenses insurance: Life, Major medical expenses with dental & visual coverage (for the employee and direct family members)
- 13 % employee savings fund, capped to the law limit
- Grocery coupons
- 30 days December bonus
- Employee Stock Purchase Plan
- 12 vacations days plus 4 floating days
- Official Mexican holidays, plus 5 extra holidays (Maundry Thursday and Friday, November 2nd, December 24th & 31st)
- Monthly non-taxable amount for the electricity and internet bills
By applying to our role, you are agreeing that your personal data may be used as in set out in EPAM's Privacy Notice and Policy.