We are seeking a Senior Cloud Engineer to join our Observability team.
The chosen candidate will play a crucial role in managing and optimizing our AWS cloud infrastructure using a range of technologies and methodologies. This position involves daily tasks such as managing AWS infrastructure, setting up observability services, automating operations, building Docker images, and troubleshooting multiple service issues.
EPAM is a leading global provider of digital platform engineering and development services. We are committed to having a positive impact on our customers, our employees, and our communities. We embrace a dynamic and inclusive culture. Here you will collaborate with multi-national teams, contribute to a myriad of innovative projects that deliver the most creative and cutting-edge solutions, and have an opportunity to continuously learn and grow. No matter where you are located, you will join a dedicated, creative, and diverse community that will help you discover your fullest potential.
Want more jobs like this?
Get jobs in Bahía Blanca, Argentina delivered to your inbox every week.
#LI-DNI
Responsibilities
- Manage the AWS infrastructure through Terraform and CloudFormation, including tasks like EKS version upgrades, blue/green deployments, scaling, and right-sizing
- Deploy and optimize a variety of observability services such as Cortex/Mimir, Loki, Tempo, OpenTelemetry, Grafana, and Alertmanager
- Automate operations programmatically using Python or Golang and CI tools like Gitlab CI
- Construct Docker images compatible with multiple architectures like arm64 and amd64
- Diagnose issues concerning microservices in Kubernetes, AWS connectivity, performance of services, Lambda functions, and Kafka
- Participate actively in hypercare events and on-call shifts
- Competency in version control usage including Git, GitHub, and GitLab alongside CI/CD pipelines
- Proficiency in Infrastructure as Code for automation using Terraform and Cloud Formation
- Background in managing large teams
- Strong understanding of observability tools including Datadog, NewRelic, and Grafana and their billing and usage calculations
- Expertise in cloud tracking and cost management strategies
- Familiarity with ITIL process methodologies including knowledge, incident, and problem management
- Comprehensive understanding of Grafana, Tempo, Mimir (Prometheus), and Loki
- Proficiency in managing AWS Cloud essentials such as IAM, tagging, load balancers, S3, Lambda, and EKS (Kubernetes)
- Knowledge of Python programming language
- Background in Cortex, Tempo, Promtail/FluentBit, Kafka, Elasticsearch, and Golang
- Connectivity Bonus (15,000 ARS are paid with a salary receipt at the end of each month as a non-wages concept)
- Medicina Prepaga (It covers the collaborator and direct family group)
- Paternity Leave (Two additional days are added to what is established by law, total of 4 days)
- Discounts card
- English Training (English lessons, twice per week)
- Training Program (Access to multiple customized training plans according to the needs of each role within the company)
- Marriage bonus (The company doubles the allowance established by law that ANSES offers)
- Referral Program (Referral bonus is paid when the referral of a collaborator joins the Company)
- External Agreements and Discounts
- Vacations: 14 calendar days a year