Introduction
IBM Cloud development team is dedicated to ensuring that the IBM Cloud is at the forefront of cloud technology, from API design to application architecture to flexible infrastructure services. We are running IBM's current generation cloud platform to deliver performance and predictability for our customers' most demanding workloads, at global scale and with leadership efficiency, resiliency and security. It is an exciting time, and as a team we are driven by this incredible opportunity to thrill our clients.

The Site Reliability engineer must have a deep understanding of the infrastructure and software domain, cloud infra and application resiliency concepts and general operations in a production environment. The key requirement is to have a passion for maintaining a high quality, highly Available service.

Want more jobs like this?

Get jobs in Bangalore, India delivered to your inbox every week.

By signing up, you agree to our Terms of Service & Privacy Policy.

Your Role and Responsibilities
We are a "You build it, You run it" culture. As a Site Reliability Engineer, you will join our follow-the-sun rotation where you will be the primary responder for automated system alerts. You will follow runbooks to resolve such issues and use your troubleshooting and analytical skills to diagnose or troubleshoot platform or application issues. You are also responsible to develop and engineer automations and tools that are required to efficiently maintain, operate and troubleshoot a complex cloud environment.

Required Technical and Professional Expertise

5+ years of experience developing or operating complex cloud scale application/infrastructure environment
4+ years of hands on experience with operating systems: RHEL, CentOS Linux, and Windows Servers
Hands-on experience with Container technologies: Kubernetes, Docker, etc.
Working knowledge with one or more Virtualization technologies: Citrix Hypervisor, VMware vSphere, Ubuntu KVM, etc.
Hands-on experience building automation: Bash, PowerShell, Python or Go.
Working knowledge with one or more key infrastructure tools/products: Active Directory, Ansible, Chef, etc.
• Working knowledge with Monitoring technologies: Zabbix, Splunk, etc.
Working knowledge with Network and Storage technologies
Working knowledge with ServiceNow, JIRA, Confluent, and GitHub

Preferred Technical and Professional Expertise

Experience with Message Queues, PostgreSQL/MySQL Databases, and NoSQL Databases
Ready to work in shifts

Site Reliability Engineering Professional

Site Reliability Engineering Professional

Want more jobs like this?

Company Videos

Search Additional Jobs