Introduction
IBM Cloud development team is dedicated to ensuring that the IBM Cloud is at the forefront of cloud technology, from API design to application architecture to flexible infrastructure services. We are running IBM's current generation cloud platform to deliver performance and predictability for our customers' most demanding workloads, at global scale and with leadership efficiency, resiliency and security. It is an exciting time, and as a team we are driven by this incredible opportunity to thrill our clients.
The Site Reliability engineer must have a deep understanding of the infrastructure and software domain, cloud infra and application resiliency concepts and general operations in a production environment. The key requirement is to have a passion for maintaining a high quality, highly Available service.
Want more jobs like this?
Get jobs in Bangalore, India delivered to your inbox every week.
Your Role and Responsibilities
We are a "You build it, You run it" culture. As a Site Reliability Engineer, you will join our follow-the-sun rotation where you will be the primary responder for automated system alerts. You will follow runbooks to resolve such issues and use your troubleshooting and analytical skills to diagnose or troubleshoot platform or application issues. You are also responsible to develop and engineer automations and tools that are required to efficiently maintain, operate and troubleshoot a complex cloud environment.
Required Technical and Professional Expertise
- 5+ years of experience developing or operating complex cloud scale application/infrastructure environment
- 4+ years of hands on experience with operating systems: RHEL, CentOS Linux, and Windows Servers
- Hands-on experience with Container technologies: Kubernetes, Docker, etc.
- Working knowledge with one or more Virtualization technologies: Citrix Hypervisor, VMware vSphere, Ubuntu KVM, etc.
- Hands-on experience building automation: Bash, PowerShell, Python or Go.
- Working knowledge with one or more key infrastructure tools/products: Active Directory, Ansible, Chef, etc.
- • Working knowledge with Monitoring technologies: Zabbix, Splunk, etc.
- Working knowledge with Network and Storage technologies
- Working knowledge with ServiceNow, JIRA, Confluent, and GitHub
Preferred Technical and Professional Expertise
- Experience with Message Queues, PostgreSQL/MySQL Databases, and NoSQL Databases
- Ready to work in shifts