Introduction
At IBM, work is more than a job - it's a calling: To build. To design. To code. To consult. To think along with clients and sell. To make markets. To invent. To collaborate. Not just to do something better, but to attempt things you've never thought possible. Are you ready to lead in this new era of technology and solve some of the world's most challenging problems? If so, lets talk.
Your Role and Responsibilities
The Site Reliability Team (SRE) ensures the service is highly available and fully optimizead in a 24/7 environment. As a SRE you will play a crutial role in ensuring the reliability and resiliency of our systems. If you are passionate about optimizing, building automation, solving problems, testing, deploying and managing highly-scalable environments - this is the perfect opportunity for you.
Want more jobs like this?
Get jobs in Krakow, Poland delivered to your inbox every week.
In this role, you will be part of a global SRE team who works closely with our development and product teams to increase the quality and reliability for our products and services but also deploy and manage of Kubernetes clusters on IBM Cloud and other cloud platforms (AWS, Azure). As a SRE you must be willing to work in a fast paced Cloud environment, share rotational on-call duty coverage with the global Ops team and support the back-end Cloud infrastructure components.
Key Responsibilities:
- Maitain high-available product and service on cloud
- Identify issues, ensure minimal downtime and drive them towards a resolution
- Automate repetitive tasks using scripts and tools, reduce manual interventions
- Collaborate with development teams - roll out new services, ensure stability and reliability
- Improve operational practices, ensure efficenty and innovation
- Share knowlegde, ideas and solutions with global team
Required Technical and Professional Expertise
- Understanding of containerization technologies
- Experience with maintaining and scaling Kubernetes-based applications on cloud infrastructure
- Familiarity with scripting and automation (Bash, Python, Go, Jenkins, Ansible)
- Familiarity with the usage of Cloud Platforms (IBM Cloud, Amazon Web Services, Microsoft Azure)
- Strong debugging and problem-solving skills
- Passion for building and maitaning reliable and resiliant systems
- Basic understanding of networking
Preferred Technical and Professional Expertise
- Go/python development skills
- Understanding of cloud storage and networking
- Experience with Infrastructure as Code
- Experience with any source version control system
- Experience with observability (e.g., Prometheus, Grafana, Sysdig)