Introduction
As a Site Reliability Engineer (SRE) in the IBM Cloud Infrastructure organization, you will be responsible for ensuring the reliability, scalability, and operational efficiency of IBM Cloud's storage services. You will work closely with development teams, SRE peers and engineering managers to automate infrastructure management, optimize system performance, and enhance monitoring capabilities. This role involves writing code, building automation, troubleshooting production issues, and improving overall service reliability.
Your role and responsibilities
Reliability & Scalability
• Design, build, and maintain highly available, distributed storage services with a focus on reliability, scalability, and security.
Want more jobs like this?
Get jobs in Dallas, TX delivered to your inbox every week.
• Implement auto-scaling, load balancing, and failover strategies to ensure seamless service availability.
• Analyze performance bottlenecks, optimize system efficiency, and contribute to capacity planning efforts.
Automation & Infrastructure as Code
• Develop infrastructure automation using PHP, Go, Kubernetes, and other cloud-native technologies.
• Implement self-healing mechanisms and automated remediation processes to minimize manual intervention.
Incident Management & Monitoring
• Respond to production incidents, participate on root cause analyses (RCA), and implement long-term fixes to improve system resilience.
• Collaborate on observability solutions, including monitoring, logging, and alerting, using tools like Prometheus, Grafana, Splunk, and IBM Cloud Monitoring.
Security & Compliance
• Ensure compliance with security best practices and regulatory requirements.
• Implement secret management, encryption, and access control for sensitive infrastructure components.
• Participate in security audits, vulnerability assessments, and compliance automation efforts.
Cross-Team Collaboration & DevOps Culture
• Work closely with development, operations, and security teams to design and implement resilient architectures.
• Advocate for DevOps/SRE best practices, including blameless postmortems, incident retrospectives, and operational readiness reviews.
Required education
Bachelor's Degree
Preferred education
Master's Degree
Required technical and professional expertise
- 2+ years of experience in SRE, DevOps, or Software Engineering roles.
- An understanding of Cloud infrastructure/operations is a must
- Knows their way around a Unix/Linux shell, can write shell scripts, and understands Linux internals
- Experience in Software Development Life Cycle, Test Driven Development, Continuous Integration and Continuous Delivery
- Experience with containers, such as with Docker, Kubernetes and Open Shift
- Familiarity with Linux systems administration, networking, and distributed systems.
- Experience with troubleshooting production incidents and implementing permanent fixes.
- Ability to write clean, maintainable, and efficient automation code.
- Familiarity with Ansible, Bash, core Python development, and deployments in production environment
Preferred technical and professional experience
- Familiarity with one of C, C++, golang, python, or Java
- PHP and perl development experience
- Experience in monitoring applications such as Grafana, ELK stack, Prometheus, Nagios, and Sysdig
- Familiarity with cloud deployment tooling
ABOUT BUSINESS UNIT
IBM Systems helps IT leaders think differently about their infrastructure. IBM servers and storage are no longer inanimate - they can understand, reason, and learn so our clients can innovate while avoiding IT issues. Our systems power the world's most important industries and our clients are the architects of the future. Join us to help build our leading-edge technology portfolio designed for cognitive business and optimized for cloud computing.
YOUR LIFE @ IBM
In a world where technology never stands still, we understand that, dedication to our clients success, innovation that matters, and trust and personal responsibility in all our relationships, lives in what we do as IBMers as we strive to be the catalyst that makes the world work better.
Being an IBMer means you'll be able to learn and develop yourself and your career, you'll be encouraged to be courageous and experiment everyday, all whilst having continuous trust and support in an environment where everyone can thrive whatever their personal or professional background.
Our IBMers are growth minded, always staying curious, open to feedback and learning new information and skills to constantly transform themselves and our company. They are trusted to provide on-going feedback to help other IBMers grow, as well as collaborate with colleagues keeping in mind a team focused approach to include different perspectives to drive exceptional outcomes for our customers. The courage our IBMers have to make critical decisions everyday is essential to IBM becoming the catalyst for progress, always embracing challenges with resources they have to hand, a can-do attitude and always striving for an outcome focused approach within everything that they do.
Are you ready to be an IBMer?
ABOUT IBM
IBM's greatest invention is the IBMer. We believe that through the application of intelligence, reason and science, we can improve business, society and the human condition, bringing the power of an open hybrid cloud and AI strategy to life for our clients and partners around the world.
Restlessly reinventing since 1911, we are not only one of the largest corporate organizations in the world, we're also one of the biggest technology and consulting employers, with many of the Fortune 50 companies relying on the IBM Cloud to run their business.
At IBM, we pride ourselves on being an early adopter of artificial intelligence, quantum computing and blockchain. Now it's time for you to join us on our journey to being a responsible technology innovator and a force for good in the world.
OTHER RELEVANT JOB DETAILS
IBM offers a competitive and comprehensive benefits program. Eligible employees may have access to:
- Healthcare benefits including medical & prescription drug coverage, dental, vision, and mental health & well being
- Financial programs such as 401(k), the IBM Employee Stock Purchase Plan, financial counseling, life insurance, short & long- term disability coverage, and opportunities for performance based salary incentive programs
- Generous paid time off including 12 holidays, minimum 56 hours sick time, 120 hours vacation, 12 weeks parental bonding leave in accordance with IBM Policy, and other Paid Care Leave programs. IBM also offers paid family leave benefits to eligible employees where required by applicable law
- Training and educational resources on our personalized, AI-driven learning platform where IBMers can grow skills and obtain industry-recognized certifications to achieve their career goals
- Diverse and inclusive employee resource groups, giving & volunteer opportunities, and discounts on retail products, services & experiences
We consider qualified applicants with criminal histories, consistent with applicable law.
This position was posted on the date cited in the key job details section and is anticipated to remain posted for 21 days from this date or less if not needed to fill the role.
The compensation range and benefits for this position are based on a full-time schedule for a full calendar year. The salary will vary depending on your job-related skills, experience and location. Pay increment and frequency of pay will be in accordance with employment classification and applicable laws. For part time roles, your compensation and benefits will be adjusted to reflect your hours. Benefits may be pro-rated for those who start working during the calendar year.