Skip to main contentA logo with &quat;the muse&quat; in dark blue text.

Site Reliability Engineer

AT IBM
IBM

Site Reliability Engineer

Dublin, Ireland

Introduction
At IBM, work is more than a job - it's a calling: To build. To design. To code. To consult. To think along with clients and sell. To make markets. To invent. To collaborate. Not just to do something better, but to attempt things you've never thought possible. Are you ready to lead in this new era of technology and solve some of the world's most challenging problems? If so, lets talk.

Your Role and Responsibilities

Site Reliability Engineers at IBM are the backbone of our strategic initiatives to design, code, test, and provide industry-leading solutions that make the world run today. At IBM, you will use the latest software development tools, techniques and approaches and work with leading minds in the industry to build solutions you can be proud of.

Want more jobs like this?

Get jobs in Dublin, Ireland delivered to your inbox every week.

By signing up, you agree to our Terms of Service & Privacy Policy.


Are you passionate about technology? Do you love building new things? Do you want to develop the future of IBM's Cloud offerings? If you answered YES, then we have the right opportunity for you!

The shift toward the consumption of IT as a service, i.e., the cloud, is one of the most important changes to happen to our industry in decades. At IBM, we are driven to shift our technology to an as-a-service model and to help our clients transform themselves to take full advantage of the cloud. With industry leadership in analytics, security, commerce, and cognitive computing and with unmatched hardware and software design and enterprise reach, no other company is as well positioned to address the full opportunity of cloud computing.

We are looking for a dynamic Site Reliability Engineer to join our Cloud IaaS Operations Team in Dublin, Ireland who is responsive to market needs, to deliver value to our clients in a fast-changing cloud landscape. The SRE team dedicated to ensuring that the IBM Cloud is at the forefront of cloud technology, from data center design, Storage & Network architecture and compute clusters to flexible infrastructure services. We are building IBM's next generation cloud platform to deliver performance and predictability for our customers' most demanding workloads, at global scale and with leadership efficiency, resiliency and security. It is an exciting time, and as a team we are driven by this incredible opportunity to thrill our clients. This role will be on a shift rotation where you will work Sunday to Thursday or Tuesday to Saturday 4pm - 12 am.

Primary Roles & Responsibilities:
In this Site Reliability Engineer role, you will work closely with several Data Centers, the entire Cloud organization and IBM vendors to support, maintain and operationally improve the IBM cloud infrastructure. You will focus on the following key responsibilities:

  • Monitor the health of production and test systems 24x7
  • Ability to respond promptly to production issues and alerts 24x7
  • Execute changes in the production environment through automation
  • Partner with other SRE teams and program managers to deliver mission-critical services to the market
  • Support development of new and existing capabilities for our compute, storage and network infrastructure services
  • Implement and automate infrastructure solutions that support IBM Cloud products and infrastructure
  • Support the compliance and security integrity of the environment
  • Work with Engineering to:
    • Provide initial assessment and possible workaround of production issue
    • Troubleshoot and resolve production issues
  • Work with Support and Development teams to:
    • Identify and resolve issues
    • Discuss and plan integration tasks
  • Provide technical escalation support for other Infrastructure Operations teams

Required Technical and Professional Expertise

  • Excellent written and verbal communication skills
  • Experience in hands-on production administration of large systems and environment
  • Experience establishing and improving procedures within a mission critical environment
  • Must be efficient in writing and debugging scripts
  • Must be extremely comfortable using and navigating within a Linux environment
  • Ability to do low level debugging and problem analysis by examining logs and running Unix commands
  • Knowledge in Monitoring Technologies, Virtualization Technologies and Automation / Configuration Managements
    • Monitoring technologies: Zabbix (preferred), Grafana, Nagios, ELK, Splunk, etc. (at least one)
    • Virtualization technologies: Citrix Xen Hypervisor (Preferred), KVM(also preferred), libvirt, VMware vSphere, etc. (at least one)
    • Automation and configuration management tools/solutions: Ansible, Salt, Chef, python, bash, puppet, Rundeck, etc. (at least one)
  • Working knowledge with ServiceNow, JIRA, Confluence, and GitHub
Working knowledge with Container technologies: Kubernetes (preferred), Docker, etc.

Preferred Technical and Professional Expertise

• 2+ years of experience with GitHub, Perl and Python
• 2+ years of experience in virtualization environments such as AWS /Softlayer/Zen/VMWARE

Client-provided location(s): Coolmine, Mulhuddart, Co. Dublin, Ireland
Job ID: IBM-20872869
Employment Type: Full Time

Company Videos

Hear directly from employees about what it is like to work at IBM.