Skip to main contentA logo with &quat;the muse&quat; in dark blue text.

Site Reliability Engineer - Storage Operations

AT IBM
IBM

Site Reliability Engineer - Storage Operations

Alajuela, Costa Rica

Introduction

IBM is a global technology and innovation company. It is the largest technology and consulting employer in the world, with presence in 170 countries. The diversity and breadth of the entire IBM portfolio of research, consulting, solutions, services, systems and software, unusually distinguishes IBM from other companies in the industry.

Over the past 100 years, a lot has changed at IBM, in this new era of Cognitive Business, IBM is helping to reshape industries as diverse as healthcare, retail, banking, travel, manufacturing, and many more, by bringing together our expertise in Cloud, Analytics, Security, Mobile, and the Internet of Things. We like to say, "be essential." We are changing how we craft. How we collaborate. How we analyze. How we engage.

Want more jobs like this?

Get jobs in Alajuela, Costa Rica delivered to your inbox every week.

By signing up, you agree to our Terms of Service & Privacy Policy.


Join the next generation of innovators, inventors and entrepreneurs who are crafting the very way the world works. We want the brightest minds doing work that encourages, in an environment where growth is supported. IBMers get to discover their potential, so they're inspired to build breakthroughs that help our clients succeed. We're building teams with dynamic strengths with people who want their ideas to matter. Join us - you'll be proud to call yourself an IBMer.

Business Unit Introduction

IBM Cloud Computing is a one-stop shop which provides all the cloud solutions & cloud tools the industries need. IBM Cloud portfolio includes infrastructure as a service (IaaS), software as a service (SaaS) and platform as a service (PaaS) offered through public, private and hybrid cloud delivery models, in addition to the components that make up those clouds.

IBM Cloud ensures seamless integration into public and private cloud environments. The infrastructure is secure, scalable, and flexible, providing customized enterprise solutions that have made IBM Cloud the Hybrid Cloud Market leader with our market leading IAAS and PAAS Platforms. The IBM Cloud platform is the public cloud offering from IBM providing services to global enterprises. IBM Cloud is the Cloud for Smarter Business, built on Open Technology with Developer Tools and supports solutions by Industry. We run the services and workloads from Watson, Blockchain, Services, Security, and IoT.

Ready to help drive IBM's success in the Cloud market? This is your chance to research and learn new Cloud related technology products and services, as well as to design and implement quick Cloud based prototypes while advancing your career in leading edge technology.

Your role and responsibilities

As a Site Reliability Engineering (SRE) and DevOps Engineer in Storage, you will ensure that the designed solution responds to non-functional requirements such as reliability, availability, performance, security, and maintainability. You will closely work with the development and other related Release and L2 teams.

Your role and responsibilities

As a storage operations administrator, you will ensure that the storage fleet maintains reliability, availability, performance, and security. You will closely work with vendors, development teams, datacenter staff, and support staff to keep the storage environment stable and growing. This includes performing expansions, upgrades and assisting vendors with installation of new hardware.

  • Maintain the storage environment with high availability and resiliency
  • Experience in administration of NetApp ONTAP storage clusters
  • A storage support engineer is responsible for diagnosing and troubleshooting technical issues related to NetApp and other storage hardware and software
  • Create and manage scripts to automate the administrative tasks and provide system level reports
  • Recommend and oversee projects that expand, change, or improve the systems and related infrastructure
  • You will bring an engineering focus to operations, putting your energy on preventing incidents, increasing observability, automation frameworks, self-service infrastructure, logging and metrics, and operational reports.
  • You will be expected to use tools include logging, monitoring, event management, notification, Runbook Automation, ChatOps, Root Cause Analysis.
  • You will work with automation engineers and QA engineers to ensure seamless delivery of our service offerings.

Responsibilities:

  • A Storage Support Engineer is responsible for diagnosing and troubleshooting technical issues related to NetApp storage hardware and software, providing timely solutions to customers through phone, email, and remote sessions, acting as a primary point of contact for resolving complex technical problems, and collaborating with other teams to deliver optimal customer support, requiring knowledge of NetApp's ONTAP operating system, RAID concepts, Ethernet, FC, and iSCSI protocols, as well as familiarity with NetApp hardware like FAS and AFF arrays
  • Keeping the site or service up and running or getting it back up and running quickly when failure occurs
  • Working closely with internal partners and teams to ensure that our infrastructure meets security, SLA, and performance requirements
  • Writing, updating, and using documentation, including runbooks and playbooks
  • Assisting in debugging complex problems across an entire stack and creating solid solutions
  • Partnering with security engineers and developing plans and automation to aggressively and safely respond to new risks and vulnerabilities.
  • Develop, communicate, and monitor standard processes to promote the long-term health of sustainability and health of operational development tasks.

Required education

Bachelor's Degree

Preferred education

Bachelor's Degree

Required technical and professional expertise

  • 3+ yrs of total experience
  • Experience with administering NetApp ONTAP storage clusters
  • A solid understanding of cloud infrastructure/operations is a must
  • Can use a Unix/Linux shell, write shell scripts, and understands Linux internals
  • Experience debugging complex problems
  • Experience with DevOps engineering or SRE
  • Has hands-on experience using source control and feature branching strategies
  • Strong communication skills
  • Understands networking and messaging, especially between services

Preferred technical and professional experience

  • Experience with standard industry tools for monitoring and observability
  • A strong understanding of diverse infrastructure platforms and infrastructure concepts required.
  • Experience in infrastructure operations automation and IT service management with hands on exposure in data center administration, configuration, incident management and support
  • Experience automating infrastructure, configuration management, testing, and deployments using tools like Ansible, Chef and can explain the Infrastructure as Code paradigm
  • Experience designing, building, and operating large-scale production systems

ABOUT BUSINESS UNIT

IBM Systems helps IT leaders think differently about their infrastructure. IBM servers and storage are no longer inanimate - they can understand, reason, and learn so our clients can innovate while avoiding IT issues. Our systems power the world's most important industries and our clients are the architects of the future. Join us to help build our leading-edge technology portfolio designed for cognitive business and optimized for cloud computing.

YOUR LIFE @ IBM

In a world where technology never stands still, we understand that, dedication to our clients success, innovation that matters, and trust and personal responsibility in all our relationships, lives in what we do as IBMers as we strive to be the catalyst that makes the world work better.

Being an IBMer means you'll be able to learn and develop yourself and your career, you'll be encouraged to be courageous and experiment everyday, all whilst having continuous trust and support in an environment where everyone can thrive whatever their personal or professional background.

Our IBMers are growth minded, always staying curious, open to feedback and learning new information and skills to constantly transform themselves and our company. They are trusted to provide on-going feedback to help other IBMers grow, as well as collaborate with colleagues keeping in mind a team focused approach to include different perspectives to drive exceptional outcomes for our customers. The courage our IBMers have to make critical decisions everyday is essential to IBM becoming the catalyst for progress, always embracing challenges with resources they have to hand, a can-do attitude and always striving for an outcome focused approach within everything that they do.

Are you ready to be an IBMer?

ABOUT IBM

IBM's greatest invention is the IBMer. We believe that through the application of intelligence, reason and science, we can improve business, society and the human condition, bringing the power of an open hybrid cloud and AI strategy to life for our clients and partners around the world.

Restlessly reinventing since 1911, we are not only one of the largest corporate organizations in the world, we're also one of the biggest technology and consulting employers, with many of the Fortune 50 companies relying on the IBM Cloud to run their business.

At IBM, we pride ourselves on being an early adopter of artificial intelligence, quantum computing and blockchain. Now it's time for you to join us on our journey to being a responsible technology innovator and a force for good in the world.

OTHER RELEVANT JOB DETAILS

For additional information about location requirements, please discuss with the recruiter following submission of your application.

Client-provided location(s): Heredia Province, Heredia, Costa Rica
Job ID: IBM-19210
Employment Type: Other

Company Videos

Hear directly from employees about what it is like to work at IBM.