Skip to main contentA logo with &quat;the muse&quat; in dark blue text.

Service Reliability Analyst II - Live Operations

AT Riot Games
Riot Games

Service Reliability Analyst II - Live Operations

Los Angeles, CA

The Process & Analytics team focuses on using operational data to understand the player experience and provide that visibility to Riot. This team strives to collect, audit and use data to improve our games’ operational health; empowering game leadership to make data informed decisions to improve stability.  

As a Service Reliability Analyst, you will work with teams across Riot to build and execute effective ITIL processes, measurements of service health, and a highly contextual picture of the player experience. Your tenacity and drive for continuous improvement will help you uncover problematic trends and push for their resolution, improving the quality of the player experience. You will be a craft master in operational process and telling compelling visual stories with data. Live Ops can look to you to improve ITIL process, answer tough operational questions through data, and uncover previously unknown anti-patterns harming the player experience.  

Responsibilities:

Want more jobs like this?

Get Data and Analytics jobs in Los Angeles, CA delivered to your inbox every week.

By signing up, you agree to our Terms of Service & Privacy Policy.
  • Lead and facilitate weekly technical discussions on service reliability with key product teams, ensuring alignment on operational goals and performance metrics
  • Conduct thorough audits of incident data in collaboration with service owners to validate accuracy and ensure comprehensive reporting and analysis
  • Collect, synthesize, and report on system health metrics for Riot's diverse infrastructure, utilizing advanced data collection methods and monitoring tools
  • Perform in-depth analysis of operational data trends to identify and address systemic issues and optimize service performance
  • Participate in on-call rotations to provide critical support and ensure rapid response to incidents, minimizing downtime and service disruptions
  • Assist in tracking and coordinating corrective actions for root cause analysis, ensuring thorough resolution of underlying issues and continuous improvement of operational processes
  • Develop and maintain dashboards and reports that provide insights into key operational performance metrics, assisting leaders with making data-driven decisions

Required Qualifications:

  • 2-4 years of hands-on experience in IT service management, data analysis, or technical operations, with a focus on maintaining and optimizing IT infrastructure
  • Strong proficiency in incident, problem, change, and release management, with the ability to design and implement process flows using industry-standard methodologies
  • Solid understanding of software development life cycles (SDLC) and how various components interact within larger ecosystems, ensuring seamless operation and scalability
  • Clear awareness of system and service ownership within a multi-team environment, including the effective use of APIs/SDKs and adherence to SLAs
  • Deep enthusiasm for operations and technology, with a proactive approach to continuous improvement in system reliability and performance
  • Experience with the following tools and technologies:
    • Data Visualization Tools: Advanced skills in Tableau, DataWrapper, and Excel for creating actionable insights from complex datasets
    • Query Languages: Proficient in JQL, SQL, and XQuery for querying and manipulating data across various platforms
    • Monitoring Solutions: Expertise in setting up and managing monitoring frameworks using tools like DataDog and NewRelic to ensure system health and performance
    • Event Management Tools: Skilled in Event Correlation to improve Incident Response with tools such as Datadog, Big Panda or PagerDuty
    • ITIL-based Ticketing Systems: In-depth experience with ServiceNow, JIRA, or similar platforms for tracking and managing IT service processes

Desired Qualifications:

  • 2+ years of specialized experience in Service Reliability Engineering (SRE) or equivalent roles such as Technical Release Manager, Process Owner, Live Operations Engineer, or Network Administrator
  • Bachelor’s degree in Computer Science, IT Systems, Information Technology, or a closely related field, or equivalent professional experience
  • Advanced data analysis and data insights proficiency, with the ability to derive actionable intelligence from large datasets
  • Relevant certifications such as AWS Certified Solutions Architect, CompTIA Linux+, or CompTIA Network+, or equivalent credentials, are highly valued
  • Demonstrated expertise in deploying and managing monitoring solutions such as DataDog and NewRelic to ensure system health and performance within complex environments

For this role, you'll find success through craft expertise, a collaborative spirit, and decision-making that prioritizes your fellow Rioters, who are the customers of your work. Being a dedicated fan of games is not necessary for this position!

Our Perks:

Riot focuses on work/life balance, shown by our open paid time off policy and other perks such as flexible work schedules. We offer medical, dental, and life insurance, parental leave for you, your spouse/domestic partner, and children, and a 401k with company match. Check out our benefits pages for more information.

Riot Games fosters a player and workplace experience that values teamwork embodied by the Summoner's Code and Community Code. Our culture embraces differences as a strength, and our values are the guiding principles for how we approach work. We are committed to putting diversity and inclusion (D&I) at the center of everything we do, and promoting a fair and collaborative culture where Rioters treat one another with dignity and respect. We encourage you to read more about our value of thriving together and our ongoing work to build the most inclusive company in Gaming.

 

It’s our policy to provide equal employment opportunity for all applicants and members of Riot Games, Inc. Riot Games makes reasonable accommodations for handicapped and disabled Rioters and does not unlawfully discriminate on the basis of race, color, religion, sex, sexual orientation, gender identity or expression, national origin, age, handicap, veteran status, marital status, criminal history, or any other category protected by applicable federal and state law. We consider for employment all qualified applicants, including those with criminal histories, in a manner consistent with applicable federal, state and local law, including the California Fair Chance Act, the City of Los Angeles Fair Chance Initiative for Hiring Ordinance, the Los Angeles County Fair Chance Ordinance for Employers, the San Francisco Fair Chance Ordinance, and the Washington Fair Chance Act.

Per the Los Angeles County Fair Chance Ordinance, the following core duties may create a basis for disqualifying candidates with relevant criminal histories:

  • Safeguarding confidential and sensitive Company data
  • Communication with others, including Rioters and third parties such as vendors, and/or players, including minors
  • Accessing Company assets, secure digital systems, and networks
  • Ensuring a safe interactive environment for players and other Rioters

These duties are directly related to essential operations, safety, trust, and compliance obligations within our organization. Please note that job duties may evolve based on business needs and additional responsibilities may be assigned as necessary to maintain operational efficiency and security. 

Client-provided location(s): Los Angeles, CA, USA
Job ID: 6233243
Employment Type: Other