Skip to main contentA logo with &quat;the muse&quat; in dark blue text.

Senior Site Reliability Engineer

AT NVIDIA
NVIDIA

Senior Site Reliability Engineer

Gurgaon, India

NVIDIA's Infrastructure, Planning and Processes (IPP) organization is seeking a hard-working and experienced Site Reliability/DevOps Engineer, with strong background in Infrastructure Management, Monitoring, Automation, & System Administration, to join our Sanity Operations Team in Pune. The IPP Org provides Infrastructure, Products & Services for multiple software teams including GPU, Mobile, and Automotive divisions working on NVIDIA's extraordinary products & services.

The team is responsible for hosting, enabling & running the large scale private cloud systems & services, for our in-house Testing CI framework. The cloud hosts a heterogeneous mix of machines and devices with various operating systems (Windows/Linux/Android, etc.), running with NVIDIA GPUs and Tegra Processors.

Want more jobs like this?

Get Software Engineering jobs delivered to your inbox every week.

Select a location
By signing up, you agree to our Terms of Service & Privacy Policy.


What you'll be doing:

  • Create resilient, scalable, and efficient test and deployment pipelines.
  • Design and implement complex automation platforms to identify & resolve operational inefficiencies.
  • Triaging software, hardware and infrastructure issues and maintaining high availability for our infrastructure & services.
  • Deploying & Monitoring critical high performance, large scale services running on Geo-distributed systems.
  • Continuously Strive for efficient utilization & management of the infrastructure.
  • Automate processes for enabling developers to adopt self-service practices, while ensuring compliance with security standards.
  • Work with architects and engineers across the teams to review the designs & solutions during development and deployment phases.
  • Collaborate with our other engineering teams to deliver reliable, robust, and high-performance capability of the underlying infra.
  • Mine & analyze data from multiple sources for identifying scaling & optimization opportunities.

What we need to see:

  • Bachelor's or Master's degree in computer science, Software Engineering, or equivalent experience with 8+ years of experience in a DevOps environment.
  • Strong hands-on experience in Configuring, maintaining, and building upon deployments of industry-standard tools (e.g. Kubernetes, Jenkins, Docker, CMake, Gitlab, Jira, etc)
  • Working Experience in monitoring & maintaining large-scale infrastructure applications running in a microservice-based architecture.
  • Proficient with Virtualization architecture with strong experience in Kubernetes, VMs, Dockers.
  • Experience with continuous integration and continuous delivery systems such as GitLab, GitOps, Jenkins, Packer, and Terraform.
  • Strong Python scripting skills, with proven background of using/writing JSON/REST APIs.
  • Fluency in using MySQL or equivalent NoSQL databases queries
  • Solid understanding of configuration management tools like, Chef, Puppet, Ansible, etc.
  • Working Experience with Perforce, GIT or any other version control system is necessary.
  • Experience with telemetry and alerting systems such as Kibana, Elastic Search, Grafana, and Prometheus to create rich visualizations of system health over time.
  • Ability to self-manage, show leadership, mentor others and communicate well.

Ways to stand out from the crowd:

  • Understanding of networking concepts like TCP/IP and firewall management.
  • Exposure to web apps/dashboards on frameworks like Django, AngularJS, VueJS, etc.
  • High level understanding of Build and Test systems.
  • Experience in Building regression detection systems by analyzing real-time production data, emphasizing important metrics.
  • Innovating with industry-standard tools and collaborating with the open source community
  • Outstanding interpersonal skills and communication.

Client-provided location(s): Gurugram, Haryana, India; Bengaluru, Karnataka, India; Pune, Maharashtra, India
Job ID: NVIDIA-JR1988676
Employment Type: Full Time