The NVIDIA DGX Cloud organization is looking for software engineering talent to build NVIDIA's accelerated compute infrastructure. This includes software to assist in the rapid bring-up, operation, configuration, and trouble-shooting of compute hardware and networking equipment. As a software engineer, you will work with other software engineers, product architects, and product managers as a collaborative team to deliver and support end-to-end software solutions to manage complex cloud infrastructure deployments. You will write services and software that aligns with the broad architectural vision for the NVIDIA Cloud Platform, working with other teams to develop a robust and scalable system. You own your code - from development to commit to test to production, including operational support. We expect you to be passionate about code quality, testing, deployment efficiency/simplicity and bringing amazing products to market.
Want more jobs like this?
Get Software Engineering jobs delivered to your inbox every week.
What you will be doing:
- Work with NVIDIA internal customers.
- Design and build scalable software systems to manage NVIDIA's cloud infrastructure.
- Participate in responses to real-time operational events.
- Building network and systems automation software for managing a multi-tenant cloud infrastructure.
- Participate in open-source communities of software we leverage and build.
- Present to internal stakeholders and NVIDIA leadership on roadmaps, vision, & demos.
What we need to see:
- 12+ years of experience with designing and building distributed software systems.
- Track record of directly supporting systems with external customers, or demanding internal customers.
- BS/MS degree in Computer science or related areas (or equivalent experience).
- Demonstrated ability to write code in a mainstream systems programming language such as C, C++, Golang, or Rust.
- Demonstrated ability to design and implement maintainable APIs for consumers.
- Practical experience with asynchronous programming, type safety, threading models, state machines and data structures.
- Background of data persistence (SQL or similar).
- Understanding of secure communication protocols (mutual-TLS, IPsec, or similar).
- Knowledge of SRE principles (observability, SLOs, logging, etc.)
Ways to stand out from the crowd:
- Experience in a Hyperscale Cloud Service Provider (public facing or not).
- Understanding of networking protocols such as IP, IPv6, BGP, HTTP, ICMP, tunneling protocols (VXLAN, Geneve, FoU, GRE), etc.
- Familiarity with Infiniband networking.
- Background with Host management systems (DHCP, Redfish, UEFI) and host security services such as TPM, TXT, and SecureBoot.
- Kubernetes and/or distributed task scheduling.
NVIDIA is leading the way in groundbreaking developments in Artificial Intelligence, High-Performance Computing and Visualization. The GPU, our invention, serves as the visual cortex of modern computers and is at the heart of our products and services. Our work opens up new universes to explore, enables amazing creativity and discovery, and powers what were once science fiction inventions from artificial intelligence to autonomous cars. NVIDIA is looking for phenomenal people like you to help us accelerate the next wave of artificial intelligence. NVIDIA is widely considered to be one of the technology world's most desirable employers. We have some of the most forward-thinking and talented people in the world working for us. If you're creative and passionate about developing cloud services we want to hear from you!