NVIDIA has continuously reinvented itself. Our invention of the GPU sparked the growth of the PC gaming market, redefined modern computer graphics, and revolutionized parallel computing. Today, research in artificial intelligence is booming worldwide, which calls for highly scalable and massively parallel computation horsepower that NVIDIA GPUs excel.
We are looking for architects who would be responsible for various aspects of architecture performance for NVIDIA's next-generation AI server systems. This role requires familiarity with HW architecture, workloads (AI training, AI inference, and/or HPC), and the software stack. The candidate should be comfortable working on silicon as well as with high-level models and simulators. The role is not expected to have significant modeling (functional or performance) component, but some small, focused tasks may come up infrequently.
Want more jobs like this?
Get jobs delivered to your inbox every week.
What you'll be doing:
- Analyze workloads of interest on existing silicon (NV server platforms). Workloads of interest would be any server workload, but emphasis on distributed training, inference and HPC applications.
- Collaborate with cross-functional teams to define performance metrics and benchmarks for various applications.
- Conduct thorough performance evaluations, identify bottlenecks, and propose effective solutions. Using appropriate platform and tools.
- Use insights derived from above to motivate arch features to optimize system architecture that meet performance and scalability goals.
- Work closely with software and hardware teams to influence design decisions that impact system performance.
- Act as a subject matter expert on system performance related issues, providing guidance and support to broader engineering team.
What we need to see:
- BE/BTech or MS/MTech in the relevant area, PhD is a plus.
- 10+ years of relevant experience related to hardware architecture in one or more of domains such as CPU, GPU, caches, memory subsystem, PCIe, etc.
- Experience in one or more of application domains such as deep learning training and/or inference, high performance computing, cloud computing, etc.
- Familiarity with performance tools and methodologies. Experience in developing and/or working with performance simulators will be a big plus.
- Programming experience in C/C++/Python
- Ability to work with potentially large and unfamiliar software repositories
- Experience in high performance networking solutions such as Infiniband, RoCE, etc. Familiarity with the workings of communication libraries such as MPI, UCX, etc. will be a big plus.
- Familiarity with CUDA and the GPU programming model
NVIDIA is widely considered to be one of the technology world's most desirable employers. We have some of the most forward-thinking and hardworking people in the world working for us. If you're creative, autonomous and love a challenge, we want to hear from you. Come, join our Deep Learning Automotive team and help build the real-time, cost-effective computing platform driving our success in this exciting and quickly growing field.
We are an equal opportunity employer and value diversity at our company. We do not discriminate on the basis of race, religion, color, national origin, gender, sexual orientation, age, marital status, veteran status, or disability status