We are seeking senior engineers with a passion for performance analysis and optimization to join our team in advancing ground breaking technologies for deep learning compilers and automated kernel generation. At NVIDIA, you will collaborate across the full hardware/software stack-from GPU architecture to deep learning frameworks-to push the boundaries of AI performance. This role provides an outstanding opportunity to craft both hardware and software roadmaps at a company that is at the forefront of the AI revolution. You will work alongside world-class engineers to implement innovative deep learning models and optimize end-to-end performance for NVIDIA's DL software and hardware ecosystem. You'll have the chance to work on powerful, enterprise-grade GPU clusters delivering hundreds of PetaFLOPS, and gain access to unreleased hardware that is shaping the future of AI.
Want more jobs like this?
Get jobs delivered to your inbox every week.
What you'll be doing:
- Profile, analyze, and optimize the performance of deep learning models and workloads on ground breaking hardware and software platforms.
- Develop tooling for profiling and microbenchmarking of DL workloads running compiled models uncovering optimization opportunities.
- Collaborate with teams across NVIDIA to provide performance insights and recommendations that improve the design and efficiency of DL frameworks and workloads.
- Own the development and implementation of standard methodologies for compiling, testing, and deploying high-performance deep learning models.
- Conduct performance benchmarking on enterprise-grade GPU clusters and pre-release hardware, driving improvements to NVIDIA's DL software stack and hardware roadmap.
What we need to see:
- 5+ years of experience in deep learning model implementation, software development, and performance optimization.
- BSc, MS, or PhD in Computer Science, Computer Engineering, Electrical Engineering, Mathematics, Physics, or a related technical field, or equivalent practical experience.
- Proficiency in Python, with extensive hands-on experience using at least one major deep learning framework (e.g., PyTorch, TensorFlow, JAX).
- Strong problem-solving and analytical skills, with a proven track record in debugging, performance tuning, and workload optimization.
- Experience with deep learning compilers (e.g., PyTorch's torch.compile, XLA, or other similar technologies)
Ways to stand out from the crowd:
- Experience with running large-scale workloads in HPC clusters
- Knowledge and passion for DevOps/MLOps practices for Deep Learning-based product's development.
- Solid understanding of Linux environments and containerization technologies such as Docker
- Familiarity with GPU programming or parallel computing.
NVIDIA is widely considered to be one of the technology world's most desirable employers. We have some of the most hard-working and forward-thinking people in the world working for us. If you're creative and autonomous, we want to hear from you! We are an equal opportunity employer and value diversity at our company. We do not discriminate on the basis of race, religion, color, national origin, gender, sexual orientation, age, marital status, veteran status, or disability status.
#deeplearning