NVIDIA's Deep Learning Libraries Team is looking for excellent interns to enable the next wave of NVIDIA's highest performing deep learning libraries, such as cuDNN, cuBLAS and TensorRT. The mission is to design and develop scalable and modular software products that enable breakthroughs in problems from image classification to speech recognition to natural language processing and artificial intelligence. Join the team that is building the underlying software used across the world to power the revolution in artificial intelligence! We're always striving for peak GPU efficiency on current and future-generation GPUs. To get a sense of the code we write, check out our CUTLASS open-source project showcasing performant matrix multiply on NVIDIA's Tensor Cores with CUDA. This specific position primarily deals with code lower in the deep learning software stack, right down to the GPU HW.

Want more jobs like this?

Get jobs in Munich, Germany delivered to your inbox every week.

By signing up, you agree to our Terms of Service & Privacy Policy.

What you'll be doing:

In this role, you will be responsible for developing and delivering highly optimized deep learning products. The scope of these efforts ranges from defining the public APIs to performance tuning and analysis, from building developer infrastructure to testing automation, from joining architecture discussion to learning latest and greatest technologies from the research community. During your internship, you will be doing one or multiple of activities below:

Writing highly tuned compute kernels, mostly in C++ CUDA, to perform core deep learning operations (e.g. matrix multiplies, convolutions, normalizations)
Collaborating with teams across NVIDIA:
CUDA compiler team on generating optimal assembly code
Deep learning training and inference performance teams on which layers require optimization
Hardware and architecture teams on the programming model for new deep learning hardware features
Develop robust and scalable GPU-accelerated deep learning libraries, using C++ and object oriented design
Building scalable automation for build, test, integration, and release processes for publicly distributed deep learning libraries
Maintain and test environments for new hardware, new OSes, and platforms by using industry-standard tools (e.g. Kubernetes, Jenkins, Docker, CMake, Gitlab, Jira, etc)
Participate in a high-energy and dynamic company culture to develop state of the art software and hardware products and practice hardware-software co-design

What we need to see:

Pursuing a BS, MS or PhD in Computer Science, Compute Engineering or similar
Demonstrated strong C++ programming and software design skills, including debugging, problem solving, performance analysis, and test design
Experience with performance-oriented parallel programming, even if it's not on GPUs (e.g. with OpenMP or pthreads)
Or experience in SCM (e.g. Git, Perforce) and build systems (e.g. Make, CMake, Bazel)
Passion for "it just works" automation and enabling team members

Ways to stand out from the crowd:

Experience in optimizing/tuning BLAS or deep learning library kernel code
Knowledge of CUDA/OpenCL GPU programming
Numerical methods and linear algebra
LLVM, TVM tensor expressions, or TensorFlow MLIR
Experience with code coverage and static code analysis tools

This is an opportunity to have a wide impact at NVIDIA by improving development velocity across our many compute software projects. Are you creative, driven, and autonomous? Do you love a challenge? If so, we want to hear from you.

Deep Learning Software Engineering Intern, AI - 2025

Deep Learning Software Engineering Intern, AI - 2025

Want more jobs like this?

Search Additional Jobs