Skip to main contentA logo with &quat;the muse&quat; in dark blue text.

Deep Learning Software Engineering Intern, AI - 2025

AT NVIDIA
NVIDIA

Deep Learning Software Engineering Intern, AI - 2025

Munich, Germany

NVIDIA's Deep Learning Libraries Team is looking for excellent interns to enable the next wave of NVIDIA's highest performing deep learning libraries, such as cuDNN, cuBLAS and TensorRT. The mission is to design and develop scalable and modular software products that enable breakthroughs in problems from image classification to speech recognition to natural language processing and artificial intelligence. Join the team that is building the underlying software used across the world to power the revolution in artificial intelligence! We're always striving for peak GPU efficiency on current and future-generation GPUs. To get a sense of the code we write, check out our CUTLASS open-source project showcasing performant matrix multiply on NVIDIA's Tensor Cores with CUDA. This specific position primarily deals with code lower in the deep learning software stack, right down to the GPU HW.

Want more jobs like this?

Get jobs in Munich, Germany delivered to your inbox every week.

By signing up, you agree to our Terms of Service & Privacy Policy.


What you'll be doing:

In this role, you will be responsible for developing and delivering highly optimized deep learning products. The scope of these efforts ranges from defining the public APIs to performance tuning and analysis, from building developer infrastructure to testing automation, from joining architecture discussion to learning latest and greatest technologies from the research community. During your internship, you will be doing one or multiple of activities below:

  • Writing highly tuned compute kernels, mostly in C++ CUDA, to perform core deep learning operations (e.g. matrix multiplies, convolutions, normalizations)
  • Collaborating with teams across NVIDIA:
  • CUDA compiler team on generating optimal assembly code
  • Deep learning training and inference performance teams on which layers require optimization
  • Hardware and architecture teams on the programming model for new deep learning hardware features
  • Develop robust and scalable GPU-accelerated deep learning libraries, using C++ and object oriented design
  • Building scalable automation for build, test, integration, and release processes for publicly distributed deep learning libraries
  • Maintain and test environments for new hardware, new OSes, and platforms by using industry-standard tools (e.g. Kubernetes, Jenkins, Docker, CMake, Gitlab, Jira, etc)
  • Participate in a high-energy and dynamic company culture to develop state of the art software and hardware products and practice hardware-software co-design

What we need to see:

  • Pursuing a BS, MS or PhD in Computer Science, Compute Engineering or similar
  • Demonstrated strong C++ programming and software design skills, including debugging, problem solving, performance analysis, and test design
  • Experience with performance-oriented parallel programming, even if it's not on GPUs (e.g. with OpenMP or pthreads)
  • Or experience in SCM (e.g. Git, Perforce) and build systems (e.g. Make, CMake, Bazel)
  • Passion for "it just works" automation and enabling team members

Ways to stand out from the crowd:

  • Experience in optimizing/tuning BLAS or deep learning library kernel code
  • Knowledge of CUDA/OpenCL GPU programming
  • Numerical methods and linear algebra
  • LLVM, TVM tensor expressions, or TensorFlow MLIR
  • Experience with code coverage and static code analysis tools

This is an opportunity to have a wide impact at NVIDIA by improving development velocity across our many compute software projects. Are you creative, driven, and autonomous? Do you love a challenge? If so, we want to hear from you.

Client-provided location(s): Munich, Germany
Job ID: NVIDIA-JR1991038
Employment Type: Intern