Introduction
Are you interested in multimodal large language models (LLMs) that combine vision, speech, and language? Are you passionate about developing systems that make a real-world impact? Would you enjoy publishing your work in the most prestigious AI conferences in the world, and making your code open source? If you answered yes to these questions, then you should apply to our summer internship position at IBM. We are seeking highly motivated students with background in multimodal LLMs to join our team.
Your Role and Responsibilities
This is for a 2025 summer internship with the following start dates: May - August or June - September for quarter system schools.
As an intern, you will be responsible to conduct cutting-edge research and development on large language and multimodal models for exciting enterprise use cases. In this role, you are expected to develop high quality software to support novel AI model architectures, new techniques for cross-modal synthetic data generation, push the frontiers on vision and/or speech understanding, and develop novel approaches for aligning modalities to large language models, among other possible projects.
Want more jobs like this?
Get Science and Engineering jobs delivered to your inbox every week.
Required Technical and Professional Expertise
- Candidates must be enrolled in a Master's or a PhD program
- Hands-on experience with multimodal LLMs, computer vision, or speech processing
- Solid knowledge of self-attention models such as Transformers, Conformers and statistical inference
- Strong programming skills and knowledge about deep learning libraries
- Great problem solving skills, with a strong desire for quality and engineering excellence
Preferred Technical and Professional Expertise
- Publications in top-tier AI conferences