岗位职责:
- 负责大模型训练和推理服务的优化、性能提升、加速、动态扩展和容错稳定性
- 研究如何实现并工程落地已知的算法、原理或需求及方案
- 跟踪并改进业界开源方案
- 与内部团队配合,按照计划完成进度要求的方案实现
- 支持或与其他团队协作,配合完成线上环境部署
- 负责生产线上环境的支持、故障分析和解决
Requirements:
- Responsible for the optimization, performance improvement, acceleration, dynamic scaling and fault-tolerant stability of large model training and inference services
- Study how to implement and engineering the known algorithms, principles or requirements and solutions.
- Track and improve industry open source solutions
- Cooperate with internal teams to complete the implementation of solutions according to the schedule requirements
- Support or collaborate with other teams to complete online environment deployment
- Responsible for the support, fault analysis and resolution of the production line environment
Want more jobs like this?
Get Science and Engineering jobs in Hangzhou, China delivered to your inbox every week.
任职要求
Qualifications
- Proficient in Python, PyTorch programming, better to have Linux Shell, Go
- Experienced in Linux, Docker, Kubernetes
- Experience with Machine Learning, Neural Networking, Deep Learning, Model Training.
- Understand the Transformer model architecture, LLM
- Experienced with the distributed model training frameworks, Huggingface training library.
- Better to have experience on NLP
- Experience in Nvidia GPU, Cuda, NCCL, Tensor-RT, RDMA, RoCE, high throughput GPU cluster.
- Experienced in Cloud Native Development, AWS, EKS, EC2
- Understand in the acceleration technology of LLM model training and inference on GPU and distributed GPU cluster, like FlashAttention, PageAttention, Continues Batching
- Experienced in the Inference engine solutions, prefilling, docoding, quantization, speculative decoding etc.
- Experienced in inference cluster management, dynamical scaling on Kubernetes.
Ways of Working
Our structured hybrid approach is centered around our offices and remote work environments. The work style of each role, Hybrid, Remote, or In-Person is indicated in the job description/posting.
Benefits
As part of our award-winning workplace culture and commitment to delivering happiness, our benefits program offers a variety of perks, benefits, and options to help employees maintain their physical, mental, emotional, and financial health; support work-life balance; and contribute to their community in meaningful ways. Click Learn for more information.
About Us
Zoomies help people stay connected so they can get more done together. We set out to build the best collaboration platform for the enterprise, and today help people communicate better with products like Zoom Contact Center, Zoom Phone, Zoom Events, Zoom Apps, Zoom Rooms, and Zoom Webinars.
We're problem-solvers, working at a fast pace to design solutions with our customers and users in mind. Here, you'll work across teams to deliver impactful projects that are changing the way people communicate and enjoy opportunities to advance your career in a diverse, inclusive environment.
Our Commitment
We believe that the unique contributions of all Zoomies is the driver of our success. To make sure that our products and culture continue to incorporate everyone's perspectives and experience we never discriminate on the basis of race, religion, national origin, gender identity or expression, sexual orientation, age, or marital, veteran, or disability status. Zoom is proud to be an equal opportunity workplace and is an affirmative action employer. All your information will be kept confidential according to EEO guidelines.
We welcome people of different backgrounds, experiences, abilities and perspectives including qualified applicants with arrest and conviction records and any qualified applicants requiring reasonable accommodations in accordance with the law.
If you need assistance navigating the interview process due to a medical disability, please submit an Accommodations Request Form and someone from our team will reach out soon. This form is solely for applicants who require an accommodation due to a qualifying medical disability. Non-accommodation-related requests, such as application follow-ups or technical issues, will not be addressed.