Skip to main contentA logo with &quat;the muse&quat; in dark blue text.

AI Infra Advisory Researcher

AT Lenovo
Lenovo

AI Infra Advisory Researcher

Beijing, China

Why Work at Lenovo

We are Lenovo. We do what we say. We own what we do. We WOW our customers.

Lenovo is a US$57 billion revenue global technology powerhouse, ranked #248 in the Fortune Global 500, and serving millions of customers every day in 180 markets. Focused on a bold vision to deliver Smarter Technology for All, Lenovo has built on its success as the world's largest PC company with a full-stack portfolio of AI-enabled, AI-ready, and AI-optimized devices (PCs, workstations, smartphones, tablets), infrastructure (server, storage, edge, high performance computing and software defined infrastructure), software, solutions, and services. Lenovo's continued investment in world-changing innovation is building a more equitable, trustworthy, and smarter future for everyone, everywhere. Lenovo is listed on the Hong Kong stock exchange under Lenovo Group Limited (HKSE: 992) (ADR: LNVGY).

Want more jobs like this?

Get Data and Analytics jobs in Beijing, China delivered to your inbox every week.

By signing up, you agree to our Terms of Service & Privacy Policy.


This transformation together with Lenovo's world-changing innovation is building a more inclusive, trustworthy, and smarter future for everyone, everywhere. To find out more visit www.lenovo.com, and read about the latest news via our StoryHub.

Description and Requirements

工作职责:

1.负责设计高可用大模型训练容错系统,支持千亿大模型预训练

2.负责大模型训练容错checkpoint优化,提升大模型checkpoint读写与恢复性能

3.负责大模型弹性训练框架的研发

岗位要求:

1.全日制硕士以上学历,计算机科学与技术、人工智能等相关专业;

2.熟练C++/Python语言、数据结构以及计算机系统结构,有AI模型性能调优经验,以及良好的工程实现能力;

3.熟悉 AI 领域常见的分布式训练技术,包括但不限于:数据并行、流水线并行和张量并行等,具有相应的项目经验;

4.至少熟悉一种AI框架(PyTorch/TensorFlow/Paddle/DeepSpeed等),能够熟练使用和调试;

5.熟悉 GPU 硬件结构和 CUDA 计算原理,有 CUDA 相关算子开发、调试经验,对 NCCL/cuDNN 等有一定了解;

6.对大规模预训练模型有较好的了解,熟悉常见的预训练模型(如GPT、BERT等)结构、训练方法和优化技巧。

7.具备出色的问题解决能力和创新思维,能够分析和解决复杂的训练问题,并提出改进和优化的方案;

8.具有良好的团队合作精神,能够与跨部门的团队紧密合作,共同推动项目的成功。

加分项:

1.有大模型研发和分布式训练经验;

2.熟悉Kubernetes架构以及大模型训练容错系统;

3.在AI或者HPC领域发表过高水平论文。

Additional Locations:
* China - Beijing - 北京(Beijing)

Client-provided location(s): Beijing, China
Job ID: Lenovo-WD00067560
Employment Type: Full Time