Skip to main contentA logo with &quat;the muse&quat; in dark blue text.
World Wide Technology

HPC Engineer

Gurgaon, India

Position Overview- We are looking for a skilled High-Performance Computing (HPC) Infrastructure Engineer to join our dynamic team. As a HPC Infrastructure Engineer, you will be responsible for designing, implementing, and maintaining HPC-based infrastructure. You will play a crucial role in ensuring the optimal performance, reliability, and scalability of our deep learning and AI infrastructure.

Key Responsibilities-

  • Designing, deploying, and managing HPC systems and related infrastructure.
  • Configuring and optimizing DGX clusters for performance, reliability, and scalability.
  • Collaborating with data scientists, AI engineers, and IT teams to integrate HPC systems into our overall AI and deep learning workflows.
  • Monitoring system performance and implementing proactive measures to maintain optimal operation.
  • Troubleshooting and resolving issues related to HPC systems, including hardware, software, and network components.
  • Implementing security measures and best practices to ensure the integrity and confidentiality of DGX-based data and workflows.
  • Documenting infrastructure configurations, processes, and procedures.
  • Providing technical guidance and training to team members HPC-related technologies and best practices.
  • Staying current with OEMs that provide HPC hardware and software advancements and recommending upgrades or enhancements as needed.

Want more jobs like this?

Get jobs delivered to your inbox every week.

Select a location
By signing up, you agree to our Terms of Service & Privacy Policy.

Qualifications-

  • Bachelor's degree in Computer Science, Computer Engineering, Electrical Engineering, or a related field. (Master's degree preferred)
  • 5 to 8 years of experience in the HPC domain
  • Proven experience in designing, deploying, and managing NVIDIA DGX and/or HPC systems in production environments.
  • Strong understanding of AI and deep learning frameworks and their integration with HPC systems.
  • Proficiency in scripting languages such as Python for automation and configuration management.
  • Experience with virtualization technologies (e.g., Docker, Kubernetes) in conjunction with HPC systems.
  • Knowledge of storage solutions (e.g., NFS, Ceph) and their integration with HPC clusters.
  • Familiarity with networking principles, protocols, and configurations related to HPC infrastructure.
  • Excellent troubleshooting and problem-solving skills.
  • Ability to work independently and collaboratively in a team environment.
  • Effective communication skills, both verbal and written.

Client-provided location(s): Gurugram, Haryana, India; Bengaluru, Karnataka, India; Hyderabad, Telangana, India; Mumbai, Maharashtra, India; Pune, Maharashtra, India
Job ID: world_wide_technology-5001064591500
Employment Type: Other