Key Responsibilities-
- Candidates need to work from client office (all 5 days in a week) in Bangalore.
- Designing, deploying, and managing HPC systems and related infrastructure.
- Configuring and optimizing DGX clusters for performance, reliability, and scalability.
- Collaborating with data scientists, AI engineers, and IT teams to integrate HPC systems into our overall AI and deep learning workflows.
- Monitoring system performance and implementing proactive measures to maintain optimal operation.
- Troubleshooting and resolving issues related to HPC systems, including hardware, software, and network components.
- Implementing security measures and best practices to ensure the integrity and confidentiality of DGX-based data and workflows.
- Documenting infrastructure configurations, processes, and procedures.
- Providing technical guidance and training to team members HPC-related technologies and best practices.
- Staying current with OEMs that provide HPC hardware and software advancements and recommending upgrades or enhancements as needed.
Want more jobs like this?
Get jobs in Bangalore, India delivered to your inbox every week.
Qualifications-
- Bachelor's degree in Computer Science, Computer Engineering, Electrical Engineering, or a related field. (Master's degree preferred)
- 5 to 8 years of experience in the HPC domain
- Proven experience in designing, deploying, and managing NVIDIA DGX and/or HPC systems in production environments.
- Strong understanding of AI and deep learning frameworks and their integration with HPC systems.
- Proficiency in scripting languages such as Python for automation and configuration management.
- Experience with virtualization technologies (e.g., Docker, Kubernetes) in conjunction with HPC systems.
- Knowledge of storage solutions (e.g., NFS, Ceph) and their integration with HPC clusters.
- Familiarity with networking principles, protocols, and configurations related to HPC infrastructure.
- Excellent troubleshooting and problem-solving skills.
- Ability to work independently and collaboratively in a team environment.
- Effective communication skills, both verbal and written.