Skip to main contentA logo with &quat;the muse&quat; in dark blue text.

Lead HPC Engineer

AT EPAM Systems
EPAM Systems

Lead HPC Engineer

Barra do Garças, Brazil

We are currently searching for a Lead HPC Engineer to manage the daily operations and engineering tasks within our HPC framework.
The perfect candidate should be a proficient engineer with vast experience in setting up and enhancing HPC infrastructure. The role entails collaborating with our Level-3 HPC infrastructure engineering team to facilitate the utilization of an HPC cluster by our Scientific research team. Although preference will be for candidates from India, the role is accessible to candidates from any geographical location.

#LI-DNI

Responsibilities

  • Maintenance and support of the HPC infrastructure
  • Employing Infrastructure as Code (IaC) for infrastructure automation
  • Incident resolution and involvement in software and hardware upgrades
  • Administration of job scheduling and resource management using HPC job schedulers
  • Installation and configuration of Bright Cluster Manager
  • Optimization and maintenance of GPFS/Lustre file systems
  • Supervision of configurations for InfiniBand/OmniPath network interconnects
Requirements

Want more jobs like this?

Get jobs in Barra do Garças, Brazil delivered to your inbox every week.

By signing up, you agree to our Terms of Service & Privacy Policy.
  • Minimum of 7 years as an HPC technical expert
  • Knowledge in engineering or HPC system development
  • Expertise in supporting and setting up HPC infrastructure
  • Proficiency in Linux (any rpm-based) including compiling kernel modules, and using debugging tools like strace, coredump, and tcpdump
  • Background in managing HPC job schedulers such as IBM LSF and Slurm
  • Qualifications in configuring and implementing Bright Cluster Manager
  • Understanding of both GPFS and Lustre file systems
  • Familiarity with InfiniBand and OmniPath network interconnect technologies
Nice to have
  • Proficiency in hardware diagnostics, upgrades, and tuning including HCA InfiniBand and disk arrays from Lustre, Vast, IBM
  • Capability to use infrastructure monitoring tools like Zabbix, Splunk, or Grafana
  • Understanding of Easybuild
  • Background in working within a GxP environment
  • Familiarity with Jira and ServiceNow

Client-provided location(s): Brazil
Job ID: EPAM-epamgdo_blt3a19056b828fc795_en-us_Other_Brazil
Employment Type: Other