Skip to main contentA logo with &quat;the muse&quat; in dark blue text.

Chief HPC Engineer

AT EPAM Systems
EPAM Systems

Chief HPC Engineer

Barra do Garças, Brazil

We are actively seeking a seasoned Chief HPC Engineer to oversee daily operations and engineering tasks within our HPC framework.
An ideal candidate will have substantial engineering proficiency and extensive knowledge of developing and improving HPC infrastructures. This position requires working alongside our L3 HPC infrastructure engineering team to facilitate the usage of an HPC cluster by our scientific research department. Preference will be given to candidates based in India, but the role is open to applicants from all locations.

#LI-DNI

Responsibilities

  • Sustain and support the HPC framework
  • Employ infrastructure automation via IaC (Infrastructure as Code)
  • Engage in both software and hardware enhancements and address incidents
  • Oversee job scheduling and the allocation of resources using HPC job schedulers
  • Set up and configure Bright Cluster Manager
  • Enhance and preserve GPFS/Lustre file systems
  • Oversee configurations of InfiniBand/OmniPath network interconnects
Requirements

Want more jobs like this?

Get jobs in Barra do Garças, Brazil delivered to your inbox every week.

By signing up, you agree to our Terms of Service & Privacy Policy.
  • Over 10 years as a technical specialist in HPC
  • Engineering background or development experience in HPC systems
  • Capability to set up and support HPC infrastructures
  • Proficiency with Linux (rpm-based), including compiling kernel modules and using debugging tools like strace, coredump, and tcpdump
  • Expertise in management of HPC job schedulers like IBM LSF and Slurm
  • Qualifications in configuring and installing Bright Cluster Manager
  • Showcase of knowledge in GPFS and Lustre file systems
  • Understanding of InfiniBand and OmniPath network interconnects
Nice to have
  • Understanding of hardware diagnostics, upgrades, and performance tuning, including HCA InfiniBand and disk arrays from Lustre, Vast, IBM
  • Expertise in infrastructure monitoring through tools like Zabbix, Splunk, or Grafana
  • Familiarity with EasyBuild
  • Background in a GxP regulated environment
  • Flexibility to use Jira and ServiceNow

Client-provided location(s): Brazil
Job ID: EPAM-epamgdo_blt68854751e4fc23e5_en-us_Other_Brazil
Employment Type: Other