At Cadence, we hire and develop leaders and innovators who want to make an impact on the world of technology.
Responsibilities;
- Responsible for assisting with all projects and repairs throughout the data center.
- Participate in an on-call rotation and provide hands-on coverage during maintenance.
- Direct and perform tasks related to solving operational issues within the data center
- Analyze and design operations that will improve workflow, handle equipment layout, and help ensure accident prevention
- Support operations, including the physical layout of equipment.
- Customer deployments and ensure on-time bring-up of GPU Servers.
- InfiniBand fabric bring-up, configuration, and subnet management on the IB switch.
Want more jobs like this?
Get Software Engineering jobs that are Remote delivered to your inbox every week.
- Document existing operational processes, equipment, and processes.
- Utilize a framework for monitoring tools, escalating key issues, and ensuring timely service implementation.
- Diagnoses/troubleshoots/installs/repairs all software, hardware, and components.
- Installing, Basic Configuring, and Troubleshooting Networking Equipment: Routers and Switches.
- Good understanding of the OSI Model and TCP/IP protocol suite (IP, ARP, ICMP, TCP, UDP, SMTP, FTP, TFTP)
- Configure Terminal Servers for out-of-band management
- Manage daily issues, including daily health checks of servers and processes, working closely with end-users, development teams, and Infrastructure teams to prioritize, resolve, and mitigate outages.
- Server installation and maintenance (rack and stack, label, HDD, memory, CPU, RAID batteries, NICs, etc.)
- Able to review design documentation & validate equipment deployment according to plans
- Network installation and maintenance (rack and stack, label, cabling, parts replacement, etc.)
- The site builds and refreshes while meeting current quality standards
- Interact with onsite staff and vendors for hardware replacement, delivery, and diagnostics.
- Perform operational tasks associated with data center implementation, migration, deployments, cabling, rack, and stack.
- Responsible for assisting with all projects and repairs throughout the data center.
- Participate in an on-call rotation and provide hands-on coverage during maintenance.
Requirements;
- Experience with cluster bring-up, drivers, loading
- Experience with GPU end to end testing in a cluster with InfiniBand
- Experience with setup of GPU servers in a cluster.
- Need experience in Linux environments and proficiency in tasks such as shell scripting
- Excellent data center organization skills and meticulous attention to detail.
- Familiarity with fiber and copper network cabling, including IP and SAN deployments.
- Responsible for maintaining acceptable ticket loads and incident SLAs.
- Follow documented escalation procedures.
- Sync with global teams on various tasks and upcoming initiatives.
- Understand and adhere to documented policies, processes, and procedures
- Assist with process improvement initiatives and documentation of policies, processes, and procedures, including runbooks.
- Able to move 50+ pounds
#LI-MA1
We're doing work that matters. Help us solve what others can't.