Introduction
At IBM Research, we invent things that matter to the world. Today, we are pioneering the most promising and disruptive technologies that will transform industries and society, including the future of AI, Hybrid Clound and Quantum Computing. We are driven to discover. With more than 3,000 researchers in 12 labs located across six continents, IBM Research is one of the world's largest and most influential corporate research labs.
Your Role and Responsibilities
We are seeking a candidate with proven interest and experience in implementing innovative solutions focused on resilient and robust computing environments, with a focus on IBM's initiatives around AI technologies. To ensure these AI infrastructures run smoothly and efficiently, it is crucial to monitor their performance and health continuously. Effective monitoring helps in identifying and addressing issues before they impact operations.
Want more jobs like this?
Get jobs delivered to your inbox every week.
Required Technical and Professional Expertise
- 3-5 years Prometheus, Grafana, Thanos and various exporters feeding a comprehensive set of data for monitoring the environment.
- Optimization and integration of monitoring tools to effectively enhance the performance and reliability of AI infrastructure.
- Great problem solving skills, with a strong desire for quality and engineering excellence
Preferred Technical and Professional Expertise
- Strong team player with excellent verbal and written communication skills