Skip to main contentA logo with &quat;the muse&quat; in dark blue text.

Staff Software Engineer, Machine Learning Infrastructure, Google Cloud

AT Google
Google

Staff Software Engineer, Machine Learning Infrastructure, Google Cloud

Bangalore, India

Minimum qualifications:

  • Bachelor's degree in Computer Science or a related technical field or equivalent practical experience.
  • 8 years of experience with software development in one or more programming languages, and with data structures/algorithms.
  • 5 years of experience testing, and launching software products, and 3 years of experience with software design and architecture.
  • 5 years of experience with machine learning algorithms and tools (e.g., TensorFlow), artificial intelligence, deep learning or natural language processing.
Preferred qualifications:
  • Experience with Kubernetes, Google Kubernetes Engine, GPU Programming, TensorFlow, and Cloud.
  • Experience analyzing ML models performance or working on LLM prompting, training or developing LLMs.
  • Experience and knowledge of CPU/GPU architecture or HW accelerators
  • Ability to quickly adapt to new tools, frameworks, and languages.

Want more jobs like this?

Get jobs in Bangalore, India delivered to your inbox every week.

By signing up, you agree to our Terms of Service & Privacy Policy.


About the job

Google Cloud's software engineers develop the next-generation technologies that change how billions of users connect, explore, and interact with information and one another. We're looking for engineers who bring fresh ideas from all areas, including information retrieval, distributed computing, large-scale system design, networking and data storage, security, artificial intelligence, natural language processing, UI design and mobile; the list goes on and is growing every day. As a software engineer, you will work on a specific project critical to Google Cloud's needs with opportunities to switch teams and projects as you and our fast-paced business grow and evolve. You will anticipate our customer needs and be empowered to act like an owner, take action and innovate. We need our engineers to be versatile, display leadership qualities and be enthusiastic to take on new problems across the full-stack as we continue to push technology forward.

In this role, you will join a team that's part of Google's Core ML organization, focused on optimizing Google's Machine Learning resources. You will help develop monitoring tools and dashboards to track the performance and efficiency of TPUs and GPUs, which are used across all Google products. This data helps to improve resource allocation, identify areas for improvement, and drive efficiency gains across Google's products.

Google Cloud accelerates every organization's ability to digitally transform its business and industry. We deliver enterprise-grade solutions that leverage Google's cutting-edge technology, and tools that help developers build more sustainably. Customers in more than 200 countries and territories turn to Google Cloud as their trusted partner to enable growth and solve their most critical business problems.

Responsibilities

  • Design, implement and advance the telemetry capabilities needed for monitoring and evaluating the fleet-wide efficiency of ML resources (TPUs and GPUs). This includes identifying the right underlying signals, devising the right high-level metrics of interest, and creating common dashboards for highlighting fleet-wide performance and efficiency.
  • Identify opportunities to improve the efficiency of the ML fleet and build solutions and capabilities to improve ML fleet efficiency.
  • Build reporting and analytic solutions with key partners, and provide in-depth analysis of the metrics to improve the operation and utilization of ML resources.
  • Drive collaboration with various teams (across different PAs) as needed to accomplish the efficiency improvement goals.
  • Lead junior SWEs towards delivering project goals.

Client-provided location(s): Bengaluru, Karnataka, India
Job ID: Google-136796127029535430
Employment Type: Other

Perks and Benefits

  • Health and Wellness

    • Health Insurance
    • Dental Insurance
    • Vision Insurance
    • Life Insurance
    • Short-Term Disability
    • Long-Term Disability
    • FSA
    • HSA
    • Fitness Subsidies
    • On-Site Gym
    • Mental Health Benefits
    • Health Reimbursement Account
    • HSA With Employer Contribution
  • Parental Benefits

    • Birth Parent or Maternity Leave
    • Non-Birth Parent or Paternity Leave
    • Fertility Benefits
    • Adoption Assistance Program
    • Family Support Resources
    • Adoption Leave
  • Work Flexibility

    • Hybrid Work Opportunities
  • Office Life and Perks

    • Commuter Benefits Program
    • Casual Dress
    • Pet-friendly Office
    • Snacks
    • Some Meals Provided
    • On-Site Cafeteria
  • Vacation and Time Off

    • Paid Vacation
    • Paid Holidays
    • Personal/Sick Days
    • Leave of Absence
    • Volunteer Time Off
  • Financial and Retirement

    • 401(K) With Company Matching
    • Company Equity
    • Performance Bonus
    • Financial Counseling
  • Professional Development

    • Tuition Reimbursement
    • Internship Program
    • Learning and Development Stipend
  • Diversity and Inclusion

    • Employee Resource Groups (ERG)

Company Videos

Hear directly from employees about what it is like to work at Google.