Skip to main contentA logo with &quat;the muse&quat; in dark blue text.

Staff Site Reliability Engineer

AT The Hartford
The Hartford

Staff Site Reliability Engineer

Charlotte, NC

Staff Reliability Engineer - IE07KE

We're determined to make a difference and are proud to be an insurance company that goes well beyond coverages and policies. Working here means having every opportunity to achieve your goals - and to help others accomplish theirs, too. Join our team as we help shape the future.

The Hartford's CARE - RE&A Organization is seeking an experienced and highly motivated Staff Reliability Engineer to lead infrastructure engineering initiatives, drive AI-powered automation, and integrate Generative AI (GenAI) into reliability engineering.

This role will have end-to-end accountability for the reliability of IT services within a defined application portfolio and building scalable, self-healing infrastructure by leveraging cloud-native architectures, predictive analytics, and AI-driven automation. The engineer will design and implement AI-powered observability solutions, intelligent incident response, and automated remediation strategies to proactively prevent failures and enhance service resilience.

Want more jobs like this?

Get jobs in Charlotte, NC delivered to your inbox every week.

By signing up, you agree to our Terms of Service & Privacy Policy.


Successful candidates will have expertise in infrastructure engineering, software reliability, and AI-driven automation while demonstrating strong problem-solving skills and leadership in cross-functional, AI-powered site reliability engineering (SRE) initiatives.

Responsibilities:

Guide the use of best-in-class software engineering standards and design practices for instrumenting code/application technology stack to enable the generation of relevant metrics on overall technology health - availability, performance, quality, currency and resiliency. Serve as key liaison between the architecture and software engineering teams to influence the technical strategy for the organization, keeping in mind its cross-functional impacts, integration across the organization, and architecture rationalization.

Function as the go-to technical expert for the applications supported, requiring depth and breadth of knowledge in technologies, applications, integration, interfaces and business domain.

IT Ops Responsibilities:

  • Ensure operational excellence. Independently drive the triaging and service restoration of all high impact incidents in order to minimize the mean time to service restoration and impact to the business. Demonstrate end-to-end ownership.
  • Partner with infrastructure teams to design and implement intelligent incident routing, enhanced monitoring/alerting capabilities and automated service restoration processes. Take proactive measures to prevent high impactful incidents.
  • Architect, build, and maintain highly available, scalable, and fault-tolerant infrastructure in cloud environments (AWS, GCP, Azure).
  • Implement observability solutions using tools like Splunk, Dynatrace, CloudWatch, Prometheus, Grafana and Open Telemetry to enhance visibility into system health.
  • Lead capacity planning, performance tuning, and incident response processes across distributed cloud-native architectures.
  • Develop self-healing mechanisms using AI/ML models to predict and mitigate infrastructure failures before they impact production.

DevSecOps Solution Responsibilities:

  • Develop effective tooling, alerts, and response mechanisms to identify and address reliability risks leveraging automation to support problem prevention, detection, mitigation, and resolution.
  • Progressively implement preventative controls and drive increased automation and self-healing capabilities. Continue to improve cost efficiency baselines.
  • Design and develop infrastructure as code (IaC) solutions using Terraform, CloudFormation, and CDK.
  • Implement CI/CD pipelines and enforce DevSecOps best practices for secure, compliant, and scalable deployments.
  • Promote and implement innovative solutions.
  • Automate infrastructure provisioning, configuration management, and remediation workflows using Python, or Bash scripting.

Generative AI & Intelligent Operations

  • Integrate Generative AI models into infrastructure operations to enhance incident detection, root cause analysis, and automated remediation.
  • Develop AI-powered chatbots or copilots to assist with troubleshooting, log analysis, and predictive maintenance.
  • Utilize LLMs and Vector Databases for intelligent automation in site reliability workflows.
  • Research and implement AI-driven anomaly detection to proactively identify risks and performance bottlenecks.

Qualifications:

  • 10+ years of experience in Infrastructure Engineering, Site Reliability Engineering (SRE), or DevOps.
  • Bachelor's degree or equivalent work experience in Computer Science, Information Technology Management, or associated degree
  • Ability to interact with diverse technical and non-technical groups in a matrix organization
  • Solid understanding of SAFe Agile methodologies
  • Familiarity with programming languages (Python, Java or JavaScript/Node.js)
  • Expertise with cloud platforms like AWS and microservices architecture
  • Hands on experience with Observability tools such as Dynatrace, SPLUNK, CloudWatch, CloudTrail, etc.
  • Hands on Experience with continuous integration and DevOps methodologies, tools including GitHub, Jenkins, Nexus,
  • Hands-on application development and production support is a plus
  • Hands-on experience with AI/ML frameworks, including Generative AI models for infrastructure automation.
  • Experience with AI-driven reliability engineering solutions is a strong plus.
  • Ability to develop, manage and communicate frameworks: e.g., Cloud Security Alliance
  • Solid understanding of technologies that support the services offered for cloud applications
  • Excellent analytical and problem-solving skills
  • Must have exceptional communication skills (written, oral, presentation and facilitation)

This role will have a Hybrid work schedule, with the expectation of working in an office (Columbus, OH, Chicago, IL, Hartford, CT or Charlotte, NC) 3 days a week (Tuesday through Thursday).

Candidates must be authorized to work in the US without company sponsorship. The company will not support the STEM OPT I-983 Training Plan endorsement for this position.

Compensation

The listed annualized base pay range is primarily based on analysis of similar positions in the external market. Actual base pay could vary and may be above or below the listed range based on factors including but not limited to performance, proficiency and demonstration of competencies required for the role. The base pay is just one component of The Hartford's total compensation package for employees. Other rewards may include short-term or annual bonuses, long-term incentives, and on-the-spot recognition. The annualized base pay range for this role is:

$126,160 - $189,240

Equal Opportunity Employer/Females/Minorities/Veterans/Disability/Sexual Orientation/Gender Identity or Expression/Religion/Age

About Us | Culture & Employee Insights | Diversity, Equity and Inclusion | Benefits

Client-provided location(s): Charlotte, NC, USA
Job ID: hartford-R2520221_Charlotte
Employment Type: Full Time

Perks and Benefits

  • Health and Wellness

    • Health Insurance
    • Health Reimbursement Account
    • Dental Insurance
    • Vision Insurance
    • Life Insurance
    • Short-Term Disability
    • Long-Term Disability
    • On-Site Gym
    • Mental Health Benefits
    • Virtual Fitness Classes
    • Fitness Subsidies
    • FSA
    • HSA
  • Parental Benefits

    • Birth Parent or Maternity Leave
    • Non-Birth Parent or Paternity Leave
    • Fertility Benefits
    • Adoption Assistance Program
    • Family Support Resources
    • Adoption Leave
  • Work Flexibility

    • Hybrid Work Opportunities
    • Remote Work Opportunities
    • Flexible Work Hours
  • Office Life and Perks

    • Commuter Benefits Program
    • Casual Dress
    • On-Site Cafeteria
    • Company Outings
    • Holiday Events
  • Vacation and Time Off

    • Paid Vacation
    • Paid Holidays
    • Volunteer Time Off
    • Personal/Sick Days
  • Financial and Retirement

    • 401(K) With Company Matching
    • Stock Purchase Program
    • Performance Bonus
    • Relocation Assistance
    • Financial Counseling
    • Profit Sharing
  • Professional Development

    • Internship Program
    • Leadership Training Program
    • Associate or Rotational Training Program
    • Tuition Reimbursement
    • Promote From Within
    • Mentor Program
    • Shadowing Opportunities
    • Access to Online Courses
    • Lunch and Learns
    • Learning and Development Stipend
  • Diversity and Inclusion

    • Employee Resource Groups (ERG)
    • Diversity, Equity, and Inclusion Program