CO Salary Range: USD 138,000.00 per year

About the Team

We live and breathe big data. On a daily basis, we ingest and extract useful information from hundreds of live TV channels as well as collect, analyze and report on information from millions of TVs. Today, with over 23 million devices and operating at a massive scale leveraging modern architecture, design and technologies. As any organization that has grown organically and significantly over time, there is a lot to manage and an appetite for tech modernization. This means you will have the opportunity to propose, design and influence our current stack while helping to reimagine how we approach specific operational challenges such as administration, monitoring, logging, configuration management, and automation.

Want more jobs like this?

Get Science and Engineering jobs in Denver, CO delivered to your inbox every week.

By signing up, you agree to our Terms of Service & Privacy Policy.

What You Will Do

We are actively seeking an experienced and highly skilled Site Reliability Staff Engineer (SRE) to join our dynamic team. As a key player on the Operations team, the ideal candidate will demonstrate senior-level proficiency in leveraging cloud technologies to ensure the reliability, scalability and performance of our platform. In this role, you will play a crucial part in designing, building and reviewing key SRE metrics, managing on-call responsibilities and exceeding expectations of platform availability and incident response. You are graceful under pressure and ready to jump in when needed.
You should have a foundational understanding of end-to-end architecture with modern microservice based architectures and pipelines. You've been in the trenches designing, building, and the instrumentation and observability of highly-scalable and resilient modern based applications from code to backend systems. Leveraging Golden signals as key indicators of system health and performance you will establish and drive SLOs/SLIs across cross functional teams and all levels of the organization.
You'll work with others leveraging your ability to influence and provide site-reliability best practices to cross-functional teams to gain buy-in and support. Your responsibility will be to help anticipate every postmortem question about "whose job was that?" or "why don't we have this operational capability?"
You should exhibit senior-level expertise estimating, building and deploying large-scale systems deployed in a cloud environment which showcase a deep understanding of cloud architecture and infrastructure. This requires extensive experience in designing, implementing and managing complex AWS solutions, leveraging services such as Route53, SQS, EC2, S3, and EKS.

About You

You have excellent judgment and have made many critical decisions by collecting information, asking questions, and weighing various trade-offs.

Exceptional communications skills are a must, as the role involves effectively conveying insights to ensure clear understanding and alignment with both team members and stakeholders. You are an excellent active listener with humility and empathy. You naturally build relationships with your team and peers. You often debate the other side of issues and try on other people's perspectives so that you can learn from their experiences and values.

You are comfortable coaching others and remote teams. You embrace asynchronous written communication and work hard to help your team achieve an ideal flow with deep and focused work. You lean into difficult conversations to resolve a situation and do what is right. You understand the value of direct, transparent, and respectful feedback. You will have a history of fostering positive relationships across departments, enhancing teamwork and driving project objectives.

Above all else, you take ownership of the systems under your care. You strive to understand their inner workings and how they come together as a whole to deliver a business result. You take equal responsibility for understanding things both old and new, because our customers don't care. You strive to move constantly forward in an incremental fashion towards a future state, and you work to make it possible every day.

Cloud Platforms:

5+ years of proficiency in AWS or GCP and modern pipelining technologies and approaches.

Containerization and Orchestration:

3+ years of design, deployment and monitoring of containerization technologies like Docker and container orchestration tools such as Kubernetes

Systems / Infrastructure as Code (IAC):

2+ years of hands-on experience with IaC tools, such as Terraform or CloudFormation.

Monitoring and Logging:

4+ years of expertise in implementing and managing observability platforms and monitoring tools (New Relic, Grafana, Prometheus) feeding into SLOs/SLI objectives and logging solutions like ELK (Elasticsearch, Logstash, Kibana) or Splunk.

Automation:

3+ years of hands-on experience with scripting languages such as Python or Bash and configuration management tools like Salt, Ansible or Chef

CI/CD:

1+ years of hands-on experience with CI/CD pipelines like Jenkins.

Reliability and Performance:

3+ years of designing and implementing highly reliable, scalable and available systems with system optimization, performance and resource utilization

Incident Response:

3+ years of primary incident management, on-call support with incident response procedures and tools such as Pager Duty and related best practices.

Collaboration and Communication:

You possess a knack for fostering professional growth and knowledge-sharing with proven ability to guide and empower team members contributing to a collaborative and skill-enhancing work environment.

Documentation:

Proficient in creating and maintaining clear and comprehensive documentation.

Problem-Solving:

You strive to understand the problem you are trying to solve before deciding on the solution, and you are thoughtful and methodical in its implementation vs. jumping to the next tool.
Ability to troubleshoot complex issues in distributed systems.

About VIZIO

We are Beautifully Simple.

Headquartered in Irvine, California, VIZIO is a leading HDTV brand in America and the #1 Sound Bar Brand in America. VIZIO's mission is to deliver high performance, smarter products with the latest innovations at a significant savings that we can pass along to our consumers. Our loyal following and industry-wide praise continues to grow as we redefine what it means to be smart.

VIZIO, Inc. is an Equal Opportunity Employer committed to diversity in the workplace. All qualified applicants will receive consideration for employment without regards to race, color, religion, sex, sexual orientation, gender identity, gender expression, national origin, protected veteran status, or any other basis protected by applicable law, and will not be discriminated against on the basis of disability.

We do not accept unsolicited agency resumes. We will not pay fees to any third-party agency, outside recruiter or firm without a mutually agreed-upon contract and will not be responsible for any agency fees associated with unsolicited resumes. Unsolicited resumes will be considered our property and will be processed accordingly.

For Colorado-based employment: The minimum salary for this position is $138,000.00/year. In addition to base salary, the compensation package also includes eligibility for an annual bonus, as well as equity and a range of medical, dental, vision and other benefits.

Staff Site Reliability Engineer

Staff Site Reliability Engineer

Want more jobs like this?

Company Videos

Search Additional Jobs