Responsibilities
The Global E-commerce SRE team of US Tech Services works with engineering and product teams to build and run large-scale, globally distributed, observable, fault-tolerant systems. As an SRE, you will deliver on production ownership and be responsible for observability and automation across complex, large-scale service mesh architectures.
Responsibilities
- Own the service level of a critical, revenue generating E-commerce platform as well as all supporting infrastructure and services. This role will focus on service reliability, highly-scalable design and release management in a cloud-native environment.
- Define service level indicators and data-driven objectives to uphold and improve uptime, latency, and system health of a core TikTok production platform.
Want more jobs like this?
Get Software Engineering jobs in Sydney, Australia delivered to your inbox every week.
- Collaborate cross team with engineering and product to ensure that key requirements (such as capacity planning and launch reviews) are performed to enable transparent service delivery to customers.
- Automation geared towards infrastructure-as-code, scalability and service resiliency
- Implement SRE practices around incident management, post-mortems while being part of on-call rotations.
Qualifications
Minimum Qualifications
- Good understanding of Unix/Linux operating systems internals and networking
- Experience writing code in Java, Go, Python or a similar language
- Experience with algorithms, data structures, complexity analysis and software design
- Experience developing tools and APIs to reduce manual interaction with systems and applications using a variety of coding and scripting standards
- Systematic problem-solving approach, coupled with effective communication skills and a sense of drive
Preferred Qualifications
- Experience with running production grade web services at scale in a cloud native environment.
- Experience with implementing observability solutions such as monitoring, logging and tracing in complex service meshes.
- Expertise in designing, analyzing, and troubleshooting large-scale distributed systems