Software engineer site reliability engineering- observability job offer

Software Engineer, Site Reliability Engineering- Observability

29 Sep 2024

Washington, Uswa

Software Engineer, Site Reliability Engineering- Observability

Vacancy expired!

Job Description

The Observability team ingests and serves petabytes of data from all services and systems across Twitter’s entire infrastructure. This data is mission critical for Twitters production services and includes system and service level metrics, which is where you’ll focus. As a Site Reliability Engineer embedded on the Observability team, you’ll bring the SRE discipline and perspective to the priorities and challenges we face.

What you’ll be doing:

- Build tooling to improve the automation of operations, and reduction of toil. This includes automatic failure remediation, application and systems deployment, capacity planning, and fleet management.

- Troubleshoot complex distributed systems handling millions of queries per second, petabytes of data.

- Collaborate with Software Engineering teams. Bring the SRE mindset for Availability, Reliability, Scalability, Disaster Recovery, Problem/Incident Management, and Performance of production services.

- Help bring our service to more data centers and cloud environments faster with reliable automation, Docker + Kubernetes, and other ideas you’ve got!

- Identify and contribute to solutions for reducing services outages, reducing alert noise, improving monitoring, and helping our services reach Service Level Objectives (SLOs).

- Participate in the teams Scrums and on-call rotation.

- Work with highly distributed and diverse hardware, software, and networking teams throughout the company.

Qualifications

- 3+ years of developing or managing services in a distributed, internet-scale, production environment.

- Practical knowledge of at least one programming language (Python, Go, Java, Ruby, C, Scala).

- Demonstrable knowledge of Linux operating system internals, TCP/IP, filesystems, disk/storage technologies.

- Experience with state configuration tools (Puppet, Chef, etc.).

- Experience setting up capacity plans for physical and/or virtual infrastructure.

- Ability to prioritize tasks and work independently. A self-starter.

- Good written and oral skills, to help create clarity when working across multiple services and stakeholders.

- Bonus: Hands on experience with Observability systems including metrics generation, monitoring, alerting, and dashboards for viewing/managing this data.

Additional Information

All of your information will be kept confidential according to EEO guidelines.

Job Details

ID

JC4923922
State

Washington
City

Uswa
Job type

Full-time
Salary

N/A
Hiring Company

Twitter
Date

2020-09-29
Deadline

2020-11-28
Category

Et cetera
Print

Software Engineer, Site Reliability Engineering- Observability

Software Engineer, Site Reliability Engineering- Observability

Software Engineer, Site Reliability Engineering- Observability

Job Description

Qualifications

Additional Information

Job Details

Navigation

Vacancies