Software Engineer, Site Reliability Engineering- Observability

Software Engineer, Site Reliability Engineering- Observability

29 Sep 2024
Washington, Uswa 00000 Uswa USA

Software Engineer, Site Reliability Engineering- Observability

Vacancy expired!

Job Description

The Observability team ingests and serves petabytes of data from all services and systems across Twitter’s entire infrastructure. This data is mission critical for Twitters production services and includes system and service level metrics, which is where you’ll focus. As a Site Reliability Engineer embedded on the Observability team, you’ll bring the SRE discipline and perspective to the priorities and challenges we face.

What you’ll be doing:

- Build tooling to improve the automation of operations, and reduction of toil. This includes automatic failure remediation, application and systems deployment, capacity planning, and fleet management.

- Troubleshoot complex distributed systems handling millions of queries per second, petabytes of data.

- Collaborate with Software Engineering teams. Bring the SRE mindset for Availability, Reliability, Scalability, Disaster Recovery, Problem/Incident Management, and Performance of production services.

- Help bring our service to more data centers and cloud environments faster with reliable automation, Docker + Kubernetes, and other ideas you’ve got!

- Identify and contribute to solutions for reducing services outages, reducing alert noise, improving monitoring, and helping our services reach Service Level Objectives (SLOs).

- Participate in the teams Scrums and on-call rotation.

- Work with highly distributed and diverse hardware, software, and networking teams throughout the company.

Qualifications

- 3+ years of developing or managing services in a distributed, internet-scale, production environment.

- Practical knowledge of at least one programming language (Python, Go, Java, Ruby, C, Scala).

- Demonstrable knowledge of Linux operating system internals, TCP/IP, filesystems, disk/storage technologies.

- Experience with state configuration tools (Puppet, Chef, etc.).

- Experience setting up capacity plans for physical and/or virtual infrastructure.

- Ability to prioritize tasks and work independently. A self-starter.

- Good written and oral skills, to help create clarity when working across multiple services and stakeholders.

- Bonus: Hands on experience with Observability systems including metrics generation, monitoring, alerting, and dashboards for viewing/managing this data.

Additional Information

All of your information will be kept confidential according to EEO guidelines.

Job Details

  • ID
    JC4923922
  • State
  • City
  • Job type
    Full-time
  • Salary
    N/A
  • Hiring Company
    Twitter
  • Date
    2020-09-29
  • Deadline
    2020-11-28
  • Category

Jocancy Online Job Portal by jobSearchi.