Senior site reliability engineer coordination and service discovery infrastructure job offer

Senior Site Reliability Engineer, Coordination and Service Discovery Infrastructure

29 Sep 2024

Washington, Uswa

Senior Site Reliability Engineer, Coordination and Service Discovery Infrastructure

Vacancy expired!

Job Description

The Coordination team develops and operates highly-available foundational services that are used by almost every engineer at Twitter. Our vision is to provide robust control plane infrastructure for Twitter that serves use cases such as distributed coordination, service discovery, topology management, and configuration management. We manage one of the world’s largest ZooKeeper deployments and are actively involved with the open source community! As a Site Reliability Engineer embedded on the Coordination & Service Discovery Infrastructure team, you’ll bring the SRE discipline and perspective to the priorities and challenges we face.

What you’ll be doing:

- Build tooling to improve the automation of operations, and reduction of toil. This includes automatic failure remediation, application and systems deployment, capacity planning, and fleet management.

- Troubleshoot mission-critical distributed systems that have some of the highest availability and lowest latency objectives within Twitter.

- Collaborate with Software Engineering teams. Bring the SRE mindset for availability, reliability, scalability, disaster recovery, problem/incident management, and performance of production services.

- Help bring our service to more data centers and cloud environments faster with reliable automation, Docker + Kubernetes, and other ideas you’ve got!

- Identify and contribute to solutions for reducing service downtime, reducing alert noise, improving monitoring, and helping our services reach Service Level Objectives (SLOs).

- Participate in the teams Scrums and On Call rotation.

- Work with highly distributed and diverse hardware, software, and networking teams throughout the company.

Qualifications

- 5+ years of improving the Reliability of data intensive applications, storage engines, and distributed systems in an internet-scale production environment.

- Practical knowledge of at least one programming language (Python, Java, Ruby, C, C, Scala, or any other modern systems language).

- Demonstrable knowledge of Linux operating system internals, TCP/IP, filesystems, disk/storage technologies.

- Experience with state configuration tools (Puppet, Chef, etc.).

- Experience setting up capacity plans for physical and/or virtual infrastructure.

- Ability to prioritize tasks and work independently. A self-starter.

- Good written and oral skills, to help create clarity when working across multiple services and stakeholders.

- Bonus: Hands on experience with Finagle, ZooKeeper, or service discovery systems. Hands on experience with Mesos and Aurora.

Additional Information

All of your information will be kept confidential according to EEO guidelines.

Related jobs

Senior Pre Service Specialist Remote

Washington, Uswa, Et cetera

Description

More info...
Senior Pre-Service Specialist Remote

Washington, Uswa, Et cetera

Description

More info...
Senior Account Executive - LFS

Washington, Uswa, Et cetera

About the Role

More info...

Job Details

ID

JC4923941
State

Washington
City

Uswa
Job type

Full-time
Salary

N/A
Hiring Company

Twitter
Date

2020-09-29
Deadline

2020-11-28
Category

Et cetera
Print

Senior Site Reliability Engineer, Coordination and Service Discovery Infrastructure

Senior Site Reliability Engineer, Coordination and Service Discovery Infrastructure

Senior Site Reliability Engineer, Coordination and Service Discovery Infrastructure

Job Description

Qualifications

Additional Information

Related jobs

Senior Pre Service Specialist Remote

Senior Pre-Service Specialist Remote

Senior Account Executive - LFS

Job Details

Navigation

Vacancies