Vacancy expired!
SREs, ensure that our Cloud services meet the reliability and uptime requirements of our demanding enterprise customers. This is achieved with proactivity, through the practice of sound engineering practices and resilient design from day 0; as well as with reactively, through a well-defined and effective on-call rotation that runs 24x7.
SREs engineer our production systems to be run at scale, so that manual and repetitive work is fully eliminated. They follow blameless postmortems practices so that all incidents are well understood and problems are fixed at their root. Over time, they make our systems more robust, fault-tolerant and able to self-heal during the worst of outages and through the most unexpected circumstances.
SREs are experts in troubleshooting complex problems and can dig very deep into why systems break in production. In order to do that, they rely on observability practices like centralized logging, distributed tracing and anomaly detection. They shorten detection (MTTD) and recovery times (MTTR), by improving the accuracy of alarms and speed of troubleshooting.
SREs leverage the latest infrastructure automation best practices and the toolset offered by Cloud Providers, so that they multiply their effectiveness and reach bigger outcomes.
Key Responsibilities and Skills: