Site Reliability Engineer

Site Reliability Engineer

02 Aug 2024
California, Sanfrancisco, 94101 Sanfrancisco USA

Site Reliability Engineer

Vacancy expired!

Site Reliability Engineer (3 Positions) San Francisco-CA (remote-till covid) What You'll Do

    • Gain deep knowledge of our complex applications.
    • Serve as a primary point responsible for the overall health, performance, and capacity of Tempo platform and applications.
    • Design, develop and support tools and libraries as part of Infrastructure Tooling & Automation
    • Develop automation tools to support growing infrastructure and provide reporting and APIs for various applications
    • Develop tools to improve our ability to rapidly deploy and effectively monitor custom applications in a large-scale UNIX environment.
    • Troubleshoot and resolve issues with core infrastructure services
    • Incubate new ideas that can bring operational efficiency and support scaling of services
    • Lead internal working groups to evaluate, adopt and deploy new technology
    • Audit software for potential security and performance problems
    • Architect and develop conguration management policies
    • Assist in the roll-out and deployment of new product features and installations to facilitate our rapid iteration and constant growth.
    • Work closely with development teams to ensure that platforms are designed with "operability" in mind.
    • Function well in a fast-paced, rapidly-changing environment.
    • Participate in a 24x7 rotation for escalations.
About You:
    • 7+ years of professional software experience in Operations and Reliability Engineering
    • Preferred having educational backgrounds in Management Information Systems (MIS), Computer Information Systems (CIS), Computer Science (CS), or Mathematics
    • Experience in public cloud solutions like AWS at application setup level and beyond (/Google Cloud Platform)
    • Experience working with Python, Flask, SQLAlchemy, and other frameworks
    • Experience working at scale with thousands of systems in a DevOps/SRE role
    • Experience with conguration management tools (Terraform, Cloudformation etc)
    • Python experience, specifically for systems automation.
    • Familiar with system hardening and server security best practices.
    • Knowledge of most of these: data structures, relational and non-relational databases, networking, Linux internals, filesystems, web architecture, APIs and related topics
    • Expertise automating system administration tasks with scripting tools (Python or shell preferred).
    • Experience with monitoring and automation tools such as DataDog, Sentry, Splunk, Ansible, Terraform etc.
    • Aptitude for analyzing and troubleshooting operating system, networking, configuration and performance problems.
    • Fundamental understanding of Internet networking protocols: TCP/IP, TLS, DNS, HTTP etc.
    • Ability to install, configure and maintain Linux hosts and popular open source applications such as Nginx, Apache HTTPd etc
    • Strong interpersonal communication skills and ability to work well in a diverse, team-focused environment with other SREs, Engineers, Product Managers, etc.
    • Strong desire to work in a fast-paced, start-up environment with short release cycles
Contact :




Job Details

Jocancy Online Job Portal by jobSearchi.