Vacancy expired!
HI All,
Title: Site Reliability Engineer
Location: SFO, CA
Duration: Long Term
Major skills: AWS or Azure, Docker, Kubernetes, Terraform
A Day in the Life
Responsible for Infrastructure maintenance, availability, performance & cost reduction.
Dive deep to resolve problems at their root and troubleshoot services related to the big data stack in our AWS/Linux infrastructure.
Develop software tools to give insights into costs & utilization patterns.
Enhance and maintain our monitoring infrastructure.
Develop automation tools for managing our cloud infrastructure.
Improve engineering standards, tooling, and processes
Partake in an on-call rotation alongside the engineers who build our production backends
What You Need
You should have 5+ years of experience with a start-up mentality in managing & troubleshooting large-scale distributed systems.
Familiarity with infrastructure provisioning tools like Docker, Kubernetes, Ansible, Chef, Cloud Formation & Terraform.
Excellent Linux and troubleshooting skills
You have a passion for solving problems using open source software
You are an expert in Python/Bash and you are proficient in Linux.
Familiarity with big data stack, HDFS, HBase, YARN clusters, Elasticsearch
Strong experience working in AWS environment and other server virtualization technologies
Experience working with monitoring stack like sensu
Bachelor’s degree in computer science
Knowledge of SQL, AWS Redshift & AWS EMR
Regards,
Chandrasekhar K