Senior DevOps / AWS Heavy / Terraform

Senior DevOps / AWS Heavy / Terraform

02 Aug 2024
California, Los angeles, 90001 Los angeles USA

Senior DevOps / AWS Heavy / Terraform

Vacancy expired!

We are seeking a senior Service Reliability Engineer who will be responsible for improving and maintaining a software development, test and live infrastructure and services. The ideal candidate will be self-motivated, articulate, have experience with Linux and other NIX derivatives, and is comfortable working in a fast-paced software development environment. Your primary mission as a SRE engineer is working closely with the Development, Technical Operations, Quality
Assurance, and Product Management teams, to ensure the uptime and
performance of our platforms.

Responsibilities:
  • Support Our Big Data platform, a mission critical, platform in production and development environments for collecting, storing, processing, and analyzing of terabytes of datasets
  • Identify and drive improvements in infrastructure and system reliability, performance, monitoring, and overall stability of the platform
  • Capacity planning and demand forecasting to meet systems demand, identifying performance bottlenecks and devising tuning improvements
  • Build tools and automation that eliminate repetitive tasks and prevent incident occurrence
  • Create and maintain operational runbooks and documentation
  • Participate in 24x7 operational support and on-call rotation shifts

Qualifications:
  • B.S. in Computer Science or equivalent experience
  • Minimum of 5 years of production applications and systems support and at least 2 years as DataOps
  • Proficiency working with Amazon Web Services (AWS) like EMR, Glue, Lambda, EC2, EBS, ELB, S3,
  • Route 53, RDS, Redshift in a highly available and scalable production environment
  • Experience with Big Data open source technologies (Hadoop, Scala, Spark, Kafka, Hadoop, Hbase, Zookeeper, Oozie)
  • Experience with SQL (MySQL, PostgreSQL)
  • Experience with continuous integration and deployment automation tools such as Jenkins, Rundeck, AWS CloudFormation, Terraform
  • Experience supporting, analyzing and troubleshooting large-scale distributed mission critical systems
  • Systematic problem-solving approach and strong sense of ownership to drive problems to resolution
  • Strong knowledge of Linux systems administration and architecture
  • Experience with configuring, managing and supporting AWS environments
  • Network knowledge (TCP/IP, UDP, DNS, Load balancing) and prior network administration experience is a plus
  • Scripting experience with Shell, Python or Ruby
  • Experience documenting processes, systems, environments and runbook procedures
  • Experience with source control tools such as GIT/GitHub/GitLab

Related jobs

Job Details

Jocancy Online Job Portal by jobSearchi.