Site Reliability Engineer

Site Reliability Engineer

02 Sep 2022
Virginia, Mclean, 22067 Mclean USA

Site Reliability Engineer

Site Reliability Engineer
Mc Lean, VA
Long Term
Client's Enterprise Data Machine Learning (EDML) employs innovative minds like yourself to design and develop software-systems that can meet the demand of our ever-growing customer base. Like a startup inside an enterprise, EDML focuses on using a customer-centric approach to building our product to enable data-driven conversations with our customers.
As one of the Site Reliability Engineers, you'll be able to work closely with customers, product management, and other subject matter experts in the technology industry to drive forward solutions that have immediate impact on the day-to-day ability for other data scientists and machine learning engineers to productionize their models by iteratively improving how we operate and scale our cloud based containerized service.
What You'll Do

  • Develop, deploy, and operate our secure infrastructure built on cloud services (AWS, Kubernetes, etc)
  • Ensure the high availability, resiliency, performance, business continuity and compliance capabilities of our cloud services.
  • Define SLA standards for SAAS solutions that are used by several groups within the company.
  • Work with our engineering teams to deploy and operate cloud services, scale our development, QA and production environments.
  • Build solutions for developer productivity. Develop and operate our build automation and continuous delivery systems.
  • Participate in an on-call rotation, drive incident resolution and improve platform resiliency
Basic Qualifications
  • Experience with container management technologies including Docker and Kubernetes.
  • Experience with AWS including EKS, ECS, IAM, S3, RDS, Security Groups, Route53, VPC Flow Logs, etc.
  • Experience with automation/configuration management using Terraform or similar solutions.
  • Experience with CI tools such as Jenkins.
  • Experience with operational monitoring tools, such as Datadog, NewRelic and Splunk.
  • Proficient in Linux tools and shell scripting or other Linux automation
  • An interest in designing, analyzing and troubleshooting large-scale distributed systems.
  • Well-versed with the entire software development lifecycle, devops, and SRE practices.
Preferred Qualifications
  • Experience with automated unit and integration testing of infrastructure code
  • Experience with container security and vulnerability management
  • Experience in one or more languages such as Python or GoLang
  • Certified Kubernetes Administrator (CKA)

Related jobs

Job Details

  • ID
  • State
  • City
  • Job type
  • Salary
  • Hiring Company
    Maintec Technologies Inc
  • Date
  • Deadline
  • Category

Jocancy Online Job Portal by jobSearchi.