Site Reliability Engineer- Lead

Site Reliability Engineer- Lead

21 Jun 2024
Georgia, Atlanta, 30301 Atlanta USA

Site Reliability Engineer- Lead

Vacancy expired!

Client is looking for a SRE Lead Engineer . The primary role is to lead the 7x24 support team for customer's production, automation, and monitoring environments/ infrastructure of the Tableau and Looker Applications operating in the AWS and Google Cloud Platform clouds. Includes but is not limited to: Managing SLA, SLO, SLIs, Monitoring & Alerting, provisioning of Environments, diagnosing performance issues. Additionally, the primary role includes cross training teams on OBIEE, BIP, and Spotfire tools, repairing CICD Pipelines, implementing IaC in the AWS and Google Cloud Platform environments.
Familiarity with DataOps is required to support Data Pipelines and the continuous deployment of those pipelines to environments. Some on-prem infrastructure support will be required as well and understanding of how to move data securely from on-prem to AWS and Google Cloud Platform Clouds.

  • Experience in an Agile/Scrum and DevOps environment, including a high-volume or critical production service environment.
  • At least 3 to 6 years of experience of AWS or Google Cloud Platform, CloudOps and Infrastructure and infrastructure management.
  • 3 to 6 Years Managing and Leading Teams.
  • 3+ years of AWS/Google Cloud Platform SaaS and Iac (Terraform) tools.
  • Expertise in Windows server management and Configuration Management.
  • 3+ years of experience on AWS and/or Google Cloud Platform Cloud infrastructure and Services.
  • Proficient Git skills.
  • Experience of having implemented end to end DevOps.
  • Desirable
  • Practical experience with Agile Scrum / Kanban, OBIEE, SpotFire and BIP tools.
  • BI Tableau and Looker Apps experience from a support & management perspective.
  • Agile certified engineer.
  • DevOps certified Engineer, other AWS/Google Cloud Platform certifications.


Responsibilities
  • Support & contribute improvements to the availability, scalability, latency, and efficiency of BI visualization and analytics Tableau and Looker Applications and OBIEE, SpotFire and BIP tools and environments.
  • Support client setup of the Service Management reference architecture (Service Strategy, Design, Transition, and Improvement) , and operational processes management (Incident,Problem,Change,Release,Capacityetc).
  • Support client’s development and operation teams to define each stakeholder’s service-level indicators (SLIs) that reflects reliability. (Ex. Availability, Response Time,Latency,Throughputetc).
  • Support Client SRE team resource building (Acquire, Develop, Manage) forming, storming, norming, performing, and adjourning.
  • Use, Administer, configure & contribute to deployment and automation tools, as well as the platform to more efficiently detect, address, and prevent problems from recurring
  • Define and measure production availability, navigating known downtime, and service level outages.
  • Debug problems at scale for our mission critical services, and help our platform and service teams to implement lasting fixes to recurring issues
  • Setup and manage DataOps engineering platform, tools and cloud environments applications within customers portfolio in AWS and Google Cloud Platform clouds.
  • Provision environments using infrastructure as code provided by automation team.
  • Execute, debug, and configure CI/CD pipelines.
  • Onboard new projects/applications onto the platform using automation.
  • Act as acceptance testers for automation developed by an automation team and use the automation to satisfy service requests in a more timely fashion.
  • Deploy and manage automation developed by automation feature team.

Related jobs

Job Details

  • ID
    JC15694037
  • State
  • City
  • Job type
    Contract
  • Salary
    Depends on Experience
  • Hiring Company
    Info Dinamica Inc
  • Date
    2021-06-15
  • Deadline
    2021-08-14
  • Category

Jocancy Online Job Portal by jobSearchi.