Sr Site Reliability Engineer (SRE)

Sr Site Reliability Engineer (SRE)

20 May 2024
Texas, Austin, 73301 Austin USA

Sr Site Reliability Engineer (SRE)

Vacancy expired!

  • 100% Remote
  • Direct Hire or Contract to Hire
  • Open to H1B (sponsorship available)

We are currently looking for a Senior Site Reliability Engineering to join our team.
About the Team
This Senior Site Reliability Engineer will be part of the Consumer Core Site Reliability Engineering (SRE) team. The Consumer Core SRE team is an innovative team devoted to providing automated solutions and services to measure, evaluate and plan for visible, reliable application delivery and maintenance.

About the Position
As a member of the SRE team, you will bring a collaborative style in leading efforts that raise the maturity levels of the engineering practices across all agile teams delivering our products. The tools and use-cases are diverse, and our challenge is to increase the development velocity by optimizing various parts of the pipeline and increase application stability. Much of our software development focuses on optimizing existing systems by measuring elasticity and saturation, building infrastructure through IAC, and eliminating /reducing toil through automation. We also look to instill core SRE practices into the engineering teams including measuring SLIs/SLOs, increasing visibility/observability through monitoring tools, guide chaos engineering efforts in order to improve overall resiliency, and lead Gameday/Production Readiness reviews across all engineering disciplines. We are looking for engineers who are passionate about automation and owning best practices facilitated by SRE principles to build scalable and highly reliable applications.
If you love to figure out how all the pieces are put together and if automation and building tools to monitor and manage your applications sounds interesting to you, we want to talk to you.

As a Senior Site Reliability Engineer you will:
• Have a natural tendency to avoid toil and want to automate it away
• Automate anything and everything! (testing, deploying, monitoring, etc)
• Take complex and not maybe well-defined problem and come up with a technically reasonable solution
• Take ownership of processes or solutions that can be shared across teams globally
• Build and rollout solutions to be consumed by multiple teams
• Have innate curiosity about how things work
• Design and assist in the authoring of software tools that reliably manage application delivery & performance
• Design and assist in the setup and maintenance of application monitoring and alerting
• Engage with product/capability engineering teams to ensure best practices are implemented
• Improve predictability and reliability of software releases, workflows, and operating software.
• Reduce application deployment windows by leading engineering teams towards a Continuous Deployment environment
• Reduce mean time to recovery (MTTR) by helping troubleshoot, monitor, alert, and automating recovery.
• Facilitate Gamedays and Production Readiness reviews to continue increasing resiliency in our applications
Qualifications:
• Expertise in designing, analyzing, and troubleshooting large-scale distributed systems
• Ability to debug, optimize code, and automate routine tasks
• Systematic problem-solving approach, coupled with effective communication skills and a sense of drive
• Understanding of Linux/Windows operating systems
• Experience with Python or PowerShell or related scripting languages
• Experience with configuration management systems (Spinnaker, Chef, Puppet, or Ansible)
• Experience rolling out highly available, mission-critical applications
• Experience with version control systems (Git or SVN) and branching strategies
• Experience with Cloud Computing platforms (Amazon AWS, Kubernetes, Heroku, etc)
• Experience with continuous integration tools (Jenkins, GitHub Actions, CircleCI, TeamCity, etc), Artifactory (or Nexus)
• Experience with Database Server infrastructure (RDS, Aurora, DynamoDB, MySQL, Postgres, etc)
• Experience with agile development, continuous integration and automated testing
• Experience with Infrastructure as Code (Terraform or CloudFormation)
• Excellent written communication, problem solving, and process management skills
• Desire to work in a fast paced, evolving, growing, dynamic environment

EEO Employer

Apex Systems is an equal opportunity employer. We do not discriminate or allow discrimination on the basis of race, color, religion, creed, sex (including pregnancy, childbirth, breastfeeding, or related medical conditions), age, sexual orientation, gender identity, national origin, ancestry, citizenship, genetic information, registered domestic partner status, marital status, disability, status as a crime victim, protected veteran status, political affiliation, union membership, or any other characteristic protected by law. Apex will consider qualified applicants with criminal histories in a manner consistent with the requirements of applicable law. If you have visited our website in search of information on employment opportunities or to apply for a position, and you require an accommodation in using our website for a search or application, please contact our Employee Services Department at or .

Related jobs

Job Details

Jocancy Online Job Portal by jobSearchi.