Site Reliability Engineer Architect

Site Reliability Engineer Architect

15 Mar 2024
North Carolina, Raleigh / durham / CH, 27601 Raleigh / durham / CH USA

Site Reliability Engineer Architect

Eaton’s Digital Services Operations team is currently seeking a Site Reliability Engineering (SRE) Architect to join our team! The position is eligible to be a remote role based in the United States. #LI-Remote #Remote #LI-SG1The expected annual salary range for this role is $112,500 - $175,000 a year.Please note the salary information shown above is a general guideline only. Salaries are based upon candidate skills, experience, and qualifications, as well as market and business considerations.The application window for this position is anticipated to close on Monday, April 1, 2024.Making what matters work at Eaton takes the passion of every employee around the world. We create an environment where creativity, invention and discovery become reality, each day. It’s where bold, bright professionals like you can reach your full potential—and where you can help us reach ours. Join our growing Digital organization focused on providing innovative digital solutions to our diverse customers and support their digitalization journey.You’ll fit in:Our teams are: Ethical –We play by the rules and act with integrity. We are proud of our actions. Passionate We care deeply about what we do. We set high expectations and we perform. Accountable We seek responsibility and take ownership. We do what we say. Efficient We value speed and simplicity. Transparent We say what we think. We make it okay to disagree. Learn – We are curious, adaptable, and willing to teach what we know. These values enable us to tackle some of the most important challenges on the planet, never losing sight of what matters. As a team, we have the power to make a difference.Benefits & Perks:You give us your best, we’ll reward you well. Eaton strives to provide industry competitive, employee well-being focused benefits and programs globally. The items below represent common programs globally, but program availabilities may vary by site. In many cases, programs may be provided by government resources in compliance with local regulations, instead of directly by Eaton. Healthcare/retirement savings programs to support you now and as you plan for the future Wellness programs and resources to support the wellbeing of you and your family Tuition assistance or financial help for ongoing learning and development Paid time off with vacation, sick days, and holiday observance Flexible work options to help balance work/life demands (at participating Eaton sites) Donation matching (US, Canada, Puerto Rico) Recognition programs for a wide range of achievements Referral program to reward you for helping us find the right candidate Competitive compensation packages to reward skills and performance Paid parental leave for birthing and non-birthing parents Fitness reimbursement to support your healthy lifestyle Casual dress policy that allows jeans in the officeWhat you’ll do:The SRE and SaaS Ops Mission and team:The SRE and SaaS Ops mission is to reduce business impact due to any outage or change events in Eaton production Brightlayer cloud hosted software and prevent their recurrence. As a part of the Digital Services Operations organization, team members will work across the Brightlayer offerings business including software development, product management, technical support, and operations to execute and deliver best practices, consultation, and support for this production suite of offerings. They will address requirements, measurement, and compliance for Service Level Agreements, uptime, and business continuity to meet customer commitments and contribute to excellent customer experiences.In this function you will: Provide technical leadership of the SRE & SaaSOps mission focused on providing production support across the Brightlayer product software offerings. You will deliver architectural guidance, best practices, and automation to ensure reliability, resiliency, and availability of the software. You will deliver SRE & SaaS Ops expertise, implement capability, and drive synergy, standardization, and execution in collaboration with product team SMEs. Collaborate with IT, product development, technical support, and offering teams to support requirements associated with customer SLAs, capacity planning and management, business continuity, and disaster recovery. Strive to automate eligible manual activities and design/develop supporting software tools that would improve the system and operations. Some examples could include deployment automation improvements with QA, canary style deployments, chaos testing, rollback handling, 3rd party tool integrations, change management, release reporting, monitoring as code, event and incident management, and metrics reporting. Partner with IT on common implementation and instrumentation of event management, monitoring, and APM to product teams. Participate in new product launch activities including architecture, implementation assessments, and launch readiness deliverables. As required respond to monitoring alerts and mitigate any production support escalation issues by restoring normal service operations and conducting postmortems. Work closely with service management, software engineers, and quality to manage, measure, and report system availability, performance, and reliability. Enable methods to centralize dashboards and automate metrics collection demonstrating service levels, system health, and continuous improvement. Participate as required in on-call and support rotation processes driving SLA adherence and quality system availability and reliability. Document your system knowledge as you acquire it over time, create runbooks, and ensure critical system information is readily available to those who need it. Help build a Site Reliability Engineering culture across the organization by sharing your best practices, approaches, documentation, training and code with other engineering teams.Qualifications:Required (Basic) Qualifications: Bachelor’s degree from an accredited institution. Minimum of 5 years of experience in the software industry developing OR supporting enterprise scalable cloud-based applications and/or distributed systems Legally authorized to work in the United States without company sponsorshipPreferred Qualifications: Experience with Agile methodologies and concepts Experience developing, deploying, configuring, and monitoring infrastructure, applications, and services in Microsoft Azure or AWS public cloud environments. Experience working with DevOps teams and production environments Experience with data pipelines, DataOps, and AI/ML Experience with monitoring and observability tools such as Prometheus, Grafana, and Dynatrace Experience with troubleshooting and debugging complex problems in distributed systems Experience with incident management, including the ability to triage and resolve issues that may affect system reliability and performanceSkills:Technical Skills: Familiarity with continuous integration and continuous delivery (CI/CD) best practices and tools, such as Jenkins, GitHub Actions, and TravisCI Familiarity with scripting languages such as Python, Go, and Bash Knowledge of operating systems, networking, relational and non-relational databases, and computer systems architecture Strong understanding of software engineering principles and best practices, including design patterns, testing, and debugging Solid understanding of automation concepts, automatic provisioning of software and infrastructure, logging, and visualization of statistics. Working knowledge of other languages such as C#, Java, C, or Go Solid understanding of concepts such as latency, performance, high availability, efficiency, change management, monitoring, and incident management Familiarity with service level management and related tools.Position Criteria: Teamwork, communication, strong interpersonal skills across cultural and organizational boundaries Good judgment, time management, collaboration, and decision-making skills Ability to stay calm under pressure and passionate to drive continuous improvement Ability to travel up to 10%. Strong problem-solving and analytical skills, including the ability to identify and prioritize issues and develop effective solutions Strong customer service skills, including the ability to respond to and resolve customer issues in a timely and effective mannerWe are committed to ensuring equal employment opportunities for job applicants and employees. Our recruitment processes use balanced selection criteria and avoid unlawful discrimination against applicants on the basis of their age, colour, disability, marital status, national origin, gender, gender identity, genetic information, race or racial origin, religion, sexual orientation or any other status protected or required by law.

Related jobs

Job Details

Jocancy Online Job Portal by jobSearchi.