Site Reliability Engineer

Site Reliability Engineer

15 Mar 2024
Maryland, Baltimore, 21240 Baltimore USA

Site Reliability Engineer

Eaton’s Digital Services Operations team is currently seeking a Site Reliability Engineering (SRE) to join our team! The position is eligible to be a remote role based in the United States. #LI-Remote #Remote #LI-SG1The expected annual salary range for this role is $112,500 - $175,000 a year.Please note the salary information shown above is a general guideline only. Salaries are based upon candidate skills, experience, and qualifications, as well as market and business considerations.The application window for this position is anticipated to close on Monday, April 1, 2024.Making what matters work at Eaton takes the passion of every employee around the world. We create an environment where creativity, invention and discovery become reality, each day. It’s where bold, bright professionals like you can reach your full potential—and where you can help us reach ours. Join our growing Digital organization focused on providing innovative digital solutions to our diverse customers and support their digitalization journey.You’ll fit in:Our teams are: Ethical –We play by the rules and act with integrity. We are proud of our actions. Passionate We care deeply about what we do. We set high expectations and we perform. Accountable We seek responsibility and take ownership. We do what we say. Efficient We value speed and simplicity. Transparent We say what we think. We make it okay to disagree. Learn – We are curious, adaptable, and willing to teach what we know. These values enable us to tackle some of the most important challenges on the planet, never losing sight of what matters. As a team, we have the power to make a difference.Benefits & Perks:You give us your best, we’ll reward you well. Eaton strives to provide industry competitive, employee well-being focused benefits and programs globally. The items below represent common programs globally, but program availabilities may vary by site. In many cases, programs may be provided by government resources in compliance with local regulations, instead of directly by Eaton. Healthcare/retirement savings programs to support you now and as you plan for the future Wellness programs and resources to support the wellbeing of you and your family Tuition assistance or financial help for ongoing learning and development Paid time off with vacation, sick days, and holiday observance Flexible work options to help balance work/life demands (at participating Eaton sites) Donation matching (US, Canada, Puerto Rico) Recognition programs for a wide range of achievements Referral program to reward you for helping us find the right candidate Competitive compensation packages to reward skills and performance Paid parental leave for birthing and non-birthing parents Fitness reimbursement to support your healthy lifestyle Casual dress policy that allows jeans in the officeWhat you’ll do:The SRE and SaaS Ops mission is to reduce business impact due to any outage or change events in Eaton production Brightlayer cloud hosted software and prevent their recurrence. As a part of the Digital Services Operations organization, team members will work across the businesses developing, delivering, and supporting Brightlayer IoT offerings. Collaboration primarily involves the following organizations: software development, product and offer management, technical support, and operations to execute and deliver best practices, consultation, and support for this production suite of offerings. The team will address requirements, measurement, and compliance for Service Level Agreements, reliability, uptime, other core SRE tenets, and business continuity to meet customer commitments and contribute to excellent customer experiences.In this function you will:

Work together with the centralized SRE & SaaSOps mission focused on providing production support across the Brightlayer product software offerings. You may align more closely with certain offerings. You will work together with the SRE/SaaSOps architect and regional SRE leader to provide guidance, best practices, and automation to ensure reliability, resiliency, and availability of the software.

You will deliver SRE & SaaS Ops expertise, implement capability, and drive synergy, standardization, and execution in collaboration with product development team SMEs.

Strive to automate eligible manual activities and design/develop supporting software tools that would improve the system and operations. The target for your effort split is 50/50 across value add development efforts and operational support.

Participate in new product launch activities including architecture, implementation assessments, and launch readiness deliverable reviews.

As required respond to monitoring alerts and mitigate any production support issues by restoring normal service operations and conducting post incident reviews (often referred as post-mortems), always seeking methods for continuous improvement.

Work closely with service management, software engineers, and quality to manage, measure, and report system availability, performance, and reliability.

Participate as required in on-call and support rotation processes driving SLA adherence and quality system availability and reliability.

Document your system knowledge as you acquire it over time, create runbooks, and ensure critical system information is readily available to those who need it.

Help build a Site Reliability Engineering culture across the organization by sharing your best practices, approaches, documentation, training and code with other engineering teams.

Qualifications:Required (Basic) Qualifications: Bachelor’s degree from an accredited institution. Minimum of 5 years of experience in the software industry developing or supporting enterprise scalable cloud-based applications and/or distributed systems Legally authorized to work in the United States without company sponsorshipPreferred Qualifications: Experience with incident management, including the ability to triage and resolve issues that may affect system reliability and performance. Experience with Agile methodologies and concepts Experience evaluating cost and capacity for application and solution architectures and reliability considerations. Experience with service-level management and related tools. Experience developing, deploying, configuring, and monitoring infrastructure, applications, and/or services in Microsoft Azure or similar public cloud environments. Experience working with hybrid cloud, IoT, edge architectures, and mobile application integration beneficial. Experience using or supporting container-based applications, Docker, Kubernetes (k8s). Experience with monitoring, logging, and observability tools such as Prometheus, Grafana, Dynatrace, ELK, Azure monitor/insightsSkills:Technical Skills: Ability to measure and report on KPIs associated with application and system performance, and operations. Understanding of software engineering principles and best practices, including design patterns, testing, and debugging Knowledge of DevOps practices and experience with production environments beneficial Problem-solving and analytical skills, including the ability to identify and prioritize issues Familiarity with continuous integration and continuous delivery (CI/CD) best practices and tools, such as Jenkins, GitHub Actions, Azure DevOps, Opsera Knowledge of operating systems, networking, relational and non-relational databases, and computer systems architecturePosition Criteria: Teamwork, communication, strong interpersonal skills across cultural and organizational boundaries Good judgment, time management, and decision-making skills Ability to stay calm under pressure and passionate to drive continuous improvement Solid understanding and passion to drive automation concepts, automatic provisioning of software, infrastructure, logging, and visualization of data. Working knowledge of other languages such as C#, Java, C, .NET, or Go Understanding of concepts such as latency, performance, high availability, efficiency, change management, monitoring, and incident management. Ability to respond to and resolve customer issues in a timely and effective manner Awareness and experience with continuity, compliance, and disaster recovery considerations.We are committed to ensuring equal employment opportunities for job applicants and employees. Our recruitment processes use balanced selection criteria and avoid unlawful discrimination against applicants on the basis of their age, colour, disability, marital status, national origin, gender, gender identity, genetic information, race or racial origin, religion, sexual orientation or any other status protected or required by law.

Related jobs

Job Details

Jocancy Online Job Portal by jobSearchi.