Site Reliability Engineer

Site Reliability Engineer

17 Feb 2025
Washington, Seattle-tacoma, 98101 Seattle-tacoma USA

Site Reliability Engineer

Application window is expected to close on 03/20/2025. However, the job posting may be removed earlier if the position is filled or if a sufficient number of applications are received.The successful applicant will be performing work in FedRAMP High or IL-5 environments, and therefore, must be a U.S. Person (i.e. U.S. citizen, U.S. national, lawful permanent resident, asylee, or refugee). This position may also perform work that the U.S. government has specified can only be performed by a U.S. citizen on U.S. soil.Meet The TeamThe Site Reliability Engineering (SRE) team at Duo, a part of Cisco, plays a crucial role in maintaining the reliability, availability, and performance of Duo's security services. They are responsible for ensuring service reliability by implementing robust monitoring and alerting systems to proactively detect and address issues. The team leads incident management efforts to resolve service outages and degradations swiftly. They focus on developing automation tools to streamline operations and improve efficiency. The SRE team continuously optimizes service performance and collaborates with development teams to ensure new features are designed with scalability and reliability in mind. Additionally, they conduct post-incident reviews to identify root causes and implement preventive measures, ensuring Duo's solutions remain dependable, secure at every layer and high-performing to meet user expectations.Your Impact As a Site Reliability Engineer on our Site Reliability Engineering team, you will develop software and tools to empower Duo's product development teams to run and maintain their services in production. You will collaborate with a wide range of internal partners to engineer automated solutions in an effort to remove toil and enhance stability for a variety of infrastructure, with an emphasis on scalability. You will face challenges that require an engineering mindset and a desire to automate everything possible. Skills you have:You have designed components in cloud based services including infrastructure You can contribute to a meeting where an outcome is a technical decision made You have a history of writing performant, maintainable, testable code You enjoy learning and elevating your team by contributing to code reviews You are passionate about automation and reducing toil You are committed to quality and experienced with modern software testing practices You care about contributing to an amazing work culture and environment Minimum Qualifications

3+ years in Site Reliability Engineering (SRE) or a related IT field.

Proficient with 3+ years of experience in Python.

3+ years of experience with AWS and SaaS solutions.

Previous experience with automated configuration tools, specifically Terraform and Ansible.

Preferred Qualifications

Experience with Container Orchestration including Kubernetes and Docker

Design and own Technical Solutions for broad or complex requirements with insightful and strategic approaches

Able to write Performant, Maintainable, Testable code

Prior experience deploying Cloud Services, Monitoring, Alerting, and Handling Escalations

Experience supporting a High-Availability SaaS environment

Charting new DevOps practices without a well-defined roadmap

#WeAreCisco#WeAreCisco where every individual brings their unique skills and perspectives together to pursue our purpose of powering an inclusive future for all.Our passion is connection-we celebrate our employees' diverse set of backgrounds and focus on unlocking potential. Cisconians often experience one company, many careers where learning and development are encouraged and supported at every stage. Our technology, tools, and culture pioneered hybrid work trends, allowing all to not only give their best, but be their best.We understand our outstanding opportunity to bring communities together and at the heart of that is our people. One-third of Cisconians collaborate in our 30 employee resource organizations, called Inclusive Communities, to connect, foster belonging, learn to be informed allies, and make a difference. Dedicated paid time off to volunteer-80 hours each year-allows us to give back to causes we are passionate about, and nearly 86% do!Our purpose, driven by our people, is what makes us the worldwide leader in technology that powers the internet. Helping our customers reimagine their applications, secure their enterprise, transform their infrastructure, and meet their sustainability goals is what we do best. We ensure that every step we take is a step towards a more inclusive future for all. Take your next step and be you, with us!Cisco is an Affirmative Action and Equal Opportunity Employer and all qualified applicants will receive consideration for employment without regard to race, color, religion, gender, sexual orientation, national origin, genetic information, age, disability, veteran status, or any other legally protected basis.Cisco will consider for employment, on a case by case basis, qualified applicants with arrest and conviction records.

Related jobs

  • Your role The Site Engineer II position is a contributing member to the site level Data Center Operations team assigned to one or more of our data center properties reporting directly to the Manager Engineering Operations. The Site Engineer II will have experience in mission critical infrastructure, including Generators, UPS Systems, HVAC Systems, Fire/Life Safety Systems, BMS Systems, and CMMS systems. It is expected that the Site Engineer I candidate has expertise in either electrical work or mechanical work and it would be expected that he/she would be competent in the area of non expertise. The responsibilities of the Site Engineer II are: to contribute to the daily site operation, including creation and modification of site operating procedures, contribute to creation of change management tickets, creation of timely incident reporting, site maintenance and repairs/inspections to help ensure Digital Realty\'s data center operations achieve the highest level of availability. What you\'ll do Gain a complete understanding of the following DLR and site related items: Facility layout and operation of MEP systems and the ability to illustrate site specific system one-lines with good accuracy. Equipment nomenclature standards and equipment locations Facility drawings and equipment specifications Equipment sequence of operations (SOO\'s), standard operating procedures (SOP\'s), and emergency operating procedures (EOP\'s). Customer SLA\'s and engineering specific lease obligations critical to data center operations. Facility top 20 EOP\'s. BMS alarm functionality, alarm escalation/acknowledgement, and ability to extract data and trends DLR event management, event escalation, and incident reporting procedures Computerized Maintenance Management System (CMMS), including the ability to create, edit, implement, and close change management work orders. Create/edit/resolve/close incident reports following a site incident Maintenance and Operations Standards Digital Realty\'s Environmental and Occupational Health and Safety standards Gain a complete understanding of all aspects of data center operations including the operation, maintenance and repair of all mission critical equipment and systems supporting a 24x7 data center operation to achieve 100% uptime and 100% compliance with all customer SLAs. Supervision of construction activity and installations as required. Ability to be the executor in the site specific change management processes including the creation of Method of Procedures (MOPs) for low risk preventative maintenance and repairs as well as the oversight of those maintenances as they are carried out. Ability to effectively troubleshoot site mechanical and electrical systems. Ability to respond to unplanned events without immediate supervision. Ability to efficiently complete rounds/inspections and to detect anomalies during those rounds. Develop or improve SOPs for site specific equipment. Gain a good understanding and knowledge of the local customers business and datacenter operation. Support various accreditation initiatives, including, but not limited to, SSAE16, SOC2, ISO 27001, etc. as may be required by Digital Realty. Complete DLR Critical Awareness Training What you\'ll need At least 3 year of experience in mission critical facilities operating / engineering or equivalent equipment experience including assets associated with mission critical engineering relevant to the specific site. (UPS, HVAC, generators, fire/life safety systems). Hands-on electrica To view the full job description,

  • Description

  • Description

  • Description

  • Description

  • Description

  • Description

Job Details

Jocancy Online Job Portal by jobSearchi.