Principal Site Reliability Engineering

Principal Site Reliability Engineering

14 Sep 2025
Alaska, Southeast alaska, 99801 Southeast alaska USA

Principal Site Reliability Engineering

Vacancy expired!

Job DescriptionSolve complex problems related to infrastructure cloud services and build automation to prevent problem recurrence. Design, write, and deploy software to improve the availability, scalability, and efficiency of Oracle products and services. Design and develop strategies, architectures, standards, and methods for large-scale distributed systems. Facilitate service capacity planning, demand forecasting, software performance analysis, and system tuning.We seek a Site Reliability Engineer to join the Service Operations Excellence Engineering organization. The ideal candidate is technically strong and able to persevere through complexity and ambiguity – They’ve directly worked on highly available, scalable, and redundant services. Automation is a core tenet of everything they do. They understand that simple systems are easier to operate and troubleshoot. They can balance speed with iteration and incremental improvements. They’ve made life easier for other developers and motivated their teams to create process and service improvements. If you are passionate about owning significant technical challenges and producing software solutions with broad, significant impacts, join our team! Candidates should have broad working knowledge across multiple domains, but we also love to see specialization. We expect the basics: Networking, Linux Systems Engineering, Software Engineering/Automation, Database Services (big data technologies), and Distributed Systems.ResponsibilitiesWork with the Site Reliability Engineering (SRE) team on the shared full-stack ownership of a collection of services and technology areas. Understand the end-to-end configuration, technical dependencies, and overall behavioral characteristics of production services. Responsible for designing and delivering the mission-critical stack, focusing on security, resiliency, scale, and performance. Authority for end-to-end performance and operability. Partner with development teams in defining and implementing improvements in service architecture. Articulate technical characteristics of services and technology areas and guide Development Teams to engineer and add premier capabilities to the Oracle Cloud service portfolio. Understand and communicate the scale, capacity, security, performance attributes, and requirements of the service and technology stack. Demonstrate a clear understanding of automation and orchestration principles. Act as an ultimate escalation point for complex or critical issues that still need to be documented as Standard Operating Procedures (SOPs). Utilize a deep understanding of service topology and the dependencies required to troubleshoot issues and define mitigations. Understand and explain the effect of product architecture decisions on distributed systems: professional curiosity and a desire to develop a deep understanding of services and technologies.As a Site Reliability Engineer (SRE) within the SOEE team, you will assist in designing and maintaining hosting, processing, transforming, and analyzing operational processes. Your first mission will be to work closely with our software developers and Cloud architects to define a sustainable operating model for Oracle Cerner engineering services. This includes mechanisms to scale the systems through easy-to-use tooling and automation. You will work in concert with developers to evolve systems/products for better scalability and reliability and enable developer velocity. You will also author and maintain operational run books to help reduce the mean Time of Incidents (TOI) and be responsible for managing and triaging operational tickets about the data platform services. Emphasis on driving prioritization and execution of work based on business impact is necessary.

Solve complex problems related to infrastructure cloud services and build automation to prevent problem recurrence.

Design, write, and deploy software to improve the availability, scalability, and efficiency of Oracle products and services.

Develop designs, architectures, standards, and methods for large-scale distributed systems.

Facilitate service capacity planning, demand forecasting, software performance analysis, and system tuning.

Work with other HCGBU – Infra—and Ops—team engineers on the shared full stack ownership of a collection of services and technology areas. Understand the end-to-end configuration, technical dependencies, and overall behavioral characteristics of production services.

Articulate technical characteristics of services and technology areas and guide development teams to engineer and add capabilities to internal Oracle services.

Act as the ultimate escalation point for complex or critical issues that still need to be documented as Standard Operating Procedures (SOPs).

You can use a deep understanding of service topology and the dependencies required to troubleshoot issues and define mitigations.

Understand and explain the effect of product architecture decisions on distributed systems.

Serve as part of a 24x7 On-call rotation in support of the HCGBU – Infra. And Ops.

Professional curiosity and a desire to develop a deep understanding of services and technologies.

Mandatory Qualifications:

Bachelor’s or Master’s degree in Computer Science or equivalent related field experience

Experience with Python, Ruby, bash, and other scripting programming

Experience working with fault-tolerant, highly available, high throughput, distributed, scalable systems

Aptitude to be a good team player and the desire to learn and implement new Cloud technologies as needed

Excellent organizational, verbal, and written communication skills

Preferred Qualifications:

5+ years of experience in two or more of the following

Software development/operations

Developing/operating large-scale distributed services/applications

System Administration, including Linux internals, TCP/IP, DNS, Load balancing technologies

Container administration and development utilizing Kubernetes, Docker, Mesos, or similar

Infrastructure automation through Terraform, Chef, Ansible, Puppet or similar

Big Data Infrastructure, including Hadoop, Spark, NoSQL, Object Storage, or similar

Experience with TCP/IP and socket programming

Knowledge of cloud computing technologies, network monitoring, data processing, and analytics

Experience with CI/CD pipelines

Proficiency in working with git

Disclaimer:Certain US customer or client-facing roles may be required to comply with applicable requirements, such as immunization and occupational health mandates.Range and benefit information provided in this posting are specific to the United States onlyHiring Range: from $97,400 to $199,500 per annum. May be eligible for bonus and equity.Oracle maintains broad salary ranges for its roles in order to account for variations in knowledge, skills, experience, market conditions and locations, as well as reflect Oracle’s differing products, industries and lines of business.Candidates are typically placed into the range based on the preceding factors as well as internal peer equity.Oracle offers a comprehensive benefits package which includes the following:

Medical, dental, and vision insurance, including expert medical opinion

Short term disability and long term disability

Life insurance and AD&D

Supplemental life insurance (Employee/Spouse/Child)

Health care and dependent care Flexible Spending Accounts

Pre-tax commuter and parking benefits

401(k) Savings and Investment Plan with company match

Paid time off: Flexible Vacation is provided to all eligible employees assigned to a salaried (non-overtime eligible) position. Accrued Vacation is provided to all other employees eligible for vacation benefits. For employees working at least 35 hours per week, the vacation accrual rate is 13 days annually for the first three years of employment and 18 days annually for subsequent years of employment. Vacation accrual is prorated for employees working between 20 and 34 hours per week. Employees working fewer than 20 hours per week are not eligible for vacation.

11 paid holidays

Paid sick leave: 72 hours of paid sick leave upon date of hire. Refreshes each calendar year. Unused balance will carry over each year up to a maximum cap of 112 hours.

Paid parental leave

Adoption assistance

Employee Stock Purchase Plan

Financial planning and group legal

Voluntary benefits including auto, homeowner and pet insurance

About UsAn Oracle career can span industries, roles, Countries and cultures, giving you the opportunity to flourish in new roles and innovate, while blending work life in. Oracle has thrived through 40+ years of change by innovating and operating with integrity while delivering for the top companies in almost every industry.In order to nurture the talent that makes this happen, we are committed to an inclusive culture that celebrates and values diverse insights and perspectives, a workforce that inspires thought leadership and innovation.Oracle offers a highly competitive suite of Employee Benefits designed on the principles of parity, consistency, and affordability. The overall package includes certain core elements such as Medical, Life Insurance, access to Retirement Planning, and much more. We also encourage our employees to engage in the culture of giving back to the communities where we live and do business.At Oracle, we believe that innovation starts with diversity and inclusion and to create the future we need talent from various backgrounds, perspectives, and abilities. We ensure that individuals with disabilities are provided reasonable accommodation to successfully participate in the job application, interview process, and in potential roles. to perform crucial job functions.That’s why we’re committed to creating a workforce where all individuals can do their best work. It’s when everyone’s voice is heard and valued that we’re inspired to go beyond what’s been done before.Oracle is an Equal Employment Opportunity Employer . All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, national origin, sexual orientation, gender identity, disability and protected veterans’ status, or any other characteristic protected by law. Oracle will consider for employment qualified applicants with arrest and conviction records pursuant to applicable law. Which includes being a United States Affirmative Action Employer

Job Details

Jocancy Online Job Portal by jobSearchi.