Site reliability engineer - remote job offer

Site Reliability Engineer - Remote

06 Oct 2025

California, Santaclara, 95050

Site Reliability Engineer - Remote

As our Site Reliability Engineer, you will design, build, and maintain the systems and infrastructure that power our applications, ensuring their reliability, scalability, and performance. You will bring a software engineering approach to operations, automating processes, and continuously improving the infrastructure and tools to support our business needs.

What you’ll do: Infrastructure Management: Design, implement, and maintain scalable and resilient infrastructure using Terraform for infrastructure as code, ensuring high availability and performance.Kubernetes and Containers: Deploy, manage, and optimize Kubernetes clusters and containerized applications using Docker. Implement best practices for container orchestration and management.Systems and Application Monitoring/Observability: Develop and maintain comprehensive monitoring and observability solutions using Datadog. Ensure detailed visibility into system performance and application health.SLOs and SLA Management: Define, monitor, and maintain Service Level Objectives (SLOs) and Service Level Agreements (SLAs) to ensure reliable and consistent service delivery.Incident Response and Troubleshooting: Respond to incidents, perform root cause analysis, and implement solutions to prevent recurrence. Participate in post-incident reviews and contribute to blameless postmortems.Reliability and Production Environment Management: Ensure the reliability and stability of our production environments. Continuously assess and improve system reliability, identifying and addressing potential points of failure.Automation and Scripting: Develop automation scripts and tools to reduce manual intervention and improve system reliability using Python, Bash, or Go. Implement and improve CI/CD pipelines.CI/CD Pipeline Management: Enhance and maintain continuous integration and continuous deployment pipelines using GitLab CI. Ensure seamless and reliable deployment processes.Capacity Planning and Scaling: Assist in capacity planning and ensure that systems are scalable to meet future demands. Implement auto-scaling strategies where applicable.Security and Compliance: Implement security best practices and ensure compliance with industry standards. Regularly review and update security policies and procedures.Collaboration and Support: Work closely with development teams to ensure reliability and scalability of new features and services. Provide technical support and guidance on infrastructure-related issues.Software Engineering for Operations: Develop and maintain internal tools and services that enhance the efficiency and reliability of our operations.On-Call Rotation: Participate in an on-call rotation to address production issues and collaborate in incident response efforts.

Related jobs

Principal Developer Experience Engineer - Remote

California, Santaclara, Et cetera

Principal Developer Experience Engineer - Remote

More info...
Staff Technical Program Manager (Reliability and Quality) - Remote

California, Santaclara, Et cetera

Staff Technical Program Manager (Reliability and Quality) - Remote

More info...
Principal Software Engineer (Prisma Access - Networking)

California, Santaclara, Et cetera

Principal Software Engineer (Prisma Access - Networking)

More info...
Sr Principal Software Engineer (L7 Security)

California, Santaclara, Et cetera

Sr Principal Software Engineer (L7 Security)

More info...
Technical Support Engineer, Cortex Cloud

California, Santaclara, Et cetera

Technical Support Engineer, Cortex Cloud

More info...
Staff Product Manager - Remote

California, Santaclara, Et cetera

Staff Product Manager - Remote

More info...
Principal Engineer Software (Cloud Management

California, Santaclara, Et cetera

Principal Engineer Software (Cloud Management

More info...

Job Details

ID

JC54615951
State

California
City

Santaclara
Job type

Full-time
Salary

N/A
Hiring Company

PayNearMe
Date

2025-10-06
Deadline

2025-12-05
Category

Et cetera
Apply
Print

Site Reliability Engineer - Remote

Site Reliability Engineer - Remote

Site Reliability Engineer - Remote

Related jobs

Principal Developer Experience Engineer - Remote

Staff Technical Program Manager (Reliability and Quality) - Remote

Principal Software Engineer (Prisma Access - Networking)

Sr Principal Software Engineer (L7 Security)

Technical Support Engineer, Cortex Cloud

Staff Product Manager - Remote

Principal Engineer Software (Cloud Management

Job Details

Navigation

Vacancies