The Observability Platform Engineer is responsible for designing, building, and maintaining observability platform tools and frameworks that enable development and operations teams to monitor and improve the performance, availability, and reliability of systems. This role involves designing and implementing systems that monitor and analyze the performance/health of software applications and infrastructure, ensuring high availability and reliability. The engineer will collaborate closely with development, site reliability engineering, DevOps, and infrastructure teams to deliver a seamless observability ecosystem. Key responsibilities include architecting observability platforms, integrating monitoring tools into software pipelines, ensuring system health visibility, reducing mean time to detection (MTTD), and promoting a culture of proactive monitoring and reliability engineering. What you will own:  Design, build, and maintain observability platforms with reusability across services in mind.  Develop scalable, automated pipelines for ingesting, transforming, and visualizing telemetry data. Integrate observability tools (e.g., Dynatrace, Splunk, Prometheus, Grafana, Splunk, Datadog, New Relic, OpenTelemetry) with existing infrastructure and applications. Enable root cause analysis through correlation of metrics, logs, and traces. Analyze telemetry data to identify performance bottlenecks and optimize resource allocation for improved efficiency  Define SLIs, SLOs, and error budgets with stakeholders for critical services. Improve incident response by enhancing monitoring dashboards, alerts, and automated notifications.