The Director, Platform SRO is a senior, hands-on technical leader responsible for ensuring the stability, resilience, and operational readiness of mission-critical broadcast linear, live event, and digital media platforms. Operating in high-pressure, real-time environments, this consultant leads major incident response efforts, supports on-air and live-event continuity, and partners closely with engineering, broadcast operations, production, and vendor teams to minimize service disruption and audience impact. The role requires deep practical experience with media workflows, rapid troubleshooting during live events, and the ability to make sound technical decisions under tight time constraints. Beyond reactive incident response, the Director plays a strategic role in improving long-term system reliability and operational maturity. By applying SRO/SRE principles adapted for media environments, the consultant identifies systemic risks, drives root cause analysis, strengthens monitoring and observability, and improves operational processes across broadcast and digital ecosystems. This role balances immediate hands-on execution with advisory leadership, helping organizations build more resilient architectures, clearer incident processes, and greater confidence in their ability to support live, always-on media operations. Responsibilities Lead and coordinate high-severity incident response for broadcast linear channels, live events, and digital media platforms, serving as incident commander when required Rapidly triage and troubleshoot issues across media workflows, including playout, live production, contribution/distribution, and OTT delivery Establish, refine, and execute incident management processes, including escalation models, on-call coordination, communications, and severity classification Produce post-incident reviews, root cause analyses, and corrective action plans to prevent recurrence and reduce operational risk Assess system reliability, fault tolerance, and operational readiness across on-prem, hybrid, and cloud-based media architectures Identify single points of failure and recommend architectural, workflow, and operational improvements to enhance availability and resilience Define and improve monitoring, alerting, and observability strategies tailored to real-time broadcast and live event environments Support disaster recovery, failover planning, and live-event readiness reviews, including testing and validation Develop and maintain operational runbooks, standard operating procedures, and incident documentation Partner with engineering, broadcast operations, production teams, and vendors to align reliability practices with on-air and live-event requirements Mentor teams on incident response best practices, reliability engineering concepts, and continuous improvement Advise leadership on operational risk, system health, and reliability priorities for critical media platforms