End to end responsibility for the management, communication, escalation, investigation and resolution of incidents, ensuring Business and Customer updates are timely and of sufficient quality, arranging discussions and updates as required.
Ensure that all modes of communication are effectively used throughout the incident life cycle.
Acting as Incident escalation focal point, identifying and resolving conflict and bottlenecks.
Creation of agreed action plans with named actions & deadlines. Accountable for the Delivery of that plan
Document post incident recovery steps in order to establish Root Cause, aid in Process improvements, identify deviations and to enable creation of a Knowledge Base.
Driving, developing and managing the major incident process and associated procedures / systems
Taking all the preventive actions to minimize the service and business impact in case resolution time seems to be high.
Conducting a thorough analysis and preparing the Major Incident Report ("MIR") for every Major Incident after it is closed.
Ensuring that all the resolution procedures are updated in the knowledge database / Work log
Be an Evangelist for the Incident Management Process
Expertise in ITIL processes
Experience in playing the Incident Manager role in Cloud and DevOps environment
Experience in playing the role in Tech companies
Nice to Have: · Aspiring Site Reliability Engineer · Scripting knowledge · Proactively recognize issues by interpreting Monitoring data. · Growth mindset