24x7 Event/Alert/Incident Monitoring support for in-scope infra, Apps and Cloud Management
Capture Alerts or Situation, then raise incident tickets. Perform SOP based support and escalate
to respective Technology Teams
Provide environmental support and handle escalations
Perform end to end Incident Management for event-based incidents
Monitor Batch Job (Job Scheduling) Alerts and Handle Batch Job Requests
Provide phone support 24x7x365
Initiates, coordinates and collaborate with Tools, Technology Team, Service Desk and Vendor
Reduces the workload of the technology tracks by performing Instruction based (SOP) Level 1.5
troubleshooting and try to resolve Incident tickets at Command Center Level level.
Assist in High Severity Incidents: Initiate Critical Bridges and work closely with CIM/MIM
o Act as a Situation Manager and assist dedicated CIM/ MIM Team (impact analysis, initiate
bridge, inform MIM/CIM, inform Service Desk, inform Business, inform oncall person,
send out hourly report, send out closer report, find out recurring issue or in the past this
issue has been reported, if required run business bridge) etc
Work in rotational shifts to provide 24/7 monitoring support for IT infrastructure.
Monitor in-scope infra, Apps and Cloud Management with various monitoring tools for example
o Monitoring Tool : Moogsoft, Splunk, iTOM, Big Panda, Solarwinds, SCOM, Dynatrace,
AppDynamics, Net cool, Tivoli, HP NNM, HP OVO, LogicMonitor, Grafana, Science Logic,
Nagios, Nimsoft, Zabbix, ManageEngine, DataDog, Vmware, WhatsUp Gold, New Relic,
o ITSM Tool : Service Now. Cherwell, Remedy, HPSC, HPSM, SalesForce, Service Desk Plus
o Batch Job Scheduler : Control-M, Autosys, Redwood, Dollar Universe (DU), TWS, Tidal,
IBM Workload Automation,
Analyze, acknowledge & record each & every Alert / Event / Situation in the monitoring tools &
Create incidents as per their impact (Severity)
Escalation & Notification to the relevant teams & stakeholders to ensure SLA compliance &
minimal impact on the business.
Strict adherence to the specified response & resolution timelines mentioned in SLA. (Resolution
includes where level 1.5 troubleshooting is in Teams scope.
Act as a trigger for the critical incident management process by involving the technical & Critical
incident management team.
Coordinate with all the technical teams to assist in providing accurate & timely updates to the
Technical Team and customer counterpart till issue resolution.
Coordinate all faulty hardware replacement, capacity expansion, server
installation/decommissioning & other project management initiatives with the vendors,
partners, internal teams.
Train & absorb the level 1.5 troubleshooting and other operational tasks from the various
Assist the team lead in updating the run book and other technical and process documents for
benefit of the entire team.
Escalate any inconsistencies in the monitoring environment with respect to the monitoring tool
configuration, alert thresholds, alert message enrichment & false alerts.
Handover any incomplete tasks, open alerts, incidents and outages reports to the next shift.
Discuss operational challenges and constraints in team meetings and with the management to
ensure timely resolution.
Coordinate with Hands and feet support team for Faulty Hardware replacement. Escalate the
Environment Monitoring Alerts to H&F team and co-ordinate for resolution
EXPERIENCE & SKILL
3-4 Years of University education post High school (B.Sc. or BCA or Diploma)
1-2 Years of working experience in Information Technology
Preferred Certification in ITIL/MSCE/MSCA/CCNA or RHCE.
Preferably 1-2 Years of alert monitoring/management experience.
Should be aware of ITIL's Event, Incident, Problem and Change management module.
Should have worked in high pressure work environments and ability to multitask.
Basic understanding of L1.5 support
Experience on Windows/Unix Servers, AD, Network Devices, Database, Storage & Backup, Job
Scheduling or Cloud computing.
Excellent Verbal and written communication skills.
Hands-on experience with the following:
Monitor in-scope infra, Apps and Cloud Management with various monitoring tools
o Monitoring Tool : Moogsoft, Splunk, iTOM, Big Panda, Solarwinds, SCOM,
Dynatrace, AppDynamics, Net cool, Tivoli, HP NNM, HP OVO, LogicMonitor,
Grafana, Science Logic, Nagios, Nimsoft, Zabbix, ManageEngine, DataDog,
Vmware, WhatsUp Gold, New Relic, SiteScope
o ITSM Tool : Service Now. Cherwell, Remedy, HPSC, HPSM, SalesForce,
Service Desk Plus etc
o Batch Job Scheduler : Control-M, Autosys, Redwood, Dollar Universe (DU),