EXL

Site Reliability Engineer(SRE)

EXL

British Indian Ocean Territory

Accepting Applications Full-time On-site LinkedIn
Posted 5 days, 4 hours ago 1 views 0 applications
Job Description
**Role Overview:** We are hiring an experienced SRE leader to manage a global Incident Management team and drive operational excellence for this engagement. The role involves leading teams, handling complex incidents, and improving overall incident response strategy. **Key Responsibilities:** * Lead and mentor a **team of SRE engineers (Level 7 ICs)** * Own **end\-to\-end incident management operations** across regions * Establish and drive **incident response processes and governance** * Ensure effective **16x7 delivery model across geographies** * Act as escalation point for **critical incidents and stakeholder communication** * Drive continuous improvements in **MTTM and operational efficiency** * Lead **process enhancements, SOP creation, and knowledge transfer planning** **Required Skills:** * Strong experience in **SRE / Incident Management leadership** * Proven ability to manage **high\-impact, complex incidents** * Excellent **communication, stakeholder management, and leadership skills** * Ability to **drive alignment and influence cross\-functional teams** **Technical Skills:** * Ability to proactively identify risks using monitoring tools such as DataDog and Grafana dashboards * Experience in incident response with capability to quickly restore services (restart, patch, or remediate live issues) * Strong focus on minimizing service downtime across environments * Hands\-on experience supporting both on\-premise (Linux environments) and cloud platforms (primarily Azure, with some exposure to GCP) * Solid understanding of networking concepts and system architecture
Login to Apply

Don't have an account? Register

About Company
Share this job