Back to Engineering

Engineering Team On-Call Playbook

Create an on-call playbook that minimizes incident impact and engineer burnout.

🛠️ EngineeringintermediateSRE Manager✓ Free

The Prompt

Create a comprehensive on-call playbook for [Team/Service]. Include: 1) On-call structure — rotation schedule (weekly, follow-the-sun), primary and secondary on-call, escalation chain, manager escalation criteria. 2) Alert design — alert severity levels (P1-P4) with response time SLAs, alert routing rules, noise reduction strategy, alert fatigue prevention. 3) Incident response — for each severity level, define: who to notify, communication channel, status page update cadence, customer communication template. 4) Runbook library — template for common incidents (service down, high latency, data inconsistency, security breach, dependency failure). Each runbook: symptoms, diagnosis steps, remediation steps, rollback procedure. 5) Incident management process — incident commander role, communication lead, subject matter experts, real-time documentation requirements. 6) Post-incident — blameless postmortem template (timeline, impact, root cause, contributing factors, action items), SLA for completing postmortems, action item tracking. 7) On-call wellness — compensation policy, handoff process, quiet hours expectations, burnout indicators and intervention. 8) Metrics — MTTA, MTTR, incidents per on-call shift, pages per shift, false positive rate, action item completion rate.

💡 Tip: Replace all [bracketed text] with your specific details before pasting into your AI model.

AI Model Compatibility

ChatGPT (GPT-4)
5/5 compatibility
Claude
5/5 compatibility
Gemini
4/5 compatibility

Tags

on-callincident responseSREreliability