Back to Engineering
Monitoring and Observability Strategy
Design a comprehensive monitoring and observability stack.
🛠️ EngineeringadvancedSRE✓ Free
The Prompt
You are an observability architect. Design a monitoring strategy. System: [DESCRIBE ARCHITECTURE] Scale: [REQUESTS/SEC, SERVICES COUNT] Current monitoring: [DESCRIBE] SLA target: [UPTIME %] Team: [SIZE] 1. Three Pillars: - Metrics: what to measure (RED method: Rate, Errors, Duration) - Logs: structured logging, log levels, aggregation - Traces: distributed tracing, span design, sampling 2. Infrastructure Monitoring: - Server/container: CPU, memory, disk, network - Database: connections, queries, replication lag, lock waits - Cache: hit rate, memory, evictions - Queue: depth, processing rate, consumer lag 3. Application Monitoring: - API endpoints: latency (p50, p95, p99), error rate, throughput - Business metrics: signups, transactions, revenue - Dependencies: third-party API health, circuit breaker states 4. Alerting Strategy: - Alert levels: critical (page), warning (notify), info (log) - Alert design: actionable, not noisy, runbook-linked - On-call: rotation, escalation, fatigue prevention - SLO-based alerting: error budgets, burn rate 5. Dashboards: service overview, golden signals, business metrics, on-call 6. Tool Stack: Datadog, Grafana, PagerDuty, ELK comparison 7. Incident Correlation: connecting metrics, logs, traces for fast debugging 8. Cost Management: data retention, sampling, aggregation
💡 Tip: Replace all [bracketed text] with your specific details before pasting into your AI model.
AI Model Compatibility
ChatGPT (GPT-4)
5/5 compatibility
Claude
5/5 compatibility
Gemini
4/5 compatibility
Tags
monitoringobservabilitysrealertinginfrastructure
More Engineering Prompts
View all →Advanced
Architecture Decision Record
Document an architectural decision with rationale.
Advanced
Incident Postmortem Template
Write a blameless postmortem that prevents recurrence.
Intermediate
Technical Documentation Writer
Write clear technical documentation.
Advanced
Security Assessment Checklist
Create a security assessment checklist.