← Site Reliability Engineering

Monitoring & Alerting

Alerts that mean something — page a human only when a human is needed.

An alert that doesn't need a human is quietly training your team to ignore alerts. I design symptom-based, actionable alerting tied to your SLOs and cut the noise that causes fatigue.

Every page links to a runbook, so whoever's on call knows what to do at 3am — and on-call becomes sustainable instead of soul-destroying.

What's included

Related articles

Designing alerts nobody ignores

Noisy alerts train your team to ignore the real one. A deep, practical guide to symptom-based, multi-window multi-burn-rate SLO alerting — the burn-rate maths, copy-pasteable PromQL, and the on-call process that makes pages trustworthy again.

Site Reliability Engineering

Let's talk about your project.

Tell me about your system and what you're trying to achieve — I'll tell you honestly how I can help.

Start a conversation

Find me on social media