What I do

Three closely-related disciplines, owned end to end — not thrown over the fence between teams.

Notes from the field

Practical writing on reliability, architecture and operating real systems — no hype, no thought-leadership theatre.

SLOs that don't lie: measuring what users actually feel

Most SLOs are green while users suffer — they measure the system, not the person. How to build SLIs from real user journeys, give each journey the target it deserves, turn the gap into a team-owned error budget, and wire alerts that drill straight to the cause.

Designing alerts nobody ignores

Noisy alerts train your team to ignore the real one. A deep, practical guide to symptom-based, multi-window multi-burn-rate SLO alerting — the burn-rate maths, copy-pasteable PromQL, and the on-call process that makes pages trustworthy again.

Terraform modules that scale with your team, not against it

Reusable modules only scale if you treat them like products: small, reviewed, tested and versioned. A practical guide to building, releasing and consuming Terraform modules straight from GitHub — pinned to a tag or, when it matters, an immutable commit hash — with a Terragrunt layout that mirrors your estate.

Got a system that has to stay up?

Whether it's an architecture review, an SRE engagement, or a reliability fire you need help putting out — let's talk.

Get in touch

Find me on social media