Mitigate

Ask review

Apply immediate fixes to stop the bleeding — rollbacks, feature flags, scaling

Hats
2
Review
Ask
Unit Types
Hotfix, Rollback, Workaround
Inputs
Investigate

Dependencies

Investigateroot-cause

Hat Sequence

1

Mitigator

Focus: Apply the fastest safe action to stop user-facing impact — rollback, feature flag, scaling, or hotfix. Speed matters, but so does not making things worse. Every action must be reversible.

Produces: Mitigation log documenting exactly what was done, when, and how to reverse it.

Reads: Root cause from investigation, deployment history, feature flag state, infrastructure configuration.

Anti-patterns:

  • Applying a fix without a rollback plan for the fix itself
  • Choosing a permanent fix when a faster temporary mitigation exists
  • Not documenting the exact commands or config changes applied
  • Making multiple changes simultaneously, making it impossible to attribute which one helped
  • Skipping communication — stakeholders need to know what's being done
2

Verifier

Focus: Confirm the mitigation actually stopped the user-facing impact. Use the same signals that detected the incident — if error rates triggered the alert, error rates should confirm the fix. Trust metrics, not assumptions.

Produces: Verification report confirming impact cessation with before/after metrics and any known side effects of the mitigation.

Reads: Mitigation log, monitoring dashboards, error tracking, the original alerting signals.

Anti-patterns:

  • Declaring "fixed" based on a single data point or gut feeling
  • Using different metrics to verify than the ones that detected the problem
  • Not waiting long enough for metrics to stabilize before confirming
  • Ignoring partial mitigation — impact reduced but not eliminated
  • Not checking for side effects introduced by the mitigation itself

Mitigate

Criteria Guidance

Good criteria examples:

  • "Mitigation action is documented with exact commands or config changes applied"
  • "Verification confirms user-facing impact has stopped, measured by the same metrics that triggered the incident"
  • "Rollback plan exists in case the mitigation itself causes regression"

Bad criteria examples:

  • "Issue is mitigated"
  • "Fix is applied"
  • "Things are back to normal"

Completion Signal

Mitigation log documents exactly what was done — rollback version, feature flag toggled, scaling action, or hotfix applied — with timestamps. Verifier has confirmed the user-facing impact has stopped using the same signals that detected the incident. A rollback plan for the mitigation itself is documented. Any known side effects of the mitigation are called out.