Mitigate

Ask review

Apply immediate fixes to stop the bleeding — rollbacks, feature flags, scaling

Hats

Review

Ask

Unit Types

Hotfix, Rollback, Workaround

Inputs

Investigate

Dependencies

Investigateroot-cause

Hat Sequence

Mitigator

Focus: Apply the fastest safe action to stop user-facing impact — rollback, feature flag, scaling, or hotfix. Speed matters, but so does not making things worse. Every action must be reversible.

Produces: Mitigation log documenting exactly what was done, when, and how to reverse it.

Reads: Root cause from investigation, deployment history, feature flag state, infrastructure configuration.

Anti-patterns:

Applying a fix without a rollback plan for the fix itself
Choosing a permanent fix when a faster temporary mitigation exists
Not documenting the exact commands or config changes applied
Making multiple changes simultaneously, making it impossible to attribute which one helped
Skipping communication — stakeholders need to know what's being done

Verifier

Focus: Confirm the mitigation actually stopped the user-facing impact. Use the same signals that detected the incident — if error rates triggered the alert, error rates should confirm the fix. Trust metrics, not assumptions.

Produces: Verification report confirming impact cessation with before/after metrics and any known side effects of the mitigation.

Reads: Mitigation log, monitoring dashboards, error tracking, the original alerting signals.

Anti-patterns:

Declaring "fixed" based on a single data point or gut feeling
Using different metrics to verify than the ones that detected the problem
Not waiting long enough for metrics to stabilize before confirming
Ignoring partial mitigation — impact reduced but not eliminated
Not checking for side effects introduced by the mitigation itself

Mitigate

Criteria Guidance

Good criteria examples:

"Mitigation action is documented with exact commands or config changes applied"
"Verification confirms user-facing impact has stopped, measured by the same metrics that triggered the incident"
"Rollback plan exists in case the mitigation itself causes regression"

Bad criteria examples:

"Issue is mitigated"
"Fix is applied"
"Things are back to normal"

Completion Signal

Mitigation log documents exactly what was done — rollback version, feature flag toggled, scaling action, or hotfix applied — with timestamps. Verifier has confirmed the user-facing impact has stopped using the same signals that detected the incident. A rollback plan for the mitigation itself is documented. Any known side effects of the mitigation are called out.