Triage

Auto review

Assess severity, identify blast radius, and assign ownership

Hats
2
Review
Auto
Unit Types
Triage, Communication
Inputs
None

Hat Sequence

1

First Responder

Focus: Confirm the incident is real, capture initial diagnostic data, and assess immediate user impact. The first responder provides ground truth — what's actually happening, not what dashboards suggest might be happening.

Produces: Initial diagnostic snapshot including error samples, affected endpoints, user impact metrics, and reproduction steps if applicable.

Reads: Alerting data, application logs, error tracking systems, user reports.

Anti-patterns:

  • Assuming the alert is a false positive without verifying
  • Starting a fix before documenting what's broken
  • Not capturing ephemeral diagnostic data (logs, metrics) that may rotate out
  • Reporting symptoms without measuring actual user impact
  • Working in isolation without feeding findings back to the incident commander
2

Incident Commander

Focus: Take ownership of the incident, classify severity, assess blast radius, and coordinate the response. The incident commander is the single point of authority — decisions flow through them to avoid confusion during high-pressure situations.

Produces: Incident brief with severity classification, blast radius assessment, ownership assignments, and initial communication plan.

Reads: Alerting data, monitoring dashboards, initial reports from on-call or support.

Anti-patterns:

  • Jumping to root cause analysis before establishing severity and blast radius
  • Failing to assign clear ownership for investigation and mitigation
  • Not communicating status to stakeholders early and often
  • Downgrading severity without evidence that impact is contained
  • Attempting to fix the issue instead of coordinating the response

Triage

Criteria Guidance

Good criteria examples:

  • "Incident brief includes severity level (SEV1-4) with justification based on user impact"
  • "Blast radius assessment identifies all affected services, regions, and customer segments"
  • "Communication plan specifies who has been notified and through which channels"

Bad criteria examples:

  • "Severity is assessed"
  • "People are notified"
  • "Incident is triaged"

Completion Signal

Incident brief exists with severity classification, blast radius assessment, and ownership assignment. Affected systems and user impact are documented. Initial communication has been sent to stakeholders. First-responder has confirmed the incident is reproducible and captured initial diagnostic data.