Investigate
Auto reviewRoot cause analysis, log analysis, and timeline reconstruction
Dependencies
Hat Sequence
Investigator
Focus: Reconstruct the incident timeline, form and test root cause hypotheses, and distinguish the root cause from contributing factors. Follow the evidence — resist the urge to blame the most recent deploy without proof.
Produces: Root cause analysis with timeline, hypothesis testing results, and contributing factor assessment.
Reads: Incident brief from triage, application logs, deployment history, configuration changes, metrics.
Anti-patterns:
- Assuming the most recent change is the cause without evidence
- Stopping at the first plausible explanation without testing alternatives
- Confusing correlation with causation (e.g., "it broke after the deploy" is not proof the deploy caused it)
- Not documenting ruled-out hypotheses and the evidence that eliminated them
- Investigating in isolation without sharing findings with the log-analyst
Log Analyst
Focus: Deep-dive into logs, metrics, and traces to find concrete evidence supporting or refuting root cause hypotheses. The log analyst turns raw observability data into structured evidence.
Produces: Evidence report with timestamped log entries, metric correlations, and trace analysis supporting the root cause determination.
Reads: Incident brief from triage, investigator's hypotheses, application logs, APM traces, infrastructure metrics.
Anti-patterns:
- Searching logs without a hypothesis to test — fishing expeditions waste time during incidents
- Presenting raw log output without synthesis or interpretation
- Ignoring logs from adjacent systems that may reveal upstream causes
- Not correlating timestamps across different data sources
- Treating absence of error logs as evidence of no problem
Investigate
Criteria Guidance
Good criteria examples:
- "Timeline reconstructs the incident from first anomaly to detection with timestamps from at least 2 independent sources"
- "Root cause hypothesis is supported by log evidence with specific entries cited"
- "Contributing factors are distinguished from the root cause with evidence for each"
Bad criteria examples:
- "Root cause is found"
- "Logs are analyzed"
- "Investigation is thorough"
Completion Signal
Root cause document exists with a reconstructed timeline from first anomaly through detection and escalation. Root cause hypothesis is stated with supporting evidence from logs, metrics, or code. Contributing factors are identified separately. Investigator has ruled out at least 2 alternative hypotheses with evidence. The root cause is specific enough to inform a targeted mitigation.