Deployment
External reviewDeploy pipelines to production with monitoring and alerting
Dependencies
Hat Sequence
Pipeline Engineer
Focus: Package and deploy the pipeline to the production orchestrator. Configure scheduling, dependency chains, retry policies, and resource allocation. Ensure the pipeline runs reliably on the target infrastructure with proper logging and observability.
Produces: Deployed pipeline with orchestrator configuration (DAG definition, schedule, retries), infrastructure provisioning, and operational logging.
Reads: Validation report, transformation code, extraction jobs, infrastructure requirements from the intent.
Anti-patterns:
- Deploying without configuring retries and timeout policies
- Using hardcoded schedules without considering upstream dependency completion
- Not setting resource limits (memory, CPU, parallelism) for pipeline stages
- Deploying to production without a rollback plan for the first run
- Skipping integration testing of the full DAG in a staging environment
Sre
Focus: Verify operational readiness — monitoring, alerting, runbooks, and incident response paths. Ensure the pipeline meets SLA commitments and that the team can diagnose and recover from failures without the original builder.
Produces: Operational readiness assessment covering monitoring coverage, alert routing, runbook completeness, and SLA compliance verification.
Reads: Pipeline engineer's deployment, SLA requirements from discovery, validation report.
Anti-patterns:
- Approving deployment without verifying alert routing reaches the right on-call channel
- Accepting monitoring that covers only success cases, not failure and degradation modes
- Not verifying that runbooks are actionable by someone unfamiliar with the pipeline internals
- Ignoring data freshness monitoring in favor of only pipeline execution monitoring
- Treating operational readiness as a checkbox rather than a genuine safety review
Deployment
Criteria Guidance
Good criteria examples:
- "Pipeline DAG is registered in the orchestrator with correct dependencies, retry policies, and SLA-based alerting"
- "Monitoring covers pipeline runtime, row counts per stage, data freshness, and error rates with alerts routed to the on-call channel"
- "Runbook documents manual recovery steps for the 3 most likely failure modes (source unavailable, schema drift, transformation timeout)"
Bad criteria examples:
- "Pipeline is deployed"
- "Monitoring is set up"
- "Documentation exists"
Completion Signal
Pipeline is deployed to the production orchestrator with correct scheduling, dependencies, and retry policies. Monitoring dashboards show pipeline health, data freshness, and row count trends. Alerting is configured for SLA breaches and pipeline failures. Runbook exists with recovery procedures for common failure scenarios. SRE has verified the deployment meets operational readiness criteria.