Engineering

Data Pipeline Studio

Data engineering lifecycle for ETL pipelines, data warehouses, and analytics workflows

5 stages10 hatsPersistence: gitDelivery: pull-request

Stage Pipeline

Stage Details

DiscoveryAuto review

Understand data sources, schemas, volumes, and SLAs

Hats

Data Architect

Map the data landscape — sources, targets, volumes, latency requirements, and system constraints. Define the high-level data flow architecture and identify integration patterns (batch, streaming, CDC) appropriate for each source-target pair.

Schema Analyst

Profile source schemas in detail — column types, nullability, cardinality, encoding, and semantic meaning. Identify type conflicts, naming inconsistencies, and data quality issues that will affect downstream transformation.

ExtractionAsk review

Design and implement data extraction from sources

Hats

Connector Reviewer

Review extraction implementations for reliability, idempotency, and operational safety. Verify that connectors handle schema drift, network failures, and partial extractions without data loss or duplication.

Extractor

Implement extraction logic that reliably moves data from sources to the staging area. Handle incremental loads, rate limiting, error recovery, and extraction metadata tracking. Prioritize correctness and idempotency over speed.

Requires: source-catalog from Discovery
TransformationAsk review

Transform and model data for the target schema

Hats

Data Modeler

Design and validate the target data model — grain definitions, entity relationships, surrogate key strategies, and slowly changing dimension (SCD) types. Ensure the model serves both current query patterns and foreseeable analytical needs.

Transformer

Implement transformation logic that converts raw staged data into the target schema. Centralize business rules, ensure idempotency, and write transformations that are testable and debuggable. Substance over cleverness — readable SQL/code beats terse one-liners.

Requires: staged-data from Extraction
ValidationAsk review

Validate data quality, schema compliance, and business rules

Hats

Data Quality Reviewer

Review the validation suite for coverage completeness and assertion quality. Verify that tests cover all critical data paths, that thresholds are appropriately tight, and that failure modes produce actionable diagnostics rather than opaque errors.

Validator

Build and run data quality checks that verify schema compliance, referential integrity, uniqueness, accepted value ranges, row count reconciliation, and business rule correctness. Every assertion should be specific, automated, and produce a clear pass/fail/warning result.

Requires: modeled-data from Transformation
DeploymentExternal review

Deploy pipelines to production with monitoring and alerting

Hats

Pipeline Engineer

Package and deploy the pipeline to the production orchestrator. Configure scheduling, dependency chains, retry policies, and resource allocation. Ensure the pipeline runs reliably on the target infrastructure with proper logging and observability.

Sre

Verify operational readiness — monitoring, alerting, runbooks, and incident response paths. Ensure the pipeline meets SLA commitments and that the team can diagnose and recover from failures without the original builder.

Requires: validation-report from Validation

Data Pipeline Studio

Lifecycle for data engineering work: ETL/ELT pipelines, data warehouse modeling, analytics workflows, and data integration projects. Use this studio when the intent involves moving, reshaping, or validating data across systems.

Appropriate for:

  • Building new ETL/ELT pipelines from source to target
  • Data warehouse or lakehouse schema design and implementation
  • Analytics pipeline development (batch or streaming)
  • Data migration between systems or schema versions
  • Data quality framework implementation