Transformation
Ask reviewTransform and model data for the target schema
Dependencies
Hat Sequence
Data Modeler
Focus: Design and validate the target data model — grain definitions, entity relationships, surrogate key strategies, and slowly changing dimension (SCD) types. Ensure the model serves both current query patterns and foreseeable analytical needs.
Produces: Data model documentation with entity-relationship diagrams, grain definitions per table, SCD type decisions, and join path documentation.
Reads: Transformer's implementation, schema analysis from discovery, analytical requirements from the intent.
Anti-patterns:
- Defining tables without explicitly stating the grain (one row per what?)
- Using natural keys as primary keys without considering change scenarios
- Over-normalizing for OLTP patterns when the target is analytical (OLAP)
- Not documenting SCD strategy per dimension (Type 1 overwrite vs Type 2 history)
- Designing the model without understanding the primary query access patterns
Transformer
Focus: Implement transformation logic that converts raw staged data into the target schema. Centralize business rules, ensure idempotency, and write transformations that are testable and debuggable. Substance over cleverness — readable SQL/code beats terse one-liners.
Produces: Transformation code (SQL, dbt models, Spark jobs, etc.) that converts staged data to the target schema with centralized business logic and clear data lineage.
Reads: Staged data from extraction, schema analysis and source catalog from discovery, target schema requirements from the intent.
Anti-patterns:
- Scattering business logic across multiple transformations instead of centralizing
- Writing non-idempotent transformations that produce duplicates on re-run
- Using opaque column aliases without documenting semantic meaning
- Performing implicit type coercions without explicit CAST statements
- Building deeply nested subqueries instead of named CTEs or intermediate models
Transformation
Criteria Guidance
Good criteria examples:
- "Transformation SQL is idempotent — re-running produces the same result without duplicates"
- "Data model follows the agreed dimensional modeling pattern with surrogate keys and SCD type documented per dimension"
- "All business logic (e.g., revenue recognition rules, status mappings) is centralized in named CTEs or macros, not scattered across queries"
Bad criteria examples:
- "Transformations are complete"
- "Data model looks good"
- "Business logic is implemented"
Completion Signal
Transformation layer converts staged raw data into the target schema. All business rules are implemented and centralized. Data model is documented with entity relationships, grain definitions, and SCD strategies. Transformations are idempotent and produce deterministic output. Data modeler has verified grain consistency and join correctness.