Refine the Model

After generating the draft model with the wizard, the next step is to refine and adjust the data warehouse structure. The generated model provides a complete baseline, but it must be validated and adapted to match business logic, data quality, and performance requirements.

This step focuses on defining keys, adjusting transformations, handling historization, and ensuring that the generated joins and structures reflect the intended analytical model.

Purpose

Validate and adjust the generated data warehouse model to ensure correct business logic, data relationships, and performance behavior.

Design Principle

AnalyticsCreator generates a structurally complete model, but correctness is achieved through refinement.

  • Automation provides the structure
  • Manual refinement ensures semantic accuracy

Developers work on metadata definitions, not directly on SQL, and all changes are reflected in generated code during synchronization.

Inputs / Outputs

Inputs

  • Generated draft model (STG, CORE, DM)
  • Source metadata and relationships
  • Business requirements and logic

Outputs

  • Refined transformations
  • Defined business keys and surrogate keys
  • Adjusted joins and relationships
  • Configured historization behavior

Internal Mechanics

1. Column selection and cleanup

Generated transformations often include all available columns. Unnecessary attributes should be removed to reduce model complexity and improve performance.

2. Business key definition

Business keys must be validated or defined explicitly. These keys determine:

  • Uniqueness of entities
  • Join conditions between tables
  • Basis for historization

3. Surrogate key generation

AnalyticsCreator generates surrogate keys automatically. Depending on the modeling approach:

  • Identity-based keys (e.g. integer)
  • Hash-based keys (for Data Vault or hybrid models)

Hash keys are typically generated in the staging layer as calculated and persisted columns.

4. Relationship validation

Automatically generated joins should be reviewed. This includes:

  • Correct join paths
  • Cardinality assumptions
  • Inclusion of required tables

5. Historization configuration

Historization is applied in persistent staging and CORE layers. Typical behavior includes:

  • Valid-from and valid-to columns
  • Tracking changes over time

The historization strategy should be verified for correctness and performance impact.

6. Macro usage

Reusable SQL logic is implemented using macros. For example:

  • Hash key generation
  • Standard transformations

Macros allow centralized control of repeated logic without modifying generated SQL directly.

7. Dimension and fact adjustments

Fact tables and dimensions generated by the wizard should be refined:

  • Remove unnecessary joins
  • Add required attributes
  • Ensure correct grain of fact tables

8. Calendar and date handling

Date columns should typically be replaced by references to a calendar dimension. This is often done using predefined macros.

Types / Variants

Key strategies

  • Business keys only
  • Surrogate keys (identity)
  • Hash-based keys

Historization strategies

  • SCD2 (valid-from / valid-to)
  • Snapshot-based access
  • Current-state only

Transformation styles

  • Fully generated
  • Adjusted via metadata
  • Extended with custom SQL logic

Example

A generated fact table includes all columns from multiple related tables.

Refinement steps:

  • Remove unnecessary attributes
  • Validate join between Orders and Customers
  • Define surrogate key for dimension tables
  • Replace date columns with calendar dimension references

Example adjustment:

-- Before refinement
SELECT *
FROM stg_orders o
JOIN stg_customer c ON o.customer_id = c.customer_id;

-- After refinement (conceptual)
SELECT 
    o.order_id,
    c.customer_key,
    o.order_date_key,
    o.amount
FROM core_orders o
JOIN dim_customer c ON o.customer_key = c.customer_key;

When to Use / When NOT to Use

Use when

  • After running the wizard
  • Validating generated model structures
  • Aligning model with business logic

Do NOT skip when

  • Working with complex source systems
  • Data quality issues exist
  • Performance requirements are strict

Performance & Design Considerations

  • Reducing column count improves performance
  • Incorrect joins can cause data duplication
  • Historization increases storage and processing cost
  • Hash keys improve scalability but add computation overhead

Design trade-off:

  • Automation speed vs model accuracy
  • Flexibility vs standardization

Integration with other AnalyticsCreator features

  • Wizard: provides initial model
  • Macros: define reusable SQL logic
  • Synchronization: generates SQL from refined metadata
  • Deployment: uses finalized model for artifact creation

Common Pitfalls

  • Leaving generated joins unvalidated
  • Using incorrect business keys
  • Overloading fact tables with unnecessary attributes
  • Ignoring historization impact on performance
  • Mixing business logic directly into SQL instead of metadata

Key Takeaway

The generated model must be refined to ensure correct business logic, keys, and performance before SQL generation and deployment.