Historization with AnalyticsCreator

Historization in AnalyticsCreator is applied after source import and before downstream analytical modeling. Source data is first loaded into staging, then written into a persistent staging or historization layer, and only then used in CORE and datamart transformations.

Historization in AnalyticsCreator

Table historization

Stores changing records with validity periods and surrogate keys. This page focuses on that pattern.

Column historization

Individual columns can use full history, equal-only behavior, or no change tracking.

Transformation historization

Downstream transformations can consume historized data as current-state, snapshot-based, or full historical output.

Join historization

Historized joins can apply different validity rules when combining time-dependent structures.

Purpose

Explain how AnalyticsCreator stores and processes changing data over time so that previous states remain available for analysis and downstream processing.

Design Principle

AnalyticsCreator treats historization as a warehouse-layer concern, not a reporting-layer concern.

  • Source data is imported first
  • Historization is applied in persistent staging
  • Downstream transformations consume historized data

This design separates source extraction from change tracking and allows historical states to be reused across multiple downstream transformations.

Inputs / Outputs

Inputs

  • Imported source table or source query in staging
  • Key definition used to identify records across loads
  • Column-level historization settings
  • Missing-source behavior
  • Optional filters and variables

Outputs

  • Historized table in persistent staging
  • Valid-from column
  • Valid-to column
  • Generated surrogate key
  • Generated historization stored procedure
  • Optional snapshot-aware downstream transformations

Internal Mechanics

1. Import first, historize second

AnalyticsCreator first imports source data into a staging table. Historization is executed after the import step, not during initial extraction.

2. Historized table structure

A historized table typically contains:

  • Business key or source key
  • Tracked attributes
  • Valid-from field
  • Valid-to field
  • Surrogate key field

This is the standard SCD Type 2 structure used to preserve previous states of a record.

3. Column-level historization behavior

Historization is configurable per column. AnalyticsCreator supports different behaviors per attribute:

  • Full history – changes create new historical versions
  • Equal only / SCD1-style – current value is updated without creating a history row
  • None – the column is ignored for change tracking

This allows mixed historization strategies within the same object.

4. Missing-source behavior

AnalyticsCreator allows explicit handling of records that disappear from the source:

  • Close the current record by setting the end of validity
  • Leave the current record open
  • Optionally insert an empty record to avoid timeline gaps

This is important when rows can temporarily disappear and reappear later.

5. Generated historization procedure

Historization logic is generated as a stored procedure. The procedure is specific to the object configuration and can be reviewed or extended if required.

6. Downstream consumption

Downstream transformations can consume historized data in different ways:

  • Actual only – only the currently valid row is used
  • Snapshot – rows are selected based on a snapshot date between valid-from and valid-to
  • Full historical – all historical states remain available

Types / Variants

Column-level change tracking

  • Full history
  • Equal only
  • None

Missing-source behavior

  • Close validity
  • Keep open
  • Add empty record

Consumption behavior

  • Current-state only
  • Snapshot-based
  • Full historical access

Example

Assume a staging table contains customer data:

stg_customer
(
    customer_id,
    name,
    city
)

A historized table generated from it may look like this:

pst_customer_history
(
    sats_id       bigint,
    customer_id   int,
    name          nvarchar(100),
    city          nvarchar(100),
    date_from     datetime,
    date_to       datetime
)

A representative SCD Type 2 pattern is:

-- close current row when tracked attributes changed
UPDATE tgt
SET date_to = @load_ts
FROM pst_customer_history tgt
JOIN stg_customer src
  ON tgt.customer_id = src.customer_id
 AND tgt.date_to IS NULL
WHERE
    ISNULL(tgt.name, '') <> ISNULL(src.name, '')
 OR ISNULL(tgt.city, '') <> ISNULL(src.city, '');

-- insert new current row
INSERT INTO pst_customer_history
(
    customer_id,
    name,
    city,
    date_from,
    date_to
)
SELECT
    src.customer_id,
    src.name,
    src.city,
    @load_ts,
    NULL
FROM stg_customer src
LEFT JOIN pst_customer_history tgt
  ON tgt.customer_id = src.customer_id
 AND tgt.date_to IS NULL
WHERE
    tgt.customer_id IS NULL
 OR ISNULL(tgt.name, '') <> ISNULL(src.name, '')
 OR ISNULL(tgt.city, '') <> ISNULL(src.city, '');

This pattern shows the core behavior: current rows are closed when tracked attributes change, and a new current row is inserted.

When to Use / When NOT to Use

Use when

  • Attribute changes must remain traceable over time
  • Source systems overwrite current values
  • Point-in-time analysis is required
  • Previous business states must remain queryable

Do NOT use when

  • Only the latest state matters
  • The source already provides reliable historized data and duplicate historization is unnecessary
  • Storage growth from history rows is not acceptable

Performance & Design Considerations

  • Historization increases write volume and storage usage
  • Over-tracking noisy columns creates unnecessary row churn
  • Wrong missing-source settings can produce incorrect history timelines
  • Snapshot-based downstream joins are more expensive than current-state joins

Design trade-off:

  • Full history gives maximum traceability
  • Selective tracking improves storage and runtime efficiency

Integration with other AnalyticsCreator features

  • Import packages and pipelines provide the staging input for historization
  • Transformation historization types define how historized rows are consumed downstream
  • Macros can support key generation in hybrid and Data Vault scenarios
  • Persisting can materialize downstream historized views for performance
  • Datamarts and semantic models can consume current-state or snapshot-based outputs

Common Pitfalls

  • Tracking all columns as SCD Type 2 even when some should be SCD1 or ignored
  • Not defining missing-source behavior
  • Using current-state-only logic when point-in-time analysis is required
  • Overusing snapshot logic in scenarios where current-state output is sufficient
  • Changing generated SQL directly instead of fixing metadata configuration

Metadata Representation

Historization metadata in AnalyticsCreator typically includes:

  • Source object reference
  • Key definition
  • Per-column historization mode
  • Missing-source behavior
  • Optional filters and variables
  • Downstream transformation historization mode

These settings are stored in the repository and used to generate historization logic and dependent transformations.

Deployment Behavior

Build

  • Reads metadata from the repository
  • Generates historized tables
  • Generates historization procedures
  • Builds downstream transformations

Deploy

  • Deploys historized table structures
  • Deploys generated procedures
  • Deploys orchestration assets

Refresh / Execute

  • Imports source data into staging
  • Executes historization logic
  • Refreshes downstream CORE and datamart structures

Key Design Principle

  • What is generated? Historized tables, surrogate keys, validity columns, and historization procedures
  • When is it executed? After source import and before downstream transformation processing
  • Where is it stored? In the persistent staging or historization layer
  • How does it scale? Through selective column tracking, explicit missing-source handling, filtered loads, and downstream consumption control

Key Takeaway

AnalyticsCreator implements SCD Type 2 historization as a configurable warehouse-layer service that preserves previous states through validity windows, surrogate keys, and generated historization procedures.