AnalyticsCreator | Blog and Insights

Native Regex in SQL Server 2025 and How It Flows Straight Through AnalyticsCreator

Written by Gustavo Leo | Aug 14, 2025 6:15:58 AM

Executive Summary

SQL Server 2025 (public preview) introduces native regular expression (regex) functions in T-SQL: REGEXP_LIKE, REGEXP_REPLACE, and REGEXP_SUBSTR. Since AnalyticsCreator generates standard T-SQL scripts for transformations, these functions work transparently—define a column expression with a regex function, compile to your target database, and run.

What’s New in SQL Server 2025

  • REGEXP_LIKE(string, pattern [, flags]) — Boolean match test
  • REGEXP_REPLACE(string, pattern [, replacement [, start [, occurrence [, flags]]]]]) — Replace matches
  • REGEXP_SUBSTR(string, pattern [, start [, occurrence [, flags [, group]]]]]) — Extract substrings

Supported flags:

  • i — Case-insensitive
  • m — Multi-line mode
  • s — Dot matches newline
  • c — Case-sensitive (default)

Why This Matters for AnalyticsCreator Users

AnalyticsCreator compiles calculated columns, manual SQL, and scripted transformations to native SQL. On SQL Server 2025 or compatible Azure SQL previews, regex works immediately—no special configuration needed.

Example use cases:

  • Data cleansing in regular transformations
  • Validation filters with REGEXP_LIKE
  • Parsing product codes or attributes before Data Product creation

Minimal Working Example (AdventureWorksLT)

-- Remove dashes from ProductNumber
REGEXP_REPLACE([T1].[ProductNumber], '-', '')

Steps in AnalyticsCreator:

  1. Add the table to a regular transformation.
  2. Create a new column (e.g., ProductNumber_NoDash).
  3. Enter the expression above in the Statement field.
  4. Compile and create — the generated view will use the regex function.

Additional Test Scenarios

  1. Email validation

    SELECT EmailAddress
    FROM SalesLT.Customer
    WHERE NOT REGEXP_LIKE(
        EmailAddress, 
        '^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}$', 
        'i'
    );
    
  2. Extract first numeric token

    SELECT ProductNumber,
           REGEXP_SUBSTR(ProductNumber, '[0-9]+') AS FirstNumberToken
    FROM SalesLT.Product;
    
  3. Normalize whitespace

    SELECT AddressLine1,
           REGEXP_REPLACE(AddressLine1, '\s+', ' ', 1, 0, 'm') AS AddressLine1Normalized
    FROM SalesLT.Address;
    
  4. Case sensitivity flags

    -- Case-insensitive
    SELECT Name AS CategoryName
    FROM SalesLT.ProductCategory
    WHERE REGEXP_LIKE(Name, '^hardware$', 'i');
    
    -- Case-sensitive
    SELECT Name AS CategoryName
    FROM SalesLT.ProductCategory
    WHERE REGEXP_LIKE(Name, '^Hardware$', 'c');
    
  5. Replace second dash only

    SELECT ProductNumber,
           REGEXP_REPLACE(ProductNumber, '-', '/', 1, 2) AS ReplaceSecondDash
    FROM SalesLT.Product;
    
  6. Capture group extraction

    SELECT ProductNumber,
           REGEXP_SUBSTR(ProductNumber, '^([A-Z0-9]+)-([A-Z0-9]+)-([A-Z0-9]+)', 1, 1, '', 2) AS ColorToken
    FROM SalesLT.Product;
    

Environment Notes

  • Requires SQL Server 2025 (preview) or compatible Azure SQL instance.
  • Default regex is case-sensitive—use i for case-insensitive mode.
  • Patterns follow PCRE-like syntax per Microsoft documentation.

Performance Guidance

  • Regex can be slower than LIKE or PATINDEX for simple patterns.
  • Consider persisting results for large datasets.
  • Index parsed/cleaned columns if used in joins or filters.

Suggested End-to-End AnalyticsCreator Test Plan

  1. Test regex functions on SQL Server 2019, 2022, 2025 preview, and Azure SQL.
  2. Create AnalyticsCreator transformations for each regex function.
  3. Persist transformations and validate data integrity.
  4. Test historization with normalized keys.
  5. Benchmark performance on large datasets.

Closing Thoughts

Native regex support in SQL Server 2025 enables AnalyticsCreator users to integrate powerful pattern matching, cleansing, and parsing directly into transformation pipelines—without additional setup. This opens new opportunities for data quality enforcement and feature extraction directly in the data warehouse layer.