Native Regex in SQL Server 2025 and How It Flows Straight Through AnalyticsCreator

Executive Summary
SQL Server 2025 (public preview) introduces native regular expression (regex) functions in T-SQL: REGEXP_LIKE
, REGEXP_REPLACE
, and REGEXP_SUBSTR
. Since AnalyticsCreator generates standard T-SQL scripts for transformations, these functions work transparently—define a column expression with a regex function, compile to your target database, and run.
What’s New in SQL Server 2025
REGEXP_LIKE(string, pattern [, flags])
— Boolean match testREGEXP_REPLACE(string, pattern [, replacement [, start [, occurrence [, flags]]]]])
— Replace matchesREGEXP_SUBSTR(string, pattern [, start [, occurrence [, flags [, group]]]]])
— Extract substrings
Supported flags:
i
— Case-insensitivem
— Multi-line modes
— Dot matches newlinec
— Case-sensitive (default)
Why This Matters for AnalyticsCreator Users
AnalyticsCreator compiles calculated columns, manual SQL, and scripted transformations to native SQL. On SQL Server 2025 or compatible Azure SQL previews, regex works immediately—no special configuration needed.
Example use cases:
- Data cleansing in regular transformations
- Validation filters with
REGEXP_LIKE
- Parsing product codes or attributes before Data Product creation
Minimal Working Example (AdventureWorksLT)
-- Remove dashes from ProductNumber
REGEXP_REPLACE([T1].[ProductNumber], '-', '')
Steps in AnalyticsCreator:
- Add the table to a regular transformation.
- Create a new column (e.g., ProductNumber_NoDash).
- Enter the expression above in the Statement field.
- Compile and create — the generated view will use the regex function.
Additional Test Scenarios
-
Email validation
SELECT EmailAddress FROM SalesLT.Customer WHERE NOT REGEXP_LIKE( EmailAddress, '^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}$', 'i' );
-
Extract first numeric token
SELECT ProductNumber, REGEXP_SUBSTR(ProductNumber, '[0-9]+') AS FirstNumberToken FROM SalesLT.Product;
-
Normalize whitespace
SELECT AddressLine1, REGEXP_REPLACE(AddressLine1, '\s+', ' ', 1, 0, 'm') AS AddressLine1Normalized FROM SalesLT.Address;
-
Case sensitivity flags
-- Case-insensitive SELECT Name AS CategoryName FROM SalesLT.ProductCategory WHERE REGEXP_LIKE(Name, '^hardware$', 'i'); -- Case-sensitive SELECT Name AS CategoryName FROM SalesLT.ProductCategory WHERE REGEXP_LIKE(Name, '^Hardware$', 'c');
-
Replace second dash only
SELECT ProductNumber, REGEXP_REPLACE(ProductNumber, '-', '/', 1, 2) AS ReplaceSecondDash FROM SalesLT.Product;
-
Capture group extraction
SELECT ProductNumber, REGEXP_SUBSTR(ProductNumber, '^([A-Z0-9]+)-([A-Z0-9]+)-([A-Z0-9]+)', 1, 1, '', 2) AS ColorToken FROM SalesLT.Product;
Environment Notes
- Requires SQL Server 2025 (preview) or compatible Azure SQL instance.
- Default regex is case-sensitive—use
i
for case-insensitive mode. - Patterns follow PCRE-like syntax per Microsoft documentation.
Performance Guidance
- Regex can be slower than
LIKE
orPATINDEX
for simple patterns. - Consider persisting results for large datasets.
- Index parsed/cleaned columns if used in joins or filters.
Suggested End-to-End AnalyticsCreator Test Plan
- Test regex functions on SQL Server 2019, 2022, 2025 preview, and Azure SQL.
- Create AnalyticsCreator transformations for each regex function.
- Persist transformations and validate data integrity.
- Test historization with normalized keys.
- Benchmark performance on large datasets.
Closing Thoughts
Native regex support in SQL Server 2025 enables AnalyticsCreator users to integrate powerful pattern matching, cleansing, and parsing directly into transformation pipelines—without additional setup. This opens new opportunities for data quality enforcement and feature extraction directly in the data warehouse layer.