English
AnalyticsCreator Congress 2022 | Modernization of a Data Warehouse | by Kulmbacher Brauerei AG
This webinar shows how Kulmbacher’s data warehouse team generates AnalyticsCreator repository objects automatically from existing SQL Server metadata. The session demonstrates how they create connectors, source layers, staging, persisted staging, core layers, raw SAP views, friendly names, row-level security logic, and deployment packages through SQL scripts instead of manually designing thousands of repository objects.
Questions
- How can AnalyticsCreator repository objects be generated automatically?
- How can SQL Server metadata be used to create AnalyticsCreator objects?
- How does Kulmbacher handle SAP metadata in its data warehouse?
- How are staging and persisted staging layers generated?
- How can friendly names and row-level security be added automatically?
- How are SSIS packages and deployments generated from metadata?
Key Takeaways
- Kulmbacher uses SQL scripts to generate AnalyticsCreator repository objects from existing metadata.
- The approach avoids manually designing thousands of repository objects.
- The team has a SQL Server-based data warehouse with many SAP ECC source tables.
- Around 80 percent of tables in the data warehouse are automatically generated from SAP metadata.
- The existing SQL and SAP mirror solution provides raw SAP data and historization.
- AnalyticsCreator can be extended through direct repository generation.
- The generated repository includes connectors, sources, staging, persisted staging, core layers, and deployment definitions.
- Friendly names are used to standardize SAP column naming for business users.
- Row-level security expressions can be generated from metadata.
- Persisting procedures and SSIS package structures can be generated automatically.
Transcript
We are from Kulmbacher Brewery. I am here with Alexander Zeidler, and our goal is to avoid designing thousands of AnalyticsCreator repository objects manually.
To do this, we developed SQL scripts that generate objects directly in the AnalyticsCreator repository database, using our own data warehouse metadata.
Our IT data warehouse team is small. It consists of myself, one colleague, and external support from specialists such as Alexander.
A lot of knowledge sits outside IT, especially in the business departments. Many colleagues understand the data very well, can design cubes, and can write queries. Our challenge is to translate those business needs into a technically sound and maintainable data warehouse structure.
My background is in computer science, with around 20 years of experience in Microsoft SQL Server, C#, SQL, backend technologies, and SSIS. Over the last 12 years, I have mainly worked with SQL Server Integration Services. We also use our own templating framework, called DSQLT, and have developed software to support communication between SQL Server and SAP ECC.
I am Alexander Zeidler, a developer at Oberpfalz Business Intelligence, a small BI consultancy in Upper Franconia.
We specialise in the Microsoft BI stack, so we are familiar with the backend side of AnalyticsCreator. I work as a senior developer and consultant, mainly with Microsoft BI solutions, T-SQL, stored procedures, tables, and small frameworks for different customers.
Our data warehouse consists of two main parts.
The first is a relational SQL Server database that provides ERP data, mostly SAP ERP data, to non-IT departments for reporting and to other systems for programmatic use.
Around 80 percent of the tables in this database are generated automatically from SAP metadata. This includes staging, basic, and core layers, as well as the procedures that move data through the layers. More complex logic is still written manually, but it is executed automatically within the process.
SAP contains many tables and columns, but our data warehouse only includes the columns that are actually needed.
We also use friendly names to make the data easier to understand. For example, the same business concept may have different column names in SAP, but in the data warehouse it should always have one consistent name, such as “item number”.
The second main part is our SQL and SAP mirror.
This is a self-developed SQL Server-based solution that mirrors SAP data and provides raw SAP data to other systems. It includes historization by detecting changes through timestamps and identifying deleted records. We also use queries that provide only the latest changed data incrementally.
To load data from SAP into the mirror, we use several approaches. The main approach is automatically generated SSIS packages using the Theobald connector.
For some large transactional tables, such as BSEG and BKPF, we also use ABAP programs. This is necessary because the timestamp required to detect changes is stored in the header table, so the join is more efficient inside SAP than outside it.
We want to evolve our data warehouse because complexity is increasing, our own solution requires a lot of development time, and we do not have a user interface for configuration and design tasks.
After evaluating several tools, we found AnalyticsCreator to be the best fit. It can run on-premise, is cloud-ready if we decide to move later, avoids vendor lock-in, and fits well with our Microsoft BI skills.
We are still in the trial phase, but we believe we are ready to move our development to AnalyticsCreator.
We identified two main evaluation scenarios.
The first is generating most of the AnalyticsCreator metadata repository automatically through programming. The second is potentially replacing our SAP mirror or existing data warehouse database solution.
This presentation focuses on the first scenario.
For the second scenario, we would need to load more than 1,000 SAP tables each night.
Today, the mirror handles this. In the future, AnalyticsCreator could potentially take over this process once we have more experience with SAP extraction performance.
One advantage is the AnalyticsCreator team’s strong knowledge of the Theobald connector, as well as the flexibility to configure SSIS packages and potentially integrate our own ABAP programs into generated workflows.
One remaining challenge is what we call “speed tasks”.
We have many small SAP tables with fewer than 100 records. SSIS introduces overhead when opening a connection, loading a small amount of data, and closing the connection.
In one case, loading 40,000 records from around 1,000 tables took three hours. Our C# program loaded the same data in about two minutes. This is important, but not the focus of today’s demo.
The focus today is how we use our own metadata to generate AnalyticsCreator repository objects automatically.
We already have metadata in our own database, and we transformed it to fit the AnalyticsCreator repository requirements.
The generation logic is written in T-SQL and stored in a database called AC Repository Generator. It contains schemas for different tasks, including repository generation, metadata access, system logging, configuration, and DSQLT templating. Connections to other databases are encapsulated through synonyms.
For the demo, we use only a subset of the full data warehouse because generating everything would take around 10 minutes. We selected all tables related to the SAP material data model, resulting in around 100 tables.
The metadata includes SAP tables such as MARA, descriptions, source groups, staging import tables, SSIS package folders, package variables, transfer grouping, persisting modes, friendly names, row-level security expressions, filters, AnalyticsCreator macros, and workflow grouping for nightly processing.
At column level, we store data types, descriptions, friendly names, generation flags, and joins.
The process starts by creating an empty AnalyticsCreator repository database, which initially contains only predefined macros.
We then back up the metadata database and restore it with a new name, giving us a new repository that AnalyticsCreator can connect to.
Next, we create the macros and schemas needed to organise the model. We then generate the first layer, including connectors, sources, columns, references between source tables, and the columns needed for those references.
We then generate the staging layer, including tables, columns, transformations, packages, variables, and references between tables and sources.
After that, we create the persisting layer. This requires synchronising the repository database with a structural copy of the data warehouse so AnalyticsCreator can collect additional information such as source column types and dependencies.
Once the persisting layer is created, we define the transformations from staging to persisting and set the incremental mode.
For Active Directory data, for example, we use a swap-full approach. For SAP tables such as MARA, we use incremental insert-only because our SAP mirror can provide only new, changed, or deleted data.
For each persisting table, we generate a pre-script that is executed before the persisting procedure.
AnalyticsCreator supports historization, but we chose to use our own persistence mechanism because our SAP mirror already provides historical data. This means we can use AnalyticsCreator’s generated procedures without being forced to use the built-in historization approach.
Next, we create layers for user access.
The raw layer exposes the SAP structure as it is, which is useful for power users and business consultants who know SAP well.
The core layer uses friendly names, such as “material number”, and can also include row-level security based on organisational structures such as house organisation or sales organisation.
Finally, we create the deployment definition. This defines which packages are called by which workflow.
The main deployment generates all packages, while additional deployments can call only selected parts of the process. We also generate scripts to determine the maximum transfer ID for incremental loading and set package variables in the SSIS configuration tables.
The generated packages cover data import, persistence, and transformations.
What we show here is only a small part of our data warehouse because we restricted the demo to tables related to material data. However, the same approach applies to the full environment.
This approach allows us to create an AnalyticsCreator repository using code instead of manually creating thousands of objects. For a repository with thousands of tables and many related objects, manual creation would take weeks.
With this method, we can generate the repository much faster, more consistently, and in a repeatable way.