English
AG Automated Azure Big Data Analysis using Data Vault 2.0 by Exxeta
This webinar shows how Exxeta used AnalyticsCreator to implement a Data Vault-based Azure analytics architecture for the New York Taxi dataset. The session covers Azure Blob Storage, Azure Data Factory, Azure SQL Database, Power BI, Data Vault modelling, generated ETL pipelines, tabular model creation, and dashboard visualization.
Questions
- How can AnalyticsCreator automate Data Vault modelling on Azure?
- What Azure services are used in the demo architecture?
- How does Data Vault support large analytical datasets?
- Can AnalyticsCreator generate Azure Data Factory pipelines automatically?
- How is a Power BI tabular model created from AnalyticsCreator?
- How is a Power BI tabular model created from AnalyticsCreator?
Key Takeaways
- AnalyticsCreator supports automated Data Vault modelling.
- The demo uses Azure Blob Storage as the source layer.
- Source files include CSV and Parquet files.
- Azure Data Factory pipelines are generated automatically.
- Azure SQL Database stores the generated data warehouse structures.
- Power BI Services hosts the tabular model and dashboard.
- Data Vault enables scalable and parallel loading for larger datasets.
- The raw vault stores business keys, relationships, and descriptive attributes.
- Data marts and star schemas are still used for BI and reporting.
- AnalyticsCreator generates import tables, views, hash keys, dimensions, facts, and tabular model structures.
- Manual adjustments include defining keys, hiding technical columns, and creating measures.
- Measures can use predefined aggregations or custom DAX statements.
- The final dashboard visualizes New York taxi trips, revenue, trip count, pickup locations, and revenue per distance.
- Collaboration and version management can be supported through repository export, version control, locking, and partial repository export/import.
Transcript
Hello everyone. I will start with a brief introduction to AnalyticsCreator so you have a clear idea of what it does.
AnalyticsCreator is a metadata-driven design application for data warehouse automation. It is designed for experts, but it can also be used by people without deep expert knowledge. Instead of programming everything manually, AnalyticsCreator generates source code from the model.
It supports the full lifecycle of a data warehouse, data lake, and data mart, including design, development, change management, and deployment. It can replace traditional ETL development approaches and help teams deliver results faster and at lower cost.
There are several common use cases for AnalyticsCreator.
The first is a greenfield project, where nothing is in place yet and a new data warehouse needs to be designed and built from scratch. Another common use case is modernisation, where a company already has a data warehouse but wants to improve performance, redesign the architecture, or meet new business and IT requirements.
AnalyticsCreator can also support partial reverse engineering. It cannot fully automate every reverse engineering scenario, but it can help extract and reuse existing code, especially for KPIs, calculations, and loading procedures.
Many of our customers use SAP as a source system and want to move from the SAP BI world to Microsoft. AnalyticsCreator supports this by helping to extract SAP metadata and build data warehouses quickly, either in Azure or on-premise.
Another important use case is moving from an on-premise data warehouse to Azure. We also see more customers asking for real-time or near-real-time data platforms, especially when systems such as Dynamics 365 are already running in Azure.
For partners, AnalyticsCreator can support managed service scenarios across multiple customers. If a partner manages data warehouse environments for several clients, AnalyticsCreator can help organise modelling, change management, and deployment.
Our vision is customer and partner independence. AnalyticsCreator is a pure design-time application, so no runtime component is required. Once the source code has been generated, the solution runs on the customer side without requiring an AnalyticsCreator licence at runtime.
AnalyticsCreator also uses a holistic metadata model that brings together the business model and the technical data model. This is useful when technology changes, for example when moving from on-premise SQL Server to Azure or from Azure SQL to a data lake architecture.
We focus fully on the Microsoft data stack, including SQL Server, Azure, Power BI, and related technologies. At the same time, the repository is open and stored in Microsoft SQL Server, so customers and partners can build their own add-ons or extensions if needed.
Thank you, Peter. My name is Timo, and I work for adesso. We are a mid-sized consulting company based in Germany, and today my colleagues and I will show what we implemented with AnalyticsCreator.
We built a showcase using Microsoft analytics technologies. The solution uses a Data Vault model in an Azure architecture, and at the end we show a Power BI dashboard.
Adesso was founded in 2005 in Germany. We now have more than 1,000 employees across several locations in Germany, Switzerland, and Slovakia. Our Microsoft platform technology team works in data engineering, data science, software development, .NET development, Microsoft 365, and collaboration solutions.
I will give a short introduction to the showcase, explain which technologies we used, and describe the modelling approach. After that, Natalia will walk through the implementation.
Our data engineering team at adesso is building a demo environment to show customers which Azure technologies we work with and which best practices can be used to build a modern data warehouse. We also use this environment to develop our own team skills.
For this showcase, we wanted to show how a classic data warehouse can be built in Azure. The requirement was to pull structured data from Azure Blob Storage, integrate it through Azure Data Factory, store it in Azure SQL Database, and visualise the result in Power BI.
Because we plan to process larger data volumes in the future, we used Data Vault as the modelling technique. To create the ETL process in Azure Data Factory, the data warehouse in Azure SQL Database, and the tabular model in Power BI, we used AnalyticsCreator.
Data Vault is a scalable data warehousing technique. It allows a data warehouse to be built incrementally while reducing the effort needed to add new data sources.
In a Raw Vault, data is stored without modification. Business logic is moved to later layers, which makes the raw data available to other data marts as well.
The main Raw Vault components are hubs, links, satellites, and reference tables. Hubs store unique business keys that are unlikely to change. Links connect hubs and represent relationships or transactions. Satellites store descriptive attributes and metadata, such as where the data came from and when it was valid. Reference tables are used for frequent lookups, such as business descriptions or codes.
Data Vault is not optimised for direct business intelligence reporting, so a star schema is usually built on top of the Raw Vault. Hubs and satellites can become dimensions, while links and link satellites can become facts.
For our use case, we used the New York taxi dataset. We wanted to analyse taxi trips in New York City, including trip volume, trip duration, peak times, pickup locations, and revenue.
The dataset is large, around one terabyte, so it is a good use case for demonstrating a scalable Azure-based architecture.
We used several Azure components. Azure Blob Storage holds the source data, which consists of CSV and Parquet files. Azure Data Factory is used for the ETL process. The data is stored in Azure SQL Database, and the tabular model is created in Power BI. The dashboard is then built on top of that model.
First, we create a connection to Azure Blob Storage. This requires the storage account name and key. Then we create the sources by defining the connector, the file path, and the column structure.
AnalyticsCreator can generate column names and data types automatically. We only need to define which columns should be used as primary keys.
The staging layer is used to load data quickly from the source system with minimal transformation. The import tables and import views are created automatically by the Data Warehouse Wizard. Since we selected a Data Vault model, AnalyticsCreator also creates hash keys for each table automatically.
In the fact table, hash keys are created for each foreign key as well. This is important for the Data Vault model.
The persisted staging layer is used for historization. Each dataset receives Valid From and Valid To columns, which show whether the dataset is current or historical.
The next layer is the core layer. This is where the business layer begins. At this point, we move from the Data Vault model to a star schema. We create views and predefined dimensions for date and time. The main manual task is defining which keys are used to join tables together.
The data mart layer is the tabular model that will be deployed to Power BI. The objects in this layer are generated automatically, but we made some manual adjustments. For example, from one Location dimension, we created separate pickup and drop-off location dimensions. We also decided which columns should be shown or hidden in the tabular model.
Once the model is ready, deployment takes place. During deployment, the database and its structure are created automatically. All objects from the staging layer to the data mart layer are created in the data warehouse.
At first, only the database structure is created, so the tables are empty. The next important step is the ETL process. We used Azure Data Factory pipelines, and these were generated automatically as well.
The main pipeline imports data from the source files into the import tables. After that, historization runs. The fact and dimension processes can run in parallel, which reduces processing time.
The final deployment step is the Power BI tabular model. The dataset is created in Power BI, and we use it to build the dashboard. In the dashboard, we analyse revenue, trip count, trip revenue, revenue per distance, daily patterns, and popular pickup locations in New York.
My conclusion is that AnalyticsCreator is a powerful application and relatively easy to use. Many predefined functions make the work simpler. The main benefit is that the database, ETL pipelines, and tabular model for analytics are created automatically. In my view, this saves time and money.
In the Q&A, we also discussed collaborative work and CI/CD. AnalyticsCreator provides several options for collaboration. The repository is a database containing the full data warehouse definition. It can be stored as a text file and managed in a version control system. Teams can also lock parts of the repository, export parts of a data warehouse, and import them into another repository.
Thank you to the Adesso team for the presentation. If you are considering a Data Vault project, Adesso is a strong AnalyticsCreator partner with a good approach to Data Vault modelling. Thank you for joining us, and I hope to see you again soon.