Microsoft Fabric: A new way to architect semantic models for all data science use cases

Earlier in the year I wrote a LinkedIn post about semantic models. To start this blog, I thought I'd expand upon the post and help those moving from an old school on-premise setup to a modern data platform such as Fabric. For the purposes of this post I'm going to focus on Microsoft houses, but the concepts being discussed are equally relevant for all BI platforms.


Over the years, I've seen a number of clients making decisions that, long term, will increase report and KPI proliferation. Ultimately leading to the age old problem of figures differing across reports and exec's losing confidence in what they see. To understand how they got here, let's role back the clock 10 years, and look at the Microsoft stack at the time. 

It was a time when most organisations hosted their own data platforms and served a lot of reports using SQL Server Reporting Services (SSRS). At that stage, the semantic model wasn't really a concept that was discussed openly. Instead people typically hosted a data warehouse, maybe built a bunch of SQL Server Analysis Server Cubes (SSAS), and plugged SSRS either into the SSAS cubes or directly into their SQL server warehouses. With all the restrictions this setup presented, quite often we'd end up having to duplicate KPIs across SSAS cubes or SSRS reports, and typically built multiple reports with minimal differences.  Job done, but little did we know the problems we would go on to store up for ourselves.

Today, the result of this is we typically see a lot of businesses in the mess of 100s of reports to maintain and KPIs that give different results (or worse, named the same thing but with different meanings). A challenge compounded by lessons not being learnt and solutions being architected in similar ways today.

But technology has changed, surely today's solution has changed? Actually, very little is different. Typically the difference is that those SSAS cubes have been replaced with Azure Analysis Services (AAS) or BI semantic models running as imports. But actually both those technologies are ultimately SSAS hosted in Azure. Meaning that for a lot of businesses, the approach they take today is exactly the same as it was 10 years ago - the only difference being they've moved everything to Azure.

Along the way we've seen products appear such as headless BI that aim to solve the KPI proliferation by adding a centralised layer to contain measure definitions. They sound great on paper. The problem comes because they work as an interoperability layer between your BI tool and underlying warehouse. Typically they will take the BI code and convert it into a format that your underlying warehouse will understand when they fire the request down. The result: you have to invest a lot of DBA time to optimising the underlying warehouse for reporting purposes and typically face a trade off in load vs read performance. 

As we can see, neither of these options are great. But Microsoft Fabric may have found a way around these challenges. To understand how, we need to borrow some core concepts from the data mesh approach.

Within the data mesh approach, we often consider planes and how they interact. They could be an operational plane containing our transactional systems, a reporting plane for all our BI, and more. What happens if we apply this concept to the BI/data analysis space? In that instance we get three clear planes:

  1. The data plane: This contains our underlying data structures. It could be a monolith operational data warehouse, or multiple systems exposing data products for reporting purposes.
  2. The semantic plane: This is the layer that contains our semantic model. In the Microsoft stack this would typical be AAS cubes, or Power BI semantic models.
  3. The data science plane: These are the Power BI, Paginated Reports, Excel documents, or data science models that we build over the semantic model.


By splitting this into three distinct planes, we can begin to consider how Microsoft Fabric can begin to solve these challenges.

Starting with the data plane, Fabric provides the tools you need to get the data in for Power BI (and potentially further use cases). Be it: hosting your Snowflake icebergs, utilising data stored in existing ADLS storage accounts, data replication , providing the tools needed for ELT (extract-load-transform),  or more, Fabric provides the tools to get the data into the platform from your source systems.

Next, the semantic plane is solved through golden semantic models. These could be imports to combine semantic and data planes, direct query to prevent data movement, or Direct Lake mode models to offer the speed of import without copying the data in Fabric. By producing a golden model we begin to remove the risk of KPI proliferation.

Lastly, we have Power BI reports, Paginated reports, and semantic model integration with notebooks to provide the tools needed to deliver for all outcomes across the data science maturity curve.

Having identified, and understood, how Fabric supports these three layers, we move into how we architect these layers in Fabric.

  1. For the data layer, we would look to either use native hosting or importing into Lakehouses/Warehouses/Power BI semantic models depending on our original source. Our goal at this layer is to end up with our data stored in the native parquet format that Power BI needs for Direct Lake mode - and with the right partition strategy in place. Why are we architecting for Direct Lake? By doing this upfront we have the choice of the mode we use for our Semantic mode. If we don't, we're taking on debt that we might need to solve later on should we want/need to use Direct Lake mode.
  2. For our semantic layer, we utilise Power BI semantic models running in either import, direct query, or direct lake mode. For large data sets we begin to run into challenges with import mode and need to use the XMLA import, and for direct query we have the same challenges as headless BI, meaning DirectLake is the right option.
  3. Lastly we use Power BI, Excel, Paginated reports, or notebooks to deliver our outcomes. For notebooks we can write the results back to OneLake so they can be consumed back into our data layer for further use cases.

And the best bit, we have one bill from one company and no need to solve interoperability challenges with Power BI. As you can see, Fabric provides a great way to manage the data we need for Power BI. If Power BI is your tool of choice, have you started a PoC today? If not, now's the time.

Comments

Popular posts from this blog

Workspace topologies in Microsoft Fabric

Microsoft Fabric: semantic models, dynamic row level security, and DirectLake mode