Microsoft Fabric - Key build announcements

The first in the series looks at the key announcements that came out of build on the 15th May. This doesn't include anything covered by the monthly build announcement (stay tuned for further blogs on those across Fabric and Power BI). All of this and more is covered off in the new Fabric Roadmap.

At a high level, I'd say overall build left me feeling a bit flat in terms of Fabric announcements. Don't get me wrong, a lot of the preview features below are great. But that's the problem, it's only preview - and honestly I can't recommend using them in production whilst that's the case. We've actually had very few GA announcements. Whilst some might say at least it shows the investment in the platform, my argument is that without knowing when those features will GA, most people don't really care about them.

That's a long winded way of saying we need Microsoft to provide an SLA between features coming into preview and going GA. That will make it much easier to get a bit more excited about these announcements.

With that off my chest and in no particular order, we start with Cosmos DB.

Cosmos DB (Preview)

As I have long known, Cosmos DB is now a native Fabric item. Whilst we don't have any visability on when it will go GA, it does show that Microsoft is committed to moving Fabric from an insight platform to a full operational data platform.

With Cosmos DB in onelake, integrating operational and insight platforms without moving data becomes possible. Making it easier than ever to deliver complex outcomes that improve the customer experience.

Whilst for those delivering new requirements today will not want to use this just yet, for long-term roadmaps, migration should definately be considered at a future date.

See the blog post for more

Digital twin builder (Preview)

The next big feature announced yesterday is digital twin builder. Allowing organisations to create representation of entities like parts, machinery, factories, supply chains, customers, and more.

I'm not going to go into the pro's and con's of digital twins in this blog, nor the features it brings (for those have a look at the blog article or this more detailed blog).

For me, this brings an interesting angle. If this feature makes it easier to build digital twins, will the knowledge graphs that it ultimately creates be able to be used by external systems - such as copilot. If so, this could give us a simpler approach to building out Graph RAG agentic agents (and anything that helps simplify that process is always a good thing).

Definately one to keep an eye on, and not just for those with traditional digital twin use cases.

OneLake shortcut enhancements

Not one, but two shortcut enhancements dropped yesterday. They are:

The last of these is the more interesting of the two. This feature allows you to apply AI transformations developed in AI foundary to a shortcut - meaning that transformation can be applied at the time of selection, rather than having to physically duplicate data. Transofrmations such as summarisation, translation, and document classification can all be applied at ease. Definately one to keep an eye on.

Chat with your data (Preview)

Of all the announcements, this is the big one from a business side. I've spent many workshops chatting to senior directors at big international brands who are frustrated by how long it takes to get insight - and who, when shown the demo from build two years ago, got very excited about the idea of chating with their data.

Now we've seen the original vision being delivered upon. Combine this with support for Fabric data agents, and it is going to help us ensure the results the business saying are trustworthy and accurate.

It's definately one feature to begin testing and getting confidence with the results ahead of going GA.

Native spark execution engine (GA)

Our first GA announcement! As of the 19th the Native execution Engine is now GA. Built upon Apache Gluten and Velox, changing over to using this engine apparentlly needs no code changes or new libraries (yep, I'm suspicious too). Onec enabled, Microsoft are saying that queries are roughly 4x faster queries on 1TB data sets and 6x performance gains on end-to-end jobs. Ultimately meaning that the same job should cost you less than not using it.

It definately needs thorough testing, as I can't believe it has no potential negatives (e.g. is the delta write acceleration offset by an increase in read duration). But it's at least a feature that I can recomend to my clients to put in their backlogs to investigate.

Mirroring improvements (Preview + GA)

We've had an absolute bucket load of improvements that Microsoft have rolled out when it comes to mirroring. Rather than go through one by one, if you have a use case do go check out the posts below:

Mirroring for Azure database for PostgreSQL (Preview)
Improved mirroring for Azure SQL managed instance (Preview). Adding suppport for the on-premise data gateway, tables without primary keys, and additional DDL syntax.
Mirroring for SQL server via on-premise gateway (Preview)
Open mirroring goes GA
More mirroring features. Such as: updates to cosmos mirroring (Preview), customised retention period for mirrored data, and region expansion to West US 3

Dataflow gen2 CI/CD and git improvements (GA)

Pretty much as it says on the tin in that these features are now GA. Personally I avoid dataflows if I can becuase of the limitations/challenges and the low/no-code tax.

But they do have their uses (e.g. Sharepoint integration is a lot simpler), so good to see this box ticked.

Encryption at rest with customer-managed keys (Preview)

Whilst we've had encryption at rest since the early days of the platform, those keys have been managed by Microsoft.

With this feature, we can set the keys the platform uses for encryption at workspace level and, with a 30 minutte lag, revoke access by revoking the key. All whilst integrating with RBAC to still define more granular permissions.

Warehouse snapshots (Preview)

Pretty much what you'd expect this feature to do, it takes a point in time snapeshot of the warehouse at a point in time in the last 30 days (via UI or API). For me, the biggest use case for this is for model development when you might want the data set to not change whilst you are building and testing your model - or to debug that report when an exec says, "why has this number changed?".

It's definately a welcome addition, and one that would have saved me a lot of effort over my career. Just ashame it's in preview again...

Materialised Lake views (Almost, but not quite, in preview)

This one's a coming to a preview near you soon.

Despite my sarcasm because it's not even in preview yet, this one is actually a pretty important feature. Anything that reduces the maintenance overhead of a typical medallian architecture is always a good thing.

Before looking iat them in more detail, we need to understand what they are. They easiest way to think of them is as an anology to the historic materialised view in a warehouse, but for a Lakehouse.

It's created through the Lakehouse notebook, and is only available when defining the cell in SQL. Fabric then uses the lineage to work out a schedule and ensure, even if chained, that when source tables are updated the changes flow through the materialised views.

The big step forward with these is that data quality constraints can be added as part of the implentation by declaring them as constraints that are checked each refresh - with a data quality report being automatically generated.

For me, this is a great feature that plugs a definate gap to date. The sooner we can get hands on the better.

E2E network security (Preview)

This is a combination of features - all of which are in preview today. It covers:

Securing inbound traffic via Azure Private Link support for workspaces.
Secure outbound traffic from spark pools. Ensuring connections can only be made to specific data sources ourtside Fabric
The previously mention encryption with custom keys

For me, the best one is Privat Link support at workspace level. It now means you can hide parts of a solution away from the public facing access point (e.g. Customer data), whilst leaving others (e.g. Power BI workspaces) public facing.

GraphQL mutations (preview)

Following on from a post a couple of days before Build, we now have smart mutations for stored procedures in Fabric databases.

I'm not going to go into this one in much detail, but if you are using Fabric databases with the graphQL API do make sure you check it out.

Copilot inline code completion in Notebooks (Preview)

To use this feature, you'll need to enable Copilot completions in the notebook using the toggle at the bottom of the screen.

Microsoft are saying it's similar to github copilot and trained on millions of lines of code, the model should return relevant options. Personally, I'm a bit sceptical, whilst my experience with these sort of technologies is it returns code that works (occasionally), it isn't as performant as it could be. Again, that means it's probably suitable for junior tasks rather than seniors - but my concern is how does the next generation of programmers get similar levels of understanding as todays seniors?

Again, just because we can use AI doesn't always mean we should - it needs to be with purpose and to support human tasks rather than replace them.

Search This Blog

The Data Engine Room