August '25 - Fabric GA features

Well, it looks like my predictions as to what Microsoft would do didn't happen. What is normally a quite release period given summer holidays has been pretty full on and we got an August release drop as well.

Fabric platform

New view in deployment pipelines

Sadly this is about as much as we get in the CI/CD space for Fabric - and it doesn't address what is currently the weakest part of the platform in my opinion (I've been saying this for a while now).

All this one does is add a new view to make it easier to select components across multiple folders.

Great if you are an SME with a small enough team to make deployment pipelines work for you, but it doesn't address the complexities that come with enterprise grade deployments still.

API specifications

For those using the Fabric APIs, we now have a Github repository that contains all the specifications for the available APIs.

Data Engineering

Auto-scale billing for spark

I know a number of people haven't been happy with the SKU based licensing approach to data engineering jobs - and would rather have billing being less predictable to gain the ability for jobs to not be capped (despite boost and smooth).

For those, this feature going GA now means they have a choice. For Spark workloads you are now able to move those from being included within the SKU licensing to being billed on a PAYG basis. It also means that production spark engines can be isolated from being impacted by other workloads being run - just you have the right spend controls setup within your Azure Landing Zone, otherwise you might end up with an unexpectedly large bill.

Job bursting control for data engineering workloads

This one provides more functionality to limit the impact that bursting has. Within the admin portal at the capacity setting level, you can now set bursting to be:

Enabled (Default) - Allows Fabric to burst the processing power to up to 3x the paid for CUs and subsequently smooths out the additional consumption to recoup the costs
Disabled - CUs are capped at the base capacity. Meaning that whilst processes are more likely to be throttled, smoothing won't occur when they are cancelled. This means that we should see less capacities locked out by heavy users running a single process
Auto-scale billing - Pure PAYG billing for spark compute only

The associated learn page details some use cases and recommendations. These recommendations are leaning on the idea that we need to be thinking about the type of workload we want to run and the user group that will be running the task.

It seems to be suggesting that we should be considering:

Auto-scaling capacity for automated production Lakehouse/Warehouse
Separate capacity without auto-scaling for interactive user functions such as Power BI/data science/etc
Using auto-scale billing when a PAYG Lakehouse is the architectural preference

It's not really a surprise, as it aligns with the best practice I've seen of having at least 3 capacities. That covers:

Dev/test/pre-prod. This is a must. I've seen too many people try to get away with a single capacity and then are shocked when dev work takes down the production workloads as the capacity has been fully utilised.
A production capacity. This is focused on regularly running workloads (e.g. loading into a Lakehouse, extracts to semantic models, production models, etc) that don't need human interaction. From the notes above, this is the one that will run with Auto-scaling.
A ad-hoc capacity for interactive workloads. From the above, that means things like Data Science notebooks that are being developed, Power BI reports, etc. This capacity will be running without auto-scalling to increase concurrency, and stop the capacity being taken down for an extended period by a single user.

All in all a promising, if not potentially expensive, start to resolving the lack of guardrails around capacities. What I'd now like to see is the ability to flag production critical workloads to prioritise capacity resources, and the ability to predict capacity consumption without executing a job - including stats on predicted capacity utilisation % based on current workloads.

I really want to understand if my job could take down a capacity before I run it - kind of the way that predicted execution plans in SQL server used to help me identify potential bottlenecks in my execution plans before I ran the code. Ok they won't be perfect, but even an indication is better than not knowing, having to kill the capacity, and taking a potentially big hit on billing that I wasn't expecting.

Enhanced monitoring for Spark high concurrency

If you are running your Spark notebooks in high concurrency mode, we now have improved logging at multiple levels. Ensure you check out this blog for more info.

Refresh SQL analytics endpoint metadata REST API

This one dropped at the end of July, but after the July release notes. Personally it's one that I think is helpful to see.

Sometimes I've found that metadata changes aren't reflected in the SQL endpoint and you have to wait till they get pulled through. Now we can use the API to trigger a SQL analytics endpoint meaning to speed that process up.

Whilst it's great to have these workarounds, I wish that Microsoft would get on and sort out the issues with the end point rather than providing us APIs to get around some of the common issues.

Personally, given the differences between the default semantic model and building your own semantic model (e.g. DRLS, etc), I tend to avoid the SQL endpoint most of the time.

Audit log (CRUD operations) Naming Simplification

For those that have built custom solutions over Fabric audit logs, be aware that Microsoft are moving to camel case instead of friendly names. Given this has happened, I suspect most have seen these systems break already, but if not do make sure you go in and make the changes.

SHOWPLAN_XML set statement

We can now view the query plan within SQL server management studio. For those that aren't used to the dark arts of query tuning, it may not seem a big deal. But let me tell you, your DBAs will be breathing a sign of relief.

The old school part of me that used to tune a lot with these things is personally glad to see it back. Whilst it's disappointing it has taken this long, I get given how extensive the overhaul of Synapse was (I heard a Microsoft rep call it open heart surgery equivalent) I get why it wasn't in the MVP.

Activator and RTI updates

From what I can tell in the logs, it seems like we have a bunch of mostly UI changes for Activator. Personally, I get very few RTI requirements - so if this does impact you, please do jump in and check out the release notes for more info on the updates.

The biggest change that has been announced is hot cache's for event house shortcuts, meaning that data in a specific window or within a certain age can now be flagged for additional performance considerations. The result is that this should help improve performance for those ingesting data from an Event house into a Lakehouse.

Easily manage pipeline triggers

Triggers are now managed through a new UI panel - pretty much mirroring how Synapse used to function.

Data pipelines being renamed

If you aren't used to it by now, it's pretty common for Microsoft to release things and then rename them later. This time it's the turn of data pipelines, that are now known as pipelines.

Converting existing Dataflow Gen2 to CI/CD enabled Gen2

The functionality to automatically convert existing Gen2 dataflows to gain Ci/CD support is now available. Meaning you should go in and click the three dots on all your existing dataflows to get CI/CD support.

Dataflow Gen2 integrated validation and run history

Now if you are editing a Gen2 Dataflows (with CI/CD support only), you don't have to leave the Dataflow to see the run history. This is a great update as the UX when debugging was far from ideal before.

Dataflow Gen2 'Get Data' support in copilot

For those using Dataflow's to reduce the technical entry point, this can be made even easier now by using Copilot to connect to a data resource. Meaning that the Get Data Wizard launches a step or two into the process automatically (depending on the prompt provided).

Search This Blog

The Data Engine Room