Fabric Mirroring, replacing the E from ETL

What is Fabric Mirroring

Fabric Mirroring

Fabric Mirroring is a data replication solution that offers an out-of-the-box CDC implementation for various sources and consolidates this in a unified data lake. More concretely, Azure SQL Database, Azure Cosmos DB, and Snowflake can be 'mirrored' to your OneLake datalake with Fabric managing the replication and conversion. Both data and schema are mirrored and stored in Delta format in your OneLake. When initiating, Fabric will create an initial snapshot of your data and keep it nearly real-time up to date using transaction logs. This approach simplifies your ETL/ELT processes by centralizing data and making cross-database queries easier. Data engineers and data scientists can access this unified data through their preferred tools, whether Spark, SQL, or Notebooks.

Furthermore, Mirroring is designed with Power BI in mind, particularly its Direct Lake Mode. As data volumes grow, importing into Power BI can become time-consuming, and using Direct Query can slow down reports. Direct Lake Mode addresses these challenges by loading parquet-formatted files directly from a data lake, eliminating the need for querying or duplicating data into a Power BI model. By using Mirroring, your data from Cosmos DB, Snowflake, or Azure SQL Database resides in OneLake in Delta format, enabling the use of Direct Lake Mode in your reporting.

Image
Image1-DirectLakeMode

Image1-DirectLake

Difference with Fabric Shortcuts

A Fabric Shortcut is a mechanism that provides seamless access to external data sources without the need to replicate or move the data into Fabric's OneLake. Instead of duplicating data, shortcuts create a virtual link that allows Fabric to access and work with data stored in external systems like Azure Data Lake Storage, Dataverse, Google Cloud Storage buckets or Amazon S3 directly within the Fabric environment. This approach differs from Mirroring, which involves copying data into OneLake for use within Fabric.

Mirroring guarantees that data is fully replicated and continuously updated within Fabric, supporting high-performance, low-latency analytics. In contrast, Fabric Shortcuts provide a more efficient, on-demand access method that eliminates the need for data duplication. Consequently, operations on large remote datasets can be slower when accessed through a shortcut. These two methods complement each other within the Fabric ecosystem: Mirroring is ideal for scenarios requiring fast, frequent analytics on replicated data, while Shortcuts are perfect for quickly accessing or analyzing external data without the complexity of replication. Additionally, there is currently no overlap between the sources available for Shortcuts and Mirroring, avoiding any potential conflict.

Image
Table 1

Microsoft Fabric Strategic aim

These two approaches enable you to use Fabric as a virtual lake, creating a centralized hub for data management. By leveraging these tools together, you significantly reduce the need for your data team to venture outside of Fabric for analysis, streamlining workflows and improving efficiency.

In the Microsoft Paper 'Open Lakes, Not Walled Gardens' Raghu Ramakrishnan and Josh Caplan mention that the aim is to support mirroring of all Microsoft data sources—including SQL Server versions 2017 and up (on-prem and in VMs), Azure SQL DB, Cosmos DB, MySQL and PostgresSQL—to OneLake, as well as mirroring of a growing number of external sources, including MongoDB, Oracle, Teradata, BigQuery, RedShift and Snowflake. They are also working on an extensible approach for any DW / DB vendor to add their data warehouse/database as a mirrored source to Fabric. Lastly, it is stated that the goal is to support mirroring of OneLake data for customers who wish to run other engines over this data, giving customers the ability to choose their analytics engine.

How to get started

Enable in your tenant

First, Mirroring requires Power BI Premium, Fabric Capacity, or Trial Capacity licensing.

Mirroring is currently in preview and default disabled in your Fabric Tenant. To enable, your admin tenant will have to go to Admin Portal > Tenant Settings > Microsoft Fabric > Users can create and use Mirrored items > ENABLED

Image
Image2-FabricTenantSettings

Image2-FabricTenantSettings

After this create a workspace with Fabric Capacity or premium workspace, this will allow you to create one of the below items in your workspace.

Image
Image3-FabricMirroringOptions

Image3-FabricMirroringOptions

Azure SQL Database

Image
	Image4-FabricMirroringSqlDatabase

Image4-FabricMirroringSqlDatabase

Requirements

Azure SQL Database settings

Enable System Assigned Managed Identity (SAMI) of your Azure SQL logical server. Go to the Azure Portal and locate the Azure SQL Server resource that you want to mirror.

SQL Server > Security > Identity > System assigned managed Identity > Status > ON

Image
Image5-SqlServerIdentityDemo

Image5-SqlServerIdentityDemo

The source SQL server needs to Allow public network access and Allow Azure services to connect.

Image
Image6-SqlSeverNetworkingDemo

Image6-SqlSeverNetworkingDemo

Image
Image7-SqlServerNameDemo

Image7-SqlServerNameDemo

Image
mage8-SqlServerMirroringFabricSettings

Image8-SqlServerMirroringFabricSetting

Azure SQL Limitations

Currently, outside general mirroring limitations, we note the most important limitations specific to Azure SQL.

  • Database level limitations
    • Fabric Mirroring does not support Free, Basic or Standard tier Single Databases (S0, S1, S2) and databases in Elastic pools with max eDTU < 100 or vCore < 1.
    • Azure SQL Database cannot be mirrored if the database has: enabled Change Data Capture (CDC), Azure Synapse Link for SQL, or the database is already mirrored in another Fabric workspace.
  • Permissions in the source database
    • Row-level security is not currently supported for Azure SQL Database configured for mirroring to Fabric OneLake.
    • Object-level permissions, for example granting permissions to certain columns, are not currently propagated from the source SQL database into Fabric.
    • Dynamic data masking settings are not currently propagated from the source SQL database into Fabric.

Network and connectivity security

Below, we note down the most important network and connectivity requirements to set up mirroring.

  • The source SQL server needs to enable Allow public network access and Allow Azure services to connect.
  • The System Assigned Managed Identity (SAMI) of the Azure SQL logical server needs to be enabled and must be the primary identity.
  • The Azure SQL Database service principal name (SPN) contributor permissions should not be removed from the Fabric mirrored database item.
  • Mirroring across Microsoft Entra tenants is not supported where an Azure SQL Database and the Fabric workspace are in separate tenants. 
  • Microsoft Purview Information Protection/sensitivity labels defined in Azure SQL Database are not cascaded and mirrored to Fabric OneLake.
  • Currently, Mirroring doesn't support Azure SQL Database logical servers behind an Azure Virtual Network or private networking. If you have your Azure SQL logical server behind a private network, you can't enable Azure SQL Database mirroring. Currently, you must do one of the following:
    • Update your Azure SQL logical server firewall rules to Allow public network access. You can perform this change via the Azure portal, Azure PowerShell, and Azure CLI.
    • Enable the Allow Azure services option to connect to your Azure SQL Database logical server. You can make this change in the Networking section of Azure SQL logical server in the Azure portal.

Azure Cosmos DB

Image
	Image9-FabricMirroringCosmosDb

Image9-FabricMirroringCosmosDb

Requirements

  • Supported APIs: API for NoSQL

Currently, only Azure Cosmos DB accounts are supported with API for NoSQL. Hence, if you are creating a new resource in Azure, the below setting should be chosen. This cannot be changed after creation.

Image
Image10-CosmosDbAccountDemo

Image10-CosmosDbAccountDemo

  • continuous backup is enabled: When creating a new Azure Cosmos DB account, in the Backup policy tab, choose continuous mode to enable the point-in-time restore functionality for the new account. With the point-in-time restore, data is restored to a new account, currently, you can't restore to an existing account
Image
Image11-CosmosDbNetworking

Image11-CosmosDbNetworking

Networking options are set to public network access for all networks

For this go to Settings > Networking > Public access > Public network access = All networks

Image
	Image12-CosmosDbNetworking2

Image12-CosmosDbNetworking2

Image
	Image13-CosmosDbAccessKeys

Image13-CosmosDbAccessKeys

Image
Image14-CosmosDbFabricSettings

Image14-CosmosDbFabricSettings

Unlike Azure SQL database, the same Cosmos DB database can be mirrored multiple times within a single workspace. To optimize resource usage, a single database copy should be reused across multiple lake houses, warehouses, or other mirrored databases, eliminating the need for multiple mirrors of the same database. Mirroring the same database across different Fabric workspaces or tenants is also supported. Mirroring in Fabric isn't related to Azure Synapse Link. You can continue to use Azure Synapse Link while using Fabric mirroring.

Network and connectivity security

Connectivity Security

Connections to your source database are based on account keys for your Azure Cosmos DB accounts (connection credentials need to be updated for Fabric mirroring if the account keys are rotated). Managed identities, single sign-on, and role-based access control are not used for authentication in mirroring.

Network security

Currently, mirroring doesn't support private endpoints or customer-managed keys (CMK) on OneLake. Mirroring isn't supported for Azure Cosmos DB accounts with network security configurations less permissive than all networks, using service endpoints, using private endpoints, using IP addresses, or using any other settings that could limit public network access to the account. Azure Cosmos DB accounts should be open to all networks to work with mirroring.

  1. Navigate to your Azure Cosmos DB account in the Azure portal.
  2. Ensure that continuous backup is enabled. If not enabled, follow the guide at migrate an existing Azure Cosmos DB account to continuous backup to enable continuous backup. This feature might not be available in some scenarios. For more information, see database and account limitations.
  3. Ensure that the networking options are set to public network access for all networks. If not, follow the guide at configure network access to an Azure Cosmos DB account.

Snowflake

Image
	Image15-FabricMirroringSnowflake

Image15-FabricMirroringSnowflake

Requirements

Snowflake allows both AWS and Azure-based snowflake databases. Fabric doesn't charge for network data ingress fees into OneLake for Mirroring. There are no mirroring costs when your Snowflake data is being replicated into OneLake. However, there are Snowflake compute and cloud query costs when data is being mirrored: virtual warehouse compute and cloud services compute.

Network and connectivity security

Any granular security established in the source Snowflake database must be re-configured in the mirrored database in Microsoft Fabric.

  • Currently, Mirroring does not support Snowflake instances behind a virtual network or private networking. If your Snowflake instance is behind a private network, you cannot enable Snowflake mirroring.
  • Snowflake authentication only via username/password is supported.
  • Sharing recipients must be added to the workspace. To share a dataset or report, first add access to the workspace with a role of admin, member, reader, or contributor.

Pricing

Power BI Premium, Fabric Capacity, or Trial Capacity licensing is required to use mirroring. However, Microsoft is absorbing the cost of the transfer compute. Additionally, they provide free Mirroring storage for replicas up to a certain limit based on the purchased compute capacity SKU you provision (detailed table below). For example, if you purchase F64, you would get 64 free terabytes worth of storage. OneLake storage is billed only when the free Mirroring storage limit is exceeded, or the provisioned compute capacity is paused. To learn more about Fabric Capacity pricing, you can read the following post: Fabric licensing explained. Nonetheless, there are increased compute and query costs related to Mirroring in your source system. Additionally, if the fabric capacity is paused, you will be charged with the storage cost according to OneLake pricing (see table below).

Image
	Image16-FabricCapacity

Image16-FabricCapacity

Best practices

Capacity

As mentioned above, Microsoft is absorbing the cost of the transfer compute and is giving some storage away for free. Therefore, in general, it is advised to use your existing Fabric capacity to set up mirroring, considering you are already paying for the compute. Having auto-pause enabled on this compute is no issue. If the capacity is paused, mirroring is also paused, once the capacity is restarted the mirroring continues where it left off. Be aware, that pausing mirroring is limited to a certain number of days, which depends on the source, e.g. snowflake has a limit of 7 days. Furthermore, when the capacity is paused you are charged with standard storage cost in Fabric (currently $0.023 per GB).

Some exceptions to using your default capacity could be:

  • If you have a source which has very frequent changes and you require the data to come through 24/7. You would require a compute that stays on. It might be advised to have a lower grade capacity specifically to do mirroring instead of a more expensive compute.
  • If you have a source with small and low-frequency changes, it might be worth having a separate capacity with a scheduled start and pause not to pay for high capacity when not required. Capacity start and pause can be triggered through the Azure Rest APIs.

To conclude the best choice of capacity for your use case will depend on the frequency and volume of the data you want to mirror.