Share via


Self-serve analytics and insights

Self-serve analytics and insights refer to data, tools, and platforms that enable business users to access, analyze, and generate insights from data independently. Microsoft Purview data governance publishes a domain model of metadata into Microsoft Fabric OneLake and Microsoft Azure Data Lake Storage Gen2, so you can analyze and generate insights by bringing your own tools and compute. Self-service analytics of data governance metadata is valuable in driving continuous improvement of your data estate health management and fostering a data-driven culture across your organization by democratizing access to data estate insights and health management.

Key components

  • Data Model: 3NF Model with domains and dimensions details.
  • Metadata: Data governance metadata that includes:
    • Governance domains
    • Data products
    • Data assets
    • Glossary terms
    • Subscription request
    • Data quality rules
    • Dimensions
    • Data quality facts (pass and fail counts)
    • Data health actions (including data quality actions)
    • And many more

Benefits

  • Empowerment: Empowers data practitioners, data product owners, data stewards, and analysts to explore data governance metadata and link metadata from various sources to derive insights.
  • Flexibility and Efficiency: You can create custom reports in addition to out-of-the-box reports in health management.
  • Agility: Allows your organization to respond more swiftly to health management issues and remediation.
  • Cost-effective: Reduces the need for setting up platforms and build tools. All data is available in OneLake and you can use available tools (Fabric semantic model, Microsoft Power BI reporting, data flow, and notebook) in OneLake.

Currently available reports (out-of-the-box)

Here are the available out-of-the-box reports. You can't customize these reports.

  • Classic assets - an overview of assets by type and collection, and their curation status.
  • Classic catalog adoption - to understand at a glance how Unified Catalog is being used. your glossary, providing a snapshot of terms and their status.
  • Classic classifications - an overview of assets classified and the types of classifications.
  • Classic data stewardship - an overview of assets classified and the types of classifications.
  • Classic glossary - health and use of glossary terms.
  • Classic sensitivity labels -an overview of assets that have sensitivity labels applied and the types of labels applied.
  • Data governance - The data governance health report allows your team to track your health progress at a glance, and identify areas that need more work.
  • Data quality health - Data quality dimensions and data quality rules performance reporting.

Data model for self-serve analytics metadata

The 3NF domain model is part of the normalization process in relational database design. It ensures that the database is free of redundancy and update anomalies. A database schema is in the Third Normal Form if it meets the requirements of the First and Second Normal Forms and all its attributes are functionally dependent only on the primary key. Use the 3NF domain model to structure data in a way that minimizes duplication and ensures data integrity. It focuses on breaking down data into smaller, related tables where each piece of information is stored only once.

Characteristics:

  • Elimination of Transitive Dependencies: Nonkey attributes shouldn't depend on other nonkey attributes.
  • Logical Grouping: Data is logically grouped into domains based on its function or meaning.
  • Entity-Relationship Diagrams (ERDs): Commonly used to represent 3NF domain models, showing how entities relate to each other.
Table name What's stored in the table Relationship keys
Access Policy Provisioning State Information about provisioning state ProvisioningStateId
Access Policy Resource Type Access policy resource information ResourceTypeId
Access Policy Set Overall information about access policy, policy use case details, and where the policy applied one AccessPolicySetId (UniqueId), ResourceTypeId (FK), ProvisioningStateId (FK)
Business Domain* Business domain (called "Governance Domain" in Unified Catalog) name, description, status, and ownership details Parent Business Domain ID (FK), Created By User ID (FK), Last Modified By User ID (FK)
Classification Data asset classification information ClassificationId
Critical Data Column Store glossary terms associated to data asset column in this field Critical Data Column Id, Business Domain Id, Asset Id
Critical Data Element Critical data elements of the data assets Critical Data Element Id
Custom Access Use Case Access use case information AccessPolicySetId
Data Asset Data asset name, description, and source information DataAssetId (UniqueId), DataAssetTypeId (FK), CreatedByUserId (FK), LastModifiedByUserId (FK)
Data Asset Column Data asset column name, column description, and references DataAssetId (FK), ColumnId (Unique), DataAssetTypeId (FK), DataTypeId (FK), Created By User Id (FK), Last Modified By User Id (FK)
Data Asset Column Classification Assignment Data classification assignment related reference keys DataAssetId (FK), ColumnId (FK), ClassificationId (FK)
Data Asset Column Critical Data Element Assignment Data asset column associating with critical data element ColumnId
Data Asset Domain Assignment Data asset governance domain assignment related information DataAssetId (FK), BusinessDomainId (FK), AssignedByUserId (FK)
Data Asset Owner Data asset owner information DataAssetOwnerId
Data Asset Owner Assignment Data asset owner assignment information DataAssetId, DataAssetOwnerId
Data Asset Type Data Type Data asset type information DataTypeId (UniqueId), DataAssetTypeId (FK)
Data Product Data product name, description, use cases, status, and other relevant information DataProductId (UniqueId), DataProductTypeId (FK), DataProductStatusId (FK), UpdateFrequencyId (FK), CreatedByUserId (FK), LastUpdatedByUserId (FK)
Data Product Asset Assignment Data product and data asset assignment information DataProductId, DataAssetId
Data Product Business Domain Assignment Data product and governance domain assignment information DataProductId (FK), BusinessDomainId (FK), AssignedByUserId (FK)
Data Product Critical Data Element Assignment Critical data element assignment to data product DataProductId
Data Product Documentation Data product documentation reference information is stored in this table. DataProductId, DocumentationId
Data Product OKR Assignment Assigned objectives and key results to data product Data Product Id, Objective Id, Key Result Id
Data Product Owner Data product owner information DataProductId, DataProductOwnerId
Data Product Status Data product status (like published or draft) related information DataProductStatusId
Data Product Terms Of Use Data product usage terms information DataProductId, TermOfUsedId, DataAssetId
Data Product Type The information about data product types, such as master, reference, or operational DataProductTypeId
Data Product Update Frequency The information about how often this data product's data are updated UpdateFrequencyId
Data Quality Asset Rule Execution Data quality scanning results RuleId (FK), DataAssetId (FK), JobExecutionId (FK)
Data Quality Job Execution Data quality job execution status JobExecutionId (UniqueId)
Data Quality Rule Information about data quality rules RuleId (UniqueId), RuleTypeId (FK), BusinessDomainId (FK), DataProductId (FK), DataAssetId (FK), JobTypeDisplayName (FK), RuleOriginDisplayName (FK), RuleTargetObjectType (FK), CreatedByUserId (FK), LastUpdatedByUserId (FK)
Data Quality Rule Column Execution Information about data quality rules pass and fail count, data quality score in columns level, and data quality job execution details RuleId (FK), DataAssetId (FK), ColumnId (FK), JobExecutionId (FK)
Data Quality Rule Type Data quality rule type and associated dimensions RuleTypeId (UniqueId), DimensionDisplayName (FK)
Data Subscription Request Information about data subscribers, policies applied, subscription request status, and other relevant information SubscriberRequestId (UniqueId), SubscriberIdentityTypeDisplayName (FK), RequestorIdentityTypeDisplayName (FK), RequestorStatusDisplayName (FK)
Glossary Term Information about glossary term, description, and overall status of the glossary term GlossaryTermId (UniqueId), ParentGlossaryTermId (FK), CreatedByUserId (FK), LastModifiedByUserId (FK)
Glossary Term Business Domain Assignment Information about glossary term governance domain assignment and statuses GlossaryTermId (FK), BusinessDomainId (FK), AssignedByUserId (FK), GlossaryTermStatusId (FK), CreatedByUserId (FK), LastUpdatedByUserId (FK)
Glossary Term Critical Data Element Assignment Associating glossary term with critical data element GlossaryTermId
Glossary Term Data Product Assignment Information about glossary term data product assignment GlossaryTermId (FK), DataProductId (FK), AssignedByUserId (FK), GlossaryTermStatusId (FK), CreatedByUserId (FK), LastUpdatedByUserId (FK)
Health Action Information about data governance and data quality actions ActionId, TargetEntityId, CreatedByUserId
Health Action Finding Type Data health actions Finding Types FindingTypeId
Health Action Finding Sub Type Data health actions Finding Sub Types FindingSubTypeId, FindingTypeId
Health Action User Assignment Data health actions User Assignment information ActionId, AssignedToUserId
Key Result Key result of data product objective, including details description Key Result Id, Objective Id
Objective Objective of the data product contains objective Id, objective description, and objective status Objective Id
Policy Set Approver Policy set and the approver information SubscriberRequestId (FK), AccessPolicySetId (FK), ApproverUserId (FK)
Relationship Information about source type and target information. The relationship of glossary terms association with assets and columns is available in this relationship table. AccountId, SourceId, TargetId

* Business Domain was renamed "Governance Domain" in the new Unified Catalog user experience, but it isn't renamed in the physical model in the database. The name change doesn't affect the usage scenario described here for Business Domain.

This diagram shows the entity relationship for the domain model described in the preceding table: Screenshot to browse ERD.

Subscribe Unified Catalog metadata to Fabric OneLake

Subscribe to Microsoft Purview data governance metadata for analytics and get insights by following these steps:

  1. In the Microsoft Purview portal, select Settings, then select Unified Catalog, then select Solution integrations (preview).

  2. Make sure that you have the Data Governance Administrator role (learn more about roles and permissions).

  3. Select Edit.

  4. Add Storage type and Enabled the setup.

  5. Add Location URL (example: https://onelake.dfs.fabric.microsoft.com/workspace name/lakehouse name/Files/purviewmetadata)

  6. Select Properties to copy URL.

    Screenshot that shows how to configure fabric url 1.

  7. Copy the URL from the Properties page.

    Screenshot that shows how to configure fabric url 2.

  8. Add the folder name at the end of the URL; for example, /DEH, as seen in the preceding image.

  9. Grant contributor accesses to Microsoft Purview Manage Service Identity (MSI) to your Fabric workspace.

  10. Test the connection.

  11. Select Save to save the configuration and publish Microsoft Purview metadata to your OneLake workspace.

Enable managed V-NET

To enable and disable managed virtual network, follow these steps:

  1. In the Microsoft Purview portal, select Settings.
  2. Under Solution settings, select Unified Catalog, then select Solution integrations (preview).
  3. On the Configure storage flyout pane, select the Enable managed V-NET toggle to enable managed virtual network for your storage.

Screenshot that shows how to configure vnet.

  1. Enter the Target Resource ID. For example, the target resource ID is: /subscriptions/e2223ea4f73-6833-439d-b40e 615c78e2d8af/resourceGroups/vnettesting/providers/. To copy a resource ID of your storage, select the Json View of your storage and copy the Json string.
  2. Select Update/Refresh MPE status to see the update. The status changes from Provisioning to Pending.
  3. You need to approve the MPE request in the Microsoft Azure portal to complete the setup. Once you approve the request, the MPE status changes from Pending to Approved.

Screenshot that shows how to approve vNet provisioning request.

Disable the virtual network setup by selecting the Enable managed V-NET toggle to turn it to the off position. If you disable it, then you need to change non-virtual network storage endpoint.

Remove a configured virtual network

To remove a managed virtual network setup, select Delete MPE on the Configure storage flyout pane. This action deletes the virtual network configuration and removes it from the Configure storage pane. The Test connection area reappears, allowing you to test a non-virtual network connection. Test connection is only enabled for non-virtual storage setup.

Managed virtual network configuration for Fabric storage

If your Fabric tenant is configured to run on Virtual Network or in Private Link, you need to select Enable managed V-NET on the Configure storage pane, then add a Private Link Resource ID as listed here: /subscriptions/07d669d6-83f2-4f15-8b5b-4a4b31a3432/resourceGroups/pdgbugbashfabricvnet/providers/Microsoft.PowerBI/privateLinkServicesForPowerBI/fabricvnetpl.

To set up compute for virtual networks, refer to Set up data quality for managed virtual networks. You need to set up the private link for the Fabric tenant before configuring data quality virtual network connection and compute allocation.

Helpful information about virtual network setup

  • Make sure that you're using Location URL or Endpoint from virtual network storage when you're enabling managed virtual network to store Microsoft Purview metadata.
  • To avoid using incorrect resource group, use JSN View to copy “Target Resource ID” from your Azure portal.
  • If you deleted configured virtual network, then make sure that you changed the endpoint or Location URL to use non virtual network storage and test connection for non virtual network storage.

Note

To store data quality error records, create separate folder locations for each domain. This approach organizes and stores domain-specific data quality error records in dedicated folders.

Create a semantic model in OneLake

A semantic model in the context of data and analytics refers to a structured representation of data that defines the meaning, relationships, and rules within a specific domain. It provides a layer of abstraction that helps users understand and interact with complex data by making it more intuitive and accessible, especially in the context of business intelligence (BI) and analytics platforms. A semantic model is always required before any reports can be built. Within the warehouse, a user can add warehouse objects - tables or views to their default Power BI semantic model. They can also add other semantic modeling properties, such as hierarchies and descriptions. These properties are then used to create the Power BI semantic model's tables. Users can also remove objects from the default Power BI semantic model.

To create a semantic model from the Microsoft Purview Data Governance metadata domain model:

  1. Open the Lakehouse in your Fabric workspace.

  2. Use shortcut to create a shortcut of the domain model from OneLake and within OneLake.

    1. Select the Tables ellipsis button (...).
    2. Select New Shortcut and select Microsoft OneLake from New Shortcut sources page.
    3. Select the DomainModel table to shortcut.

    Screenshot that shows how to publish to delta table via shortcut.

  3. Create shortcut directly in table level for all files, which eliminates duplicate records.

    Screenshot that shows how to create table level shortcut.

After you publish all files to delta tables via shortcut, you can add the delta tables to semantic model.

  1. Switch to SQL analytics endpoint page from the Lakehouse page.
  2. Select Reporting from top left corner of the SQL analytics endpoint page.
  3. Select Manage default semantic model.
  4. Select the tables from dbo > Tables that you want to add to the semantic model for reporting.

Screenshot that shows how to add table to semantic model manually.

To add objects such as tables or views to the default Power BI semantic model, select Automatically update semantic model.

Screenshot that shows the self-serve analytics semantic model.

Note

  • You can right-click to add related tables to create relationship.

  • If you prefer not to subscribe Microsoft Purview metadata for self-serve analytics, you can manually disable self-serve analytics (metadata subscription): go to Solution settings > Unified Catalog > Solution Integrations > Self-serve analytics and select the toggle to disable it. You need the Data Governance Administrator role to enable and disable this feature.

Subscribe to Purview catalog metadata to Data Lake Storage Gen2 storage

Subscribe to Microsoft Purview's data governance metadata to publish and store it in your AdlsG2 storage for analytics and insights by following these steps:

  1. In the Microsoft Purview portal, open Unified Catalog.

  2. Select Settings on the left navigation, then under Solution settings, select Unified Catalog.

  3. Select Solution integration, then select Edit.

  4. Select Storage type and Enabled the setup.

  5. Add Location URL. This URL needs to be the Data Lake Storage Gen2 path plus the container name, formatted as Data Lake Storage Gen2 + "/(container name)".

    1. Go to portal.azure.com.
      1. Select your Data Lake Storage Gen2 storage (Home > adlsg2).
    2. Go to Settings > Endpoints and select the primary endpoint of your Microsoft Azure Data Lake storage.
  6. Grant Storage Blob Data Contributor access to Microsoft Purview Manage Service Identity (MSI) for your Data Lake Storage Gen2 container.

  7. Test the connection.

    Browse the model

  8. Select Save to save the configuration and publish the domain model to your Data Lake Storage Gen2 storage.

Note

To store data quality error records, create separate folder locations for each domain. This approach organizes and stores domain-specific data quality error records in dedicated folders.

Review published model and data

  1. Open portal.azure.com.

  2. Select your Data Lake Storage Gen2 storage.

  3. Select the container that you added with the Data Lake Storage Gen2 endpoint in Microsoft Purview.

  4. Browse the list of Delta Parquet files published in the container.

  5. Browse published model and metadata (see the following images).

    Screenshot that shows how to configure subscription connection in adlsg2 container.

    Browse the model files

Create a Power BI report

Power BI is natively integrated in the whole Fabric experience. This native integration includes a unique mode, called DirectLake, for accessing the data from the lakehouse to provide the most performant query and reporting experience. DirectLake is a groundbreaking new capability that allows you to analyze large semantic models in Power BI. With DirectLake, you load parquet-formatted files directly from a data lake without needing to query a data warehouse or lakehouse endpoint, and without needing to import or duplicate data into a Power BI semantic model. DirectLake is a fast path to load the data from the data lake straight into the Power BI engine, ready for analysis.

In traditional DirectQuery mode, the Power BI engine directly queries the data from the source for each query execution, and the query performance depends on the data retrieval speed. DirectQuery eliminates the need to copy data, ensuring that any changes in the source are immediately reflected in query results.

For more information, see how to create a Power BI report in Microsoft Fabric.

Set up the schedule

Data Governance Administrators can set up the self-serve analytics metadata refreshing schedule by following these steps:

  1. In the Microsoft Purview portal, open Unified Catalog.
  2. Select Settings on the left navigation, then under Solution settings, select Unified Catalog.
  3. Select Solution integration, then select the scheduler icon.
  4. On Schedule self-serve analytics, turn on the Enabled toggle, and set the start date, frequency, and end date.
  5. Select Save.

Job history

You can browse Job history by clicking Job history icon located in the self-serve analytics page. You can filter job status Complete or Failed for a date range.

Screenshot that shows the list of job status.

Important

  • Default refreshing cycle is every 24 hours.
  • Purview MSI needs contributor access to your Fabric workspace if you're subscribing Microsoft Purview metadata to publish into your Fabric workspace.
  • Purview MSI needs Storage Blob Data Contributor access to your Azure Data Lake Storage Gen2 if you're subscribing purview metadata to publish into your adlsg2 container.
  • Scheduling the data refreshing job isn't yet supported.
  • Virtual network isn't supported yet.
  • We're only publishing governed assets data. Data assets associated with a data product are categorized as governed assets. Data Map assets that aren't governed won't appear in the self-service analytics data asset table.
  • We have RBAC implemented in the Catalog, ensuring that not all users can view all domains or data products. However, for self-service analytics, we publish all data, allowing anyone with access to this data to view the entire catalog. Access control for self-service metadata depends on where the data is stored—either the Fabric workspace owner or the ADLS Gen2 storage owner can manage access.
  • If you prefer not to subscribe Microsoft Purview metadata for self-serve analytics, you can manually disable self-serve analytics (metadata subscription): go to Solution settings > Unified Catalog > Solution Integrations > Self-serve analytics and select the toggle to disable it. You need the Data Governance Administrator role to enable and disable this feature.