Share via


Set up data quality for Google BigQuery (preview)

Supported capabilities

When scanning a Google BigQuery source, Microsoft Purview supports:

  • Extracting technical metadata including:
    • Projects and datasets.
    • Tables including the columns.
    • Views including the columns.
  • Fetching static lineage on assets relationships among tables and views.

When setting up a scan, you can choose to scan an entire Google BigQuery project. You can also scope the scan to a subset of datasets matching the given names or name patterns.

Known limitations

  • Currently, Microsoft Purview only supports scanning Google BigQuery datasets in a US multiregional location. If the specified dataset is in another location, such as "us-east1" or "EU," the scan completes but no assets appear in Microsoft Purview.

  • When you delete an object from the data source, the subsequent scan doesn't automatically remove the corresponding asset in Microsoft Purview.

Configure Data Map scan to catalog Google BigQuery data in Microsoft Purview

Register a Google BigQuery project

  • In the Microsoft Purview portal, select Data Map, then select Register.
  • At Register sources, select Google BigQuery, then select Continue.
  • Enter a Name for the data source that appears in the catalog.
  • Enter the ProjectID. This value should be a fully qualified project ID. For example, mydomain.com: myProject.
  • Select a collection from the list.
  • Select Register.

Set up a Data Map scan for Google BigQuery project

  • Make sure a self-hosted integration runtime is set up. If it isn't set up, use the steps listed in Google BigQuery connection prerequisites.

  • Navigate to Sources.

  • Select the registered BigQuery project.

  • Select New scan.

  • Enter these details:

    • Name: The name of the scan.
    • Connect via integration runtime: Select the configured self-hosted integration runtime.
    • Credential While configuring BigQuery credential, make sure to:
      • Select Basic Authentication as the authentication method.
      • Provide the email ID of the service account in the User name field. For example, xyz\@developer.gserviceaccount.com.
      • Follow below steps to generate the private key. Copy the entire JSON key file and store it as the value of a Key Vault secret. To create a new private key from Google's cloud platform:
        • In the navigation menu, select IAM (Identity Access Management), and select Admin --> Service Accounts --> Select a project -->
        • Select the email address of the service account that you want to create a key for.
        • Select the Keys tab.
        • Select the Add key drop-down menu, then select Create new key.
        • Choose JSON format.
    • Specify the path to the JDBC (Java Database Connectivity) driver location in your machine where self-host integration runtime is running. For example: D:\Drivers\GoogleBigQuery.
    • Specify a list of BigQuery datasets to import. For example, dataset1; dataset2. When the list is empty, all available datasets are imported.
    • Maximum memory (in GB) available on your virtual machine to be used by scanning processes: This depends on the size of Google BigQuery project to be scanned.
  • Select Test connection.

  • Select Continue.

  • Choose your scan trigger. You can set up a schedule or run the scan once.

  • Review your scan and select Save and Run.

Once scanned, the data assets in Google BigQuery project are available in the Unified Catalog search. For more information, see how to connect and manage Google BigQuery in Microsoft Purview.

Important

Deleting your scan doesn't delete catalog assets created from previous scans.

Set up connection to Google BigQuery project for data quality scan

The scanned asset is now ready for cataloging and governance. Associate the scanned assets to the data products in a governance domain to set up a data quality scan.

  1. In Unified Catalog, go to Health management > Data quality. Select a governance domain to open its details page, then select Manage to create a connection.

  2. Set up the connection:

    • Add connection name and description.
    • Select source type Google BigQuery.
    • Add Project ID, Dataset name, and Table name.
    • Enter details at Service account private key:
      • Add the Microsoft Azure subscription.
      • Add the Microsoft Azure Key Vault connection.
      • Enter the Secret name.
      • Enter the Secret version.
  3. Test the connection to make sure the data source connection is successfully configured.

    Screenshot that shows how to set up google BigQuery connection.

    Screenshot that shows how to configure connection for google BigQuery.

Important

Data quality stewards need read only access to Google BigQuery to set up a data quality connection. Virtual network and private endpoint aren't supported for Google BigQuery data source yet for data quality scanning service.

Profiling and data quality scanning for data in Google BigQuery

After you set up the connection, you can profile your data, create and apply rules, and run a data quality scan for your data in Google BigQuery. Follow the step-by-step guidelines described in these articles:

Resources