แชร์ผ่าน


Set up data quality for Snowflake data

When scanning Snowflake data sources, Microsoft Purview supports extracting technical metadata, including:

  • Server, databases, schemas, and tables, including the columns; views including the columns, foreign keys, and unique constraints.
  • Stored procedures, including the parameter dataset and result set.
  • Functions, including the parameter dataset.
  • Pipes, stages, and streams, including the columns.
  • Tasks and sequences.

Register a Snowflake source in Data Map

To register a new Snowflake source in Microsoft Purview Unified Catalog, follow these steps:

  • Sign in to the Microsoft Purview portal.
  • Select the Data Map solution card. If the Data Map solution card isn't displayed, select View all solutions and then select Data Map from the Core section.
  • Select Register.
  • On Register sources, select Snowflake.

Set up a Data Map scan

  • Select the registered Snowflake source.
  • Select + New scan.
  • Provide the following details:
    • Name: The name of the scan.
    • Connect via integration runtime: Select the Azure autoresolved integration runtime, Managed Virtual Network IR, or SHIR according to your scenario.
    • Host for connection: Choose the endpoint used to establish connection to Snowflake during scan. You can choose from the server URL, or the other hosts that you configured in data source.
    • Credential: Select the credential to connect to your data source. Make sure to:
      • Select Basic Authentication while creating a credential.
      • Provide the user name used to connect to Snowflake in the User name input field.
      • Store the user password used to connect to Snowflake in the secret key.
    • Warehouse: Specify the name of the warehouse instance used to empower scan in capital case. The default role assigned to the user specified in the credential must have USAGE rights on this warehouse.
    • Databases: Specify one or more database instance names to import in capital case. Separate the names in the list with a semicolon (;). For example, DB1;DB2. The default role assigned to the user specified in the credential must have adequate rights on the database objects.
    • Schema: List subset of schemas to import expressed as a semicolon separated list.
  • Select Test connection to validate the settings (available when using Azure Integration Runtime).
  • Select Continue.
  • Select a scan rule set for classification. You can choose between the system default, existing custom rule sets, or create a new rule set inline.
  • Review your scan and select Save and Run.

Once scanned, the data asset in Snowflake is available on the Unified Catalog search. For more information about how to connect and manage Snowflake in Microsoft Purview, see Connect to and manage Snowflake in Microsoft Purview.

Important

When you delete the object from the data source, the subsequent scan doesn't automatically remove the corresponding asset in Microsoft Purview.

Set up connection to Snowflake data source for data quality scan

At this point, the scanned asset is ready for cataloging and governance. Associate the scanned asset to the data product in a Governance Domain Sele. At the Data Quality Tab, add a new Azure SQL Database Connection: Get the Database Name entered manually.

  1. In the Microsoft Purview portal, open Unified Catalog.

  2. Under Health management, select Data quality.

  3. Select a governance domain from the list, then select Connections from the Manage dropdown list.

  4. Configure connection on the Connections page:

    • Add connection name and description.
    • Select source type Snowflake.
    • Add Server name, Warehouse name, Database name, Schema name, and Table name.
    • Select authentication method - Basic authentication.
    • Add User name.
    • Add Credentials:
      • Add Azure subscription
      • Key vault connection
      • Secret name
      • Secret version
    • Select the Enable managed V-Net checkbox if your Snowflake is running on Azure Virtual Network.
    • Select the Azure Region.
    • Add Private Link Resources ID.
    • Add Fully Qualified Domain Name.
  5. Test the connection to make sure it works. If you're using Virtual Network, then the test connection feature isn't supported.

    Screenshot that shows how to set up snowflake connection.

    Screenshot that shows how to configure snowflake connection token.

The resource ID for the target Snowflake private link is of the following format: /subscriptions/(subscription_id)/resourcegroups/az(region)-privatelink/providers/microsoft.network/privatelinkservices/sf-pvlinksvc-az(region).

  • To obtain region-ID and Fully qualified name, run SYSTEM_WHITE_LIST and SYSTEM_WHITE_LIST_PRIVATELINK to obtain the SNOWFLAKE_DEPLOYMENT, SNOWFLAKE_DEPLOYMENT_REGIONLESS, and OCSP_CACHE values for public and allowlist hosts.
  • To obtain Subscription ID, run SYSTEM$GET_SNOWFLAKE_PLATFORM_INFO() as ACCOUNTADMIN to obtain the snowflake-vnet-subnet-ids values. The Subscription ID for the Private Link of the Snowflake’s Azure tenant is obtained from this.

Important

  • Once the request for a private endpoint connection is created, a support ticket needs to be raised with Snowflake support to approve the private endpoint connection. Provide the details of the managed private endpoint resource ID in this support ticket for the Snowflake team to approve.
  • Data quality stewards need read only access to Snowflake to set up the data quality connection.
  • The Snowflake connector doesn't accept https://. Remove https:// when you add the server name to configure data source connection.
  • If public access is disabled, you need to select Allow trusted Microsoft services for Key Vault. This is required only for Key Vault, not for your Snowflake workspace.

Note

Support will be added for Open Authorization (OAuth) and key pair for Snowflake database connection. With OAuth and key pair based authentication mechanisms supported for Snowflake, you can create multiple connections both supporting different authentication mechanisms. In such cases, the logic of picking the last created connection holds up. For example, if you first create a key pair based connection followed by OAuth, then the OAuth one is picked up. If you want the key pair connection to be picked up, you have to delete the OAuth mechanism. Contact your Microsoft Account representative if you want to preview the new Snowflake authentication model for Microsoft Purview Data Quality scan.

Profiling and data quality scanning for data in Snowflake

After you set up the connection, you can profile your data, create and apply rules, and run a data quality scan for your data in Snowflake. Follow the step-by-step guidelines described in the following documents:

Resources