Hey Arjun, it looks like you're having some trouble with Microsoft Purview while trying to profile and perform data quality checks on your ADLS Gen2 Spark partitioned storage. Here’s a breakdown of what you might consider doing to resolve this issue.
Steps to Resolve Your Issue:
- Check Cluster Permissions:
- Ensure that the Microsoft Purview account is listed as a Storage Blob Data Reader on your ADLS Gen2 storage account. If the Purview account lacks the necessary permissions, it can't access the schema properly.
- Wait for Schema Update:
- Sometimes changes in schema can take time to reflect. After the initial scan, it can take up to 12 hours for the schema to update properly. If it hasn't been long since you scanned, consider waiting before trying to analyze it again.
- Enable Advanced Resource Sets:
- Make sure the Advanced Resource Sets option is turned on in your Microsoft Purview settings. This feature must be enabled for the schema to be accurately captured for partitioned datasets.
- Validate File Format and Structure:
- Check that the data follows the supported directory structure for profiling and DQ you mentioned. Make sure your file names and folder structures conform to the expected formats that Purview supports. For instance, use paths structured like
https://<storage_account>.dfs.core.windows.net/<container>/path/{Partitioned Files}.
- Check that the data follows the supported directory structure for profiling and DQ you mentioned. Make sure your file names and folder structures conform to the expected formats that Purview supports. For instance, use paths structured like
- Consider Data Type Support:
- Ensure none of the fields in your files utilize unsupported data types. If they do, it might affect schema extraction.
Reference Information:
- Asset Schema Issues: Assets - Asset Schema is missing or incorrect
- ADLS Gen2 Data Sources: Connect to Azure Data Lake Storage in Microsoft Purview
- Data Profiling Support Information: Configure and run data profiling for a data asset
Follow-Up Questions:
- Are all file formats being used compliant with what Purview can detect and analyze?
- Are there any specific error messages that you encountered while conducting the profiling or DQ checks?
Hope this helps you get your schema issues sorted out! Let me know if you have any more questions.
If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.