Purview scan of Azure Databricks Unity Catalog fails on Kubernetes IR due to internal __materialization_mat_* tables, but Managed IR succeeds. Is this expected?

Piotr Tybulewicz 125 Reputation points
2025-11-24T15:36:40.9233333+00:00

Hi Team,

I am scanning Azure Databricks Unity Catalog using Microsoft Purview. I followed the official documentation: Connect to and manage Azure Databricks Unity Catalog in Microsoft Purview https://learn.microsoft.com/en-us/purview/register-scan-azure-databricks-unity-catalog

My configuration:

  • Data source: Azure Databricks (old source)
  • Scope: single catalog only (not entire metastore)
  • Authentication: Service Principal using Databricks PAT
  • IR #1: Managed IR
  • IR #2: Self-hosted Kubernetes IR
  • Lineage extraction enabled
  • Same Databricks warehouse, same identity, same permissions

Observed behavior:

  1. In both cases (Managed IR and Kubernetes IR), Purview issues discovery queries that attempt to SELECT from Databricks internal system tables named: _materialization_mat*.
  2. These SELECT statements fail with permission errors on the Databricks side. I believe this is expected, since those internal tables are not readable by user identities.

The difference is:

Using Managed IR, the scan shows as Successful, even though Query History shows the failing SELECT statements.

  Using **Kubernetes IR**, the **same failures** cause the entire scan to be marked as **Failed**.
  

So Managed IR seems to suppress/ignore these errors, while Kubernetes IR treats them as fatal.

I also tested that using Azure Databricks Unity Catalog data source (new one) with Kubernetes IR - the same result.

Questions:

Is this difference in error-handling between Managed IR and Kubernetes IR expected behavior?

Is this a known limitation of using Kubernetes IR with Databricks Unity Catalog scans?

  1. Is there a recommended workaround for Kubernetes IR?
  2. Should customers currently avoid using Kubernetes IR for UC scans if internal system tables are present?

Thanks in advance for your support.

Regards,

Piotr

Microsoft Security | Microsoft Purview
{count} votes

2 answers

Sort by: Most helpful
  1. VRISHABHANATH PATIL 1,820 Reputation points Microsoft External Staff Moderator
    2025-12-03T05:54:19.79+00:00

    Hi @Piotr Tybulewicz

    Thank you for contacting to Microsoft QA, below are the few mitigation steps that may help you to address the query -

    The failures weren’t due to Purview’s handling of internal _materialization_mat* tables. The actual issue was resource constraints on the Kubernetes Integration Runtime (IR). When the node hit its CPU limit, it restarted mid-scan, causing the entire job to fail.

    Why Managed IR Behaved Differently

    Managed IR runs on Microsoft-hosted infrastructure with auto-scaling and sufficient resources, so transient query errors (like those on internal system tables) don’t cause the scan to fail. Kubernetes IR, on the other hand, is self-hosted—so if the node restarts due to resource exhaustion, the scan cannot recover and is marked as failed.

    Approach

    Increase Kubernetes Node Size

    • Ensure the node has enough CPU and memory to handle Purview scan workloads.
    • For Databricks Unity Catalog scans, consider sizing for peak usage rather than minimum specs.

    Monitor Resource Utilization

    • Use Kubernetes metrics or Azure Monitor to track CPU/memory during scans.
    • Set alerts for resource saturation to prevent unexpected restarts.

    Optional: Use Managed IR for Stability

    • If scaling Kubernetes IR is not feasible, Managed IR is more resilient for large or complex scans.

    About Internal Tables?

    The _materialization_mat* tables are internal to Databricks and not accessible to user identities. Permission errors on these tables are expected and harmless. Purview ignores them when using Managed IR. With Kubernetes IR, once resource sizing is fixed, these errors will no longer cause the scan to fail.

    Guidance

    Customers do not need to avoid Kubernetes IR for Unity Catalog scans. Just make sure:

    • Nodes are properly sized.
    • Resource monitoring is in place.
    • Understand that permission errors on internal tables are normal and can be safely ignored.
    1 person found this answer helpful.
    0 comments No comments

  2. Piotr Tybulewicz 125 Reputation points
    2025-12-01T13:11:37.7133333+00:00

    Hi

    It turned out that the root cause was completely different. The Kubernetes node was undersized, and once it hit its CPU limit it restarted, which caused the scan failures.

    After increasing the node size, the scan completed successfully, even though the _materialization tables still couldn’t be read due to missing permissions.

    Regards,

    Piotr


Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.