Purview scan of Azure Databricks Unity Catalog fails on Kubernetes IR due to internal __materialization_mat_* tables, but Managed IR succeeds. Is this expected?

Question

Purview scan of Azure Databricks Unity Catalog fails on Kubernetes IR due to internal __materialization_mat_* tables, but Managed IR succeeds. Is this expected?

Piotr Tybulewicz 125

Hi Team,

I am scanning Azure Databricks Unity Catalog using Microsoft Purview. I followed the official documentation: Connect to and manage Azure Databricks Unity Catalog in Microsoft Purview https://learn.microsoft.com/en-us/purview/register-scan-azure-databricks-unity-catalog

My configuration:

Data source: Azure Databricks (old source)
Scope: single catalog only (not entire metastore)
Authentication: Service Principal using Databricks PAT
IR #1: Managed IR
IR #2: Self-hosted Kubernetes IR
Lineage extraction enabled
Same Databricks warehouse, same identity, same permissions

Observed behavior:

In both cases (Managed IR and Kubernetes IR), Purview issues discovery queries that attempt to SELECT from Databricks internal system tables named: _materialization_mat*.
These SELECT statements fail with permission errors on the Databricks side. I believe this is expected, since those internal tables are not readable by user identities.

The difference is:

Using Managed IR, the scan shows as Successful, even though Query History shows the failing SELECT statements.

  Using **Kubernetes IR**, the **same failures** cause the entire scan to be marked as **Failed**.

So Managed IR seems to suppress/ignore these errors, while Kubernetes IR treats them as fatal.

I also tested that using Azure Databricks Unity Catalog data source (new one) with Kubernetes IR - the same result.

Questions:

Is this difference in error-handling between Managed IR and Kubernetes IR expected behavior?

Is this a known limitation of using Kubernetes IR with Databricks Unity Catalog scans?

Is there a recommended workaround for Kubernetes IR?
Should customers currently avoid using Kubernetes IR for UC scans if internal system tables are present?

Thanks in advance for your support.

Regards,

Piotr

VRISHABHANATH PATIL 1,820 Reputation points Microsoft External Staff Moderator

2025-12-05T03:39:45.19+00:00

Hi @Piotr Tybulewicz

I hope you had a chance to review the information shared earlier, and I hope this information has been helpful! If you still have questions, please let us know what is needed in the comments so the question can be answered.

2 answers

Your answer

VRISHABHANATH PATIL 1,820 Reputation points Microsoft External Staff Moderator

2025-12-05T03:39:45.19+00:00

Hi @Piotr Tybulewicz

I hope you had a chance to review the information shared earlier, and I hope this information has been helpful! If you still have questions, please let us know what is needed in the comments so the question can be answered.

Answer 1

Hi @Piotr Tybulewicz

Thank you for contacting to Microsoft QA, below are the few mitigation steps that may help you to address the query -

The failures weren’t due to Purview’s handling of internal _materialization_mat* tables. The actual issue was resource constraints on the Kubernetes Integration Runtime (IR). When the node hit its CPU limit, it restarted mid-scan, causing the entire job to fail.

Why Managed IR Behaved Differently

Managed IR runs on Microsoft-hosted infrastructure with auto-scaling and sufficient resources, so transient query errors (like those on internal system tables) don’t cause the scan to fail. Kubernetes IR, on the other hand, is self-hosted—so if the node restarts due to resource exhaustion, the scan cannot recover and is marked as failed.

Approach

Increase Kubernetes Node Size

Ensure the node has enough CPU and memory to handle Purview scan workloads.
For Databricks Unity Catalog scans, consider sizing for peak usage rather than minimum specs.

Monitor Resource Utilization

Use Kubernetes metrics or Azure Monitor to track CPU/memory during scans.
Set alerts for resource saturation to prevent unexpected restarts.

Optional: Use Managed IR for Stability

If scaling Kubernetes IR is not feasible, Managed IR is more resilient for large or complex scans.

About Internal Tables?

The _materialization_mat* tables are internal to Databricks and not accessible to user identities. Permission errors on these tables are expected and harmless. Purview ignores them when using Managed IR. With Kubernetes IR, once resource sizing is fixed, these errors will no longer cause the scan to fail.

Guidance

Customers do not need to avoid Kubernetes IR for Unity Catalog scans. Just make sure:

Nodes are properly sized.
Resource monitoring is in place.
Understand that permission errors on internal tables are normal and can be safely ignored.

Answer 2

Piotr Tybulewicz 125

Hi

It turned out that the root cause was completely different. The Kubernetes node was undersized, and once it hit its CPU limit it restarted, which caused the scan failures.

After increasing the node size, the scan completed successfully, even though the _materialization tables still couldn’t be read due to missing permissions.

Regards,

Piotr

Piotr Tybulewicz 125 Reputation points

2025-12-03T15:16:36.69+00:00
Thanks for your comment.

I also noticed another difference when scanning Unity Catalog:

Using Managed IR: scan ends with "Completed" status, even though those _materialization tables couldn't be read due to missing permissions.

Using AKS IR: scan ends with "Completed with exceptions" status.

Could you please elaborate on this difference?

Regards,

Piotr
VRISHABHANATH PATIL 1,820 Reputation points Microsoft External Staff Moderator

2025-12-04T04:58:09.5933333+00:00
Hi @Piotr Tybulewicz

Thank you for providing your feedback, below are the detailed explanation on your additional questions -

Status Differs

Managed IR → “Completed” Managed IR is Microsoft-hosted and designed to treat certain non-critical errors (like permission issues on internal system tables) as informational. These errors are logged but do not affect the overall scan status. The assumption is that the scan achieved its primary goal—catalog discovery—even if some internal tables were skipped.

Kubernetes IR → “Completed with exceptions” Self-hosted IR (AKS) uses stricter error reporting. When it encounters queries that fail (e.g., _materialization_mat* tables), it flags them as exceptions to ensure transparency for customers managing their own infrastructure. This is by design, so customers can review and address potential issues.

Is This Expected?

Yes, this difference is expected behavior:

Managed IR prioritizes scan completion and suppresses non-critical errors.

Kubernetes IR surfaces exceptions explicitly because customers control the runtime environment.

Actions to be taken -

No functional impact: Both statuses mean the scan succeeded for all accessible objects. The exceptions only indicate skipped internal tables, which is normal and harmless.

Best practice:

Document this behavior for your team so it’s not mistaken for a failure.

If you want a “Completed” status without exceptions, Managed IR is the only option today.

For Kubernetes IR, you can safely ignore these exceptions as long as all required assets are discovered.

Key points -

“The difference in status is intentional. Managed IR suppresses non-critical errors for simplicity, while Kubernetes IR reports them for transparency. These exceptions do not affect the scan results and can be safely ignored.”

Share via

Purview scan of Azure Databricks Unity Catalog fails on Kubernetes IR due to internal __materialization_mat_* tables, but Managed IR succeeds. Is this expected?

2 answers

Your answer