Azure datalake adls to delta lake

azure_learner 920 Reputation points
2025-10-25T11:06:39.5233333+00:00

Hi Kind friends and experts, we have a data modernization roadmap, which I have also discussed similar but not exactly the same as this question here in below threads:

https://learn.microsoft.com/en-us/answers/questions/2237931/azure-datalake-and-consistent-data

https://learn.microsoft.com/en-us/answers/questions/5526827/schema-in-adls

The current state high level architecture is as follows:

a.       The whole data sources are ingested (no streaming data at the moment), validated (DQ checks) comprehensive at silver layer.

b.      The data is standardized ,joined (wherever applicable)

c.       The data is delta format in the format

year=2025

month=10

day=12

d.      The data is being ingested from landing to bronze and silver layer in incremental load.

e.      We have Auto loader for schema drift etc.

f.        We also have unity catalog for data governance, and Azure purview for metadata and lineage tracking.

The future proposed high level architecture is as looks like this:

a.       The KPIs are identified, for other sources the KPIs identification is in progress.

b.      Once all KPIs are identified, and collated, in silver layer these KPIs would be created materialize KPI logic into managed Delta tables in gold layer such as gold.sales_analysis_kpi etc. for the required data sources.

c.       These are refreshed on a defined schedule (daily, weekly, etc.) and serve as single sources of truth. And, consumed by the downstream BI reporting and AI ML teams as well.

d.      As Delta/Data Lakehouse intended goal, of course data modeling (star schema/snowflake or hybrid depends on the use-cases)

e.      We also plan to have a Synapse analytics as our enterprise data warehouse.

I would appreciate and incredibly grateful, if experts can confirm the direction we are heading is aligned with standard approach and with industry best practices.

Since the things at this point in time are evolving that is new data sources could be expected to land.

Please help me with these clarifications:

a.       How to treat these data sources which might bring new set of KPIs once our data Lakehouse data model is already been done, what could be potential challenges integrating new set of KPIs and tables in to existing model?

b.      Do we need to data model twice one for the Lakehouse and other for Synapse?

Your comprehensive and elaborate guidance would be highly appreciated and valued. I would be immensely thankful for your informed inputs. Thanks in advance

Azure Data Lake Storage
Azure Data Lake Storage
An Azure service that provides an enterprise-wide hyper-scale repository for big data analytic workloads and is integrated with Azure Blob Storage.
0 comments No comments
{count} votes

Answer accepted by question author
  1. Smaran Thoomu 32,525 Reputation points Microsoft External Staff Moderator
    2025-10-27T13:04:35.6933333+00:00

    Hi azure_learner
    Your proposed approach for building a Lakehouse on Delta Lake and organizing KPIs in Azure is well aligned with industry best practices. You do not need to maintain two separate data models for the Lakehouse and Synapse, but there are a few important considerations as your data sources and KPIs evolve.

    Handling New Data Sources & KPIs

    • When new data sources or KPIs are introduced, update your silver/gold layer Delta tables and pipelines to include new columns, tables, or logic. Delta Lake supports schema evolution, allowing you to integrate new data with minimal disruption but it’s important to document changes carefully to avoid downstream issues.
    • Potential challenges: As more sources and KPIs are added, maintaining consistent definitions, preventing duplication, and keeping dependencies synchronized across reporting and ML pipelines can become complex. Strong version control, robust data validation, and keeping Unity Catalog and Purview lineage up to date are essential for long-term stability.
    • Review and update your governance rules, access controls, and refresh schedules whenever new sources or KPIs are introduced.

    Data Modeling for Lakehouse & Synapse

    Aspect Delta Lakehouse Synapse Analytics DW
    Storage Delta tables in ADLS (bronze/silver/gold) External tables or imports from Delta tables
    Modeling Approach Star, snowflake, or hybrid - optimized for KPI analytics Star/snowflake - optimized for performance reporting
    Model Duplication? Usually not needed (Synapse can read from Delta tables) Only if DW-specific schema tuning or compliance rules are required
    • In most cases, organizations design the data model once and use external tables or serverless SQL in Synapse to query Delta tables directly. This avoids duplication and ensures that the Gold layer remains the single source of truth.
    • Some re-modeling may be needed only when specific warehouse optimizations or regulatory compliance requirements apply.

    Recommendations

    Maintain a metadata-driven architecture and document all KPIs in Purview for clear lineage and traceability.

    Enable schema drift monitoring in your pipelines and set up alerts for new columns or data source additions.

    Keep a single, well-governed data model across the Lakehouse and Synapse layers unless technical or business needs require otherwise.

    Regularly review performance tuning and governance as your model grows to ensure scalability.

    Your roadmap follows the Azure Lakehouse best practices for scalable analytics, reporting, and AI workloads. As new sources and KPIs are added, focus on automation, metadata management, and governance to remain agile and consistent as your data platform evolves.

    Reference:

    Hope this helps. Do let us know if you any further queries.


    If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.

    1 person found this answer helpful.
    0 comments No comments

0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.