Establishing a Connection between ADLS Gen2, Databricks and ADF In Microsoft Azure

Rosalina5 201 Reputation points
2025-12-04T14:30:31.1666667+00:00

Hello,

May, Someone please help me with establishing connection between ADLS Gen2, Databricks and ADF, full steps if possibble. Do I need to route through key-vault, this is i am first time doing in production,.

May somebody please share detailed step for implementing in Production.

ADF - Orchastrator

ADLS Gen2 - Storage

Databricks - Processsing data, transformation using pyspark.

Thanks a lot

Azure Databricks
Azure Databricks
An Apache Spark-based analytics platform optimized for Azure.
0 comments No comments
{count} votes

Answer accepted by question author
  1. Pratyush Vashistha 5,045 Reputation points Microsoft External Staff Moderator
    2025-12-04T19:23:51.94+00:00

    Hey Rosalina5!

    It sounds like you're trying to set up a connection between Azure Data Lake Storage (ADLS) Gen2, Databricks, and Azure Data Factory (ADF) for your data processing needs. Here's a comprehensive step-by-step guide that you can follow:

    1. Set Up ADLS Gen2:

    • Ensure you have created an ADLS Gen2 account. Note down the storage account name and the file system name you will be using.

    2. Configure Permissions:

    3. Set Up Databricks to Access ADLS Gen2:

    • In your Azure Databricks workspace, navigate to the workspace and create a new cluster if you don't have one already.
    • Configure the Spark environment to access your ADLS Gen2 by setting up the proper configurations.
      • Use the following Spark configuration settings (with your actual values):
         spark.conf.set("fs.azure.account.auth.type.<your-storage-account-name>.dfs.core.windows.net", "OAuth")
      

    spark.conf.set("fs.azure.account.oauth.provider.type.<your-storage-account-name>.dfs.core.windows.net", "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider") spark.conf.set("fs.azure.account.oauth2.client.id.<your-storage-account-name>.dfs.core.windows.net", "<your-client-id>") spark.conf.set("fs.azure.account.oauth2.client.secret.<your-storage-account-name>.dfs.core.windows.net", "<your-client-secret>") spark.conf.set("fs.azure.account.oauth2.client.endpoint.<your-storage-account-name>.dfs.core.windows.net", "https://login.microsoftonline.com/<your-tenant-id>/oauth2/token")

       
    #### 4. Connecting ADF:
    
    - To set up a connection in ADF, create a new linked service in ADF for your ADLS Gen2 storage, and provide the necessary credentials (Service Principal or Account Key).
    
    - Make sure to set the connection string correctly using the linked service properties.
    
    #### 5. Connecting Databricks to ADF:
    
    - Use the **Azure Databricks** linked service in ADF to orchestrate your data workflows.
    
    #### 6. Integration Testing:
    
    - After setting everything up, run a sample pipeline that reads from ADLS Gen2, transforms data using Databricks notebooks, and writes back to ADLS or another destination.
    
    #### 7. Using Key Vault (if needed):
    
    - If you want to enhance security, you can store your sensitive information such as Client ID and Client Secret in Azure Key Vault.
    
    - In this case, set up a Key Vault linked service in ADF and refer to the secrets in your ADF pipeline.
    
    I hope these steps are helpful! If you run into issues or specific errors, let us know so we can troubleshoot further.
    
    ### References:
    
    - [Connect to Azure Data Lake Storage and Blob Storage](https://learn.microsoft.com/azure/databricks/connect/storage/azure-storage)
    
    - [Tutorial: Connect to Azure Data Lake Storage](https://learn.microsoft.com/azure/databricks/connect/storage/tutorial-azure-storage)
    
    - [How do I set ACLs correctly for a service principal?](https://learn.microsoft.com/azure/storage/blobs/data-lake-storage-access-control#how-do-i-set-acls-correctly-for-a-service-principal)
    
    - [Configure your virtual network to use a Microsoft Entra service endpoint](https://learn.microsoft.com/azure/data-lake-store/data-lake-store-network-security?toc=%2Fazure%2Fvirtual-network%2Ftoc.json#configuration)
    
    - [Create a new application secret](https://docs.microsoft.com/azure/active-directory/develop/howto-create-service-principal-portal#create-a-new-application-secret)
    
    If this answers your query, do click `Accept Answer` and `Yes` for was this answer helpful. And, if you have any further query do let us know.
    
    
    0 comments No comments

1 additional answer

Sort by: Most helpful
  1. Q&A Assist
    2025-12-04T14:31:48.1466667+00:00
    AI generated content. This question contains an answer created with AI from Q&A Assist. Learn more

    To establish a connection between Azure Data Lake Storage Gen2 (ADLS Gen2), Azure Databricks, and Azure Data Factory (ADF), follow these steps:

    1. Set Up ADLS Gen2

    • Create an ADLS Gen2 account if you haven't already. Ensure you have the necessary permissions to access it.
    • Create a container within your ADLS Gen2 account to store your data.

    2. Configure Azure Data Factory (ADF)

    • Create a new ADF instance in the Azure portal.
    • Set up a Linked Service to connect to your ADLS Gen2:
      • Go to the ADF instance, select Manage > Linked services > New.
      • Choose Azure Data Lake Storage Gen2 as the connector.
      • Fill in the required details, including the URL of your ADLS Gen2 account and authentication method (e.g., Account Key, Service Principal).

    3. Set Up Azure Databricks

    • Create a Databricks workspace if you don’t have one.
    • Create a new notebook in Databricks:
      • In your Databricks workspace, select Create > Notebook.
      • Name your notebook and choose Python as the language.

    4. Configure Connection in Databricks

    • In the first cell of your notebook, add the following code to configure the connection to your ADLS Gen2 account:
        # Configure ADLS Gen2 connection
        storage_account_name = "your_storage_account_name"
        storage_account_key = "your_storage_account_key"
        container_name = "mydatacontainer"
        
        spark.conf.set(
            f"fs.azure.account.key.{storage_account_name}.dfs.core.windows.net",
            storage_account_key
        )
      
    • Replace your_storage_account_name and your_storage_account_key with your actual credentials.
    • Run the cell to establish the connection.
    • For production environments, it is recommended to use Azure Key Vault to securely manage your credentials instead of hardcoding them in your notebook. You can set up a Key Vault and reference the secrets in your Databricks notebook.

    6. Read and Write Data

    • Use Spark to read and write data from/to ADLS Gen2 in your Databricks notebook. For example, to read a CSV file:
        customers_path = f"abfss://{container_name}@{storage_account_name}.dfs.core.windows.net/Customers.csv"
        customers_df = spark.read.format("csv") \
            .option("header", "true") \
            .option("inferSchema", "true") \
            .load(customers_path)
      

    7. Orchestrate with ADF

    • In ADF, you can create pipelines that utilize the linked service to read from and write to ADLS Gen2, and also trigger your Databricks notebooks for data processing.

    By following these steps, you can successfully establish a connection between ADLS Gen2, Databricks, and ADF for data processing and orchestration.


    References:

    0 comments No comments

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.