Edit

Share via


Quickstart: Create a solution to move and transform data

In this quickstart, you learn how dataflows and pipelines work together to create a powerful Data Factory solution. You'll clean data with dataflows and move it with pipelines.

Prerequisites

Before you start, you need:

Compare dataflows and pipelines

Dataflow Gen2 provide a low-code interface with 300+ data and AI-based transformations. You can easily clean, prep, and transform data with flexibility. Pipelines offer rich data orchestration capabilities to compose flexible data workflows that meet your enterprise needs.

In a pipeline, you can create logical groupings of activities that perform a task. This might include calling a dataflow to clean and prep your data. While there's some functionality overlap between the two, your choice depends on whether you need the full capabilities of pipelines or can use the simpler capabilities of dataflows. For more information, see the Fabric decision guide.

Transform data with dataflows

Follow these steps to set up your dataflow.

Create a dataflow

  1. Select your Fabric-enabled workspace, then New, and choose Dataflow Gen2.

    Screenshot of starting a Dataflow Gen2.

  2. In the dataflow editor, select Import from SQL Server.

    Screenshot of the dataflow editor.

Get data

  1. In the Connect to data source dialog, enter your Azure SQL database details and select Next. Use the AdventureWorksLT sample database from the prerequisites.

    Screenshot of connecting to an Azure SQL database.

  2. Select the data to transform, such as SalesLT.Customer, and use Select related tables to include related tables. Then select Create.

    Screenshot of selecting data to transform.

Transform your data

  1. Select Diagram view from the status bar or the View menu in the Power Query editor.

    Screenshot of selecting diagram view.

  2. Right-select your SalesLT Customer query, or select the vertical ellipsis on the right of the query, then select Merge queries.

    Screenshot of the Merge queries option.

  3. Configure the merge with SalesLTOrderHeader as the right table, CustomerID as the join column, and Left outer as the join type. Select OK.

    Screenshot of the Merge configuration screen.

  4. Add a data destination by selecting the database symbol with an arrow. Choose Azure SQL database as the destination type.

    Screenshot of the Add data destination button.

  5. Provide the details for your Azure SQL database connection where the merge query is to be published. In this example, we use the AdventureWorksLT database we used as the data source for the destination too.

    Screenshot of the Connect to data destination dialog.

  6. Choose a database to store the data, and provide a table name, then select Next.

    Screenshot of the Choose destination target window.

  7. Accept the default settings in the Choose destination settings dialog and select Save settings.

    Screenshot of the Choose destination settings dialog.

  8. Select Publish in the dataflow editor to publish the dataflow.

    Screenshot highlighting the Publish button on the dataflow gen2 editor.

Move data with pipelines

Now that you've created a Dataflow Gen2, you can operate on it in a pipeline. In this example, you copy the data generated from the dataflow into text format in an Azure Blob Storage account.

Create a new pipeline

  1. In your workspace, select New, then Pipeline.

    Screenshot of creating a new pipeline.

  2. Name your pipeline and select Create.

    Screenshot showing the new pipeline creation prompt with a sample pipeline name.

Configure your dataflow

  1. Add a dataflow activity to your pipeline by selecting Dataflow in the Activities tab.

    Screenshot of adding a dataflow activity.

  2. Select the dataflow on the pipeline canvas, go to the Settings tab, and choose the dataflow you created earlier.

    Screenshot of selecting a dataflow.

  3. Select Save, then Run to populate the merged query table.

    Screenshot showing where to select Run.

Add a copy activity

  1. Select Copy data on the canvas or use the Copy Assistant from the Activities tab.

    Screenshot showing the two ways to access the copy assistant.

  2. Choose Azure SQL Database as the data source and select Next.

    Screenshot showing where to choose a data source.

  3. Create a connection to your data source by selecting Create new connection. Fill in the required connection information on the panel, and enter the AdventureWorksLT for the database, where we generated the merge query in the dataflow. Then select Next.

    Screenshot showing where to create a new connection.

  4. Select the table you generated in the dataflow step earlier, and then select Next.

    Screenshot showing how to select from available tables.

  5. For your destination, choose Azure Blob Storage and then select Next.

    Screenshot showing the Azure Blob Storage data destination.

  6. Create a connection to your destination by selecting Create new connection. Provide the details for your connection, then select Next.

    Screenshot showing how to create a connection.

  7. Select your Folder path and provide a File name, then select Next.

    Screenshot showing how to select folder path and file name.

  8. Select Next again to accept the default file format, column delimiter, row delimiter, and compression type, optionally including a header.

    Screenshot showing the configuration options for the file in Azure Blob Storage.

  9. Finalize your settings. Then, review and select Save + Run to finish the process.

    Screenshot showing how to review copy data settings.

Design your pipeline and save to run and load data

  1. To run the Copy activity after the Dataflow activity, drag from Succeeded on the Dataflow activity to the Copy activity. The Copy activity only runs after the Dataflow activity succeeds.

    Screenshot showing how to make the dataflow run take place after the copy activity.

  2. Select Save to save your pipeline. Then select Run to run your pipeline and load your data.

    Screenshot showing where to select Save and Run.

Schedule pipeline execution

Once you finish developing and testing your pipeline, you can schedule it to run automatically.

  1. On the Home tab of the pipeline editor window, select Schedule.

    Screenshot of the Schedule button on the menu of the Home tab in the pipeline editor.

  2. Configure the schedule as required. The example here schedules the pipeline to run daily at 8:00 PM until the end of the year.

    Screenshot showing the schedule configuration for a pipeline to run daily at 8:00 PM until the end of the year.

This sample shows you how to create and configure a Dataflow Gen2 to create a merge query and store it in an Azure SQL database, then copy data from the database into a text file in Azure Blob Storage. You learned how to:

  • Create a dataflow.
  • Transform data with the dataflow.
  • Create a pipeline using the dataflow.
  • Order the execution of steps in the pipeline.
  • Copy data with the Copy Assistant.
  • Run and schedule your pipeline.

Next, advance to learn more about monitoring your pipeline runs.