Edit

Share via


Module 1: Create a pipeline with Data Factory

This module takes about 10 minutes to complete. You'll ingest raw data from the source store into a table in the bronze data layer of a data Lakehouse using the Copy activity in a pipeline.

The high-level steps in module 1 are:

  1. Create a pipeline.
  2. Create Copy Activity in the pipeline to load sample data into a data Lakehouse.
  3. Run and view the results of the the copy activity

Prerequisites

Create a pipeline

  1. Sign into Power BI.

  2. Select the default Power BI icon at the bottom left of the screen, and select Fabric.

  3. Select a workspace from the Workspaces tab or select My workspace, then select + New item, then search for and choose Pipeline.

    Screenshot of the Data Factory start page with the button to create a new pipeline selected.

  4. Provide a pipeline name. Then select Create.

Create a Copy activity in the pipeline to load sample data to a data Lakehouse

  1. Select Copy data assistant to open the copy assistant tool.

    Screenshot showing the selection of the Copy data activity from the new pipeline start page.

  2. On the Choose data source page, select Sample data from the options at the top of the dialog, and then select NYC Taxi - Green.

    Screenshot showing the selection of the NYC Taxi - Green data in the copy assistant on the Choose data source tab.

  3. The data source preview appears next on the Connect to data source page. Review, and then select Next.

    Screenshot showing the preview data for the NYC Taxi - Green sample dataset.

  4. For the Choose data destination step of the copy assistant, select Lakehouse.

  5. Enter a Lakehouse name, then select Create and connect.

  6. Select Connect.

  7. Select Full copy for the copy job mode.

  8. When mapping to destination, select Tables, select Append as the update method, and edit the table mapping so the destination table is named Bronze. Then select Next.

    Screenshot showing the Connect to data destination tab of the Copy data assistant, on the Select and map to folder path or table step.

  9. On the Review + save page of the copy data assistant, review the configuration and then select Save.

  10. Select the copy job activity on the pipeline canvas, then select the Settings tab below the canvas.

    Screenshot of the pipeline canvas with the copy job activity highlighted and the settings tab highlighted.

  11. Select the Connection drop-down and select Browse all.

    Screenshot of the copy job activity settings list, with browse all highlighted.

  12. Select Copy job under New sources.

  13. On the Connect data source page, select Sign in to authenticate the connection.

    Screenshot of the get data connection credentials page, with the Sign in Option highlighted.

  14. Follow the prompts to sign in to your organizational account.

  15. Select Connect to complete the connection setup.

  16. At the top of the pipeline editor, select Save to save the pipeline.

Run and view the results of your Copy activity

  1. At the top of the pipeline editor, select Run to run the pipeline and copy the data.

    Note

    This copy can take over 30 minutes to complete.

    Screenshot of the pipeline editor with the Run button highlighted.

  2. You can monitor the run and check the results on the Output tab below the pipeline canvas. Select name of the pipeline to view the run details.

    Screenshot showing the run details button in the pipeline Output tab.

  3. Expand the Duration breakdown section to see the duration of each stage of the Copy activity. After reviewing the copy details, select Close.

Next step

Once the copy has completed, it can take around half an hour, continue to the next section to create your dataflow.