Parquet format in Data Factory in Microsoft Fabric

This article outlines how to configure Parquet format in the pipeline of Data Factory in Microsoft Fabric.

Supported capabilities

Parquet format is supported for the following activities and connectors as a source and destination.

Category	Connector/Activity
Supported connector	Amazon S3
	Amazon S3 Compatible
	Azure Blob Storage
	Azure Data Lake Storage Gen1
	Azure Data Lake Storage Gen2
	Azure Files
	File system
	FTP
	Google Cloud Storage
	HTTP
	Lakehouse Files
	Oracle Cloud Storage
	SFTP
Supported activity	Copy activity (source/destination)
	Lookup activity
	GetMetadata activity
	Delete activity

Parquet format in copy activity

To configure Parquet format, choose your connection in the source or destination of a pipeline copy activity, and then select Parquet in the drop-down list of File format. Select Settings for further configuration of this format.

Screenshot showing file format settings.

Parquet format as source

After you select Settings in the File format section, the following properties are shown in the pop-up File format settings dialog box.

Screenshot showing parquet file format source.

Compression type: Choose the compression codec used to read Parquet files in the drop-down list. You can choose from None, gzip (.gz), snappy, lzo, Brotli (.br), Zstandard, lz4, lz4frame, bzip2 (.bz2), or lz4hadoop.

Parquet format as destination

After you select Settings, the following properties are shown in the pop-up File format settings dialog box.

Screenshot showing parquet file format destination.

Compression type: Choose the compression codec used to write Parquet files in the drop-down list. You can choose from None, gzip (.gz), snappy, lzo, Brotli (.br), Zstandard, lz4, lz4frame, bzip2 (.bz2), or lz4hadoop.
Use V-Order: Enable a write time optimization to the parquet file format. For more information, see Delta Lake table optimization and V-Order. It is enabled by default.

Under Advanced settings in the Destination tab, the following Parquet format related properties are displayed.

Max rows per file: When writing data into a folder, you can choose to write to multiple files and specify the maximum rows per file. Specify the maximum rows that you want to write per file.
File name prefix: Applicable when Max rows per file is configured. Specify the file name prefix when writing data to multiple files, resulted in this pattern: <fileNamePrefix>_00000.<fileExtension>. If not specified, the file name prefix is auto generated. This property doesn't apply when the source is a file based store or a partition option enabled data store.

Mapping

For the Mapping tab configuration, if you don't apply Parquet format as your destination data store, go to Mapping.

Edit destination data types

When copying data to the destination connector in Parquet format, except the configuration in Mapping, you can specify certain destination column types after enabling Advanced Parquet type settings. You can also configure the IsNullable option to specify whether each Parquet destination column allows null values. The default value for IsNullable is true.

The following mappings are used from interim data types supported for editing by the service internally to Parquet data types.

Interim service data type	Parquet logical type	Parquet physical type
DateTime	Option 1: null Option 2: TIMESTAMP	Option 1: INT96 (default) Option 2: INT64 (Unit: MILLIS, MICROS, NANOS (default))
DateTimeOffset	Option 1: null Option 2: TIMESTAMP	Option 1: INT96 (default) Option 2: INT64 (Unit: MILLIS, MICROS, NANOS (default))
TimeSpan	TIME	INT32 (Unit: MILLIS) INT64 (Unit: MICROS, NANOS (default))
Decimal	DECIMAL	INT32 (1 <= precision <= 9) INT64 (9 < precision <= 18) FIXED_LEN_BYTE_ARRAY (precision > 18) (default)
GUID	Option 1: STRING Option 2: UUID	Option 1: BYTE_ARRAY (default) Option 2: FIXED_LEN_BYTE_ARRAY
Byte array	null	BYTE_ARRAY (default) or FIXED_LEN_BYTE_ARRAY

For example, the type for decimalData column in the source is converted to an interim service type: Decimal. According to the mapping table above, the mapped type for the destination column is automatically determined according to the specified precision. If the precision is 9 or less, it is mapped to INT32. For precision values above 9 and up to 18, it is mapped to INT64. If the precision exceeds 18, it is mapped to FIXED_LEN_BYTE_ARRAY.

Screenshot of mapping destination column type.

Data type mapping for Parquet

When copying data from the source connector in Parquet format, the following mappings are used from Parquet data types to interim data types used by the service internally.

Parquet logical type	Parquet physical type	Interim service data type
null	BOOLEAN	Boolean
INT(8, true)	INT32	SByte
INT(8, false)	INT32	Byte
INT(16, true)	INT32	Int16
INT(16, false)	INT32	UInt16
INT(32, true)	INT32	Int32
INT(32, false)	INT32	UInt32
INT(64, true)	INT64	Int64
INT(64, false)	INT64	UInt64
null	FLOAT	Single
null	DOUBLE	Double
DECIMAL	INT32, INT64, FIXED_LEN_BYTE_ARRAY or BYTE_ARRAY	Decimal
DATE	INT32	Date
TIME	INT32 or INT64	DateTime
TIMESTAMP	INT64	DateTime
ENUM	BYTE_ARRAY	String
UUID	FIXED_LEN_BYTE_ARRAY	GUID
null	BYTE_ARRAY	Byte array
STRING	BYTE_ARRAY	String

When copying data to the destination connector in Parquet format, the following mappings are used from interim data types used by the service internally to Parquet data types.

Interim service data type	Parquet logical type	Parquet physical type
Boolean	null	BOOLEAN
SByte	INT	INT32
Byte	INT	INT32
Int16	INT	INT32
UInt16	INT	INT32
Int32	INT	INT32
UInt32	INT	INT32
Int64	INT	INT64
UInt64	INT	INT64
Single	null	FLOAT
Double	null	DOUBLE
DateTime	null	INT96
DateTimeOffset	null	INT96
Date	DATE	INT32
TimeSpan	TIME	INT64
Decimal	DECIMAL	INT32, INT64 or FIXED_LEN_BYTE_ARRAY
GUID	STRING	BYTE_ARRAY
String	STRING	BYTE_ARRAY
Byte array	null	BYTE_ARRAY

Table summary

Parquet as source

The following properties are supported in the copy activity Source section when using the Parquet format.

Name	Description	Value	Required	JSON script property
File format	The file format that you want to use.	Parquet	Yes	type (under `datasetSettings`): Parquet
Compression type	The compression codec used to read Parquet files.	Choose from: None gzip (.gz) snappy lzo Brotli (.br) Zstandard lz4 lz4frame bzip2 (.bz2) lz4hadoop	No	compressionCodec: gzip snappy lzo brotli zstd lz4 lz4frame bz2 lz4hadoop

Parquet as destination

The following properties are supported in the copy activity Destination section when using the Parquet format.

Name	Description	Value	Required	JSON script property
File format	The file format that you want to use.	Parquet	Yes	type (under `datasetSettings`): Parquet
Use V-Order	A write time optimization to the parquet file format.	selected or unselected	No	enableVertiParquet
Compression type	The compression codec used to write Parquet files.	Choose from: None gzip (.gz) snappy lzo Brotli (.br) Zstandard lz4 lz4frame bzip2 (.bz2) lz4hadoop	No	compressionCodec: gzip snappy lzo brotli zstd lz4 lz4frame bz2 lz4hadoop
Max rows per file	When writing data into a folder, you can choose to write to multiple files and specify the maximum rows per file. Specify the maximum rows that you want to write per file.	<your max rows per file>	No	maxRowsPerFile
File name prefix	Applicable when Max rows per file is configured. Specify the file name prefix when writing data to multiple files, resulted in this pattern: `<fileNamePrefix>_00000.<fileExtension>`. If not specified, the file name prefix is auto generated. This property doesn't apply when the source is a file based store or a partition option enabled data store.	<your file name prefix>	No	fileNamePrefix

Feedback

Was this page helpful?

Last updated on 2025-10-13