你当前正在访问 Microsoft Azure Global Edition 技术文档网站。 如果需要访问由世纪互联运营的 Microsoft Azure 中国技术文档网站,请访问 https://docs.azure.cn

partition operator

Applies to: ✅Microsoft FabricAzure Data ExplorerAzure MonitorMicrosoft Sentinel

partition 运算符根据键列中的值将其输入表的记录分区为多个子表。 该运算符对每个子表运行一个子查询,并生成单个输出表,该表是所有子查询结果的并集。

当需要仅对属于同一分区键的行的子集执行子查询,而不是整个数据集的查询时,分区运算符非常有用。 These subqueries could include aggregate functions, window functions, top N and others.

partition 运算符支持多种子查询操作策略:

  • Native - use with an implicit data source with thousands of key partition values.
  • Shuffle - use with an implicit source with millions of key partition values.
  • Legacy - use with an implicit or explicit source for 64 or less key partition values.

Syntax

T|partition [ hint.strategy=Strategy ] [ Hints ] byColumn(TransformationSubQuery)

T|partition [ hint.strategy=legacy ] [ Hints ] byColumn{SubQueryWithSource}

Learn more about syntax conventions.

Parameters

Name 类型 Required Description
T string ✔️ 输入表格源。
Strategy string legacyshufflenative。 此提示定义 partition 运算符的执行策略。

如果未指定任何策略,则使用 legacy 策略。 For more information, see Strategies.
Column string ✔️ The name of a column in T whose values determine how to partition the input tabular source.
TransformationSubQuery string ✔️ 表格转换表达式。 The source is implicitly the subtables produced by partitioning the records of T. Each subtable is homogenous on the value of Column.

表达式必须仅提供一个表格结果,并且不应具有其他类型的语句,例如 let 语句。
SubQueryWithSource string ✔️ 包含其自己的表格源(例如表引用)的表格表达式。 This syntax is only supported with the legacy strategy. The subquery can only reference the key column, Column, from T. To reference the column, use the syntax toscalar(Column).

表达式必须仅提供一个表格结果,并且不应具有其他类型的语句,例如 let 语句。
Hints string Zero or more space-separated parameters in the form of: HintName=Value that control the behavior of the operator. See the supported hints per strategy type.

Supported hints

Hint name 类型 Strategy Description
hint.shufflekey string shuffle 用于使用 shuffle 策略运行分区运算符的分区键。
hint.materialized bool legacy 如果设置为 true,则具体化 partition 运算符的源。 默认值是 false
hint.concurrency int legacy 决定要并行运行的分区数量。 默认值是 16
hint.spread int legacy 决定如何在群集节点之间分配分区。 默认值是 1

For example, if there are N partitions and the spread hint is set to P, then the N partitions are processed by P different cluster nodes equally, in parallel/sequentially depending on the concurrency hint.

Returns

此运算符返回各个子查询的结果的并集。

Strategies

The partition operator supports several strategies of subquery operation: native, shuffle, and legacy.

Note

nativeshuffle 策略之间的区别使调用方能够指示子查询的基数和执行策略。 此选项可能会影响子查询完成所需的时间,但不会更改最终结果。

Native strategy

当分区键的相异值数量不大(以千计)时,应该应用此策略。

子查询必须为表格转换,它不指定表格源。 源是隐式的,将根据子表分区进行分配。 Only certain supported operators can be used in the subquery. 分区数量没有限制。

若要使用此策略,请指定 hint.strategy=native

Shuffle strategy

当分区键的相异值数量很大(以百万计)时,应该应用此策略。

子查询必须为表格转换,它不指定表格源。 源是隐式的,将根据子表分区进行分配。 Only certain supported operators can be used in the subquery. 分区数量没有限制。

若要使用此策略,请指定 hint.strategy=shuffle。 For more information about shuffle strategy and performance, see shuffle query.

本机和随机策略支持的运算符

以下运算符列表可用于具有本机或随机策略的子查询:

Note

  • 引用子表分区以外的表源的运算符与 nativeshuffle 策略不兼容。 For example, join, union, externaldata, and evaluate (plugins). For such scenarios, resort to the legacy strategy.
  • The fork operator isn't supported for any strategy type, as the subquery must return a single tabular result.
  • The make-series operator is only partially supported within the partition by operator. 仅当同时指定和from参数时to,才支持它。

Legacy strategy

出于历史原因,策略 legacy 是默认策略。 However, we recommend favoring the native or shuffle strategies, as the legacy approach is limited to 64 partitions and is less efficient.

在某些情况下,legacy 策略可能是必需的,因为它支持在子查询中包含表格源。 In such cases, the subquery can only reference the key column, Column, from the input tabular source, T. To reference the column, use the syntax toscalar(Column).

如果子查询是没有表格源的表格转换,则源是隐式的,并且基于子表分区。

若要使用此策略,请指定 hint.strategy=legacy 或省略任何其他策略指示。

Note

An error occurs if the partition column, Column, contains more than 64 distinct values.

Examples

本节中的示例演示如何使用语法帮助你入门。

The examples in this article use publicly available tables in the help cluster, such as the StormEvents table in the Samples database.

The examples in this article use publicly available tables, such as the Weather table in the Weather analytics sample gallery. 可能需要修改示例查询中的表名称以匹配工作区中的表。

查找顶部值

在某些情况下,使用 partition 运算符编写查询比使用 top-nested 运算符更高效且更方便。 以下查询针对每个以 summarize 开头的 top 运行计算 StateW 的子查询:"WYOMING"、"WASHINGTON"、"WEST VIRGINIA" 和 "WISCONSIN"。

StormEvents
| where State startswith 'W'
| partition hint.strategy=native by State 
    (
    summarize Events=count(), Injuries=sum(InjuriesDirect) by EventType, State
    | top 3 by Events 
    ) 

Output

EventType State Events Injuries
Hail WYOMING 108 0
High Wind WYOMING 81 5
Winter Storm WYOMING 72 0
Heavy Snow WASHINGTON 82 0
High Wind WASHINGTON 58 13
Wildfire WASHINGTON 29 0
Thunderstorm Wind WEST VIRGINIA 180 1
Hail WEST VIRGINIA 103 0
Winter Weather WEST VIRGINIA 88 0
Thunderstorm Wind WISCONSIN 416 1
Winter Storm WISCONSIN 310 0
Hail WISCONSIN 303 1

Native strategy

以下查询返回按 EventType 的以“W”开头的每个 TotalInjuries 中的前 2 个 State 值:

StormEvents
| where State startswith 'W'
| partition hint.strategy = native by State
    (
    summarize TotalInjueries = sum(InjuriesDirect) by EventType
    | top 2 by TotalInjueries
    )

Output

EventType TotalInjueries
Tornado 4
Hail 1
Thunderstorm Wind 1
Excessive Heat 0
High Wind 13
Lightning 5
High Wind 5
Avalanche 3

Shuffle strategy

以下查询返回前 3 个 DamagedProperty 值 foreach EpisodeId 和列 EpisodeIdState

StormEvents
| partition hint.strategy=shuffle by EpisodeId
    (
    top 3 by DamageProperty
    | project EpisodeId, State, DamageProperty
    )
| count

Output

Count
22345

使用显式源的 Legacy 策略

以下查询运行两个子查询:

  • x == 1 时,查询从 StormEvents 返回所有具有 InjuriesIndirect == 1 的行。
  • x == 2 时,查询从 StormEvents 返回所有具有 InjuriesIndirect == 2 的行。

最终结果是这两个子查询的并集。

range x from 1 to 2 step 1
| partition hint.strategy=legacy by x {StormEvents | where x == InjuriesIndirect}
| count 

Output

Count
113

Partition reference

The following example shows how to use the as operator to give a "name" to each data partition and then reuse that name within the subquery. 此方法只与 legacy 策略相关。

T
| partition by Dim
(
    as Partition
    | extend MetricPct = Metric * 100.0 / toscalar(Partition | summarize sum(Metric))
)