你当前正在访问 Microsoft Azure Global Edition 技术文档网站。 如果需要访问由世纪互联运营的 Microsoft Azure 中国技术文档网站,请访问 https://docs.azure.cn。
Applies to: ✅Microsoft Fabric✅Azure Data Explorer✅Azure Monitor✅Microsoft Sentinel
partition 运算符根据键列中的值将其输入表的记录分区为多个子表。 该运算符对每个子表运行一个子查询,并生成单个输出表,该表是所有子查询结果的并集。
当需要仅对属于同一分区键的行的子集执行子查询,而不是整个数据集的查询时,分区运算符非常有用。 These subqueries could include aggregate functions, window functions, top N and others.
partition 运算符支持多种子查询操作策略:
- Native - use with an implicit data source with thousands of key partition values.
- Shuffle - use with an implicit source with millions of key partition values.
- Legacy - use with an implicit or explicit source for 64 or less key partition values.
Syntax
T|partition [ hint.strategy=Strategy ] [ Hints ] byColumn(TransformationSubQuery)
T|partition [ hint.strategy=legacy ] [ Hints ] byColumn{SubQueryWithSource}
Learn more about syntax conventions.
Parameters
| Name | 类型 | Required | Description |
|---|---|---|---|
| T | string |
✔️ | 输入表格源。 |
| Strategy | string |
值 legacy、shuffle 或 native。 此提示定义 partition 运算符的执行策略。
如果未指定任何策略,则使用 legacy 策略。 For more information, see Strategies. |
|
| Column | string |
✔️ | The name of a column in T whose values determine how to partition the input tabular source. |
| TransformationSubQuery | string |
✔️ | 表格转换表达式。 The source is implicitly the subtables produced by partitioning the records of T. Each subtable is homogenous on the value of Column.
表达式必须仅提供一个表格结果,并且不应具有其他类型的语句,例如 let 语句。 |
| SubQueryWithSource | string |
✔️ | 包含其自己的表格源(例如表引用)的表格表达式。 This syntax is only supported with the legacy strategy. The subquery can only reference the key column, Column, from T. To reference the column, use the syntax toscalar(Column).
表达式必须仅提供一个表格结果,并且不应具有其他类型的语句,例如 let 语句。 |
| Hints | string |
Zero or more space-separated parameters in the form of: HintName=Value that control the behavior of the operator. See the supported hints per strategy type. |
Supported hints
| Hint name | 类型 | Strategy | Description |
|---|---|---|---|
hint.shufflekey |
string |
shuffle | 用于使用 shuffle 策略运行分区运算符的分区键。 |
hint.materialized |
bool |
legacy | 如果设置为 true,则具体化 partition 运算符的源。 默认值是 false。 |
hint.concurrency |
int |
legacy | 决定要并行运行的分区数量。 默认值是 16。 |
hint.spread |
int |
legacy | 决定如何在群集节点之间分配分区。 默认值是 1。
For example, if there are N partitions and the spread hint is set to P, then the N partitions are processed by P different cluster nodes equally, in parallel/sequentially depending on the concurrency hint. |
Returns
此运算符返回各个子查询的结果的并集。
Strategies
The partition operator supports several strategies of subquery operation: native, shuffle, and legacy.
Note
native 和 shuffle 策略之间的区别使调用方能够指示子查询的基数和执行策略。 此选项可能会影响子查询完成所需的时间,但不会更改最终结果。
Native strategy
当分区键的相异值数量不大(以千计)时,应该应用此策略。
子查询必须为表格转换,它不指定表格源。 源是隐式的,将根据子表分区进行分配。 Only certain supported operators can be used in the subquery. 分区数量没有限制。
若要使用此策略,请指定 hint.strategy=native。
Shuffle strategy
当分区键的相异值数量很大(以百万计)时,应该应用此策略。
子查询必须为表格转换,它不指定表格源。 源是隐式的,将根据子表分区进行分配。 Only certain supported operators can be used in the subquery. 分区数量没有限制。
若要使用此策略,请指定 hint.strategy=shuffle。 For more information about shuffle strategy and performance, see shuffle query.
本机和随机策略支持的运算符
以下运算符列表可用于具有本机或随机策略的子查询:
- count
- distinct
- extend
- make-series (partially supported, see note)
- mv-apply
- mv-expand
- parse
- parse-where
- project
- project-away
- project-keep
- project-rename
- project-reorder
- reduce
- sample
- sample-distinct
- scan
- search
- serialize
- sort
- summarize
- take
- top
- top-hitters
- top-nested
- where
Note
- 引用子表分区以外的表源的运算符与
native和shuffle策略不兼容。 For example, join, union, externaldata, and evaluate (plugins). For such scenarios, resort to the legacy strategy. - The fork operator isn't supported for any strategy type, as the subquery must return a single tabular result.
- The make-series operator is only partially supported within the
partition byoperator. 仅当同时指定和from参数时to,才支持它。
Legacy strategy
出于历史原因,策略 legacy 是默认策略。 However, we recommend favoring the native or shuffle strategies, as the legacy approach is limited to 64 partitions and is less efficient.
在某些情况下,legacy 策略可能是必需的,因为它支持在子查询中包含表格源。 In such cases, the subquery can only reference the key column, Column, from the input tabular source, T. To reference the column, use the syntax toscalar(Column).
如果子查询是没有表格源的表格转换,则源是隐式的,并且基于子表分区。
若要使用此策略,请指定 hint.strategy=legacy 或省略任何其他策略指示。
Note
An error occurs if the partition column, Column, contains more than 64 distinct values.
Examples
本节中的示例演示如何使用语法帮助你入门。
The examples in this article use publicly available tables in the help cluster, such as the
StormEventstable in the Samples database.
The examples in this article use publicly available tables, such as the
Weathertable in the Weather analytics sample gallery. 可能需要修改示例查询中的表名称以匹配工作区中的表。
查找顶部值
在某些情况下,使用 partition 运算符编写查询比使用 top-nested 运算符更高效且更方便。 以下查询针对每个以 summarize 开头的 top 运行计算 State 和 W 的子查询:"WYOMING"、"WASHINGTON"、"WEST VIRGINIA" 和 "WISCONSIN"。
StormEvents
| where State startswith 'W'
| partition hint.strategy=native by State
(
summarize Events=count(), Injuries=sum(InjuriesDirect) by EventType, State
| top 3 by Events
)
Output
| EventType | State | Events | Injuries |
|---|---|---|---|
| Hail | WYOMING | 108 | 0 |
| High Wind | WYOMING | 81 | 5 |
| Winter Storm | WYOMING | 72 | 0 |
| Heavy Snow | WASHINGTON | 82 | 0 |
| High Wind | WASHINGTON | 58 | 13 |
| Wildfire | WASHINGTON | 29 | 0 |
| Thunderstorm Wind | WEST VIRGINIA | 180 | 1 |
| Hail | WEST VIRGINIA | 103 | 0 |
| Winter Weather | WEST VIRGINIA | 88 | 0 |
| Thunderstorm Wind | WISCONSIN | 416 | 1 |
| Winter Storm | WISCONSIN | 310 | 0 |
| Hail | WISCONSIN | 303 | 1 |
Native strategy
以下查询返回按 EventType 的以“W”开头的每个 TotalInjuries 中的前 2 个 State 值:
StormEvents
| where State startswith 'W'
| partition hint.strategy = native by State
(
summarize TotalInjueries = sum(InjuriesDirect) by EventType
| top 2 by TotalInjueries
)
Output
| EventType | TotalInjueries |
|---|---|
| Tornado | 4 |
| Hail | 1 |
| Thunderstorm Wind | 1 |
| Excessive Heat | 0 |
| High Wind | 13 |
| Lightning | 5 |
| High Wind | 5 |
| Avalanche | 3 |
Shuffle strategy
以下查询返回前 3 个 DamagedProperty 值 foreach EpisodeId 和列 EpisodeId 和 State。
StormEvents
| partition hint.strategy=shuffle by EpisodeId
(
top 3 by DamageProperty
| project EpisodeId, State, DamageProperty
)
| count
Output
| Count |
|---|
| 22345 |
使用显式源的 Legacy 策略
以下查询运行两个子查询:
- 当
x == 1时,查询从StormEvents返回所有具有InjuriesIndirect == 1的行。 - 当
x == 2时,查询从StormEvents返回所有具有InjuriesIndirect == 2的行。
最终结果是这两个子查询的并集。
range x from 1 to 2 step 1
| partition hint.strategy=legacy by x {StormEvents | where x == InjuriesIndirect}
| count
Output
| Count |
|---|
| 113 |
Partition reference
The following example shows how to use the as operator to give a "name" to each data partition and then reuse that name within the subquery. 此方法只与 legacy 策略相关。
T
| partition by Dim
(
as Partition
| extend MetricPct = Metric * 100.0 / toscalar(Partition | summarize sum(Metric))
)