Applies to: ✅Microsoft Fabric✅Azure Data Explorer✅Azure Monitor✅Microsoft Sentinel
數據分割運算子會根據索引鍵數據行中的值,將其輸入數據表的記錄分割成多個子數據表。 運算子會在每個子數據表上執行子查詢,併產生單一輸出數據表,該數據表是所有子查詢結果的聯集。
當您只需要對屬於相同數據分割索引鍵的數據列子集執行子查詢,而不是整個數據集的查詢時,數據分割運算子就很有用。 These subqueries could include aggregate functions, window functions, top N and others.
資料分割運算子支援數種子查詢作業策略:
- Native - use with an implicit data source with thousands of key partition values.
- Shuffle - use with an implicit source with millions of key partition values.
- Legacy - use with an implicit or explicit source for 64 or less key partition values.
Syntax
T|partition [ hint.strategy=Strategy ] [ Hints ] byColumn(TransformationSubQuery)
T|partition [ hint.strategy=legacy ] [ Hints ] byColumn{SubQueryWithSource}
Learn more about syntax conventions.
Parameters
| Name | 類型 | Required | Description |
|---|---|---|---|
| T | string |
✔️ | 輸入表格式來源。 |
| Strategy | string |
值 legacy、 shuffle或 native。 此提示會定義數據分割運算符的執行策略。
如果未指定任何策略,則會 legacy 使用策略。 For more information, see Strategies. |
|
| Column | string |
✔️ | The name of a column in T whose values determine how to partition the input tabular source. |
| TransformationSubQuery | string |
✔️ | 表格式轉換表達式。 The source is implicitly the subtables produced by partitioning the records of T. Each subtable is homogenous on the value of Column.
表達式只能提供一個表格式結果,而且不應該有其他類型的語句,例如 let 語句。 |
| SubQueryWithSource | string |
✔️ | 表格式表達式,其中包含自己的表格式來源,例如數據表參考。 This syntax is only supported with the legacy strategy. The subquery can only reference the key column, Column, from T. To reference the column, use the syntax toscalar(Column).
表達式只能提供一個表格式結果,而且不應該有其他類型的語句,例如 let 語句。 |
| Hints | string |
Zero or more space-separated parameters in the form of: HintName=Value that control the behavior of the operator. See the supported hints per strategy type. |
Supported hints
| Hint name | 類型 | Strategy | Description |
|---|---|---|---|
hint.shufflekey |
string |
shuffle | 用來使用策略執行數據分割運算符的數據 shuffle 分割索引鍵。 |
hint.materialized |
bool |
legacy | 如果設定為 true,則會具體化 partition 運算符的來源。 預設值是 false。 |
hint.concurrency |
int |
legacy | 決定要平行執行的分割區數目。 預設值是 16。 |
hint.spread |
int |
legacy | 決定如何在叢集節點之間散發分割區。 預設值是 1。
For example, if there are N partitions and the spread hint is set to P, then the N partitions are processed by P different cluster nodes equally, in parallel/sequentially depending on the concurrency hint. |
Returns
運算符會傳回個別子查詢結果的聯集。
Strategies
The partition operator supports several strategies of subquery operation: native, shuffle, and legacy.
Note
和 native 策略之間的差異shuffle可讓呼叫端指出子查詢的基數和執行策略。 此選項可能會影響子查詢完成所花費的時間,但不會變更最終結果。
Native strategy
當分割區索引鍵的相異值數目在數千個左右時,應該套用此策略。
子查詢必須是未指定表格式來源的表格式轉換。 來源是隱含的,而且會根據子數據表分割指派。 Only certain supported operators can be used in the subquery. 分割區數目沒有限制。
若要使用此策略,請指定 hint.strategy=native。
Shuffle strategy
當分割區索引鍵的相異值數目在數百萬時,應該套用此策略。
子查詢必須是未指定表格式來源的表格式轉換。 來源是隱含的,而且會根據子數據表分割指派。 Only certain supported operators can be used in the subquery. 分割區數目沒有限制。
若要使用此策略,請指定 hint.strategy=shuffle。 For more information about shuffle strategy and performance, see shuffle query.
原生和隨機策略的支持運算符
下列運算子清單可用於具有原生或隨機策略的子查詢:
- count
- distinct
- extend
- make-series (partially supported, see note)
- mv-apply
- mv-expand
- parse
- parse-where
- project
- project-away
- project-keep
- project-rename
- project-reorder
- reduce
- sample
- sample-distinct
- scan
- search
- serialize
- sort
- summarize
- take
- top
- top-hitters
- top-nested
- where
Note
- 參考子數據表數據分割以外的數據表來源的運算子與 和
native策略不相容shuffle。 For example, join, union, externaldata, and evaluate (plugins). For such scenarios, resort to the legacy strategy. - The fork operator isn't supported for any strategy type, as the subquery must return a single tabular result.
- The make-series operator is only partially supported within the
partition byoperator. 只有在指定 和from參數時to,才支援它。
Legacy strategy
基於歷史原因,策略 legacy 是預設策略。 However, we recommend favoring the native or shuffle strategies, as the legacy approach is limited to 64 partitions and is less efficient.
在某些情況下,策略可能是必要的, legacy 因為它支援在子查詢中包含表格式來源。 In such cases, the subquery can only reference the key column, Column, from the input tabular source, T. To reference the column, use the syntax toscalar(Column).
如果子查詢是不含表格式來源的表格式轉換,則來源是隱含的,而且是以子數據表分割為基礎。
若要使用此策略,請指定 hint.strategy=legacy 或省略任何其他策略指示。
Note
An error occurs if the partition column, Column, contains more than 64 distinct values.
Examples
本節中的範例示範如何使用 語法來協助您開始使用。
The examples in this article use publicly available tables in the help cluster, such as the
StormEventstable in the Samples database.
The examples in this article use publicly available tables, such as the
Weathertable in the Weather analytics sample gallery. 您可能需要修改範例查詢中的資料表名稱,以符合工作區中的資料表。
尋找最上層值
在某些情況下,使用運算符撰寫查詢比使用 partitiontop-nested 運算子更為高效能且更容易。 下列查詢會針對每個summarize查詢執行子查詢,top其State開頭W為 :“WYOMING”、“WASHINGTON”、“WEST 維吉尼亞州”和 “WISCONSIN”。
StormEvents
| where State startswith 'W'
| partition hint.strategy=native by State
(
summarize Events=count(), Injuries=sum(InjuriesDirect) by EventType, State
| top 3 by Events
)
Output
| EventType | State | Events | Injuries |
|---|---|---|---|
| Hail | WYOMING | 108 | 0 |
| High Wind | WYOMING | 81 | 5 |
| Winter Storm | WYOMING | 72 | 0 |
| Heavy Snow | WASHINGTON | 82 | 0 |
| High Wind | WASHINGTON | 58 | 13 |
| Wildfire | WASHINGTON | 29 | 0 |
| Thunderstorm Wind | WEST VIRGINIA | 180 | 1 |
| Hail | WEST VIRGINIA | 103 | 0 |
| Winter Weather | WEST VIRGINIA | 88 | 0 |
| Thunderstorm Wind | WISCONSIN | 416 | 1 |
| Winter Storm | WISCONSIN | 310 | 0 |
| Hail | WISCONSIN | 303 | 1 |
Native strategy
下列查詢會針對開頭為 『W』 的每個 EventType ,傳回的前 2 TotalInjuries 個值State:
StormEvents
| where State startswith 'W'
| partition hint.strategy = native by State
(
summarize TotalInjueries = sum(InjuriesDirect) by EventType
| top 2 by TotalInjueries
)
Output
| EventType | TotalInjueries |
|---|---|
| Tornado | 4 |
| Hail | 1 |
| Thunderstorm Wind | 1 |
| Excessive Heat | 0 |
| High Wind | 13 |
| Lightning | 5 |
| High Wind | 5 |
| Avalanche | 3 |
Shuffle strategy
下列查詢會傳回前 3 DamagedProperty 個 foreach EpisodeId 值,以及 資料行 EpisodeId 和 State。
StormEvents
| partition hint.strategy=shuffle by EpisodeId
(
top 3 by DamageProperty
| project EpisodeId, State, DamageProperty
)
| count
Output
| Count |
|---|
| 22345 |
具有明確來源的舊版策略
下列查詢會執行兩個子查詢:
- 當 時
x == 1,查詢會從StormEvents中傳回具有InjuriesIndirect == 1的所有數據列。 - 當 時
x == 2,查詢會從StormEvents中傳回具有InjuriesIndirect == 2的所有數據列。
最終結果是這兩個子查詢的聯集。
range x from 1 to 2 step 1
| partition hint.strategy=legacy by x {StormEvents | where x == InjuriesIndirect}
| count
Output
| Count |
|---|
| 113 |
Partition reference
The following example shows how to use the as operator to give a "name" to each data partition and then reuse that name within the subquery. 此方法僅與 legacy 策略相關。
T
| partition by Dim
(
as Partition
| extend MetricPct = Metric * 100.0 / toscalar(Partition | summarize sum(Metric))
)