共用方式為


partition operator

Applies to: ✅Microsoft FabricAzure Data ExplorerAzure MonitorMicrosoft Sentinel

數據分割運算子會根據索引鍵數據行中的值,將其輸入數據表的記錄分割成多個子數據表。 運算子會在每個子數據表上執行子查詢,併產生單一輸出數據表,該數據表是所有子查詢結果的聯集。

當您只需要對屬於相同數據分割索引鍵的數據列子集執行子查詢,而不是整個數據集的查詢時,數據分割運算子就很有用。 These subqueries could include aggregate functions, window functions, top N and others.

資料分割運算子支援數種子查詢作業策略:

  • Native - use with an implicit data source with thousands of key partition values.
  • Shuffle - use with an implicit source with millions of key partition values.
  • Legacy - use with an implicit or explicit source for 64 or less key partition values.

Syntax

T|partition [ hint.strategy=Strategy ] [ Hints ] byColumn(TransformationSubQuery)

T|partition [ hint.strategy=legacy ] [ Hints ] byColumn{SubQueryWithSource}

Learn more about syntax conventions.

Parameters

Name 類型 Required Description
T string ✔️ 輸入表格式來源。
Strategy string legacyshufflenative。 此提示會定義數據分割運算符的執行策略。

如果未指定任何策略,則會 legacy 使用策略。 For more information, see Strategies.
Column string ✔️ The name of a column in T whose values determine how to partition the input tabular source.
TransformationSubQuery string ✔️ 表格式轉換表達式。 The source is implicitly the subtables produced by partitioning the records of T. Each subtable is homogenous on the value of Column.

表達式只能提供一個表格式結果,而且不應該有其他類型的語句,例如 let 語句。
SubQueryWithSource string ✔️ 表格式表達式,其中包含自己的表格式來源,例如數據表參考。 This syntax is only supported with the legacy strategy. The subquery can only reference the key column, Column, from T. To reference the column, use the syntax toscalar(Column).

表達式只能提供一個表格式結果,而且不應該有其他類型的語句,例如 let 語句。
Hints string Zero or more space-separated parameters in the form of: HintName=Value that control the behavior of the operator. See the supported hints per strategy type.

Supported hints

Hint name 類型 Strategy Description
hint.shufflekey string shuffle 用來使用策略執行數據分割運算符的數據 shuffle 分割索引鍵。
hint.materialized bool legacy 如果設定為 true,則會具體化 partition 運算符的來源。 預設值是 false
hint.concurrency int legacy 決定要平行執行的分割區數目。 預設值是 16
hint.spread int legacy 決定如何在叢集節點之間散發分割區。 預設值是 1

For example, if there are N partitions and the spread hint is set to P, then the N partitions are processed by P different cluster nodes equally, in parallel/sequentially depending on the concurrency hint.

Returns

運算符會傳回個別子查詢結果的聯集。

Strategies

The partition operator supports several strategies of subquery operation: native, shuffle, and legacy.

Note

native 策略之間的差異shuffle可讓呼叫端指出子查詢的基數和執行策略。 此選項可能會影響子查詢完成所花費的時間,但不會變更最終結果。

Native strategy

當分割區索引鍵的相異值數目在數千個左右時,應該套用此策略。

子查詢必須是未指定表格式來源的表格式轉換。 來源是隱含的,而且會根據子數據表分割指派。 Only certain supported operators can be used in the subquery. 分割區數目沒有限制。

若要使用此策略,請指定 hint.strategy=native

Shuffle strategy

當分割區索引鍵的相異值數目在數百萬時,應該套用此策略。

子查詢必須是未指定表格式來源的表格式轉換。 來源是隱含的,而且會根據子數據表分割指派。 Only certain supported operators can be used in the subquery. 分割區數目沒有限制。

若要使用此策略,請指定 hint.strategy=shuffle。 For more information about shuffle strategy and performance, see shuffle query.

原生和隨機策略的支持運算符

下列運算子清單可用於具有原生或隨機策略的子查詢:

Note

  • 參考子數據表數據分割以外的數據表來源的運算子與 和 native 策略不相容shuffle。 For example, join, union, externaldata, and evaluate (plugins). For such scenarios, resort to the legacy strategy.
  • The fork operator isn't supported for any strategy type, as the subquery must return a single tabular result.
  • The make-series operator is only partially supported within the partition by operator. 只有在指定 和 from 參數時to,才支援它。

Legacy strategy

基於歷史原因,策略 legacy 是預設策略。 However, we recommend favoring the native or shuffle strategies, as the legacy approach is limited to 64 partitions and is less efficient.

在某些情況下,策略可能是必要的, legacy 因為它支援在子查詢中包含表格式來源。 In such cases, the subquery can only reference the key column, Column, from the input tabular source, T. To reference the column, use the syntax toscalar(Column).

如果子查詢是不含表格式來源的表格式轉換,則來源是隱含的,而且是以子數據表分割為基礎。

若要使用此策略,請指定 hint.strategy=legacy 或省略任何其他策略指示。

Note

An error occurs if the partition column, Column, contains more than 64 distinct values.

Examples

本節中的範例示範如何使用 語法來協助您開始使用。

The examples in this article use publicly available tables in the help cluster, such as the StormEvents table in the Samples database.

The examples in this article use publicly available tables, such as the Weather table in the Weather analytics sample gallery. 您可能需要修改範例查詢中的資料表名稱,以符合工作區中的資料表。

尋找最上層值

在某些情況下,使用運算符撰寫查詢比使用 partitiontop-nested 運算子更為高效能且更容易。 下列查詢會針對每個summarize查詢執行子查詢,topState開頭W為 :“WYOMING”、“WASHINGTON”、“WEST 維吉尼亞州”和 “WISCONSIN”。

StormEvents
| where State startswith 'W'
| partition hint.strategy=native by State 
    (
    summarize Events=count(), Injuries=sum(InjuriesDirect) by EventType, State
    | top 3 by Events 
    ) 

Output

EventType State Events Injuries
Hail WYOMING 108 0
High Wind WYOMING 81 5
Winter Storm WYOMING 72 0
Heavy Snow WASHINGTON 82 0
High Wind WASHINGTON 58 13
Wildfire WASHINGTON 29 0
Thunderstorm Wind WEST VIRGINIA 180 1
Hail WEST VIRGINIA 103 0
Winter Weather WEST VIRGINIA 88 0
Thunderstorm Wind WISCONSIN 416 1
Winter Storm WISCONSIN 310 0
Hail WISCONSIN 303 1

Native strategy

下列查詢會針對開頭為 『W』 的每個 EventType ,傳回的前 2 TotalInjuries 個值State

StormEvents
| where State startswith 'W'
| partition hint.strategy = native by State
    (
    summarize TotalInjueries = sum(InjuriesDirect) by EventType
    | top 2 by TotalInjueries
    )

Output

EventType TotalInjueries
Tornado 4
Hail 1
Thunderstorm Wind 1
Excessive Heat 0
High Wind 13
Lightning 5
High Wind 5
Avalanche 3

Shuffle strategy

下列查詢會傳回前 3 DamagedProperty 個 foreach EpisodeId 值,以及 資料行 EpisodeIdState

StormEvents
| partition hint.strategy=shuffle by EpisodeId
    (
    top 3 by DamageProperty
    | project EpisodeId, State, DamageProperty
    )
| count

Output

Count
22345

具有明確來源的舊版策略

下列查詢會執行兩個子查詢:

  • 當 時 x == 1,查詢會從 StormEvents 中傳回具有 InjuriesIndirect == 1的所有數據列。
  • 當 時 x == 2,查詢會從 StormEvents 中傳回具有 InjuriesIndirect == 2的所有數據列。

最終結果是這兩個子查詢的聯集。

range x from 1 to 2 step 1
| partition hint.strategy=legacy by x {StormEvents | where x == InjuriesIndirect}
| count 

Output

Count
113

Partition reference

The following example shows how to use the as operator to give a "name" to each data partition and then reuse that name within the subquery. 此方法僅與 legacy 策略相關。

T
| partition by Dim
(
    as Partition
    | extend MetricPct = Metric * 100.0 / toscalar(Partition | summarize sum(Metric))
)