你当前正在访问 Microsoft Azure Global Edition 技术文档网站。如果需要访问由世纪互联运营的 Microsoft Azure 中国技术文档网站，请访问 https://docs.azure.cn。

reduce operator

Applies to: ✅Microsoft Fabric✅Azure Data Explorer✅Azure Monitor✅Microsoft Sentinel

根据值相似性分组一系列字符串。

对于每个此类组，运算符将返回 pattern、count 和 representative。 pattern 最好地描述组，其中的 * 字符表示通配符。 count 是组中值的数量，representative 是组中的原始值之一。

Syntax

T|reduce [kind=ReduceKind] byExpr [with [threshold=Threshold] [,characters=Characters]]

Learn more about syntax conventions.

Parameters

Name	类型	Required	Description
Expr	`string`	✔️	作为减小量的值。
Threshold	`real`		一个介于 0 到 1 之间的值，用于确定与分组条件匹配以触发缩减操作所需的最小行数。默认值为 0.1。阈值参数确定要组合在一起的值所需的最低相似性级别。使用较小的阈值（接近 0），将更相似的值组合在一起，从而减少但更相似的组。更大的阈值（接近 1）需要更少的相似性，导致更多的组不太相似。建议为大型输入设置较小的阈值。 See Examples.
Characters	`string`		在字词之间进行分隔的字符的列表。默认值为每个非 ascii 数字字符。 For examples, see Examples.
ReduceKind	`string`		唯一有效的值是 `source`。如果指定了 `source`，则运算符会将 `Pattern` 列追加到表中的现有行，而不是通过 `Pattern` 进行聚合。

Returns

一个表，其行数与标题为 pattern、count 和 representative 的组数和列数相同。 pattern 最好地描述组，其中的 * 字符表示通配符，或任意插入字符串的占位符。 count 是组中值的数量，representative 是组中的原始值之一。

例如，reduce by city 的结果可能包括：

Pattern	Count	Representative
San *	5182	San Bernard
Saint *	2846	Saint Lucy
Moscow	3726	Moscow
-上-	2730	一对一
Paris	2716	Paris

Examples

本节中的示例演示如何使用语法帮助你入门。

The examples in this article use publicly available tables in the help cluster, such as the StormEvents table in the Samples database.

The examples in this article use publicly available tables, such as the Weather table in the Weather analytics sample gallery. 可能需要修改示例查询中的表名称以匹配工作区中的表。

下面的示例生成一系列数字，创建一个包含串联字符串和随机整数的新列，然后使用特定的缩减参数按新列对行进行分组。阈值设置为 0.001，这意味着运算符对彼此非常相似的值进行分组。

运行查询

range x from 1 to 1000 step 1
| project MyText = strcat("MachineLearningX", tostring(toint(rand(10))))
| reduce by MyText  with threshold=0.001 , characters = "X"

Output

Pattern	Count	Representative
MachineLearning*	1000	MachineLearningX4

下面的示例生成一系列数字，创建一个包含串联字符串和随机整数的新列，然后使用特定的缩减参数按新列对行进行分组。阈值设置为 0.9，这意味着运算符将值组合在一起的严格程度较低，并允许更多的方差。

运行查询

range x from 1 to 1000 step 1
| project MyText = strcat("MachineLearningX", tostring(toint(rand(10))))
| reduce by MyText  with threshold=0.9 , characters = "X"

Output

结果仅包含 MyText 值至少出现在行的 90% 的组。

Pattern	Count	Representative
MachineLearning*	177	MachineLearningX9
MachineLearning*	102	MachineLearningX0
MachineLearning*	106	MachineLearningX1
MachineLearning*	96	MachineLearningX6
MachineLearning*	110	MachineLearningX4
MachineLearning*	100	MachineLearningX3
MachineLearning*	99	MachineLearningX8
MachineLearning*	104	MachineLearningX7
MachineLearning*	106	MachineLearningX2

If the Characters parameter is unspecified, by default the operator treats all non-alphanumeric characters (including spaces and punctuation) as term separators. The following example shows how the reduce operator behaves when the Characters parameter isn't specified.

运行查询

range x from 1 to 10 step 1 | project str = strcat("foo", "Z", tostring(x)) | reduce by str

Output

Pattern	Count	Representative
others	10

但是，如果指定“Z”是分隔符，则好像 str 中的每个值都是两个术语：foo 和 tostring(x)：

运行查询

range x from 1 to 10 step 1 | project str = strcat("foo", "Z", tostring(x)) | reduce by str with characters="Z"

Output

Pattern	Count	Representative
foo*	10	fooZ1

以下示例演示如何将 reduce 运算符应用于“清理”输入，其中要减少的列中的 GUID 在减少之前将被替换：

从跟踪表中的一些记录开始。然后减少包含随机 GUID 的文本列。当随机 GUID 干扰化简作时，将它们全部替换为字符串“GUID”。现在执行化简作。如果存在嵌入的“-”或“_”字符的其他“准随机”标识符，请将字符视为非分词符。

Trace
| take 10000
| extend Text = replace(@"[0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{12}", "GUID", Text)
| reduce by Text with characters="-_"

autocluster

Note

reduce 运算符的实现很大程度上基于 Risto Vaarandi 所著论文用于从事件日志中挖掘模式的数据聚类分析算法。

反馈

此页面是否有帮助？

Last updated on 2025-07-20

通过

reduce operator

Syntax

Parameters

Returns

Examples

Related content

反馈

其他资源