你当前正在访问 Microsoft Azure Global Edition 技术文档网站。如果需要访问由世纪互联运营的 Microsoft Azure 中国技术文档网站，请访问 https://docs.azure.cn。

Transcriptions - Submit

服务:: Azure AI Services

API 版本:: 2025-10-15

提交新的听录作业。

POST {endpoint}/speechtotext/transcriptions:submit?api-version=2025-10-15

URI 参数

名称	在	必需	类型	说明
endpoint	path	True	string	支持的认知服务终结点（协议和主机名，例如： https://westus.api.cognitive.microsoft.com）。
api-version	query	True	string	请求的 API 版本。

请求头

名称	必需	类型	说明
Ocp-Apim-Subscription-Key	True	string	在此处提供认知服务帐户密钥。

请求正文

名称	必需	类型	说明
displayName	True	string minLength: 1	对象的显示名称。
locale	True	string minLength: 1	包含数据的区域设置。如果使用语言标识，则此区域设置用于转录无法检测到任何语言的语音。
properties	True	TranscriptionProperties	TranscriptionProperties
contentContainerUrl		string (uri)	包含音频文件的 Azure Blob 容器的 URL。允许容器的最大大小为 5GB，最大数量为 10000 个 blob。 blob 的最大大小为 2.5GB。容器 SAS 应包含“r”（读取）和“l”（列表）权限。此属性不会在响应中返回。
contentUrls		string[] (uri)	用于获取要转录的音频文件的内容 URL 列表。最多允许 1000 个 URL。此属性不会在响应中返回。
customProperties		object	此实体的自定义属性。允许的最大键长度为 64 个字符，允许的最大值长度为 256 个字符，允许的条目计数为 10。
dataset		EntityReference	EntityReference
description		string	对象的说明。
model		EntityReference	EntityReference
project		EntityReference	EntityReference

响应

名称	类型	说明
201 Created	Transcription	响应包含有关实体作为有效负载及其位置作为标头的信息。标头 Location: string
Other Status Codes	Error	出现了错误。

名称

类型

说明

201 Created

Transcription

响应包含有关实体作为有效负载及其位置作为标头的信息。

标头

Location: string

Other Status Codes

Error

出现了错误。

安全性

Ocp-Apim-Subscription-Key

在此处提供认知服务帐户密钥。

类型: apiKey
在: header

示例

Create a transcription for URIs

Create a transcription from blob container

Create a transcription with language identification

Create a transcription with multispeaker diarization

Create a transcription for URIs

示例请求

HTTP

POST {endpoint}/speechtotext/transcriptions:submit?api-version=2025-10-15


{
  "displayName": "Transcription using default model for en-US",
  "locale": "en-US",
  "contentUrls": [
    "https://contoso.com/mystoragelocation",
    "https://contoso.com/myotherstoragelocation"
  ],
  "properties": {
    "wordLevelTimestampsEnabled": false,
    "displayFormWordLevelTimestampsEnabled": false,
    "punctuationMode": "DictatedAndAutomatic",
    "profanityFilterMode": "Masked",
    "timeToLiveHours": 48
  }
}

示例响应

状态代码:: 201

{
  "self": "https://westus.api.cognitive.microsoft.com/speechtotext/transcriptions/ba7ea6f5-3065-40b7-b49a-a90f48584683?api-version=2025-10-15",
  "displayName": "Transcription using adapted model en-US",
  "customProperties": {
    "key": "value"
  },
  "locale": "en-US",
  "createdDateTime": "2019-01-07T11:34:12Z",
  "lastActionDateTime": "2019-01-07T11:36:07Z",
  "model": {
    "self": "https://westus.api.cognitive.microsoft.com/speechtotext/models/827712a5-f942-4997-91c3-7c6cde35600b?api-version=2025-10-15"
  },
  "links": {
    "files": "https://westus.api.cognitive.microsoft.com/speechtotext/transcriptions/ba7ea6f5-3065-40b7-b49a-a90f48584683/files?api-version=2025-10-15"
  },
  "properties": {
    "wordLevelTimestampsEnabled": false,
    "displayFormWordLevelTimestampsEnabled": false,
    "channels": [
      0,
      1
    ],
    "punctuationMode": "DictatedAndAutomatic",
    "profanityFilterMode": "Masked",
    "timeToLiveHours": 48,
    "durationMilliseconds": 42000
  },
  "status": "Succeeded"
}

Create a transcription from blob container

示例请求

HTTP

POST {endpoint}/speechtotext/transcriptions:submit?api-version=2025-10-15


{
  "displayName": "Transcription of storage container using default model for en-US",
  "locale": "en-US",
  "properties": {
    "wordLevelTimestampsEnabled": false,
    "displayFormWordLevelTimestampsEnabled": false,
    "punctuationMode": "DictatedAndAutomatic",
    "profanityFilterMode": "Masked",
    "timeToLiveHours": 48
  },
  "contentContainerUrl": "https://customspeech-usw.blob.core.windows.net/artifacts/audiofiles/"
}

示例响应

状态代码:: 201

Location: https://westus.api.cognitive.microsoft.com/speechtotext/transcriptions/ba7ea6f5-3065-40b7-b49a-a90f48584683?api-version=2025-10-15

{
  "self": "https://westus.api.cognitive.microsoft.com/speechtotext/transcriptions/ba7ea6f5-3065-40b7-b49a-a90f48584683?api-version=2025-10-15",
  "displayName": "Transcription using adapted model en-US",
  "customProperties": {
    "key": "value"
  },
  "locale": "en-US",
  "createdDateTime": "2019-01-07T11:34:12Z",
  "lastActionDateTime": "2019-01-07T11:36:07Z",
  "model": {
    "self": "https://westus.api.cognitive.microsoft.com/speechtotext/models/827712a5-f942-4997-91c3-7c6cde35600b?api-version=2025-10-15"
  },
  "links": {
    "files": "https://westus.api.cognitive.microsoft.com/speechtotext/transcriptions/ba7ea6f5-3065-40b7-b49a-a90f48584683/files?api-version=2025-10-15"
  },
  "properties": {
    "wordLevelTimestampsEnabled": false,
    "displayFormWordLevelTimestampsEnabled": false,
    "channels": [
      0,
      1
    ],
    "punctuationMode": "DictatedAndAutomatic",
    "profanityFilterMode": "Masked",
    "timeToLiveHours": 48,
    "durationMilliseconds": 42000
  },
  "status": "Succeeded"
}

Create a transcription with language identification

示例请求

HTTP

POST {endpoint}/speechtotext/transcriptions:submit?api-version=2025-10-15


{
  "displayName": "Transcription using language identification with three candidate languages, 'fr-FR' as fallback locale and a custom model for transcribing utterances that were classified as 'nl-NL' locale.",
  "locale": "fr-FR",
  "contentUrls": [
    "https://contoso.com/mystoragelocation"
  ],
  "properties": {
    "wordLevelTimestampsEnabled": false,
    "displayFormWordLevelTimestampsEnabled": false,
    "channels": [
      0,
      1
    ],
    "punctuationMode": "DictatedAndAutomatic",
    "profanityFilterMode": "Masked",
    "timeToLiveHours": 48,
    "languageIdentification": {
      "candidateLocales": [
        "fr-FR",
        "nl-NL",
        "el-GR"
      ],
      "speechModelMapping": {
        "nl-NL": {
          "self": "https://westus.api.cognitive.microsoft.com/speechtotext/models/827712a5-f942-4997-91c3-7c6cde35600b?api-version=2025-10-15"
        }
      },
      "mode": "Single"
    }
  }
}

示例响应

状态代码:: 201

{
  "self": "https://westus.api.cognitive.microsoft.com/speechtotext/transcriptions/ba7ea6f5-3065-40b7-b49a-a90f48584683?api-version=2025-10-15",
  "displayName": "Transcription using language identification with three candidate languages, 'fr-FR' as fallback locale and a custom model for transcribing utterances that were classified as 'nl-NL' locale.",
  "customProperties": {
    "key": "value"
  },
  "locale": "fr-FR",
  "createdDateTime": "2019-01-07T11:34:12Z",
  "lastActionDateTime": "2019-01-07T11:36:07Z",
  "model": {
    "self": "https://westus.api.cognitive.microsoft.com/speechtotext/models/827712a5-f942-4997-91c3-7c6cde35600b?api-version=2025-10-15"
  },
  "links": {
    "files": "https://westus.api.cognitive.microsoft.com/speechtotext/transcriptions/ba7ea6f5-3065-40b7-b49a-a90f48584683/files?api-version=2025-10-15"
  },
  "properties": {
    "wordLevelTimestampsEnabled": false,
    "displayFormWordLevelTimestampsEnabled": false,
    "channels": [
      0,
      1
    ],
    "punctuationMode": "DictatedAndAutomatic",
    "profanityFilterMode": "Masked",
    "timeToLiveHours": 48,
    "languageIdentification": {
      "candidateLocales": [
        "fr-FR",
        "nl-NL",
        "el-GR"
      ],
      "speechModelMapping": {
        "nl-NL": {
          "self": "https://westus.api.cognitive.microsoft.com/speechtotext/models/827712a5-f942-4997-91c3-7c6cde35600b?api-version=2025-10-15"
        }
      },
      "mode": "Single"
    },
    "durationMilliseconds": 42000
  },
  "status": "Succeeded"
}

Create a transcription with multispeaker diarization

示例请求

HTTP

POST {endpoint}/speechtotext/transcriptions:submit?api-version=2025-10-15


{
  "displayName": "Transcription using diarization for audio that is known to contain speech from up to 5 speakers",
  "locale": "en-US",
  "contentUrls": [
    "https://contoso.com/mystoragelocation"
  ],
  "properties": {
    "wordLevelTimestampsEnabled": false,
    "displayFormWordLevelTimestampsEnabled": false,
    "channels": [
      0,
      1
    ],
    "punctuationMode": "DictatedAndAutomatic",
    "profanityFilterMode": "Masked",
    "timeToLiveHours": 48,
    "diarization": {
      "enabled": true,
      "maxSpeakers": 5
    }
  }
}

示例响应

状态代码:: 201

{
  "self": "https://westus.api.cognitive.microsoft.com/speechtotext/transcriptions/ba7ea6f5-3065-40b7-b49a-a90f48584683?api-version=2025-10-15",
  "displayName": "Transcription using diarization for audio that is known to contain speech from up to 5 speakers",
  "customProperties": {
    "key": "value"
  },
  "locale": "en-US",
  "createdDateTime": "2019-01-07T11:34:12Z",
  "lastActionDateTime": "2019-01-07T11:36:07Z",
  "model": {
    "self": "https://westus.api.cognitive.microsoft.com/speechtotext/models/827712a5-f942-4997-91c3-7c6cde35600b?api-version=2025-10-15"
  },
  "links": {
    "files": "https://westus.api.cognitive.microsoft.com/speechtotext/transcriptions/ba7ea6f5-3065-40b7-b49a-a90f48584683/files?api-version=2025-10-15"
  },
  "properties": {
    "wordLevelTimestampsEnabled": false,
    "displayFormWordLevelTimestampsEnabled": false,
    "channels": [
      0,
      1
    ],
    "punctuationMode": "DictatedAndAutomatic",
    "profanityFilterMode": "Masked",
    "timeToLiveHours": 48,
    "diarization": {
      "enabled": true,
      "maxSpeakers": 5
    },
    "durationMilliseconds": 42000
  },
  "status": "Succeeded"
}

定义

名称	说明
DetailedErrorCode	详细错误代码
DiarizationProperties	DiarizationProperties
EntityError	实体错误
EntityReference	EntityReference
Error	错误
ErrorCode	ErrorCode
InnerError	InnerError
LanguageIdentificationMode	语言识别模式
LanguageIdentificationProperties	LanguageIdentificationProperties
ProfanityFilterMode	ProfanityFilterMode
PunctuationMode	标点符号Mode
Status	状态
Transcription	转录
TranscriptionLinks	转录链接
TranscriptionProperties	TranscriptionProperties

DetailedErrorCode

枚举

详细错误代码

值	说明
InvalidParameterValue	参数值无效。
InvalidRequestBodyFormat	请求正文格式无效。
EmptyRequest	空请求。
MissingInputRecords	缺少输入记录。
InvalidDocument	无效的文档。
ModelVersionIncorrect	型号版本不正确。
InvalidDocumentBatch	无效的单据批处理。
UnsupportedLanguageCode	不支持的语言代码。
DataImportFailed	数据导入失败。
InUseViolation	使用中违规。
InvalidLocale	区域设置无效。
InvalidBaseModel	基本模型无效。
InvalidAdaptationMapping	无效的适应映射。
InvalidDataset	数据集无效。
InvalidTest	测试无效。
FailedDataset	数据集失败。
InvalidModel	模型无效。
InvalidTranscription	转录无效。
InvalidPayload	有效负载无效。
InvalidParameter	参数无效。
EndpointWithoutLogging	没有日志记录的端点。
InvalidPermissions	权限无效。
InvalidPrerequisite	先决条件无效。
InvalidProductId	产品 ID 无效。
InvalidSubscription	订阅无效。
InvalidProject	项目无效。
InvalidProjectKind	项目类型无效。
InvalidRecordingsUri	无效的记录 uri。
OnlyOneOfUrlsOrContainerOrDataset	只有 url 或容器或数据集之一。
ExceededNumberOfRecordingsUris	超过记录数量。
InvalidChannels	无效的通道。
ModelMismatch	模型不匹配。
ProjectGenderMismatch	项目性别不匹配。
ModelDeprecated	模型已弃用。
ModelExists	模型存在。
ModelNotDeployable	模型不可部署。
EndpointNotUpdatable	终结点不可更新。
SingleDefaultEndpoint	单个默认终结点。
EndpointCannotBeDefault	终结点不能是默认的。
InvalidModelUri	模型 uri 无效。
SubscriptionNotFound	找不到订阅。
QuotaViolation	配额违规。
UnsupportedDelta	不支持的增量。
UnsupportedFilter	不支持的过滤器。
UnsupportedPagination	不支持的分页。
UnsupportedDynamicConfiguration	不支持的动态配置。
UnsupportedOrderBy	不支持的订单。
NoUtf8WithBom	没有带有 bom 的 utf8。
ModelDeploymentNotCompleteState	模型部署未完成状态。
SkuLimitsExist	存在 SKU 限制。
DeployingFailedModel	部署失败的模型。
UnsupportedTimeRange	不支持的时间范围。
InvalidLogDate	日志日期无效。
InvalidLogId	日志 ID 无效。
InvalidLogStartTime	日志开始时间无效。
InvalidLogEndTime	日志结束时间无效。
InvalidTopForLogs	日志的顶部无效。
InvalidSkipTokenForLogs	日志的跳过令牌无效。
DeleteNotAllowed	不允许删除。
Forbidden	已禁止。
DeployNotAllowed	不允许部署。
UnexpectedError	意外错误。
InvalidCollection	集合无效。
InvalidCallbackUri	回调 uri 无效。
InvalidSasValidityDuration	SAS 有效期无效。
InaccessibleCustomerStorage	无法访问客户存储。
UnsupportedClassBasedAdaptation	不支持的基于类的适应。
InvalidWebHookEventKind	无效的 Web 挂钩事件类型。
InvalidTimeToLive	无效的生存时间。
InvalidSourceAzureResourceId	源 Azure 资源 ID 无效。
ModelCopyAuthorizationExpired	已过期的 ModelCopyAuthorization。
EndpointLoggingNotSupported	不支持终结点日志记录。
NoLanguageIdentified	语言识别不识别任何语言。
MultipleLanguagesIdentified	语言识别识别多种语言。无法确定主要语言。
InvalidAudioFormat	不支持输入音频的格式。
BadChannelConfiguration	数据、配置或应用程序要求中的音频通道之间存在不匹配。
InvalidChannelSpecification	不支持在转录请求中选择通道（例如，既未选择 0 也未选择 1）。
AudioLengthLimitExceeded	音频文件超过允许的最大持续时间。
EmptyAudioFile	音频文件为空。

DiarizationProperties

Object

DiarizationProperties

名称	类型	说明
enabled	boolean	指示是否启用说话人分类的值。
maxSpeakers	integer (int32) minimum: 2 maximum: 35	用于分类的最大说话者数的提示。必须大于 1 且小于 36。

EntityError

Object

实体错误

名称	类型	说明
code	string	此错误的代码。
message	string	此错误的消息。

EntityReference

Object

EntityReference

名称	类型	说明
self	string (uri)	引用实体的位置。

Error

Object

错误

名称	类型	说明
code	ErrorCode	ErrorCode 高级错误代码。
details	Error[]	有关错误和/或预期策略的其他支持性详细信息。
innerError	InnerError	InnerError 新的内部错误格式符合认知服务 API 准则，该指南可在 https://microsoft.sharepoint.com/%3Aw%3A/t/CognitiveServicesPMO/EUoytcrjuJdKpeOKIK_QRC8BPtUYQpKBi8JsWyeDMRsWlQ?e=CPq8ow. 这包含必需属性 ErrorCode、消息和可选属性 target、details（键值对）、内部错误（可以嵌套）。
message	string	高级错误消息。
target	string	错误的源。例如，如果文档无效，则为“文档”或“文档 ID”。

ErrorCode

枚举

ErrorCode

值	说明
InvalidRequest	表示无效的请求错误代码。
InvalidArgument	表示无效的参数错误代码。
InternalServerError	表示内部服务器错误错误代码。
ServiceUnavailable	表示服务不可用错误代码。
NotFound	表示未找到错误代码。
PipelineError	表示管道错误错误代码。
Conflict	表示冲突错误代码。
InternalCommunicationFailed	表示内部通信失败错误代码。
Forbidden	表示禁止的错误代码。
NotAllowed	表示不允许的错误代码。
Unauthorized	表示未经授权的错误代码。
UnsupportedMediaType	表示不受支持的媒体类型错误代码。
TooManyRequests	表示请求过多错误代码。
UnprocessableEntity	表示无法处理的实体错误代码。

InnerError

Object

InnerError

名称	类型	说明
code	DetailedErrorCode	详细错误代码详细的错误代码枚举。
details	object	有关错误和/或预期策略的其他支持性详细信息。
innerError	InnerError	InnerError 新的内部错误格式符合认知服务 API 准则，该指南可在 https://microsoft.sharepoint.com/%3Aw%3A/t/CognitiveServicesPMO/EUoytcrjuJdKpeOKIK_QRC8BPtUYQpKBi8JsWyeDMRsWlQ?e=CPq8ow. 这包含必需属性 ErrorCode、消息和可选属性 target、details（键值对）、内部错误（可以嵌套）。
message	string	高级错误消息。
target	string	错误的源。例如，如果文档无效，则为“文档”或“文档 ID”。

LanguageIdentificationMode

枚举

语言识别模式

值	说明
Continuous	连续语言识别（默认）。
Single	单一语言识别。如果无法识别任何语言，则错误代码 NoLanguageIdentified 将返回给用户。如果多种语言之间存在歧义，则错误代码 MultipleLanguagesIdentified 将返回给用户。

LanguageIdentificationProperties

Object

LanguageIdentificationProperties

名称	类型	默认值	说明
candidateLocales	string[]		语言标识的候选区域设置（例如 [“en-US”， “de-DE”， “es-ES”]）。连续模式支持最少 2 个和最多 10 个候选区域设置，包括转录的主要区域设置。对于单一语言标识，候选区域设置的最大数量是无限制的。
mode	LanguageIdentificationMode	Continuous	语言识别模式用于语言识别的模式。
speechModelMapping	<string, EntityReference>		区域设置到语音模型实体的可选映射。如果未为区域设置提供模型，则使用默认基本模型。键必须是候选区域设置中包含的区域设置，值是相应区域设置模型的实体。

ProfanityFilterMode

枚举

ProfanityFilterMode

值	说明
None	禁用亵渎性语言筛选。
Removed	删除亵渎性语言。
Tags	添加“亵渎”XML 标记</亵渎>
Masked	用 * 掩盖亵渎，除了第一个字母，例如 f***

PunctuationMode

枚举

标点符号Mode

值	说明
None	没有标点符号。
Dictated	仅听写标点符号，即显式标点符号。
Automatic	自动标点符号。
DictatedAndAutomatic	听写标点符号或自动标点符号。

Status

枚举

状态

值	说明
NotStarted	长期运行的作尚未开始。
Running	长时间运行的作当前正在处理中。
Succeeded	长时间运行的作已成功完成。
Failed	长时间运行的作失败了。

Transcription

Object

转录

名称	类型	说明
contentContainerUrl	string (uri)	包含音频文件的 Azure Blob 容器的 URL。允许容器的最大大小为 5GB，最大数量为 10000 个 blob。 blob 的最大大小为 2.5GB。容器 SAS 应包含“r”（读取）和“l”（列表）权限。此属性不会在响应中返回。
contentUrls	string[] (uri)	用于获取要转录的音频文件的内容 URL 列表。最多允许 1000 个 URL。此属性不会在响应中返回。
createdDateTime	string (date-time)	创建对象时的时间戳。时间戳编码为 ISO 8601 日期和时间格式（“YYYY-MM-DDThh：mm：ssZ”，请参阅 https://en.wikipedia.org/wiki/ISO_8601#Combined_date_and_time_representations）。
customProperties	object	此实体的自定义属性。允许的最大键长度为 64 个字符，允许的最大值长度为 256 个字符，允许的条目计数为 10。
dataset	EntityReference	EntityReference
description	string	对象的说明。
displayName	string minLength: 1	对象的显示名称。
lastActionDateTime	string (date-time)	输入当前状态时的时间戳。时间戳编码为 ISO 8601 日期和时间格式（“YYYY-MM-DDThh：mm：ssZ”，请参阅 https://en.wikipedia.org/wiki/ISO_8601#Combined_date_and_time_representations）。
links	TranscriptionLinks	转录链接
locale	string minLength: 1	包含数据的区域设置。如果使用语言标识，则此区域设置用于转录无法检测到任何语言的语音。
model	EntityReference	EntityReference
project	EntityReference	EntityReference
properties	TranscriptionProperties	TranscriptionProperties
self	string (uri)	此实体的位置。
status	Status	状态描述 API 的当前状态。

TranscriptionLinks

Object

转录链接

名称	类型	说明
files	string (uri)	获取此实体的所有文件的位置。有关更多详细信息，请参阅作“Transcriptions_ListFiles”。

TranscriptionProperties

Object

TranscriptionProperties

名称	类型	默认值	说明
channels	integer[] (int32)		请求的通道号的集合。在默认情况下，将考虑通道 0 和 1。
destinationContainerUrl	string (uri)		请求的目标容器。注解当目标容器与结合 `timeToLive`使用时，转录的元数据将正常删除，但存储在目标容器中的数据（包括转录结果）将保持不变，因为此容器不需要删除权限。若要支持自动清理，请在容器上配置 blob 生存期，或使用“自带存储（BYOS）”而不是 `destinationContainerUrl`，其中可以清理 blob。
diarization	DiarizationProperties		DiarizationProperties
displayFormWordLevelTimestampsEnabled	boolean		一个值，该值指示是否请求显示窗体的字级时间戳。默认值为 `false`。
durationMilliseconds	integer (int64)	0	转录的持续时间（以毫秒为单位）。不支持大于 2^53-1 的持续时间，以确保与 JavaScript 整数的兼容性。
error	EntityError		实体错误
languageIdentification	LanguageIdentificationProperties		LanguageIdentificationProperties
profanityFilterMode	ProfanityFilterMode		ProfanityFilterMode 不雅内容筛选模式。
punctuationMode	PunctuationMode		标点符号Mode 用于标点的模式。
timeToLiveHours	integer (int32)		听录将在系统完成后保留多长时间。一旦转录达到完成后的生存时间（成功或失败），它将被自动删除。注意：使用 BYOS（自带存储）时，客户拥有的存储帐户上的结果文件也将被删除。使用 destinationContainerUrl 为结果文件指定一个单独的容器，该容器在 timeToLive 到期时不会被删除，或者通过 API 检索结果文件并根据需要存储它们。最短的受支持的持续时间为 6 小时，最长的受支持的持续时间为 31 天。直接使用数据时，建议的默认值为 2 天（48 小时）。
wordLevelTimestampsEnabled	boolean		一个值，该值指示是否请求字级时间戳。默认值为 `false`。

通过

Transcriptions - Submit

URI 参数

请求头

请求正文

响应

安全性

Ocp-Apim-Subscription-Key

示例

Create a transcription for URIs

示例请求

示例响应

Create a transcription from blob container

示例请求

示例响应

Create a transcription with language identification

示例请求

示例响应

Create a transcription with multispeaker diarization

示例请求

示例响应

定义

DetailedErrorCode

DiarizationProperties

EntityError

EntityReference

Error

ErrorCode

InnerError

LanguageIdentificationMode

LanguageIdentificationProperties

ProfanityFilterMode

PunctuationMode

Status

Transcription

TranscriptionLinks

TranscriptionProperties

注解