Content Analyzers - Create Or Replace
Create a new analyzer asynchronously.
PUT {endpoint}/contentunderstanding/analyzers/{analyzerId}?api-version=2025-11-01
PUT {endpoint}/contentunderstanding/analyzers/{analyzerId}?api-version=2025-11-01&allowReplace={allowReplace}
URI Parameters
| Name | In | Required | Type | Description |
|---|---|---|---|---|
|
analyzer
|
path | True |
string minLength: 1maxLength: 64 pattern: ^[a-zA-Z0-9._-]{1,64}$ |
The unique identifier of the analyzer. |
|
endpoint
|
path | True |
string (uri) |
Content Understanding service endpoint. |
|
api-version
|
query | True |
string minLength: 1 |
The API version to use for this operation. |
|
allow
|
query |
boolean |
Allow the operation to replace an existing resource. |
Request Header
| Name | Required | Type | Description |
|---|---|---|---|
| x-ms-client-request-id |
string (uuid) |
An opaque, globally-unique, client-generated string identifier for the request. |
Request Body
| Name | Type | Description |
|---|---|---|
| baseAnalyzerId |
string minLength: 1maxLength: 64 pattern: ^[a-zA-Z0-9._-]{1,64}$ |
The analyzer to incrementally train from. |
| config |
Analyzer configuration settings. |
|
| description |
string |
A description of the analyzer. |
| dynamicFieldSchema |
boolean |
Indicates whether the result may contain additional fields outside of the defined schema. |
| fieldSchema |
The schema of fields to extracted. |
|
| knowledgeSources | KnowledgeSource[]: |
Additional knowledge sources used to enhance the analyzer. |
| models |
object |
Mapping of model roles to specific model names. Ex. { "completion": "gpt-4.1", "embedding": "text-embedding-3-large" }. |
| processingLocation |
The location where the data may be processed. Defaults to global. |
|
| tags |
object |
Tags associated with the analyzer. |
Responses
| Name | Type | Description |
|---|---|---|
| 200 OK |
The request has succeeded. Headers
|
|
| 201 Created |
The request has succeeded and a new resource has been created as a result. Headers
|
|
| Other Status Codes |
An unexpected error response. Headers x-ms-error-code: string |
Security
Ocp-Apim-Subscription-Key
Key-based authentication using the access key of the Azure resource.
Type:
apiKey
In:
header
EntraIdToken
Microsoft Entra ID OAuth2 authentication using an access token.
Type:
oauth2
Flow:
accessCode
Authorization URL:
https://login.microsoftonline.com/common/oauth2/authorize
Token URL:
https://login.microsoftonline.com/common/oauth2/token
Scopes
| Name | Description |
|---|---|
| https://cognitiveservices.azure.com/.default |
Examples
Create or Replace Analyzer
Sample request
PUT {endpoint}/contentunderstanding/analyzers/myAnalyzer?api-version=2025-11-01
{
"description": "My analyzer",
"tags": {
"createdBy": "John"
},
"baseAnalyzerId": "prebuilt-document",
"config": {
"enableFormula": false,
"returnDetails": true
},
"fieldSchema": {
"name": "MyForm",
"description": "My form",
"fields": {
"Company": {
"type": "string",
"description": "Name of company."
}
},
"definitions": {}
},
"knowledgeSources": [
{
"kind": "labeledData",
"containerUrl": "https://myStorageAccount.blob.core.windows.net/myContainer",
"prefix": "trainingData",
"fileListPath": "trainingData/fileList.jsonl"
}
]
}
Sample response
Operation-Location: https://myendpoint.cognitiveservices.azure.com/contentunderstanding/analyzers/myAnalyzer/operations/3b31320d-8bab-4f88-b19c-2322a7f11034?api-version=2025-11-01
{
"analyzerId": "myAnalyzer",
"description": "My analyzer",
"tags": {
"createdBy": "John"
},
"status": "creating",
"createdAt": "2025-05-01T18:46:36.051Z",
"lastModifiedAt": "2025-05-01T18:46:36.051Z",
"baseAnalyzerId": "prebuilt-document",
"config": {
"locales": null,
"enableOcr": true,
"enableLayout": true,
"enableFormula": false,
"returnDetails": true
},
"fieldSchema": {
"name": "MyForm",
"description": "My form",
"fields": {
"Company": {
"type": "string",
"description": "Name of company."
}
},
"definitions": {}
},
"knowledgeSources": [
{
"kind": "labeledData",
"containerUrl": "https://myStorageAccount.blob.core.windows.net/myContainer",
"prefix": "trainingData",
"fileListPath": "trainingData/fileList.jsonl"
}
]
}
Operation-Location: https://myendpoint.cognitiveservices.azure.com/contentunderstanding/analyzers/myAnalyzer/operations/3b31320d-8bab-4f88-b19c-2322a7f11034?api-version=2025-11-01
{
"analyzerId": "myAnalyzer",
"description": "My analyzer",
"tags": {
"createdBy": "John"
},
"status": "creating",
"createdAt": "2025-05-01T18:46:36.051Z",
"lastModifiedAt": "2025-05-01T18:46:36.051Z",
"baseAnalyzerId": "prebuilt-document",
"config": {
"locales": null,
"enableOcr": true,
"enableLayout": true,
"enableFormula": false,
"returnDetails": true
},
"fieldSchema": {
"name": "MyForm",
"description": "My form",
"fields": {
"Company": {
"type": "string",
"description": "Name of company."
}
},
"definitions": {}
},
"knowledgeSources": [
{
"kind": "labeledData",
"containerUrl": "https://myStorageAccount.blob.core.windows.net/myContainer",
"prefix": "trainingData",
"fileListPath": "trainingData/fileList.jsonl"
}
]
}
Definitions
| Name | Description |
|---|---|
|
Annotation |
Representation format of annotations in analyze result markdown. |
|
Azure. |
The error object. |
|
Azure. |
A response containing error details. |
|
Azure. |
An object containing more specific information about the error. As per Azure REST API guidelines - https://aka.ms/AzureRestApiGuidelines#handling-errors. |
|
Chart |
Representation format of charts in analyze result markdown. |
|
Content |
Analyzer that extracts content and fields from multimodal documents. |
|
Content |
Configuration settings for an analyzer. |
|
Content |
Status of a resource. |
|
Content |
Content category definition. |
|
Content |
Definition of the field using a JSON Schema like syntax. |
|
Content |
Schema of fields to be extracted from documents. |
|
Content |
Semantic data type of the field value. |
|
Generation |
Generation method. |
|
Knowledge |
Knowledge source kind. |
|
Labeled |
Labeled data knowledge source. |
|
Processing |
The location where the data may be processed. Defaults to global. |
|
Supported |
Chat completion and embedding models supported by the analyzer. |
|
Table |
Representation format of tables in analyze result markdown. |
AnnotationFormat
Representation format of annotations in analyze result markdown.
| Value | Description |
|---|---|
| none |
Do not represent annotations. |
| markdown |
Represent basic annotation information using markdown formatting. |
Azure.Core.Foundations.Error
The error object.
| Name | Type | Description |
|---|---|---|
| code |
string |
One of a server-defined set of error codes. |
| details |
An array of details about specific errors that led to this reported error. |
|
| innererror |
An object containing more specific information than the current object about the error. |
|
| message |
string |
A human-readable representation of the error. |
| target |
string |
The target of the error. |
Azure.Core.Foundations.ErrorResponse
A response containing error details.
| Name | Type | Description |
|---|---|---|
| error |
The error object. |
Azure.Core.Foundations.InnerError
An object containing more specific information about the error. As per Azure REST API guidelines - https://aka.ms/AzureRestApiGuidelines#handling-errors.
| Name | Type | Description |
|---|---|---|
| code |
string |
One of a server-defined set of error codes. |
| innererror |
Inner error. |
ChartFormat
Representation format of charts in analyze result markdown.
| Value | Description |
|---|---|
| chartJs |
Represent charts as Chart.js code blocks. |
| markdown |
Represent charts as markdown tables. |
ContentAnalyzer
Analyzer that extracts content and fields from multimodal documents.
| Name | Type | Default value | Description |
|---|---|---|---|
| analyzerId |
string minLength: 1maxLength: 64 pattern: ^[a-zA-Z0-9._-]{1,64}$ |
The unique identifier of the analyzer. |
|
| baseAnalyzerId |
string minLength: 1maxLength: 64 pattern: ^[a-zA-Z0-9._-]{1,64}$ |
The analyzer to incrementally train from. |
|
| config |
Analyzer configuration settings. |
||
| createdAt |
string (date-time) |
The date and time when the analyzer was created. |
|
| description |
string |
A description of the analyzer. |
|
| dynamicFieldSchema |
boolean |
False |
Indicates whether the result may contain additional fields outside of the defined schema. |
| fieldSchema |
The schema of fields to extracted. |
||
| knowledgeSources | KnowledgeSource[]: |
Additional knowledge sources used to enhance the analyzer. |
|
| lastModifiedAt |
string (date-time) |
The date and time when the analyzer was last modified. |
|
| models |
object |
Mapping of model roles to specific model names. Ex. { "completion": "gpt-4.1", "embedding": "text-embedding-3-large" }. |
|
| processingLocation | global |
The location where the data may be processed. Defaults to global. |
|
| status |
The status of the analyzer. |
||
| supportedModels |
Chat completion and embedding models supported by the analyzer. |
||
| tags |
object |
Tags associated with the analyzer. |
|
| warnings |
Warnings encountered while creating the analyzer. |
ContentAnalyzerConfig
Configuration settings for an analyzer.
| Name | Type | Default value | Description |
|---|---|---|---|
| annotationFormat | markdown |
Representation format of annotations in analyze result markdown. |
|
| chartFormat | chartJs |
Representation format of charts in analyze result markdown. |
|
| contentCategories |
<string,
Content |
Map of categories to classify the input content(s) against. |
|
| disableFaceBlurring |
boolean |
Disable the default blurring of faces for privacy while processing the content. |
|
| enableFigureAnalysis |
boolean |
Enable analysis of figures, such as charts and diagrams. |
|
| enableFigureDescription |
boolean |
Enable generation of figure description. |
|
| enableFormula |
boolean |
Enable mathematical formula detection. |
|
| enableLayout |
boolean |
Enable layout analysis. |
|
| enableOcr |
boolean |
Enable optical character recognition (OCR). |
|
| enableSegment |
boolean |
Enable segmentation of the input by contentCategories. |
|
| estimateFieldSourceAndConfidence |
boolean |
Return field grounding source and confidence. |
|
| locales |
string[] |
List of locale hints for speech transcription. |
|
| omitContent |
boolean |
Omit the content for this analyzer from analyze result. Only return content(s) from additional analyzers specified in contentCategories, if any. |
|
| returnDetails |
boolean |
Return all content details. |
|
| segmentPerPage |
boolean |
Force segmentation of document content by page. |
|
| tableFormat | html |
Representation format of tables in analyze result markdown. |
ContentAnalyzerStatus
Status of a resource.
| Value | Description |
|---|---|
| creating |
The resource is being created. |
| ready |
The resource is ready. |
| deleting |
The resource is being deleted. |
| failed |
The resource failed during creation. |
ContentCategoryDefinition
Content category definition.
| Name | Type | Description |
|---|---|---|
| analyzer |
Optional inline definition of analyzer used to process the content. |
|
| analyzerId |
string |
Optional analyzer used to process the content. |
| description |
string |
The description of the category. |
ContentFieldDefinition
Definition of the field using a JSON Schema like syntax.
| Name | Type | Description |
|---|---|---|
| $ref |
string |
Reference to another field definition. |
| description |
string |
Field description. |
| enum |
string[] |
Enumeration of possible field values. |
| enumDescriptions |
object |
Descriptions for each enumeration value. |
| estimateSourceAndConfidence |
boolean |
Return grounding source and confidence. |
| examples |
string[] |
Examples of field values. |
| items |
Field type schema of each array element, if type is array. |
|
| method |
Generation method. |
|
| properties |
<string,
Content |
Named sub-fields, if type is object. |
| type |
Semantic data type of the field value. |
ContentFieldSchema
Schema of fields to be extracted from documents.
| Name | Type | Description |
|---|---|---|
| definitions |
<string,
Content |
Additional definitions referenced by the fields in the schema. |
| description |
string |
A description of the field schema. |
| fields |
<string,
Content |
The fields defined in the schema. |
| name |
string |
The name of the field schema. |
ContentFieldType
Semantic data type of the field value.
| Value | Description |
|---|---|
| string |
Plain text. |
| date |
Date, normalized to ISO 8601 (YYYY-MM-DD) format. |
| time |
Time, normalized to ISO 8601 (hh:mm:ss) format. |
| number |
Number as double precision floating point. |
| integer |
Integer as 64-bit signed integer. |
| boolean |
Boolean value. |
| array |
List of subfields of the same type. |
| object |
Named list of subfields. |
| json |
JSON object. |
GenerationMethod
Generation method.
| Value | Description |
|---|---|
| generate |
Values are generated freely based on the content. |
| extract |
Values are extracted as they appear in the content. |
| classify |
Values are classified against a predefined set of categories. |
KnowledgeSourceKind
Knowledge source kind.
| Value | Description |
|---|---|
| labeledData |
A labeled data knowledge source. |
LabeledDataKnowledgeSource
Labeled data knowledge source.
| Name | Type | Description |
|---|---|---|
| containerUrl |
string (uri) |
The URL of the blob container containing labeled data. |
| fileListPath |
string |
An optional path to a file listing specific blobs to include. |
| kind |
string:
labeled |
The kind of knowledge source. |
| prefix |
string |
An optional prefix to filter blobs within the container. |
ProcessingLocation
The location where the data may be processed. Defaults to global.
| Value | Description |
|---|---|
| geography |
Data may be processed in the same geography as the resource. |
| dataZone |
Data may be processed in the same data zone as the resource. |
| global |
Data may be processed in any Azure data center globally. |
SupportedModels
Chat completion and embedding models supported by the analyzer.
| Name | Type | Description |
|---|---|---|
| completion |
object |
Chat completion models supported by the analyzer. |
| embedding |
object |
Embedding models supported by the analyzer. |
TableFormat
Representation format of tables in analyze result markdown.
| Value | Description |
|---|---|
| html |
Represent tables using HTML table elements: <table>, <th>, <tr>, <td>. |
| markdown |
Represent tables using GitHub Flavored Markdown table syntax, which does not support merged cells or rich headers. |