Share via


Content Analyzers - Create Or Replace

Create a new analyzer asynchronously.

PUT {endpoint}/contentunderstanding/analyzers/{analyzerId}?api-version=2025-11-01
PUT {endpoint}/contentunderstanding/analyzers/{analyzerId}?api-version=2025-11-01&allowReplace={allowReplace}

URI Parameters

Name In Required Type Description
analyzerId
path True

string

minLength: 1
maxLength: 64
pattern: ^[a-zA-Z0-9._-]{1,64}$

The unique identifier of the analyzer.

endpoint
path True

string (uri)

Content Understanding service endpoint.

api-version
query True

string

minLength: 1

The API version to use for this operation.

allowReplace
query

boolean

Allow the operation to replace an existing resource.

Request Header

Name Required Type Description
x-ms-client-request-id

string (uuid)

An opaque, globally-unique, client-generated string identifier for the request.

Request Body

Name Type Description
baseAnalyzerId

string

minLength: 1
maxLength: 64
pattern: ^[a-zA-Z0-9._-]{1,64}$

The analyzer to incrementally train from.

config

ContentAnalyzerConfig

Analyzer configuration settings.

description

string

A description of the analyzer.

dynamicFieldSchema

boolean

Indicates whether the result may contain additional fields outside of the defined schema.

fieldSchema

ContentFieldSchema

The schema of fields to extracted.

knowledgeSources KnowledgeSource[]:

LabeledDataKnowledgeSource[]

Additional knowledge sources used to enhance the analyzer.

models

object

Mapping of model roles to specific model names. Ex. { "completion": "gpt-4.1", "embedding": "text-embedding-3-large" }.

processingLocation

ProcessingLocation

The location where the data may be processed. Defaults to global.

tags

object

Tags associated with the analyzer.

Responses

Name Type Description
200 OK

ContentAnalyzer

The request has succeeded.

Headers

  • Operation-Location: string
  • x-ms-client-request-id: string
201 Created

ContentAnalyzer

The request has succeeded and a new resource has been created as a result.

Headers

  • Operation-Location: string
  • x-ms-client-request-id: string
Other Status Codes

Azure.Core.Foundations.ErrorResponse

An unexpected error response.

Headers

x-ms-error-code: string

Security

Ocp-Apim-Subscription-Key

Key-based authentication using the access key of the Azure resource.

Type: apiKey
In: header

EntraIdToken

Microsoft Entra ID OAuth2 authentication using an access token.

Type: oauth2
Flow: accessCode
Authorization URL: https://login.microsoftonline.com/common/oauth2/authorize
Token URL: https://login.microsoftonline.com/common/oauth2/token

Scopes

Name Description
https://cognitiveservices.azure.com/.default

Examples

Create or Replace Analyzer

Sample request

PUT {endpoint}/contentunderstanding/analyzers/myAnalyzer?api-version=2025-11-01

{
  "description": "My analyzer",
  "tags": {
    "createdBy": "John"
  },
  "baseAnalyzerId": "prebuilt-document",
  "config": {
    "enableFormula": false,
    "returnDetails": true
  },
  "fieldSchema": {
    "name": "MyForm",
    "description": "My form",
    "fields": {
      "Company": {
        "type": "string",
        "description": "Name of company."
      }
    },
    "definitions": {}
  },
  "knowledgeSources": [
    {
      "kind": "labeledData",
      "containerUrl": "https://myStorageAccount.blob.core.windows.net/myContainer",
      "prefix": "trainingData",
      "fileListPath": "trainingData/fileList.jsonl"
    }
  ]
}

Sample response

Operation-Location: https://myendpoint.cognitiveservices.azure.com/contentunderstanding/analyzers/myAnalyzer/operations/3b31320d-8bab-4f88-b19c-2322a7f11034?api-version=2025-11-01
{
  "analyzerId": "myAnalyzer",
  "description": "My analyzer",
  "tags": {
    "createdBy": "John"
  },
  "status": "creating",
  "createdAt": "2025-05-01T18:46:36.051Z",
  "lastModifiedAt": "2025-05-01T18:46:36.051Z",
  "baseAnalyzerId": "prebuilt-document",
  "config": {
    "locales": null,
    "enableOcr": true,
    "enableLayout": true,
    "enableFormula": false,
    "returnDetails": true
  },
  "fieldSchema": {
    "name": "MyForm",
    "description": "My form",
    "fields": {
      "Company": {
        "type": "string",
        "description": "Name of company."
      }
    },
    "definitions": {}
  },
  "knowledgeSources": [
    {
      "kind": "labeledData",
      "containerUrl": "https://myStorageAccount.blob.core.windows.net/myContainer",
      "prefix": "trainingData",
      "fileListPath": "trainingData/fileList.jsonl"
    }
  ]
}
Operation-Location: https://myendpoint.cognitiveservices.azure.com/contentunderstanding/analyzers/myAnalyzer/operations/3b31320d-8bab-4f88-b19c-2322a7f11034?api-version=2025-11-01
{
  "analyzerId": "myAnalyzer",
  "description": "My analyzer",
  "tags": {
    "createdBy": "John"
  },
  "status": "creating",
  "createdAt": "2025-05-01T18:46:36.051Z",
  "lastModifiedAt": "2025-05-01T18:46:36.051Z",
  "baseAnalyzerId": "prebuilt-document",
  "config": {
    "locales": null,
    "enableOcr": true,
    "enableLayout": true,
    "enableFormula": false,
    "returnDetails": true
  },
  "fieldSchema": {
    "name": "MyForm",
    "description": "My form",
    "fields": {
      "Company": {
        "type": "string",
        "description": "Name of company."
      }
    },
    "definitions": {}
  },
  "knowledgeSources": [
    {
      "kind": "labeledData",
      "containerUrl": "https://myStorageAccount.blob.core.windows.net/myContainer",
      "prefix": "trainingData",
      "fileListPath": "trainingData/fileList.jsonl"
    }
  ]
}

Definitions

Name Description
AnnotationFormat

Representation format of annotations in analyze result markdown.

Azure.Core.Foundations.Error

The error object.

Azure.Core.Foundations.ErrorResponse

A response containing error details.

Azure.Core.Foundations.InnerError

An object containing more specific information about the error. As per Azure REST API guidelines - https://aka.ms/AzureRestApiGuidelines#handling-errors.

ChartFormat

Representation format of charts in analyze result markdown.

ContentAnalyzer

Analyzer that extracts content and fields from multimodal documents.

ContentAnalyzerConfig

Configuration settings for an analyzer.

ContentAnalyzerStatus

Status of a resource.

ContentCategoryDefinition

Content category definition.

ContentFieldDefinition

Definition of the field using a JSON Schema like syntax.

ContentFieldSchema

Schema of fields to be extracted from documents.

ContentFieldType

Semantic data type of the field value.

GenerationMethod

Generation method.

KnowledgeSourceKind

Knowledge source kind.

LabeledDataKnowledgeSource

Labeled data knowledge source.

ProcessingLocation

The location where the data may be processed. Defaults to global.

SupportedModels

Chat completion and embedding models supported by the analyzer.

TableFormat

Representation format of tables in analyze result markdown.

AnnotationFormat

Representation format of annotations in analyze result markdown.

Value Description
none

Do not represent annotations.

markdown

Represent basic annotation information using markdown formatting.

Azure.Core.Foundations.Error

The error object.

Name Type Description
code

string

One of a server-defined set of error codes.

details

Azure.Core.Foundations.Error[]

An array of details about specific errors that led to this reported error.

innererror

Azure.Core.Foundations.InnerError

An object containing more specific information than the current object about the error.

message

string

A human-readable representation of the error.

target

string

The target of the error.

Azure.Core.Foundations.ErrorResponse

A response containing error details.

Name Type Description
error

Azure.Core.Foundations.Error

The error object.

Azure.Core.Foundations.InnerError

An object containing more specific information about the error. As per Azure REST API guidelines - https://aka.ms/AzureRestApiGuidelines#handling-errors.

Name Type Description
code

string

One of a server-defined set of error codes.

innererror

Azure.Core.Foundations.InnerError

Inner error.

ChartFormat

Representation format of charts in analyze result markdown.

Value Description
chartJs

Represent charts as Chart.js code blocks.

markdown

Represent charts as markdown tables.

ContentAnalyzer

Analyzer that extracts content and fields from multimodal documents.

Name Type Default value Description
analyzerId

string

minLength: 1
maxLength: 64
pattern: ^[a-zA-Z0-9._-]{1,64}$

The unique identifier of the analyzer.

baseAnalyzerId

string

minLength: 1
maxLength: 64
pattern: ^[a-zA-Z0-9._-]{1,64}$

The analyzer to incrementally train from.

config

ContentAnalyzerConfig

Analyzer configuration settings.

createdAt

string (date-time)

The date and time when the analyzer was created.

description

string

A description of the analyzer.

dynamicFieldSchema

boolean

False

Indicates whether the result may contain additional fields outside of the defined schema.

fieldSchema

ContentFieldSchema

The schema of fields to extracted.

knowledgeSources KnowledgeSource[]:

LabeledDataKnowledgeSource[]

Additional knowledge sources used to enhance the analyzer.

lastModifiedAt

string (date-time)

The date and time when the analyzer was last modified.

models

object

Mapping of model roles to specific model names. Ex. { "completion": "gpt-4.1", "embedding": "text-embedding-3-large" }.

processingLocation

ProcessingLocation

global

The location where the data may be processed. Defaults to global.

status

ContentAnalyzerStatus

The status of the analyzer.

supportedModels

SupportedModels

Chat completion and embedding models supported by the analyzer.

tags

object

Tags associated with the analyzer.

warnings

Azure.Core.Foundations.Error[]

Warnings encountered while creating the analyzer.

ContentAnalyzerConfig

Configuration settings for an analyzer.

Name Type Default value Description
annotationFormat

AnnotationFormat

markdown

Representation format of annotations in analyze result markdown.

chartFormat

ChartFormat

chartJs

Representation format of charts in analyze result markdown.

contentCategories

<string,  ContentCategoryDefinition>

Map of categories to classify the input content(s) against.

disableFaceBlurring

boolean

Disable the default blurring of faces for privacy while processing the content.

enableFigureAnalysis

boolean

Enable analysis of figures, such as charts and diagrams.

enableFigureDescription

boolean

Enable generation of figure description.

enableFormula

boolean

Enable mathematical formula detection.

enableLayout

boolean

Enable layout analysis.

enableOcr

boolean

Enable optical character recognition (OCR).

enableSegment

boolean

Enable segmentation of the input by contentCategories.

estimateFieldSourceAndConfidence

boolean

Return field grounding source and confidence.

locales

string[]

List of locale hints for speech transcription.

omitContent

boolean

Omit the content for this analyzer from analyze result. Only return content(s) from additional analyzers specified in contentCategories, if any.

returnDetails

boolean

Return all content details.

segmentPerPage

boolean

Force segmentation of document content by page.

tableFormat

TableFormat

html

Representation format of tables in analyze result markdown.

ContentAnalyzerStatus

Status of a resource.

Value Description
creating

The resource is being created.

ready

The resource is ready.

deleting

The resource is being deleted.

failed

The resource failed during creation.

ContentCategoryDefinition

Content category definition.

Name Type Description
analyzer

ContentAnalyzer

Optional inline definition of analyzer used to process the content.

analyzerId

string

Optional analyzer used to process the content.

description

string

The description of the category.

ContentFieldDefinition

Definition of the field using a JSON Schema like syntax.

Name Type Description
$ref

string

Reference to another field definition.

description

string

Field description.

enum

string[]

Enumeration of possible field values.

enumDescriptions

object

Descriptions for each enumeration value.

estimateSourceAndConfidence

boolean

Return grounding source and confidence.

examples

string[]

Examples of field values.

items

ContentFieldDefinition

Field type schema of each array element, if type is array.

method

GenerationMethod

Generation method.

properties

<string,  ContentFieldDefinition>

Named sub-fields, if type is object.

type

ContentFieldType

Semantic data type of the field value.

ContentFieldSchema

Schema of fields to be extracted from documents.

Name Type Description
definitions

<string,  ContentFieldDefinition>

Additional definitions referenced by the fields in the schema.

description

string

A description of the field schema.

fields

<string,  ContentFieldDefinition>

The fields defined in the schema.

name

string

The name of the field schema.

ContentFieldType

Semantic data type of the field value.

Value Description
string

Plain text.

date

Date, normalized to ISO 8601 (YYYY-MM-DD) format.

time

Time, normalized to ISO 8601 (hh:mm:ss) format.

number

Number as double precision floating point.

integer

Integer as 64-bit signed integer.

boolean

Boolean value.

array

List of subfields of the same type.

object

Named list of subfields.

json

JSON object.

GenerationMethod

Generation method.

Value Description
generate

Values are generated freely based on the content.

extract

Values are extracted as they appear in the content.

classify

Values are classified against a predefined set of categories.

KnowledgeSourceKind

Knowledge source kind.

Value Description
labeledData

A labeled data knowledge source.

LabeledDataKnowledgeSource

Labeled data knowledge source.

Name Type Description
containerUrl

string (uri)

The URL of the blob container containing labeled data.

fileListPath

string

An optional path to a file listing specific blobs to include.

kind string:

labeledData

The kind of knowledge source.

prefix

string

An optional prefix to filter blobs within the container.

ProcessingLocation

The location where the data may be processed. Defaults to global.

Value Description
geography

Data may be processed in the same geography as the resource.

dataZone

Data may be processed in the same data zone as the resource.

global

Data may be processed in any Azure data center globally.

SupportedModels

Chat completion and embedding models supported by the analyzer.

Name Type Description
completion

object

Chat completion models supported by the analyzer.

embedding

object

Embedding models supported by the analyzer.

TableFormat

Representation format of tables in analyze result markdown.

Value Description
html

Represent tables using HTML table elements: <table>, <th>, <tr>, <td>.

markdown

Represent tables using GitHub Flavored Markdown table syntax, which does not support merged cells or rich headers.