教學課程：從結構化文檔佈局進行向量化

Azure AI 搜尋可以從儲存在 Azure Blob 記憶體中的 PDF 檔擷取和編制文字和影像的索引。本教學課程說明如何建置多模式索引管線， 以根據文件結構分割數據 ， 並使用多模式內嵌 來向量化同一份檔中的文字和影像。裁剪的影像會儲存在知識存放區中，文字和視覺內容都會向量化並內嵌在可搜尋的索引中。分塊是基於 Foundry 工具中 Azure 文件智慧的佈局模型，該模型能辨識文件結構。

在本教學課程中，您會使用：

36 頁 PDF 檔，結合豐富的視覺內容，例如圖表、資訊圖和掃描的頁面，以及傳統文字。
使用索引子和技能，以建立包含透過技能進行 AI 擴充的編製索引管線。
文件版面配置技能可從各種文件中透過其locationMetadata功能擷取文字和正規化影像，例如頁碼或周框區域。
Azure Vision 多模態嵌入技能，能將文字與圖片向量化。
設定為儲存擷取文字和影像內容的搜尋索引。某些內容會被向量化，以進行基於向量的相似性搜尋。

Prerequisites

Microsoft Foundry 資源。本資源提供本教學技能所使用的 Azure Vision 多模態嵌入模型及 Azure 文件智慧佈局模型的存取。你必須使用 Foundry 資源才能取得這些資源的技能存取權。
Azure AI 搜尋服務。設定您的搜尋服務，以進行角色型訪問控制和受控識別。您的服務必須位於基本層或更高層級。免費層不支持本教學課程。
Azure 記憶體，用於儲存範例數據，以及建立知識存放區。
Visual Studio Code 具有 REST 用戶端。

Limitations

檔版面配置技能的區域可用性有限。建立 Foundry 資源時，選擇提供多模態嵌入的區域。有關支援區域的清單，請參見文件版面技能支援區域。
Azure Vision 多模態嵌入技能在區域內的供應也有限。如需提供多模態嵌入的最新區域列表，請參閱 Azure Vision 文件。

準備資料

下列指示適用於 Azure 儲存體，它提供範例資料，並且承載知識庫。搜尋服務身分識別需要 Azure 記憶體的讀取許可權才能擷取範例數據，而且需要寫入許可權才能建立知識存放區。搜尋服務會使用您在環境變數中提供的名稱，在技能集處理期間建立裁剪影像的容器。

下載下列範例 PDF： sustainable-ai-pdf
在 Azure 記憶體中，建立名為 sustainable-ai-pdf 的新容器。
上傳範例數據檔。
建立角色指派，並在連接字串中指定受控識別：
1. 指派 儲存體 Blob 資料讀取器，供索引器擷取資料。指派「儲存體 Blob 資料參與者」和「儲存體資料表資料參與者」，以建立和載入知識存放區。您可以針對搜尋服務角色指派使用系統指派的受控識別或使用者指派的受控識別。
2. 針對使用系統指派的受控識別建立的連接，請取得包含 ResourceId 的連接字串，不含帳戶密鑰或密碼。 ResourceId 必須包含記憶體帳戶的訂用帳戶標識碼、記憶體帳戶的資源群組，以及記憶體帳戶名稱。連接字串應類似於下列範例：
```
"credentials" : { 
    "connectionString" : "ResourceId=/subscriptions/00000000-0000-0000-0000-00000000/resourceGroups/MY-DEMO-RESOURCE-GROUP/providers/Microsoft.Storage/storageAccounts/MY-DEMO-STORAGE-ACCOUNT/;" 
}
```
3. 針對使用使用者指派的受控識別所建立的連線，請取得包含 ResourceId 的連接字串，不含帳戶密鑰或密碼。 ResourceId 必須包含記憶體帳戶的訂用帳戶標識碼、記憶體帳戶的資源群組，以及記憶體帳戶名稱。使用下列範例所示的語法來提供身分識別。將userAssignedIdentity設定為使用者指派的受控識別。連接字串應類似於下列範例：
```
"credentials" : { 
    "connectionString" : "ResourceId=/subscriptions/00000000-0000-0000-0000-00000000/resourceGroups/MY-DEMO-RESOURCE-GROUP/providers/Microsoft.Storage/storageAccounts/MY-DEMO-STORAGE-ACCOUNT/;" 
},
"identity" : { 
    "@odata.type": "#Microsoft.Azure.Search.DataUserAssignedIdentity",
    "userAssignedIdentity" : "/subscriptions/00000000-0000-0000-0000-00000000/resourcegroups/MY-DEMO-RESOURCE-GROUP/providers/Microsoft.ManagedIdentity/userAssignedIdentities/MY-DEMO-USER-MANAGED-IDENTITY" 
}
```

準備模型

這個教學假設你已有 Foundry 資源，技能透過該資源呼叫 Azure Vision 多模態 4.0 嵌入模型。搜尋服務會使用其受控識別，在技能集處理期間連線到模型。本節提供您指派授權存取角色的指引和連結。

相同的角色指派也用於透過 Foundry 資源存取 Azure 文件智慧版面模型。

登入 Azure 入口網站（不是 Foundry 入口網站），找到 Foundry 資源。確保它位於提供多模態 4.0 API 和 Azure 文件智慧佈局模型的區域。
選取 存取控制 (IAM)。
選取 [新增 ]，然後 選取 [新增角色指派]。
搜尋 認知服務用戶 ，然後選取它。
選擇 [受控識別 ]，然後指派您的搜尋服務受控識別。

設定 REST 檔案

在本教學課程中，您本機 REST 用戶端與 Azure AI 搜尋的連線需要端點和 API 金鑰。您可以從 Azure 入口網站取得這些值。如需替代連接方法，請參閱連線至搜尋服務。

對於在索引器和技能集處理期間發生的已驗證連線，搜尋服務會使用您先前定義的角色指派。

啟動 Visual Studio Code 並建立新的檔案。

提供要求中使用的變數值。針對 @storageConnection，請確定您的連接字串沒有尾端分號或引號。針對 @imageProjectionContainer，提供 Blob 記憶體中唯一的容器名稱。 Azure AI 搜尋服務會在技能處理期間為您建立此容器。

@searchUrl = PUT-YOUR-SEARCH-SERVICE-ENDPOINT-HERE
@searchApiKey = PUT-YOUR-ADMIN-API-KEY-HERE
@storageConnection = PUT-YOUR-STORAGE-CONNECTION-STRING-HERE
@cognitiveServicesUrl = PUT-YOUR-AZURE-AI-FOUNDARY-ENDPOINT-HERE
@modelVersion = 2023-04-15
@imageProjectionContainer=sustainable-ai-pdf-images

使用 .rest 或 .http 副檔名來儲存檔案。如需 REST 用戶端的說明，請參閱快速入門：使用 REST 進行全文搜索。

若要取得 Azure AI 搜尋端點和 API 金鑰：

登入 Azure 入口網站，瀏覽至搜尋服務的 [概觀] 頁面，然後複製 URL。範例端點看起來會像是 https://mydemo.search.windows.net。
在設定>金鑰中，複製系統管理金鑰。系統管理金鑰可用來新增、修改和刪除物件。有兩個可交換的系統管理密鑰。複製其中一個。

建立數據源

建立資料來源 (REST) 會建立資料來源連線，指定要編製索引的資料。

### Create a data source using system-assigned managed identities
POST {{searchUrl}}/datasources?api-version=2025-11-01-preview   HTTP/1.1
  Content-Type: application/json
  api-key: {{searchApiKey}}

  {
    "name": "doc-intelligence-multimodal-embedding-ds",
    "description": "A data source to store multimodal documents",
    "type": "azureblob",
    "subtype": null,
    "credentials":{
      "connectionString":"{{storageConnection}}"
    },
    "container": {
      "name": "sustainable-ai-pdf",
      "query": null
    },
    "dataChangeDetectionPolicy": null,
    "dataDeletionDetectionPolicy": null,
    "encryptionKey": null,
    "identity": null
  }

發送請求。回應看起來應如下所示：

HTTP/1.1 201 Created
Transfer-Encoding: chunked
Content-Type: application/json; odata.metadata=minimal; odata.streaming=true; charset=utf-8
Location: https://<YOUR-SEARCH-SERVICE-NAME>.search.windows-int.net:443/datasources('doc-extraction-multimodal-embedding-ds')?api-version=2025-11-01-preview -Preview
Server: Microsoft-IIS/10.0
Strict-Transport-Security: max-age=2592000, max-age=15724800; includeSubDomains
Preference-Applied: odata.include-annotations="*"
OData-Version: 4.0
request-id: 4eb8bcc3-27b5-44af-834e-295ed078e8ed
elapsed-time: 346
Date: Sat, 26 Apr 2025 21:25:24 GMT
Connection: close

{
  "name": "doc-extraction-multimodal-embedding-ds",
  "description": null,
  "type": "azureblob",
  "subtype": null,
  "indexerPermissionOptions": [],
  "credentials": {
    "connectionString": null
  },
  "container": {
    "name": "sustainable-ai-pdf",
    "query": null
  },
  "dataChangeDetectionPolicy": null,
  "dataDeletionDetectionPolicy": null,
  "encryptionKey": null,
  "identity": null
}

建立索引

建立索引 (REST) 會在您的搜尋服務上建立搜尋索引。索引會指定所有參數及其屬性。

對於巢狀 JSON，索引欄位必須與來源欄位相同。目前，Azure AI 搜尋不支援欄位對應至巢狀 JSON，因此欄位名稱和資料類型必須完全相符。下列索引會對齊原始內容中的 JSON 元素。

### Create an index
POST {{searchUrl}}/indexes?api-version=2025-11-01-preview   HTTP/1.1
  Content-Type: application/json
  api-key: {{searchApiKey}}

{
    "name": "doc-intelligence-multimodal-embedding-index",
    "fields": [
        {
            "name": "content_id",
            "type": "Edm.String",
            "retrievable": true,
            "key": true,
            "analyzer": "keyword"
        },
        {
            "name": "text_document_id",
            "type": "Edm.String",
            "searchable": false,
            "filterable": true,
            "retrievable": true,
            "stored": true,
            "sortable": false,
            "facetable": false
        },          
        {
            "name": "document_title",
            "type": "Edm.String",
            "searchable": true
        },
        {
            "name": "image_document_id",
            "type": "Edm.String",
            "filterable": true,
            "retrievable": true
        },
        {
            "name": "content_text",
            "type": "Edm.String",
            "searchable": true,
            "retrievable": true
        },
        {
            "name": "content_embedding",
            "type": "Collection(Edm.Single)",
            "dimensions": 1024,
            "searchable": true,
            "retrievable": true,
            "vectorSearchProfile": "hnsw"
        },
        {
            "name": "content_path",
            "type": "Edm.String",
            "searchable": false,
            "retrievable": true
        },
        {
            "name": "offset",
            "type": "Edm.String",
            "searchable": false,
            "retrievable": true
        },
        {
            "name": "location_metadata",
            "type": "Edm.ComplexType",
            "fields": [
                {
                "name": "page_number",
                "type": "Edm.Int32",
                "searchable": false,
                "retrievable": true
                },
                {
                "name": "bounding_polygons",
                "type": "Edm.String",
                "searchable": false,
                "retrievable": true,
                "filterable": false,
                "sortable": false,
                "facetable": false
                }
            ]
        }         
    ],
    "vectorSearch": {
        "profiles": [
            {
                "name": "hnsw",
                "algorithm": "defaulthnsw",
                "vectorizer": "demo-vectorizer"
            }
        ],
        "algorithms": [
            {
                "name": "defaulthnsw",
                "kind": "hnsw",
                "hnswParameters": {
                    "m": 4,
                    "efConstruction": 400,
                    "metric": "cosine"
                }
            }
        ],
        "vectorizers": [
            {
                "name": "demo-vectorizer",
                "kind": "aiServicesVision",
                "aiServicesVisionParameters": {
                    "resourceUri": "{{cognitiveServicesUrl}}",
                    "authIdentity": null,
                    "modelVersion": "{{modelVersion}}"
                }
            }
        ]     
    },
    "semantic": {
        "defaultConfiguration": "semanticconfig",
        "configurations": [
            {
                "name": "semanticconfig",
                "prioritizedFields": {
                    "titleField": {
                        "fieldName": "document_title"
                    },
                    "prioritizedContentFields": [
                    ],
                    "prioritizedKeywordsFields": []
                }
            }
        ]
    }
}

重點︰

文字和影像內嵌會儲存在 content_embedding 欄位中，且必須設定適當的維度，例如 1024 和向量搜尋配置檔。
location_metadata 會擷取每個文字區塊和標準化影像的周框多邊形和頁碼中繼資料，以實現精確的空間搜尋或 UI 重疊。
如需向量搜尋的詳細資訊，請參閱 Azure AI 搜尋中的向量。
如需語意排名的詳細資訊，請參閱 Azure AI 搜尋中的語意排名。

建立技能集

建立技能集（REST）會在您的搜尋服務上建立技能集。技能組合定義在編製索引之前分割和嵌入內容的操作。此技能集使用文件版面技能來擷取文本和影像，並保留文本和影像的位置元數據，對於RAG應用程式中的引用非常有用。它運用 Azure Vision 多模態嵌入技能來向量化影像與文字內容。

### Create a skillset
POST {{searchUrl}}/skillsets?api-version=2025-11-01-preview   HTTP/1.1
  Content-Type: application/json
  api-key: {{searchApiKey}}

{
  "name": "doc-intelligence-multimodal-embedding-skillset",
  "description": "A sample skillset for multimodal using multimodal embedding",
  "skills": [
    {
      "@odata.type": "#Microsoft.Skills.Util.DocumentIntelligenceLayoutSkill",
      "name": "document-layout-skill",
      "description": "Azure Document Intelligence skill for document cracking",
      "context": "/document",
      "outputMode": "oneToMany",
      "outputFormat": "text",
      "extractionOptions": ["images", "locationMetadata"],
      "chunkingProperties": {     
          "unit": "characters",
          "maximumLength": 2000, 
          "overlapLength": 200
      },
      "inputs": [
        {
          "name": "file_data",
          "source": "/document/file_data"
        }
      ],
      "outputs": [
        { 
          "name": "text_sections", 
          "targetName": "text_sections" 
        }, 
        { 
          "name": "normalized_images", 
          "targetName": "normalized_images" 
        } 
      ]
    },
    { 
      "@odata.type": "#Microsoft.Skills.Vision.VectorizeSkill", 
      "name": "text-embedding-skill",
      "description": "Vision Vectorization skill for text",
      "context": "/document/text_sections/*", 
      "modelVersion": "2023-04-15", 
      "inputs": [ 
        { 
          "name": "text", 
          "source": "/document/text_sections/*/content" 
        } 
      ], 
      "outputs": [ 
        { 
          "name": "vector",
          "targetName": "text_vector"
        } 
      ] 
    },    
    { 
      "@odata.type": "#Microsoft.Skills.Vision.VectorizeSkill", 
      "name": "image-embedding-skill",
      "description": "Vision Vectorization skill for images",
      "context": "/document/normalized_images/*", 
      "modelVersion": "2023-04-15", 
      "inputs": [ 
        { 
          "name": "image", 
          "source": "/document/normalized_images/*" 
        } 
      ], 
      "outputs": [ 
        { 
          "name": "vector",
          "targetName": "image_vector"
        } 
      ] 
    },
    {
      "@odata.type": "#Microsoft.Skills.Util.ShaperSkill",
      "name": "shaper-skill",
      "context": "/document/normalized_images/*",
      "inputs": [
        {
          "name": "normalized_images",
          "source": "/document/normalized_images/*",
          "inputs": []
        },
        {
          "name": "imagePath",
          "source": "='my_container_name/'+$(/document/normalized_images/*/imagePath)",
          "inputs": []
        }
      ],
      "outputs": [
        {
          "name": "output",
          "targetName": "new_normalized_images"
        }
      ]
    }      
  ], 
   "indexProjections": {
      "selectors": [
        {
          "targetIndexName": "doc-intelligence-multimodal-embedding-index",
          "parentKeyFieldName": "text_document_id",
          "sourceContext": "/document/text_sections/*",
          "mappings": [    
            {
            "name": "content_embedding",
            "source": "/document/text_sections/*/text_vector"
            },                      
            {
              "name": "content_text",
              "source": "/document/text_sections/*/content"
            },
            {
              "name": "location_metadata",
              "source": "/document/text_sections/*/locationMetadata"
            },                
            {
              "name": "document_title",
              "source": "/document/document_title"
            }   
          ]
        },        
        {
          "targetIndexName": "{{index}}",
          "parentKeyFieldName": "image_document_id",
          "sourceContext": "/document/normalized_images/*",
          "mappings": [    
            {
            "name": "content_embedding",
            "source": "/document/normalized_images/*/image_vector"
            },                                           
            {
              "name": "content_path",
              "source": "/document/normalized_images/*/new_normalized_images/imagePath"
            },                    
            {
              "name": "document_title",
              "source": "/document/document_title"
            },
            {
              "name": "location_metadata",
              "source": "/document/normalized_images/*/locationMetadata"
            }             
          ]
        }
      ],
      "parameters": {
        "projectionMode": "skipIndexingParentDocuments"
      }
  },
  "cognitiveServices": {
    "@odata.type": "#Microsoft.Azure.Search.AIServicesByIdentity",
    "subdomainUrl": "{{cognitiveServicesUrl}}",
    "identity": null
  },
  "knowledgeStore": {
    "storageConnectionString": "",
    "identity": null,
    "projections": [
      {
        "files": [
          {
            "storageContainer": "{{imageProjectionContainer}}",
            "source": "/document/normalized_images/*"
          }
        ]
      }
    ]
  }
}

此技能集會擷取文字和影像、將兩者向量化，並塑造影像元數據以投射到索引中。

重點︰

欄位 content_text 填入了使用文檔佈局技能擷取並區塊化的文字。
content_path 包含指定之影像投影容器內圖像檔案的相對路徑。只有在 extractOption 設定為 ["images", "locationMetadata"] 或 ["images"] 時，才會為從文件提取的圖像生成此欄位，並且可以從來源欄位 /document/normalized_images/*/imagePath 的增強文檔中進行映射。
Azure Vision 的多模態嵌入功能能夠透過相同的技能類型，針對不同輸入（文字或影像）來嵌入文字與視覺資料。欲了解更多資訊，請參閱 Azure Vision 多模態嵌入技能。

建立並執行索引子

建立索引子會在您的搜尋服務上建立索引子。索引器會連線到數據源、載入數據、執行技能集，以及為擴充的數據編製索引。

### Create and run an indexer
POST {{searchUrl}}/indexers?api-version=2025-11-01-preview   HTTP/1.1
  Content-Type: application/json
  api-key: {{searchApiKey}}

{
  "dataSourceName": "doc-intelligence-multimodal-embedding-ds",
  "targetIndexName": "doc-intelligence-multimodal-embedding-index",
  "skillsetName": "doc-intelligence-multimodal-embedding-skillset",
  "parameters": {
    "maxFailedItems": -1,
    "maxFailedItemsPerBatch": 0,
    "batchSize": 1,
    "configuration": {
      "allowSkillsetToReadFileData": true
    }
  },
  "fieldMappings": [
    {
      "sourceFieldName": "metadata_storage_name",
      "targetFieldName": "document_title"
    }
  ],
  "outputFieldMappings": []
}

執行查詢

第一個文件載入後，您就可以開始查詢。

### Query the index
POST {{searchUrl}}/indexes/doc-intelligence-multimodal-embedding-index/docs/search?api-version=2025-11-01-preview   HTTP/1.1
  Content-Type: application/json
  api-key: {{searchApiKey}}
  
  {
    "search": "*",
    "count": true
  }

發送請求。這是未指定的全文搜索查詢，會傳回索引中標示為可擷取的所有字段，以及文件計數。回應看起來應如下所示：

{
  "@odata.count": 100,
  "@search.nextPageParameters": {
    "search": "*",
    "count": true,
    "skip": 50
  },
  "value": [
  ],
  "@odata.nextLink": "https://<YOUR-SEARCH-SERVICE-NAME>.search.windows.net/indexes/doc-intelligence-multimodal-embedding-index/docs/search?api-version=2025-11-01-preview "
}

回應中將返回 100 份文件。

在篩選中，您也可以使用邏輯運算子 (and、or、not) 和比較運算子 (eq、ne、gt、lt、ge、le)。字串比較是區分大小寫的。如需詳細資訊和範例，請參閱簡單搜尋查詢的範例。

Note

參數 $filter 只適用於在索引建立期間標示為可篩選的欄位。

### Query for only images
POST {{searchUrl}}/indexes/doc-intelligence-multimodal-embedding-index/docs/search?api-version=2025-11-01-preview   HTTP/1.1
  Content-Type: application/json
  api-key: {{searchApiKey}}
  
  {
    "search": "*",
    "count": true,
    "filter": "image_document_id ne null"
  }

### Query for text or images with content related to energy, returning the id, parent document, and text (only populated for text chunks), and the content path where the image is saved in the knowledge store (only populated for images)
POST {{searchUrl}}/indexes/doc-intelligence-multimodal-embedding-index/docs/search?api-version=2025-11-01-preview   HTTP/1.1
  Content-Type: application/json
  api-key: {{searchApiKey}}
  
  {
    "search": "energy",
    "count": true,
    "select": "content_id, document_title, content_text, content_path"
  }

重設並重新執行

索引器可以重設為清除執行歷程記錄，以允許完整重新執行。下列 POST 要求適用於重設，後面接著重新執行。

### Reset the indexer
POST {{searchUrl}}/indexers/doc-intelligence-multimodal-embedding-indexer/reset?api-version=2025-11-01-preview   HTTP/1.1
  api-key: {{searchApiKey}}

### Run the indexer
POST {{searchUrl}}/indexers/doc-intelligence-multimodal-embedding-indexer/run?api-version=2025-11-01-preview   HTTP/1.1
  api-key: {{searchApiKey}}

### Check indexer status 
GET {{searchUrl}}/indexers/doc-intelligence-multimodal-embedding-indexer/status?api-version=2025-11-01-preview   HTTP/1.1
  api-key: {{searchApiKey}}

清理資源

如果您使用自己的訂用帳戶，當專案結束時，建議您移除不再需要的資源。讓資源繼續執行可能會產生費用。您可以個別刪除資源，或刪除資源群組以刪除整組資源。

您可以使用 Azure 入口網站來移除索引、索引器和資料來源。

另請參閱

現在您已熟悉多模式索引編製案例的範例實作，請參閱：

意見反應

此頁面對您有幫助嗎？

Last updated on 2025-11-18