教學課程：向量化影像和文字

Azure AI 搜尋可以從儲存在 Azure Blob 記憶體中的 PDF 檔擷取和編制文字和影像的索引。本教學課程說明如何在 Azure AI 搜尋中建置多模式索引管線， 以使用內建文字分割技能將數據區塊 化， 並使用多模式內嵌 來向量化同一份檔中的文字和影像。裁剪的影像會儲存在知識存放區中，文字和視覺內容都會向量化並內嵌在可搜尋的索引中。

在本教學課程中，您會使用：

36 頁 PDF 檔，結合豐富的視覺內容，例如圖表、資訊圖和掃描的頁面，以及傳統文字。
使用索引子和技能，以建立包含透過技能進行 AI 擴充的編製索引管線。
檔案擷取技能用於擷取標準化的影像和文字。文字分割技能會將數據區塊化。
Azure Vision 多模態嵌入技能，能將文字與圖片向量化。
設定為儲存擷取文字和影像內容的搜尋索引。某些內容會被向量化，以進行基於向量的相似性搜尋。

本教學課程示範了一種使用文件擷取技能進行多模態內容索引的低成本方法。它可讓您從從 Azure Blob 記憶體提取的檔擷取和搜尋文字和影像。不過，它不包含文字的位置元數據，例如頁碼或周框區域。如需包含結構化文字版面配置和空間元數據的更全面解決方案，請參閱教學課程：從結構化檔版面配置向量化。

Note

文件擷取技能中的圖片擷取不是免費的。在技能組合中將 imageAction 設定為 generateNormalizedImages 會觸發影像擷取，這會產生額外的費用。如需計費資訊，請參閱 Azure AI 搜尋定價。

Prerequisites

Microsoft Foundry 資源。此資源提供本教學中使用的 Azure Vision 多模態嵌入模型的存取。你必須使用 Foundry 資源才能取得此資源的技能存取權。
Azure AI 搜尋服務。設定你的搜尋服務，以進行角色為基礎的存取控制，並提供連接 Azure Storage 和 Azure Vision 的管理身份。您的服務必須位於基本層或更高層級。免費層不支持本教學課程。
Azure 記憶體，用於儲存範例數據，以及建立知識存放區。
Visual Studio Code 具有 REST 用戶端。

局限性

Azure Vision 多模態嵌入技能在區域內的供應有限。建立 Foundry 資源時，選擇提供多模態嵌入的區域。如需提供多模態嵌入的最新區域列表，請參閱 Azure Vision 文件。

準備資料

下列指示適用於 Azure 儲存體，它提供範例資料，並且承載知識庫。搜尋服務身分識別需要 Azure 記憶體的讀取許可權才能擷取範例數據，而且需要寫入許可權才能建立知識存放區。搜尋服務會使用您在環境變數中提供的名稱，在技能集處理期間建立裁剪影像的容器。

下載下列範例 PDF： sustainable-ai-pdf
在 Azure 記憶體中，建立名為 sustainable-ai-pdf 的新容器。
上傳範例數據檔。
建立角色指派，並在連接字串中指定受控識別：
1. 指派 儲存體 Blob 資料讀取器，以供索引器擷取資料。指派 儲存體 Blob 資料貢獻者 和 儲存體表格資料貢獻者 ，以建立和載入知識存放區。您可以針對搜尋服務角色指派使用系統指派的受控識別或使用者指派的受控識別。
2. 針對使用系統指派的受控識別建立的連接，請取得包含 ResourceId 的連接字串，不含帳戶密鑰或密碼。 ResourceId 必須包含記憶體帳戶的訂用帳戶標識碼、記憶體帳戶的資源群組，以及記憶體帳戶名稱。連接字串應類似於下列範例：
```
"credentials" : { 
    "connectionString" : "ResourceId=/subscriptions/00000000-0000-0000-0000-00000000/resourceGroups/MY-DEMO-RESOURCE-GROUP/providers/Microsoft.Storage/storageAccounts/MY-DEMO-STORAGE-ACCOUNT/;" 
}
```
3. 針對使用使用者指派的受控識別所建立的連線，請取得包含 ResourceId 的連接字串，不含帳戶密鑰或密碼。 ResourceId 必須包含記憶體帳戶的訂用帳戶標識碼、記憶體帳戶的資源群組，以及記憶體帳戶名稱。使用下列範例所示的語法來提供身分識別。將 userAssignedIdentity 設定為使用者指派的受控識別。連接字串應類似於下列範例：
```
"credentials" : { 
    "connectionString" : "ResourceId=/subscriptions/00000000-0000-0000-0000-00000000/resourceGroups/MY-DEMO-RESOURCE-GROUP/providers/Microsoft.Storage/storageAccounts/MY-DEMO-STORAGE-ACCOUNT/;" 
},
"identity" : { 
    "@odata.type": "#Microsoft.Azure.Search.DataUserAssignedIdentity",
    "userAssignedIdentity" : "/subscriptions/00000000-0000-0000-0000-00000000/resourcegroups/MY-DEMO-RESOURCE-GROUP/providers/Microsoft.ManagedIdentity/userAssignedIdentities/MY-DEMO-USER-MANAGED-IDENTITY" 
}
```

準備模型

這個教學假設你已有 Foundry 資源，技能透過該資源呼叫 Azure Vision 多模態 4.0 嵌入模型。搜尋服務會使用其受控識別，在技能集處理期間連線到模型。本節提供您指派授權存取角色的指引和連結。

登入 Azure 入口網站（不是 Foundry 入口網站），找到 Foundry 資源。請確定它位於提供多模式 4.0 API 的區域。
選取 存取控制 (IAM)。
選取 [新增 ]，然後 選取 [新增角色指派]。
搜尋 認知服務用戶 ，然後選取它。
選擇 [受控識別 ]，然後指派您的搜尋服務受控識別。

設定 REST 檔案

在本教學課程中，您本機 REST 用戶端與 Azure AI 搜尋的連線需要端點和 API 金鑰。您可以從 Azure 入口網站取得這些值。如需替代連接方法，請參閱連線至搜尋服務。

對於在索引器和技能集處理期間發生的已驗證連線，搜尋服務會使用您先前定義的角色指派。

啟動 Visual Studio Code 並建立新的檔案。

提供要求中使用的變數值。針對 @storageConnection，請確定您的連接字串沒有尾端分號或引號。針對 @imageProjectionContainer，提供 Blob 記憶體中唯一的容器名稱。 Azure AI 搜尋服務會在技能處理期間為您建立此容器。

 @searchUrl = PUT-YOUR-SEARCH-SERVICE-ENDPOINT-HERE
 @searchApiKey = PUT-YOUR-ADMIN-API-KEY-HERE
 @storageConnection = PUT-YOUR-STORAGE-CONNECTION-STRING-HERE
 @cognitiveServicesUrl = PUT-YOUR-AZURE-AI-FOUNDRY-ENDPOINT-HERE
 @modelVersion = 2023-04-15
 @imageProjectionContainer=sustainable-ai-pdf-images

使用 .rest 或 .http 副檔名來儲存檔案。如需 REST 用戶端的說明，請參閱快速入門：使用 REST 進行全文搜索。

若要取得 Azure AI 搜尋端點和 API 金鑰：

登入 Azure 入口網站，瀏覽至搜尋服務的 [概觀] 頁面，然後複製 URL。範例端點看起來會像是 https://mydemo.search.windows.net。
在設定>金鑰中，複製系統管理金鑰。系統管理金鑰可用來新增、修改和刪除物件。有兩個可交換的系統管理密鑰。複製其中一個。

建立數據源

建立資料來源 (REST) 會建立資料來源連線，指定要編製索引的資料。

POST {{searchUrl}}/datasources?api-version=2025-11-01-preview   HTTP/1.1
  Content-Type: application/json
  api-key: {{searchApiKey}}

{
   "name":"doc-extraction-multimodal-embedding-ds",
   "description":null,
   "type":"azureblob",
   "subtype":null,
   "credentials":{
      "connectionString":"{{storageConnection}}"
   },
   "container":{
      "name":"sustainable-ai-pdf",
      "query":null
   },
   "dataChangeDetectionPolicy":null,
   "dataDeletionDetectionPolicy":null,
   "encryptionKey":null,
   "identity":null
}

發送請求。回應看起來應如下所示：

HTTP/1.1 201 Created
Transfer-Encoding: chunked
Content-Type: application/json; odata.metadata=minimal; odata.streaming=true; charset=utf-8
Location: https://<YOUR-SEARCH-SERVICE-NAME>.search.windows-int.net:443/datasources('doc-extraction-multimodal-embedding-ds')?api-version=2025-11-01-preview -Preview
Server: Microsoft-IIS/10.0
Strict-Transport-Security: max-age=2592000, max-age=15724800; includeSubDomains
Preference-Applied: odata.include-annotations="*"
OData-Version: 4.0
request-id: 4eb8bcc3-27b5-44af-834e-295ed078e8ed
elapsed-time: 346
Date: Sat, 26 Apr 2025 21:25:24 GMT
Connection: close

{
  "name": "doc-extraction-multimodal-embedding-ds",
  "description": null,
  "type": "azureblob",
  "subtype": null,
  "indexerPermissionOptions": [],
  "credentials": {
    "connectionString": null
  },
  "container": {
    "name": "sustainable-ai-pdf",
    "query": null
  },
  "dataChangeDetectionPolicy": null,
  "dataDeletionDetectionPolicy": null,
  "encryptionKey": null,
  "identity": null
}

建立索引

建立索引 (REST) 會在您的搜尋服務上建立搜尋索引。索引會指定所有參數及其屬性。

對於巢狀 JSON，索引欄位必須與來源欄位相同。目前，Azure AI 搜尋不支援欄位對應至巢狀 JSON，因此欄位名稱和資料類型必須完全相符。下列索引會對齊原始內容中的 JSON 元素。

### Create an index
POST {{searchUrl}}/indexes?api-version=2025-11-01-preview   HTTP/1.1
  Content-Type: application/json
  api-key: {{searchApiKey}}

{
    "name": "doc-extraction-multimodal-embedding-index",
    "fields": [
        {
            "name": "content_id",
            "type": "Edm.String",
            "retrievable": true,
            "key": true,
            "analyzer": "keyword"
        },
        {
            "name": "text_document_id",
            "type": "Edm.String",
            "searchable": false,
            "filterable": true,
            "retrievable": true,
            "stored": true,
            "sortable": false,
            "facetable": false
        },          
        {
            "name": "document_title",
            "type": "Edm.String",
            "searchable": true
        },
        {
            "name": "image_document_id",
            "type": "Edm.String",
            "filterable": true,
            "retrievable": true
        },
        {
            "name": "content_text",
            "type": "Edm.String",
            "searchable": true,
            "retrievable": true
        },
        {
            "name": "content_embedding",
            "type": "Collection(Edm.Single)",
            "dimensions": 1024,
            "searchable": true,
            "retrievable": true,
            "vectorSearchProfile": "hnsw"
        },
        {
            "name": "content_path",
            "type": "Edm.String",
            "searchable": false,
            "retrievable": true
        },
        {
            "name": "offset",
            "type": "Edm.String",
            "searchable": false,
            "retrievable": true
        },
        {
            "name": "location_metadata",
            "type": "Edm.ComplexType",
            "fields": [
                {
                "name": "page_number",
                "type": "Edm.Int32",
                "searchable": false,
                "retrievable": true
                },
                {
                "name": "bounding_polygons",
                "type": "Edm.String",
                "searchable": false,
                "retrievable": true,
                "filterable": false,
                "sortable": false,
                "facetable": false
                }
            ]
        }         
    ],
    "vectorSearch": {
        "profiles": [
            {
                "name": "hnsw",
                "algorithm": "defaulthnsw",
                "vectorizer": "demo-vectorizer"
            }
        ],
        "algorithms": [
            {
                "name": "defaulthnsw",
                "kind": "hnsw",
                "hnswParameters": {
                    "m": 4,
                    "efConstruction": 400,
                    "metric": "cosine"
                }
            }
        ],
        "vectorizers": [
            {
                "name": "demo-vectorizer",
                "kind": "aiServicesVision",
                "aiServicesVisionParameters": {
                    "resourceUri": "{{cognitiveServicesUrl}}",
                    "authIdentity": null,
                    "modelVersion": "{{modelVersion}}"
                }
            }
        ]     
    },
    "semantic": {
        "defaultConfiguration": "semanticconfig",
        "configurations": [
            {
                "name": "semanticconfig",
                "prioritizedFields": {
                    "titleField": {
                        "fieldName": "document_title"
                    },
                    "prioritizedContentFields": [
                    ],
                    "prioritizedKeywordsFields": []
                }
            }
        ]
    }
}

重點︰

文字和影像內嵌會儲存在 content_embedding 欄位中，且必須設定適當的維度，例如 1024 和向量搜尋配置檔。
location_metadata 會擷取每個標準化影像的周框多邊形和頁碼中繼資料，以實現精確的空間搜尋或 UI 重疊。 location_metadata 僅適用於此案例中的映像。如果您想要擷取文字的位置元數據，請考慮使用檔版面配置技能。深入教學課程會在頁面底部連結。
如需向量搜尋的詳細資訊，請參閱 Azure AI 搜尋中的向量。
如需語意排名的詳細資訊，請參閱 Azure AI 搜尋中的語意排名

建立技能集

建立技能集（REST）會在您的搜尋服務上建立技能集。技能組合定義在編製索引之前分割和嵌入內容的操作。此技能集會使用內建的檔擷取技能來擷取文字和影像。它會使用文字分割技能來區塊大型文字。它運用 Azure Vision 多模態嵌入技能來向量化影像與文字內容。

### Create a skillset
POST {{searchUrl}}/skillsets?api-version=2025-11-01-preview   HTTP/1.1
  Content-Type: application/json
  api-key: {{searchApiKey}}

{
  "name": "doc-extraction-multimodal-embedding-skillset",
	"description": "A test skillset",
  "skills": [
    {
      "@odata.type": "#Microsoft.Skills.Util.DocumentExtractionSkill",
      "name": "document-extraction-skill",
      "description": "Document extraction skill to extract text and images from documents",
      "parsingMode": "default",
      "dataToExtract": "contentAndMetadata",
      "configuration": {
          "imageAction": "generateNormalizedImages",
          "normalizedImageMaxWidth": 2000,
          "normalizedImageMaxHeight": 2000
      },
      "context": "/document",
      "inputs": [
        {
          "name": "file_data",
          "source": "/document/file_data"
        }
      ],
      "outputs": [
        {
          "name": "content",
          "targetName": "extracted_content"
        },
        {
          "name": "normalized_images",
          "targetName": "normalized_images"
        }
      ]
    },
    {
      "@odata.type": "#Microsoft.Skills.Text.SplitSkill",
      "name": "split-skill",
      "description": "Split skill to chunk documents",
      "context": "/document",
      "defaultLanguageCode": "en",
      "textSplitMode": "pages",
      "maximumPageLength": 2000,
      "pageOverlapLength": 200,
      "unit": "characters",
      "inputs": [
        {
          "name": "text",
          "source": "/document/extracted_content",
          "inputs": []
        }
      ],
      "outputs": [
        {
          "name": "textItems",
          "targetName": "pages"
        }
      ]
    },  
  { 
    "@odata.type": "#Microsoft.Skills.Vision.VectorizeSkill", 
    "name": "text-embedding-skill",
    "description": "Vision Vectorization skill for text",
    "context": "/document/pages/*", 
    "modelVersion": "{{modelVersion}}", 
    "inputs": [ 
      { 
        "name": "text", 
        "source": "/document/pages/*" 
      } 
    ], 
    "outputs": [ 
      { 
        "name": "vector",
        "targetName": "text_vector"
      } 
    ] 
  },
  { 
    "@odata.type": "#Microsoft.Skills.Vision.VectorizeSkill", 
    "name": "image-embedding-skill",
    "description": "Vision Vectorization skill for images",
    "context": "/document/normalized_images/*", 
    "modelVersion": "{{modelVersion}}", 
    "inputs": [ 
      { 
        "name": "image", 
        "source": "/document/normalized_images/*" 
      } 
    ], 
    "outputs": [ 
      { 
        "name": "vector",
  "targetName": "image_vector"
      } 
    ] 
  },  
    {
      "@odata.type": "#Microsoft.Skills.Util.ShaperSkill",
      "name": "shaper-skill",
      "description": "Shaper skill to reshape the data to fit the index schema",
      "context": "/document/normalized_images/*",
      "inputs": [
        {
          "name": "normalized_images",
          "source": "/document/normalized_images/*",
          "inputs": []
        },
        {
          "name": "imagePath",
          "source": "='{{imageProjectionContainer}}/'+$(/document/normalized_images/*/imagePath)",
          "inputs": []
        },
        {
          "name": "dataUri",
          "source": "='data:image/jpeg;base64,'+$(/document/normalized_images/*/data)",
          "inputs": []
        },
        {
          "name": "location_metadata",
          "sourceContext": "/document/normalized_images/*",
          "inputs": [
            {
              "name": "page_number",
              "source": "/document/normalized_images/*/pageNumber"
            },
            {
              "name": "bounding_polygons",
              "source": "/document/normalized_images/*/boundingPolygon"
            }              
          ]
        }          
      ],
      "outputs": [
        {
          "name": "output",
          "targetName": "new_normalized_images"
        }
      ]
    }  
  ],
  "cognitiveServices": {
    "@odata.type": "#Microsoft.Azure.Search.AIServicesByIdentity",
    "subdomainUrl": "{{cognitiveServicesUrl}}",
    "identity": null
  },
  "indexProjections": {
      "selectors": [
        {
          "targetIndexName": "doc-extraction-multimodal-embedding-index",
          "parentKeyFieldName": "text_document_id",
          "sourceContext": "/document/pages/*",
          "mappings": [              
            {
              "name": "content_embedding",
              "source": "/document/pages/*/text_vector"
            },
            {
              "name": "content_text",
              "source": "/document/pages/*"
            },             
            {
              "name": "document_title",
              "source": "/document/document_title"
            }      
          ]
        },
        {
          "targetIndexName": "doc-extraction-multimodal-embedding-index",
          "parentKeyFieldName": "image_document_id",
          "sourceContext": "/document/normalized_images/*",
          "mappings": [                                   
            {
              "name": "content_embedding",
              "source": "/document/normalized_images/*/image_vector"
            },
            {
              "name": "content_path",
              "source": "/document/normalized_images/*/new_normalized_images/imagePath"
            },
            {
              "name": "location_metadata",
              "source": "/document/normalized_images/*/new_normalized_images/location_metadata"
            },                      
            {
              "name": "document_title",
              "source": "/document/document_title"
            }                
          ]
        }
      ],
      "parameters": {
        "projectionMode": "skipIndexingParentDocuments"
      }
  },
  "knowledgeStore": {
    "storageConnectionString": "{{storageConnection}}",
    "identity": null,
    "projections": [
      {
        "files": [
          {
            "storageContainer": "{{imageProjectionContainer}}",
            "source": "/document/normalized_images/*"
          }
        ]
      }
    ]
  }
}

此技能集會擷取文字和影像、將兩者向量化，並塑造影像元數據以投射到索引中。

重點︰

欄位 content_text 會填入使用檔擷取技能擷取的文字，並使用分割技能進行區塊化
content_path 包含指定之影像投影容器內圖像檔案的相對路徑。僅針對在 imageAction 設為 generateNormalizedImages 時從 PDF 中擷取到的影像才會產生此欄位，並可從來源欄位 /document/normalized_images/*/imagePath 擴充的文件中對應。
Azure Vision 的多模態嵌入功能能夠透過相同的技能類型，針對不同輸入（文字或影像）來嵌入文字與視覺資料。欲了解更多資訊，請參閱 Azure Vision 多模態嵌入技能。

建立並執行索引子

建立索引子會在您的搜尋服務上建立索引子。索引器會連線到數據源、載入數據、執行技能集，以及為擴充的數據編製索引。

### Create and run an indexer
POST {{searchUrl}}/indexers?api-version=2025-11-01-preview   HTTP/1.1
  Content-Type: application/json
  api-key: {{searchApiKey}}

{
  "name": "doc-extraction-multimodal-embedding-indexer",
  "dataSourceName": "doc-extraction-multimodal-embedding-ds",
  "targetIndexName": "doc-extraction-multimodal-embedding-index",
  "skillsetName": "doc-extraction-multimodal-embedding-skillset",
  "parameters": {
    "maxFailedItems": -1,
    "maxFailedItemsPerBatch": 0,
    "batchSize": 1,
    "configuration": {
      "allowSkillsetToReadFileData": true
    }
  },
  "fieldMappings": [
    {
      "sourceFieldName": "metadata_storage_name",
      "targetFieldName": "document_title"
    }
  ],
  "outputFieldMappings": []
}

執行查詢

第一個文件載入後，您就可以開始查詢。

### Query the index
POST {{searchUrl}}/indexes/doc-extraction-multimodal-embedding-index/docs/search?api-version=2025-11-01-preview   HTTP/1.1
  Content-Type: application/json
  api-key: {{searchApiKey}}
  
  {
    "search": "*",
    "count": true
  }

發送請求。這是未指定的全文搜索查詢，會傳回索引中標示為可擷取的所有字段，以及文件計數。回應看起來應如下所示：

HTTP/1.1 200 OK
Transfer-Encoding: chunked
Content-Type: application/json; odata.metadata=minimal; odata.streaming=true; charset=utf-8
Content-Encoding: gzip
Vary: Accept-Encoding
Server: Microsoft-IIS/10.0
Strict-Transport-Security: max-age=2592000, max-age=15724800; includeSubDomains
Preference-Applied: odata.include-annotations="*"
OData-Version: 4.0
request-id: 712ca003-9493-40f8-a15e-cf719734a805
elapsed-time: 198
Date: Wed, 30 Apr 2025 23:20:53 GMT
Connection: close

{
  "@odata.count": 100,
  "@search.nextPageParameters": {
    "search": "*",
    "count": true,
    "skip": 50
  },
  "value": [
  ],
  "@odata.nextLink": "https://<YOUR-SEARCH-SERVICE-NAME>.search.windows.net/indexes/doc-extraction-multimodal-embedding-index/docs/search?api-version=2025-11-01-preview "
}

回應中將返回 100 份文件。

在篩選中，您也可以使用邏輯運算子 (and、or、not) 和比較運算子 (eq、ne、gt、lt、ge、le)。字串比較是區分大小寫的。如需詳細資訊和範例，請參閱簡單搜尋查詢的範例。

Note

參數 $filter 只適用於在索引建立期間標示為可篩選的欄位。

### Query for only images
POST {{searchUrl}}/indexes/doc-extraction-multimodal-embedding-index/docs/search?api-version=2025-11-01-preview   HTTP/1.1
  Content-Type: application/json
  api-key: {{searchApiKey}}
  
  {
    "search": "*",
    "count": true,
    "filter": "image_document_id ne null"
  }

### Query for text or images with content related to energy, returning the id, parent document, and text (only populated for text chunks), and the content path where the image is saved in the knowledge store (only populated for images)
POST {{searchUrl}}/indexes/doc-extraction-multimodal-embedding-index/docs/search?api-version=2025-11-01-preview   HTTP/1.1
  Content-Type: application/json
  api-key: {{searchApiKey}}
  

  {
    "search": "energy",
    "count": true,
    "select": "content_id, document_title, content_text, content_path"
  }

重設並重新執行

您可以重設索引子以清除高水位線，以允許完整重新執行。下列 POST 要求適用於重設，後面接著重新執行。

### Reset the indexer
POST {{searchUrl}}/indexers/doc-extraction-multimodal-embedding-indexer/reset?api-version=2025-11-01-preview   HTTP/1.1
  api-key: {{searchApiKey}}

### Run the indexer
POST {{searchUrl}}/indexers/doc-extraction-multimodal-embedding-indexer/run?api-version=2025-11-01-preview   HTTP/1.1
  api-key: {{searchApiKey}}

### Check indexer status 
GET {{searchUrl}}/indexers/doc-extraction-multimodal-embedding-indexer/status?api-version=2025-11-01-preview   HTTP/1.1
  api-key: {{searchApiKey}}

清理資源

如果您使用自己的訂用帳戶，當專案結束時，建議您移除不再需要的資源。讓資源繼續執行可能會產生費用。您可以個別刪除資源，或刪除資源群組以刪除整組資源。

您可以使用 Azure 入口網站來移除索引、索引器和資料來源。

另請參閱

現在您已熟悉多模式索引編製案例的範例實作，請參閱：

意見反應

此頁面對您有幫助嗎？

Last updated on 2025-11-18