Use AI analysis in Data Security Investigations

After AI preparation and vectorization completes for your investigation scope, you're ready to review and use AI analytics tools for the data in the investigation. Generative AI processing conducts a deep content analysis of selected items and can uncover key security and sensitive data risks within impacted data.

To get started with AI analysis in an investigation, complete the following steps:

Go to the Microsoft Purview portal and sign in with the credentials for a user account assigned Data Security Investigations permissions.
Select the Data Security Investigations solution card, then select Investigations in the left nav.
Select an investigation, then select Analysis on the navigation bar.

Tip

Consider increasing the default item display from 50 to 1,000 items for easier selection of multiple items to exclude from the investigation scope.

Use categorization

You can use natural language to ask a question or enter phrases with specific focus to narrow down items for review. There aren't any additional Data Security Investigation Compute Units (compute unit) related capacity costs associated with vector search queries, the previous processing is completed for these scoped items.

To create a vector search, complete the following steps:

Important

You must prepare data for AI analysis before using vector search.

In an investigation, select Analyze > Analysis.
Describe what you're looking for in the Vector search field.
Select Vector search or select enter.

The vector search starts and data items associated with your query are listed in the items area. Review items as applicable.

Vector search and compute units

Using vector search in Data Security Investigations doesn't require many compute units, even for larger amounts of data included in an investigation scope. The following table provides an estimate of the required compute units for different sized data sets when you use vector search.

Amount of data searched	Estimated compute units used
100 MB	0.1
1 GB	0.3
10 GB	3.1

For more information about compute unit capacity and billing, see Billing models in Data Security Investigations.

Tip

Consider adding context to the investigation to help focus categories on specific areas or issues.

When you first open the Analysis page, items aren't categorized. You should configure categorization first because it's helpful to start your triage by grouping items by risk.

Categorization takes time to complete. The completion time depends on the data volume and consumes AI capacity (compute units). To get an initial understanding of incident severity, use AI to categorize impacted data and narrow the focus to high-risk assets. Data Security Investigations sorts data into default, custom, or AI-generated categories, including by subject matter and risk.

To categorize data items in the investigation scope, complete the following steps.

Important

You must prepare data for AI analysis before configuring categorization.

Go to the Microsoft Purview portal and sign in with the credentials for a user account assigned Data Security Investigations permissions.
Select the Data Security Investigations solution card, then select Investigations in the left nav.
Select an investigation, then select Analysis.
Select Categorize.
In the Categorize with AI dialog, complete the following areas to customize your categories as applicable:
- Default categories: Select one or more default categories.
- Suggested categories: Select one or more AI suggested categories. Suggested categories are generated based on the most recent vector search. If searches aren't run, no categories are suggested.
- Custom categories: Select Create category and enter a theme or area to include. Select Save for the custom category.
After configuring your categorization settings, select Save.

After categorization processing completes, select categorization areas or individual subject areas within a category to filter data items for review. Examining the categories helps you quickly identify incident severity and scope.

When you select a specific subject area, a summary for the subject area displays with the following information:

Topic name: The name of the subject area in the category.
Topic description: The description of the subject area generated from AI processing.
Topic impact score: The impact score related to potential risk generated from AI processing.
Total documents in sample: The total number of data items that match the subject area in the investigation scope.

Categorization and compute units

For example, you might see the following categories in your investigation:

Credentials (712 items): This category identifies documents or emails containing passwords or API keys.
Operational Information (356 items): This category identifies items that contain app credentials in a SharePoint site used by a team in your organization.
Internal Communications (122 items): This category identifies chat logs or emails that include shared credentials.

From an incident response perspective in this example, exposed credentials are typically the highest risk and the priority to triage, then the other areas for follow-up investigation.

Using categorization in Data Security Investigations might require a significant number of compute units, even for smaller amounts of data included in an investigation scope. The compute unit requirements are directly proportional to the size of the data categorized, not the overall number of categories selected or the number of custom categories created.

The following table provides an estimate of the required compute units for different sized data sets when using categorization (using 2 to 20 different categories).

Amount of data categorized	Estimated compute units used
100 MB	146
500 MB	734
1 GB	1,470

For more information about compute unit capacity and billing, see Billing models in Data Security Investigations.

Use vector search

Use vector search to describe what you're looking for in the vectorized data items in the investigation scope. Vector-based semantic search enables similarity-based information retrieval and understands user intent beyond literal words. You can query your impacted data to find all assets related to a particular subject, even if keywords are missing. For example, a pharmaceutical company might use vector search to find all emails, documents, Copilot prompts and responses, and Teams messages related to vaccine trials to identify relevant assets that don't mention the words vaccine or trial but remain pertinent to the investigation.

Additionally, Data Security Investigations supports searching and returning results across multiple languages. You can create a vector search in one language, and vector search can also identify items with the same terms in other languages. For example, if you search for shared passwords in English, vector search may identify content in French containing mot de passe or content in Spanish containing contrasena.

You can use natural language to ask a question or enter phrases with specific focus to narrow down items for review. There aren't any additional compute unit related capacity costs associated with vector search queries, the previous processing is completed for these scoped items.

To create a vector search, complete the following steps:

Important

You must prepare data for AI analysis before using vector search.

In an investigation, select the Analyze card or the Analysis tab.
Select Standard mode.
Describe what you're looking for in the search field or select one of the suggested searches.
Select the search arrow or press Enter.

The vector or suggested search starts and lists data items associated with your query in the items area. Each item includes a search relevance score. The results are listed from high to low by default, with the most relevant items listed first. The search relevance measures how closely each result matches the terms of the search that you provided. The search relevance score indicates the confidence level of the connection and helps give you a sense of confidence about how well each result fits your search. The score is only applicable to the current vector search and item scores can change based on the search performed.

Search relevance scores are as follows:

High: Strong connection signals, highly relevant vector search results.
Medium: Moderate connection signals, likely relevant vector search results.
Low: Weak connection signals; possibly less relevant vector search results.

Important

Search relevance scores are shown only when vector search items are returned.

Vector search and compute units

Amount of data searched	Estimated compute units used
100 MB	0.1
1 GB	0.3
10 GB	3.1

For more information about AI capacity and billing, see Billing models in Data Security Investigations.

Use Search with AI (preview)

Use Search with AI (preview) to ask natural language questions or enter keywords with a specific focus to narrow down items for review. Search with AI (preview) supplements vector search and extends AI capabilities when analyzing your data. Metadata for data items is now included with Search with AI (preview), helping you narrow down relevant items based on item file types, sizes, versions, and more.

In addition to relevant items, search results also include a high-level summary of all results. This summary helps you quickly determine if the search items returned are relevant to your search question or keywords. The summary includes citations to specific items returned by the search and relevance scores for each item.

To use Search with AI (preview), complete the following steps:

Important

You must prepare data for AI analysis before using Search with AI (preview).

In an investigation, select the Analyze card or the Analysis tab.
Select Ask AI (preview) mode.
In the Search with AI (preview) pane, enter your question about the data or enter keywords.
After the search is complete, review the search summary, item results, and item AI search details.

The item detail pane displays all items matching the context related to your AI search. Use filters to help focus the results by document and sender or author. Select Document to view a list of cited items included in the results to automatically filter by relevant items. Use actions on the command bar to examine, categorize, or add one or more items to your mitigation plan.

Select an item in the results and select the AI summary view to review the relevance categorization score and a snippet with an extracted example that matches the intent of the search.

Use examination tools

Tip

Consider adding context to the investigation to help focus examination results on specific areas or issues.

Use examination to run deep content analysis with AI on selected data items. This examination helps you find security risks buried within impacted data. By examining impacted data for security risks, you can find credentials, network risks, or evidence of threat actor discussion. Once you identify security risks, you can scan for sensitive data, like personal data, financial, or health information.

In addition to summarizing risks, Data Security Investigations provides mitigation steps and the thought process to explain the assessment. From here, you can add open issues to the mitigation plan, connecting analysis to mitigation. This analysis helps you identify data relevant to your investigation and quickly take action to minimize the impact.

You can choose examination processing for the following focus areas:

Credentials: Credentials processing examines and extracts credentials and access assets included in selected data items.
Risks: Risks processing analyzes and scores selected data items for active risks.
Mitigation: Mitigation processing identifies specific threats and recommends mitigation steps for selected data items.

When the examination process completes for a focus area, select Probing history from the command bar on the right side of the investigation scope page. In the Probing history pane, select View details for a specific examination process.

Examination and compute units

Using examination in Data Security Investigations might require a significant number of compute units, even for smaller amounts of data included in an investigation scope. The compute unit requirements are directly proportional to the size of the data categorized and each examination option selected.

The following table provides an estimate of the required compute units for different sized data sets when using a single examination option (credentials, risk, or mitigation).

Amount of data examined	Estimated compute units used
5 MB	13
50 MB	115
500 MB	1,129

For example, if you want to discover credentials for 50 MB of impacted data associated with the data security incident, you use an estimated 115 compute units. If you also want to include examinations for risks and mitigation insights, you use an estimated 345 compute units.

For more information about compute unit capacity and billing, see Billing models in Data Security Investigations.

Examination process information

Select Probing history from the far-right command bar to display a list of the examination activities for the investigation scope.

The list shows the following summary information for each examination process:

Name: The name of the examination.
Created by: The user principal name (UPN) of the user that created the examination process.
Probe: The probing area selected for the examination.
Scope: The number of items selected for examination.
Date: The creation date of the examination process.
Status: The process status. Values include In progress, Successful, or Failed.

Select View details when the process completes to view the examination report and recommendations.

Next steps

After the examination process completes, review the recommendation summaries that you selected:

Feedback

Was this page helpful?

Last updated on 2026-01-06

Share via

Use AI analysis in Data Security Investigations

Use categorization

Vector search and compute units

Categorization and compute units

Use vector search

Vector search and compute units

Use Search with AI (preview)

Use examination tools

Examination and compute units

Examination process information

Next steps

Feedback

Additional resources