Use ai.similarity with pandas

The ai.similarity function uses generative AI to compare two string expressions and then calculate a semantic similarity score. It uses only a single line of code. You can compare text values from one column of a DataFrame with a single common text value or with pairwise text values in another column.

Note

This article covers using ai.similarity with pandas. To use ai.similarity with PySpark, see this article.
See other AI functions in this overview article.
Learn how to customize the configuration of AI functions.

Overview

The ai.similarity function extends the pandas Series class.

To calculate the semantic similarity of each input row for a single common text value, call the function on a pandas DataFrame text column. The function can also calculate the semantic similarity of each row for corresponding pairwise values in another column that has the same dimensions as the input column.

The function returns a pandas Series that contains similarity scores, which can be stored in a new DataFrame column.

df["similarity"] = df["col1"].ai.similarity("value")

df["similarity"] = df["col1"].ai.similarity(df["col2"])

Parameters

Name	Description
`other` Required	A string that contains either: - A single common text value, which is used to compute similarity scores for each input row. - Another pandas Series with the same dimensions as the input. It contains text values to use to compute pairwise similarity scores for each input row.

Returns

The function returns a pandas Series that contains similarity scores for each input text row. The output similarity scores are relative, and they're best used for ranking. Score values can range from -1 (opposites) to 1* (identical). A score value of 0 indicates that the values are unrelated in meaning.

Example

Compare with a single value
Compare with pairwise values

# This code uses AI. Always review output for mistakes.

df = pd.DataFrame([ 
        ("Bill Gates"), 
        ("Satya Nadella"), 
        ("Joan of Arc")
    ], columns=["name"])
    
df["similarity"] = df["name"].ai.similarity("Microsoft")
display(df)

This example code cell provides the following output:

# This code uses AI. Always review output for mistakes.

df = pd.DataFrame([ 
        ("Bill Gates", "Technology"), 
        ("Satya Nadella", "Healthcare"), 
        ("Joan of Arc", "Agriculture") 
    ], columns=["names", "industries"])
    
df["similarity"] = df["names"].ai.similarity(df["industries"])
display(df)

This example code cell provides the following output:

Use ai.similarity with PySpark.
Detect sentiment with ai.analyze_sentiment.
Categorize text with ai.classify.
Generate vector embeddings with ai.embed.
Extract entities with ai_extract.
Fix grammar with ai.fix_grammar.
Answer custom user prompts with ai.generate_response.
Summarize text with ai.summarize.
Translate text with ai.translate.
Learn more about the full set of AI functions.
Customize the configuration of AI functions.
Did we miss a feature you need? Suggest it on the Fabric Ideas forum.

Feedback

Was this page helpful?

Last updated on 2025-11-21