Use ai.embed with PySpark

The ai.embed function uses generative AI to convert text into vector embeddings. These vectors let AI understand relationships between texts, so you can search, group, and compare content based on meaning rather than exact wording. With a single line of code, you can generate vector embeddings from a column in a DataFrame.

Note

This article covers using ai.embed with PySpark. To use ai.embed with pandas, see this article.
See other AI functions in this overview article.
Learn how to customize the configuration of AI functions.

Overview

The ai.embed function is available for Spark DataFrames. You must specify the name of an existing input column as a parameter.

The function returns a new DataFrame that includes embeddings for each row of input text, in an output column.

Syntax

df.ai.embed(input_col="col1", output_col="embed")

Parameters

Name	Description
`input_col` Required	A string that contains the name of an existing column with input text values to use for computing embeddings.
`output_col` Optional	A string that contains the name of a new column to store calculated embeddings for each input text row. If you don't set this parameter, a default name generates for the output column.
`error_col` Optional	A string that contains the name of a new column that stores any OpenAI errors that result from processing each input text row. If you don't set this parameter, a default name generates for the error column. If an input row has no errors, this column has a `null` value.

Returns

The function returns a Spark DataFrame that includes a new column that contains generated embeddings for each input text row. Embeddings are of the type [pyspark.ml.linalg.DenseVector])https://spark.apache.org/docs/latest/api/python/reference/api/pyspark.ml.linalg.DenseVector.html#densevector). The number of elements in the DenseVector depends on the embedding model's dimensions, which are configurable in AI functions

Example

# This code uses AI. Always review output for mistakes. 
# Read terms: https://azure.microsoft.com/support/legal/preview-supplemental-terms/.

df = spark.createDataFrame([
        ("This duvet, lovingly hand-crafted from all-natural fabric, is perfect for a good night's sleep.",), 
        ("Tired of friends judging your baking? With these handy-dandy measuring cups, you'll create culinary delights.",), 
        ("Enjoy this *BRAND NEW CAR!* A compact SUV perfect for the professional commuter!",) 
    ], ["descriptions"])

embed = df.ai.embed(input_col="descriptions", output_col="embed")
display(embed)

This example code cell provides the following output:

Use ai.embed with pandas.
Detect sentiment with ai.analyze_sentiment.
Categorize text with ai.classify.
Extract entities with ai_extract.
Fix grammar with ai.fix_grammar.
Answer custom user prompts with ai.generate_response.
Calculate similarity with ai.similarity.
Summarize text with ai.summarize.
Translate text with ai.translate.
Learn more about the full set of AI functions.
Customize the configuration of AI functions.
Did we miss a feature you need? Suggest it on the Fabric Ideas forum.

Feedback

Was this page helpful?

Last updated on 2025-11-21