Edit

Share via


Use ai.summarize with pandas

The ai.summarize function uses generative AI to produce summaries of input text, with a single line of code. The function can either summarize values from one column of a DataFrame or values across all the columns.

Note

Overview

The ai.summarize function extends the pandas Series class. To summarize each row value from that column alone, call the function on a pandas DataFrame text column. You can also call the ai.summarize function on an entire DataFrame to summarize values across all the columns.

The function returns a pandas Series that contains summaries, which can be stored in a new DataFrame column.

Syntax

df["summaries"] = df["text"].ai.summarize()

Parameters

Name Description
instructions
Optional
A string that contains more context for the AI model, such as specifying output length, tone, or more. More precise instructions will yield better results.

Returns

The function returns a pandas Series that contains summaries for each input text row. If the input text is null, the result is null.

Example

# This code uses AI. Always review output for mistakes.

df= pd.DataFrame([
        ("Microsoft Teams", "2017",
        """
        The ultimate messaging app for your organization—a workspace for real-time 
        collaboration and communication, meetings, file and app sharing, and even the 
        occasional emoji! All in one place, all in the open, all accessible to everyone.
        """),
        ("Microsoft Fabric", "2023",
        """
        An enterprise-ready, end-to-end analytics platform that unifies data movement, 
        data processing, ingestion, transformation, and report building into a seamless, 
        user-friendly SaaS experience. Transform raw data into actionable insights.
        """)
    ], columns=["product", "release_year", "description"])

df["summaries"] = df["description"].ai.summarize()
display(df)

This example code cell provides the following output:

Screenshot showing a data frame. The 'summaries' column has a summary of the 'description' column only, in the corresponding row.