Edit

Share via


Evaluate the Edge RAG Preview system

Evaluate the system, models, and datasets within Edge RAG Preview, enabled by Azure Arc. There are two types of evaluations: baseline, and automatic.

Important

Edge RAG Preview, enabled by Azure Arc is currently in PREVIEW. See the Supplemental Terms of Use for Microsoft Azure Previews for legal terms that apply to Azure features that are in beta, preview, or otherwise not yet released into general availability.

Prerequisites

Before you begin:

Run baseline check

The baseline check evaluates the functionality of the RAG system to make sure it's working as expected. It runs the following tasks:

  • Creates an ingestion build in the documents dataset.
  • Inferences by using the build of a test dataset that includes set of queries and expected answers.
  • Evaluates system based on model metrics.

To run a baseline check:

  1. Go to the developer portal using the domain name provided at deployment and app registration. For example: https://arcrag.contoso.com.

  2. Sign in with developer credentials that have both "EdgeRAGDeveloper" and "EdgeRAGEndUser" roles assigned.

  3. Select the Evaluation tab.

    A screenshot showing the Evaluation tab in the developer portal, highlighting options for running checks and managing evaluations.

  4. On the Baseline check tab, select Run a check.

  5. Enter a name for your evaluation.

    A screenshot showing the Evaluation tab in the developer portal, with options for running checks and managing evaluations.

  6. Select Run.

  7. Review the evaluation status.

    A screenshot showing the evaluation status page in the developer portal, displaying the progress and details of the baseline check.

  8. When the evaluation is completed, select the name to see the results.

    A screenshot showing the evaluation results, including metrics and detailed performance analysis of the RAG system.

Run automatic evaluation

The automatic evaluation evaluates the quality of the RAG system by using your own documents and dataset.

  1. In the developer portal, select Evaluation > Automatic evaluation.

    Screenshot of the Automatic Evaluation tab in the developer portal with options for creating evaluations.

  2. Select Create an automated evaluation.

  3. Enter a name for your evaluation.

    A screenshot of the basic information tab, with fields for entering the evaluation name and configuration options.

  4. Review the parameters like Temperature, Top-N, Top-P, and System prompt. These parameters are derived from the Chat playground. To change the parameters, go to the Chat tab and change them as needed.

  5. Select Next.

  6. Under Test dataset, select Download dataset sample to get familiar with the required structure of the test dataset JSONL format.

    Screenshot of the test dataset tab  where you can download a template and update the dataset.

  7. Upload your dataset JSONL file.

  8. Select Next.

  9. Select the Metrics you want to evaluate for your RAG system.

    A screenshot that shows the available metrics to evaluate your system.

  10. Select Next.

  11. Review the configurations and select Create.

    Screenshot of the tab that summarizes your configuration for the automatic evaluation.

  12. Monitor the progress and the status of the evaluation.

    Screenshot shows the results of an automatic evaluation, including metrics and evaluation details.

  13. After the evaluation completes, review the results by selecting on the evaluation name.

    Screenshot of the evaluation results page in the developer portal, displaying metrics, and performance analysis for the RAG system.

  14. Review the evaluation details and metrics.

    Screenshot of the evaluation details page in the developer portal, showing metrics, configurations, and detailed analysis for the RAG system.