Edit

Share via


Metrics for evaluating the Edge RAG Preview system

This article lists the metrics used when you evaluate the system of Edge RAG Preview, enabled by Azure Arc. For more information, see Evaluate the Edge RAG system

Important

Edge RAG Preview, enabled by Azure Arc is currently in PREVIEW. See the Supplemental Terms of Use for Microsoft Azure Previews for legal terms that apply to Azure features that are in beta, preview, or otherwise not yet released into general availability.

Generation metrics

The following metrics for evaluate the quality of generated responses.

Metric Description
Correctness Evaluates the accuracy and factual validity of generated responses against the expected responses (ground truth).

Range score: 1-5
Groundedness Evaluates the degree to which the responses generated by the generative AI application correspond with the information provided from the retrieved documents.

Range score: 1-5
Relevancy Evaluates the degree to which the responses generated by the generative AI application are appropriate and directly correspond to the provided input.

Range score: 1-5
Rouge L Measures the longest common subsequence between the generated text and reference text.

Range score: 0-1
Bleu Evaluates the quality of generated text by comparing it to expected responses (ground truth) while penalizing on the brevity.

Range score: 0-1
Meteor METEOR (Metric for Evaluation of Translation with Explicit Ordering) evaluates the quality of generated text by comparing it to expected responses (ground truth) while penalizing on misalignment in fragments of the actual vs. expected sentences.

Range score: 0-1

Information retrieval metrics

The following metrics for evaluate the retrieval performance.

Metric Description
Precision Measures the proportion of correctly retrieved documents among all retrieved document.

Range score: 0-1
Recall Measures the proportion of retrieved documents among all relevant documents.

Range score: 0-1
MRR Mean reciprocal rank (MRR) measures the quality of document ranking based on the position of the first relevant document.

Range score: 0-1

Evaluate the Edge RAG system