Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
This article lists the metrics used when you evaluate the system of Edge RAG Preview, enabled by Azure Arc. For more information, see Evaluate the Edge RAG system
Important
Edge RAG Preview, enabled by Azure Arc is currently in PREVIEW. See the Supplemental Terms of Use for Microsoft Azure Previews for legal terms that apply to Azure features that are in beta, preview, or otherwise not yet released into general availability.
Generation metrics
The following metrics for evaluate the quality of generated responses.
| Metric | Description |
|---|---|
| Correctness | Evaluates the accuracy and factual validity of generated responses against the expected responses (ground truth). Range score: 1-5 |
| Groundedness | Evaluates the degree to which the responses generated by the generative AI application correspond with the information provided from the retrieved documents. Range score: 1-5 |
| Relevancy | Evaluates the degree to which the responses generated by the generative AI application are appropriate and directly correspond to the provided input. Range score: 1-5 |
| Rouge L | Measures the longest common subsequence between the generated text and reference text. Range score: 0-1 |
| Bleu | Evaluates the quality of generated text by comparing it to expected responses (ground truth) while penalizing on the brevity. Range score: 0-1 |
| Meteor | METEOR (Metric for Evaluation of Translation with Explicit Ordering) evaluates the quality of generated text by comparing it to expected responses (ground truth) while penalizing on misalignment in fragments of the actual vs. expected sentences. Range score: 0-1 |
Information retrieval metrics
The following metrics for evaluate the retrieval performance.
| Metric | Description |
|---|---|
| Precision | Measures the proportion of correctly retrieved documents among all retrieved document. Range score: 0-1 |
| Recall | Measures the proportion of retrieved documents among all relevant documents. Range score: 0-1 |
| MRR | Mean reciprocal rank (MRR) measures the quality of document ranking based on the position of the first relevant document. Range score: 0-1 |