使法官与人类协调一致

法官对齐教 LLM 法官通过系统反馈来匹配人类评估标准。此过程将泛型评估师转变为了解你独特的质量标准的领域特定专家，与基线评委相比，将与人工评估的协议提高30%至50%。

判断对齐遵循三步工作流：

生成初始评估：创建法官并评估跟踪以建立基线
收集人工反馈：域专家审阅并纠正评估员的判断
对齐和部署：使用 SIMBA 优化器根据人工反馈改进判断

该系统使用简化引导聚合 (SIMBA) 作为默认优化策略，利用 DSPy 的实现来迭代优化评估指令。

要求

MLflow 3.4.0 或更高版本用于使用评判对齐功能

%pip install --upgrade "mlflow[databricks]>=3.4.0"
dbutils.library.restartPython()

已使用创建了一个法官
人工反馈评估名称必须与评审名称完全匹配。例如，如果你的法官名为product_quality，那么你的人类反馈也必须使用相同的名称product_quality。
对齐方式适用于使用 make_judge() 基于模板的评估创建的法官。

步骤 1：创建评审并生成痕迹

创建初始判断，并使用评估生成跟踪。至少需要 10 个跟踪，但 50-100 个跟踪提供更好的对齐结果。

from mlflow.genai.judges import make_judge
import mlflow

# Create an MLflow experiment for alignment
experiment_id = mlflow.create_experiment("product-quality-alignment")
mlflow.set_experiment(experiment_id=experiment_id)

# Create initial judge with template-based evaluation
initial_judge = make_judge(
    name="product_quality",
    instructions=(
        "Evaluate if the product description in {{ outputs }} "
        "is accurate and helpful for the query in {{ inputs }}. "
        "Rate as: excellent, good, fair, or poor"
    ),
    model="databricks:/databricks-gpt-oss-120b",
)

生成跟踪记录并运行评测程序。

# Generate traces for alignment (minimum 10, recommended 50+)
traces = []
for i in range(50):
    with mlflow.start_span(f"product_description_{i}") as span:
        # Your application logic here
        query = f"Tell me about product {i}"
        description = generate_product_description(query)  # Replace with your application logic

        # Log inputs and outputs
        span.set_inputs({"query": query})
        span.set_outputs({"description": description})
        traces.append(span.trace_id)

# Run initial judge on all traces
for trace_id in traces:
    trace = mlflow.get_trace(trace_id)
    inputs = trace.data.spans[0].inputs
    outputs = trace.data.spans[0].outputs

    # Generate judge assessment
    judge_result = initial_judge(inputs=inputs, outputs=outputs)

    # Log judge feedback to the trace
    mlflow.log_feedback(
        trace_id=trace_id,
        name="product_quality",
        value=judge_result.value,
        rationale=judge_result.rationale,
    )

步骤 2：收集人工反馈

收集人工反馈，以训练评审掌握你的质量标准。从以下方法中进行选择：

Databricks 用户界面评审

在以下情况下收集人工反馈：

你需要域专家来评审输出
你希望以迭代方式优化反馈条件
你正在使用较小的数据集（< 100 个示例）

使用 MLflow UI 手动查看并提供反馈：

导航到 Databricks 工作区中的 MLflow 试验
单击“ 评估 ”选项卡以查看跟踪
查看每个踪迹及其评审
使用 UI 的反馈界面添加人工反馈
确保反馈名称与评审名称完全匹配（“product_quality”）

编程反馈

在以下情况下使用编程反馈：

你有预先存在的基础事实标注
你正在使用大型数据集（100 多个示例）
你需要可重现的反馈集合

如果有现有的地实标签，请以编程方式记录它们：

from mlflow.entities import AssessmentSource, AssessmentSourceType

# Your ground truth data
ground_truth_data = [
    {"trace_id": traces[0], "label": "excellent", "rationale": "Comprehensive and accurate description"},
    {"trace_id": traces[1], "label": "poor", "rationale": "Missing key product features"},
    {"trace_id": traces[2], "label": "good", "rationale": "Accurate but could be more detailed"},
    # ... more ground truth labels
]

# Log human feedback for each trace
for item in ground_truth_data:
    mlflow.log_feedback(
        trace_id=item["trace_id"],
        name="product_quality",  # Must match judge name
        value=item["label"],
        rationale=item.get("rationale", ""),
        source=AssessmentSource(
            source_type=AssessmentSourceType.HUMAN,
            source_id="ground_truth_dataset"
        ),
    )

反馈收集的最佳做法

不同的审阅者：包括多个领域专家，以捕获各种观点
平衡示例：至少包括 30% 的负面示例（差评/中评）。
明确的理由：提供分级的详细说明
代表性示例：涵盖边缘事例和常见方案

步骤 3：对齐并注册法官

拥有足够的人工反馈后，使法官保持一致：

默认优化器（建议）

MLflow 使用 DSPy 的 SIMBA 实现（简化的多启动聚合）提供默认对齐优化器。在不指定优化器的情况下调用 align() 时，会自动使用 SIMBA 优化器：

from mlflow.genai.judges.optimizers import SIMBAAlignmentOptimizer

# Retrieve traces with both judge and human assessments
traces_for_alignment = mlflow.search_traces(
    experiment_ids=[experiment_id],
    max_results=100,
    return_type="list"
)

# Filter for traces with both judge and human feedback
# Only traces with both assessments can be used for alignment
valid_traces = []
for trace in traces_for_alignment:
    feedbacks = trace.search_assessments(name="product_quality")
    has_judge = any(f.source.source_type == "LLM_JUDGE" for f in feedbacks)
    has_human = any(f.source.source_type == "HUMAN" for f in feedbacks)
    if has_judge and has_human:
        valid_traces.append(trace)

if len(valid_traces) >= 10:
    # Create SIMBA optimizer with Databricks model
    optimizer = SIMBAAlignmentOptimizer(
        model="databricks:/databricks-gpt-oss-120b"
    )

    # Align the judge based on human feedback
    aligned_judge = initial_judge.align(optimizer, valid_traces)

    # Register the aligned judge for production use
    aligned_judge.register(
        experiment_id=experiment_id,
        name="product_quality_aligned",
        tags={"alignment_date": "2025-10-23", "num_traces": str(len(valid_traces))}
    )

    print(f"Successfully aligned judge using {len(valid_traces)} traces")
else:
    print(f"Insufficient traces for alignment. Found {len(valid_traces)}, need at least 10")

显式优化器

from mlflow.genai.judges.optimizers import SIMBAAlignmentOptimizer

# Retrieve traces with both judge and human assessments
traces_for_alignment = mlflow.search_traces(
    experiment_ids=[experiment_id], max_results=15, return_type="list"
)

# Align the judge using human corrections (minimum 10 traces recommended)
if len(traces_for_alignment) >= 10:
    # Explicitly specify SIMBA with custom model configuration
    optimizer = SIMBAAlignmentOptimizer(model="databricks:/databricks-gpt-oss-120b")
    aligned_judge = initial_judge.align(optimizer, traces_for_alignment)

    # Register the aligned judge
    aligned_judge.register(experiment_id=experiment_id)
    print("Judge aligned successfully with human feedback")
else:
    print(f"Need at least 10 traces for alignment, have {len(traces_for_alignment)}")

启用详细日志记录

若要监视对齐过程，请为 SIMBA 优化器启用调试日志记录：

import logging

# Enable detailed SIMBA logging
logging.getLogger("mlflow.genai.judges.optimizers.simba").setLevel(logging.DEBUG)

# Run alignment with verbose output
aligned_judge = initial_judge.align(optimizer, valid_traces)

验证对齐

验证对齐是否改进了判断：


def test_alignment_improvement(
    original_judge, aligned_judge, test_traces: list
) -> dict:
    """Compare judge performance before and after alignment."""

    original_correct = 0
    aligned_correct = 0

    for trace in test_traces:
        # Get human ground truth from trace assessments
        feedbacks = trace.search_assessments(type="feedback")
        human_feedback = next(
            (f for f in feedbacks if f.source.source_type == "HUMAN"), None
        )

        if not human_feedback:
            continue

        # Get judge evaluations
        # Judges can evaluate entire traces instead of individual inputs/outputs
        original_eval = original_judge(trace=trace)
        aligned_eval = aligned_judge(trace=trace)

        # Check agreement with human
        if original_eval.value == human_feedback.value:
            original_correct += 1
        if aligned_eval.value == human_feedback.value:
            aligned_correct += 1

    total = len(test_traces)
    return {
        "original_accuracy": original_correct / total,
        "aligned_accuracy": aligned_correct / total,
        "improvement": (aligned_correct - original_correct) / total,
    }

创建自定义对齐优化器

请扩展基类以实现专用对齐策略。

from mlflow.genai.judges.base import AlignmentOptimizer, Judge
from mlflow.entities.trace import Trace

class MyCustomOptimizer(AlignmentOptimizer):
    """Custom optimizer implementation for judge alignment."""

    def __init__(self, model: str = None, **kwargs):
        """Initialize your optimizer with custom parameters."""
        self.model = model
        # Add any custom initialization logic

    def align(self, judge: Judge, traces: list[Trace]) -> Judge:
        """
        Implement your alignment algorithm.

        Args:
            judge: The judge to be optimized
            traces: List of traces containing human feedback

        Returns:
            A new Judge instance with improved alignment
        """
        # Your custom alignment logic here
        # 1. Extract feedback from traces
        # 2. Analyze disagreements between judge and human
        # 3. Generate improved instructions
        # 4. Return new judge with better alignment

        # Example: Return judge with modified instructions
        from mlflow.genai.judges import make_judge

        improved_instructions = self._optimize_instructions(judge.instructions, traces)

        return make_judge(
            name=judge.name,
            instructions=improved_instructions,
            model=judge.model,
        )

    def _optimize_instructions(self, instructions: str, traces: list[Trace]) -> str:
        """Your custom optimization logic."""
        # Implement your optimization strategy
        pass

# Create your custom optimizer
custom_optimizer = MyCustomOptimizer(model="your-model")

# Use it for alignment
aligned_judge = initial_judge.align(traces_with_feedback, custom_optimizer)

局限性

判定对齐不支持代理驱动或期望驱动的评估。

后续步骤

了解生产监控以大规模部署协调评估器
有关互补确定性指标，请参阅基于代码的记分器

反馈

此页面是否有帮助？

Last updated on 2025-11-15