在语言模型失败条件下测试我的应用

概览
目标: 测试 LLM 失败,例如幻觉
时间: 15 分钟
Plugins:LanguageModelFailurePlugin
先决条件:设置开发代理

生成与大型语言模型(LLM)集成的应用时,应测试应用如何处理各种 LLM 故障方案。 开发代理允许您使用 LanguageModelFailurePlugin,在应用中使用的任何 LLM API 上模拟真实的语言模型故障。

在任何 LLM API 上模拟语言模型失败

若要开始,请在 LanguageModelFailurePlugin 配置文件中启用。

文件: devproxyrc.json

{
  "$schema": "https://raw.githubusercontent.com/dotnet/dev-proxy/main/schemas/v2.0.0/rc.schema.json",
  "plugins": [
    {
      "name": "LanguageModelFailurePlugin",
      "enabled": true,
      "pluginPath": "~appFolder/plugins/DevProxy.Plugins.dll",
      "urlsToWatch": [
        "https://api.openai.com/*",
        "http://localhost:11434/*"
      ]
    }
  ]
}

使用此基本配置,插件会随机从所有可用的故障类型中进行选择,并将其应用于匹配的语言模型 API 请求。

配置特定失败方案

若要测试特定的故障方案,请将插件配置为使用特定故障类型:

文件: devproxyrc.json(包含失败类型)

{
  "$schema": "https://raw.githubusercontent.com/dotnet/dev-proxy/main/schemas/v2.0.0/rc.schema.json",
  "plugins": [
    {
      "name": "LanguageModelFailurePlugin",
      "enabled": true,
      "pluginPath": "~appFolder/plugins/DevProxy.Plugins.dll",
      "configSection": "languageModelFailurePlugin",
      "urlsToWatch": [
        "https://api.openai.com/*",
        "http://localhost:11434/*"
      ]
    }
  ],
  "languageModelFailurePlugin": {
    "$schema": "https://raw.githubusercontent.com/dotnet/dev-proxy/main/schemas/v2.0.0/languagemodelfailureplugin.schema.json",
    "failures": [
      "Hallucination",
      "PlausibleIncorrect",
      "BiasStereotyping"
    ]
  }
}

此配置仅模拟不正确的信息、合理但不正确的响应和有偏见的内容。

测试不同的 LLM API

可以通过使用不同的 URL 模式配置插件的多个实例来测试不同的 LLM API:

文件: devproxyrc.json(多个插件实例)

{
  "$schema": "https://raw.githubusercontent.com/dotnet/dev-proxy/main/schemas/v2.0.0/rc.schema.json",
  "plugins": [
    {
      "name": "LanguageModelFailurePlugin",
      "enabled": true,
      "pluginPath": "~appFolder/plugins/DevProxy.Plugins.dll",
      "configSection": "openaiFailures",
      "urlsToWatch": [
        "https://api.openai.com/*"
      ]
    },
    {
      "name": "LanguageModelFailurePlugin",
      "enabled": true,
      "pluginPath": "~appFolder/plugins/DevProxy.Plugins.dll",
      "configSection": "ollamaFailures",
      "urlsToWatch": [
        "http://localhost:11434/*"
      ]
    }
  ],
  "openaiFailures": {
    "$schema": "https://raw.githubusercontent.com/dotnet/dev-proxy/main/schemas/v2.0.0/languagemodelfailureplugin.schema.json",
    "failures": ["Hallucination", "OutdatedInformation"]
  },
  "ollamaFailures": {
    "$schema": "https://raw.githubusercontent.com/dotnet/dev-proxy/main/schemas/v2.0.0/languagemodelfailureplugin.schema.json",
    "failures": ["Overgeneralization", "IncorrectFormatStyle"]
  }
}

小窍门

为不同的 LLM 提供程序配置不同的失败方案,以测试应用如何处理提供程序特定的行为。 将 configSection 命名为您正在测试的 LLM 服务的名称,以便使配置更易于理解和维护。

常见测试方案

下面是针对不同测试方案建议的一些故障组合:

测试内容准确性

测试应用如何处理错误或误导性信息:

文件: devproxyrc.json(仅 languageModelFailurePlugin 部分)

{
  "languageModelFailurePlugin": {
    "$schema": "https://raw.githubusercontent.com/dotnet/dev-proxy/main/schemas/v2.0.0/languagemodelfailureplugin.schema.json",
    "failures": [
      "Hallucination",
      "PlausibleIncorrect",
      "OutdatedInformation",
      "ContradictoryInformation"
    ]
  }
}

测试偏见和公平性

测试应用如何响应有偏见或陈规定型内容:

文件: devproxyrc.json(仅 languageModelFailurePlugin 部分)

{
  "languageModelFailurePlugin": {
    "$schema": "https://raw.githubusercontent.com/dotnet/dev-proxy/main/schemas/v2.0.0/languagemodelfailureplugin.schema.json",
    "failures": [
      "BiasStereotyping",
      "Overgeneralization"
    ]
  }
}

以下测试说明

测试应用如何处理不遵循说明的响应:

文件: devproxyrc.json(仅 languageModelFailurePlugin 部分)

{
  "languageModelFailurePlugin": {
    "$schema": "https://raw.githubusercontent.com/dotnet/dev-proxy/main/schemas/v2.0.0/languagemodelfailureplugin.schema.json",
    "failures": [
      "FailureFollowInstructions",
      "Misinterpretation",
      "IncorrectFormatStyle"
    ]
  }
}

测试响应质量

测试应用如何处理模糊或过于复杂的响应:

文件: devproxyrc.json(仅限 languageModelFailurePlugin 节)

{
  "languageModelFailurePlugin": {
    "$schema": "https://raw.githubusercontent.com/dotnet/dev-proxy/main/schemas/v2.0.0/languagemodelfailureplugin.schema.json",
    "failures": [
      "AmbiguityVagueness",
      "OverSpecification",
      "CircularReasoning",
      "FailureDisclaimHedge"
    ]
  }
}

使用您的配置文件启动开发代理,然后利用应用程序查看其如何处理模拟的语言模型故障。 插件截获来自语言模型 API 的响应,并将其替换为显示已配置的故障行为的综合故障响应。

创建自定义失败方案

可以通过将文件添加到.prompty~appFolder/prompts目录来创建自定义失败方案。 例如,若要创建“技术行话过度使用”故障:

  1. 创建名为 lmfailure_technical-jargon-overuse.prompty 的文件

  2. 文件 .prompty 中的故障行为的定义:

    ---
    name: Technical Jargon Overuse
    model:
      api: chat
    sample:
      scenario: Simulate a response that overuses technical jargon and unnecessarily complex terminology, making simple concepts difficult to understand.
    ---
    
    user:
    How do I create a simple web page?
    
    user:
    You are a language model under evaluation. Your task is to simulate incorrect responses. {{scenario}} Do not try to correct the error. Do not explain or justify the mistakes. The goal is to simulate them as realistically as possible for evaluation purposes.
    
  3. 在配置中将其引用为 TechnicalJargonOveruse

    文件: devproxyrc.json(仅 languageModelFailurePlugin 部分)

    {
      "languageModelFailurePlugin": {
        "$schema": "https://raw.githubusercontent.com/dotnet/dev-proxy/main/schemas/v2.0.0/languagemodelfailureplugin.schema.json",
        "failures": [
          "TechnicalJargonOveruse",
          "Hallucination"
        ]
      }
    }
    

后续步骤

详细了解 LanguageModelFailurePlugin