概览
目标: 测试 LLM 失败,例如幻觉
时间: 15 分钟
Plugins:LanguageModelFailurePlugin
先决条件:设置开发代理
生成与大型语言模型(LLM)集成的应用时,应测试应用如何处理各种 LLM 故障方案。 开发代理允许您使用 LanguageModelFailurePlugin,在应用中使用的任何 LLM API 上模拟真实的语言模型故障。
在任何 LLM API 上模拟语言模型失败
若要开始,请在 LanguageModelFailurePlugin 配置文件中启用。
文件: devproxyrc.json
{
"$schema": "https://raw.githubusercontent.com/dotnet/dev-proxy/main/schemas/v2.0.0/rc.schema.json",
"plugins": [
{
"name": "LanguageModelFailurePlugin",
"enabled": true,
"pluginPath": "~appFolder/plugins/DevProxy.Plugins.dll",
"urlsToWatch": [
"https://api.openai.com/*",
"http://localhost:11434/*"
]
}
]
}
使用此基本配置,插件会随机从所有可用的故障类型中进行选择,并将其应用于匹配的语言模型 API 请求。
配置特定失败方案
若要测试特定的故障方案,请将插件配置为使用特定故障类型:
文件: devproxyrc.json(包含失败类型)
{
"$schema": "https://raw.githubusercontent.com/dotnet/dev-proxy/main/schemas/v2.0.0/rc.schema.json",
"plugins": [
{
"name": "LanguageModelFailurePlugin",
"enabled": true,
"pluginPath": "~appFolder/plugins/DevProxy.Plugins.dll",
"configSection": "languageModelFailurePlugin",
"urlsToWatch": [
"https://api.openai.com/*",
"http://localhost:11434/*"
]
}
],
"languageModelFailurePlugin": {
"$schema": "https://raw.githubusercontent.com/dotnet/dev-proxy/main/schemas/v2.0.0/languagemodelfailureplugin.schema.json",
"failures": [
"Hallucination",
"PlausibleIncorrect",
"BiasStereotyping"
]
}
}
此配置仅模拟不正确的信息、合理但不正确的响应和有偏见的内容。
测试不同的 LLM API
可以通过使用不同的 URL 模式配置插件的多个实例来测试不同的 LLM API:
文件: devproxyrc.json(多个插件实例)
{
"$schema": "https://raw.githubusercontent.com/dotnet/dev-proxy/main/schemas/v2.0.0/rc.schema.json",
"plugins": [
{
"name": "LanguageModelFailurePlugin",
"enabled": true,
"pluginPath": "~appFolder/plugins/DevProxy.Plugins.dll",
"configSection": "openaiFailures",
"urlsToWatch": [
"https://api.openai.com/*"
]
},
{
"name": "LanguageModelFailurePlugin",
"enabled": true,
"pluginPath": "~appFolder/plugins/DevProxy.Plugins.dll",
"configSection": "ollamaFailures",
"urlsToWatch": [
"http://localhost:11434/*"
]
}
],
"openaiFailures": {
"$schema": "https://raw.githubusercontent.com/dotnet/dev-proxy/main/schemas/v2.0.0/languagemodelfailureplugin.schema.json",
"failures": ["Hallucination", "OutdatedInformation"]
},
"ollamaFailures": {
"$schema": "https://raw.githubusercontent.com/dotnet/dev-proxy/main/schemas/v2.0.0/languagemodelfailureplugin.schema.json",
"failures": ["Overgeneralization", "IncorrectFormatStyle"]
}
}
小窍门
为不同的 LLM 提供程序配置不同的失败方案,以测试应用如何处理提供程序特定的行为。 将 configSection 命名为您正在测试的 LLM 服务的名称,以便使配置更易于理解和维护。
常见测试方案
下面是针对不同测试方案建议的一些故障组合:
测试内容准确性
测试应用如何处理错误或误导性信息:
文件: devproxyrc.json(仅 languageModelFailurePlugin 部分)
{
"languageModelFailurePlugin": {
"$schema": "https://raw.githubusercontent.com/dotnet/dev-proxy/main/schemas/v2.0.0/languagemodelfailureplugin.schema.json",
"failures": [
"Hallucination",
"PlausibleIncorrect",
"OutdatedInformation",
"ContradictoryInformation"
]
}
}
测试偏见和公平性
测试应用如何响应有偏见或陈规定型内容:
文件: devproxyrc.json(仅 languageModelFailurePlugin 部分)
{
"languageModelFailurePlugin": {
"$schema": "https://raw.githubusercontent.com/dotnet/dev-proxy/main/schemas/v2.0.0/languagemodelfailureplugin.schema.json",
"failures": [
"BiasStereotyping",
"Overgeneralization"
]
}
}
以下测试说明
测试应用如何处理不遵循说明的响应:
文件: devproxyrc.json(仅 languageModelFailurePlugin 部分)
{
"languageModelFailurePlugin": {
"$schema": "https://raw.githubusercontent.com/dotnet/dev-proxy/main/schemas/v2.0.0/languagemodelfailureplugin.schema.json",
"failures": [
"FailureFollowInstructions",
"Misinterpretation",
"IncorrectFormatStyle"
]
}
}
测试响应质量
测试应用如何处理模糊或过于复杂的响应:
文件: devproxyrc.json(仅限 languageModelFailurePlugin 节)
{
"languageModelFailurePlugin": {
"$schema": "https://raw.githubusercontent.com/dotnet/dev-proxy/main/schemas/v2.0.0/languagemodelfailureplugin.schema.json",
"failures": [
"AmbiguityVagueness",
"OverSpecification",
"CircularReasoning",
"FailureDisclaimHedge"
]
}
}
使用您的配置文件启动开发代理,然后利用应用程序查看其如何处理模拟的语言模型故障。 插件截获来自语言模型 API 的响应,并将其替换为显示已配置的故障行为的综合故障响应。
创建自定义失败方案
可以通过将文件添加到.prompty~appFolder/prompts目录来创建自定义失败方案。 例如,若要创建“技术行话过度使用”故障:
创建名为
lmfailure_technical-jargon-overuse.prompty的文件文件
.prompty中的故障行为的定义:--- name: Technical Jargon Overuse model: api: chat sample: scenario: Simulate a response that overuses technical jargon and unnecessarily complex terminology, making simple concepts difficult to understand. --- user: How do I create a simple web page? user: You are a language model under evaluation. Your task is to simulate incorrect responses. {{scenario}} Do not try to correct the error. Do not explain or justify the mistakes. The goal is to simulate them as realistically as possible for evaluation purposes.在配置中将其引用为
TechnicalJargonOveruse文件: devproxyrc.json(仅 languageModelFailurePlugin 部分)
{ "languageModelFailurePlugin": { "$schema": "https://raw.githubusercontent.com/dotnet/dev-proxy/main/schemas/v2.0.0/languagemodelfailureplugin.schema.json", "failures": [ "TechnicalJargonOveruse", "Hallucination" ] } }
后续步骤
详细了解 LanguageModelFailurePlugin。