LLM在使用JSON模式进行约束解码时,返回值'medium'不在允许的枚举['low', 'high']中
LLM returns value 'medium' not in allowed enum ['low', 'high'] when using JSON mode with constrained decoding
ID: llm/structured-output-enum-violation-in-json-mode
版本兼容性
| 版本 | 状态 | 引入 | 弃用 | 备注 |
|---|---|---|---|---|
| openai==1.18.0 | active | — | — | — |
| anthropic==0.32.0 | active | — | — | — |
| gpt-4o-2024-05-13 | active | — | — | — |
| claude-3-haiku-20240307 | active | — | — | — |
| outlines==0.0.34 | active | — | — | — |
| lmql==0.4.0 | active | — | — | — |
根因分析
JSON模式或约束解码(例如,使用函数或response_format)不对字符串值强制执行枚举约束,允许LLM输出指定集合之外的值。
English
JSON mode or constrained decoding (e.g., using functions or response_format) does not enforce enum constraints on string values, allowing the LLM to output values outside the specified set.
官方文档
https://platform.openai.com/docs/guides/structured-outputs解决方案
-
后处理LLM输出以验证枚举值,并拒绝无效值或将其映射到默认值(例如:`if value not in ['low', 'high']: value = 'low'`)。
-
使用像`outlines`或`lmql`这样的约束解码库,强制LLM仅输出与枚举的正则表达式或语法匹配的令牌。
无效尝试
常见但无效的做法:
-
70% 失败
Adding more examples to the prompt with correct enum values doesn't guarantee the LLM will follow them, especially for edge cases.
-
85% 失败
Using a stricter JSON schema with additionalProperties: false doesn't prevent enum violations because the schema doesn't enforce value constraints at generation time.
-
60% 失败
Lowering temperature to 0 doesn't fix enum violations because the model may still sample from improbable tokens.