llm data_error ai_generated true

LLM returns value 'medium' not in allowed enum ['low', 'high'] when using JSON mode with constrained decoding

ID: llm/structured-output-enum-violation-in-json-mode

Also available as: JSON · Markdown · 中文

92%Fix Rate

86%Confidence

1Evidence

2024-05-10First Seen

Version Compatibility

Version	Status	Introduced	Deprecated	Notes
openai==1.18.0	active	—	—	—
anthropic==0.32.0	active	—	—	—
gpt-4o-2024-05-13	active	—	—	—
claude-3-haiku-20240307	active	—	—	—
outlines==0.0.34	active	—	—	—
lmql==0.4.0	active	—	—	—

Root Cause

JSON mode or constrained decoding (e.g., using functions or response_format) does not enforce enum constraints on string values, allowing the LLM to output values outside the specified set.

generic

中文

JSON模式或约束解码（例如，使用函数或response_format）不对字符串值强制执行枚举约束，允许LLM输出指定集合之外的值。

Official Documentation

https://platform.openai.com/docs/guides/structured-outputs

Workarounds

95% success Post-process the LLM output to validate enum values and either reject or map invalid values to a default (e.g., `if value not in ['low', 'high']: value = 'low'`).
```
Post-process the LLM output to validate enum values and either reject or map invalid values to a default (e.g., `if value not in ['low', 'high']: value = 'low'`).
```
90% success Use a constrained decoding library like `outlines` or `lmql` that forces the LLM to output only tokens matching the regex or grammar of the enum.
```
Use a constrained decoding library like `outlines` or `lmql` that forces the LLM to output only tokens matching the regex or grammar of the enum.
```

中文步骤

后处理LLM输出以验证枚举值，并拒绝无效值或将其映射到默认值（例如：`if value not in ['low', 'high']: value = 'low'`）。

使用像`outlines`或`lmql`这样的约束解码库，强制LLM仅输出与枚举的正则表达式或语法匹配的令牌。

Dead Ends

Common approaches that don't work:

70% fail
Adding more examples to the prompt with correct enum values doesn't guarantee the LLM will follow them, especially for edge cases.
85% fail
Using a stricter JSON schema with additionalProperties: false doesn't prevent enum violations because the schema doesn't enforce value constraints at generation time.
60% fail
Lowering temperature to 0 doesn't fix enum violations because the model may still sample from improbable tokens.