llm data_error ai_generated partial

ValidationError: 1 validation error for ResponseModel color Input should be 'red', 'green', or 'blue' [type=enum, input_value='purple', input_type=str]

ID: llm/llm-structured-output-enum-violation-streaming

Also available as: JSON · Markdown · 中文

75%Fix Rate

82%Confidence

1Evidence

2024-04-05First Seen

Version Compatibility

Version	Status	Introduced	Deprecated	Notes
openai 1.12.0	active	—	—	—
openai 1.13.0	active	—	—	—
pydantic 2.5.0	active	—	—	—

Root Cause

LLM generates enum values outside the allowed set when using structured output with streaming, due to incomplete constraint enforcement during partial token generation.

generic

中文

在流式处理中使用结构化输出时，由于部分令牌生成期间约束执行不完整，LLM生成超出允许集合的枚举值。

Official Documentation

https://platform.openai.com/docs/guides/structured-outputs

Workarounds

85% success Use post-processing to map invalid values to nearest valid enum: valid_colors = {'red','green','blue'}; if output.color not in valid_colors: output.color = 'blue' # fallback
```
Use post-processing to map invalid values to nearest valid enum: valid_colors = {'red','green','blue'}; if output.color not in valid_colors: output.color = 'blue'  # fallback
```
95% success Switch to non-streaming mode for structured outputs: response = client.chat.completions.create(model='gpt-4', response_format={'type':'json_object'}, stream=False)
```
Switch to non-streaming mode for structured outputs: response = client.chat.completions.create(model='gpt-4', response_format={'type':'json_object'}, stream=False)
```

中文步骤

Use post-processing to map invalid values to nearest valid enum: valid_colors = {'red','green','blue'}; if output.color not in valid_colors: output.color = 'blue'  # fallback

Switch to non-streaming mode for structured outputs: response = client.chat.completions.create(model='gpt-4', response_format={'type':'json_object'}, stream=False)

Dead Ends

Common approaches that don't work:

Setting temperature to 0 to reduce randomness 80% fail
Enum violations occur due to token-level decoding constraints, not sampling randomness.
Increasing max_tokens hoping for complete output 90% fail
More tokens don't fix constraint enforcement; the model still generates invalid values.