llm data_error ai_generated partial

ValidationError: 1 validation error for ResponseModel color Input should be 'red', 'green', or 'blue' [type=enum, input_value='purple', input_type=str]

ID: llm/llm-structured-output-enum-violation-streaming

Also available as: JSON · Markdown · 中文
75%Fix Rate
82%Confidence
1Evidence
2024-04-05First Seen

Version Compatibility

VersionStatusIntroducedDeprecatedNotes
openai 1.12.0 active
openai 1.13.0 active
pydantic 2.5.0 active

Root Cause

LLM generates enum values outside the allowed set when using structured output with streaming, due to incomplete constraint enforcement during partial token generation.

generic

中文

在流式处理中使用结构化输出时,由于部分令牌生成期间约束执行不完整,LLM生成超出允许集合的枚举值。

Official Documentation

https://platform.openai.com/docs/guides/structured-outputs

Workarounds

  1. 85% success Use post-processing to map invalid values to nearest valid enum: valid_colors = {'red','green','blue'}; if output.color not in valid_colors: output.color = 'blue' # fallback
    Use post-processing to map invalid values to nearest valid enum: valid_colors = {'red','green','blue'}; if output.color not in valid_colors: output.color = 'blue'  # fallback
  2. 95% success Switch to non-streaming mode for structured outputs: response = client.chat.completions.create(model='gpt-4', response_format={'type':'json_object'}, stream=False)
    Switch to non-streaming mode for structured outputs: response = client.chat.completions.create(model='gpt-4', response_format={'type':'json_object'}, stream=False)

中文步骤

  1. Use post-processing to map invalid values to nearest valid enum: valid_colors = {'red','green','blue'}; if output.color not in valid_colors: output.color = 'blue'  # fallback
  2. Switch to non-streaming mode for structured outputs: response = client.chat.completions.create(model='gpt-4', response_format={'type':'json_object'}, stream=False)

Dead Ends

Common approaches that don't work:

  1. Setting temperature to 0 to reduce randomness 80% fail

    Enum violations occur due to token-level decoding constraints, not sampling randomness.

  2. Increasing max_tokens hoping for complete output 90% fail

    More tokens don't fix constraint enforcement; the model still generates invalid values.