llm data_error ai_generated partial

验证错误:ResponseModel的color字段应为'red'、'green'或'blue',但收到'purple'。

ValidationError: 1 validation error for ResponseModel color Input should be 'red', 'green', or 'blue' [type=enum, input_value='purple', input_type=str]

ID: llm/llm-structured-output-enum-violation-streaming

其他格式: JSON · Markdown 中文 · English
75%修复率
82%置信度
1证据数
2024-04-05首次发现

版本兼容性

版本状态引入弃用备注
openai 1.12.0 active
openai 1.13.0 active
pydantic 2.5.0 active

根因分析

在流式处理中使用结构化输出时,由于部分令牌生成期间约束执行不完整,LLM生成超出允许集合的枚举值。

English

LLM generates enum values outside the allowed set when using structured output with streaming, due to incomplete constraint enforcement during partial token generation.

generic

官方文档

https://platform.openai.com/docs/guides/structured-outputs

解决方案

  1. Use post-processing to map invalid values to nearest valid enum: valid_colors = {'red','green','blue'}; if output.color not in valid_colors: output.color = 'blue'  # fallback
  2. Switch to non-streaming mode for structured outputs: response = client.chat.completions.create(model='gpt-4', response_format={'type':'json_object'}, stream=False)

无效尝试

常见但无效的做法:

  1. Setting temperature to 0 to reduce randomness 80% 失败

    Enum violations occur due to token-level decoding constraints, not sampling randomness.

  2. Increasing max_tokens hoping for complete output 90% 失败

    More tokens don't fix constraint enforcement; the model still generates invalid values.