llm data_error ai_generated partial

OutputParserException: Parsing LLM output produced by 'StructuredOutputParser' failed — value 'large' not in enum ['small', 'medium']

ID: llm/langchain-output-parser-enum

Also available as: JSON · Markdown · 中文
80%Fix Rate
86%Confidence
1Evidence
2024-05-05First Seen

Version Compatibility

VersionStatusIntroducedDeprecatedNotes
langchain>=0.1.0 active
langchain-core>=0.1.0 active
pydantic>=2.0.0 active

Root Cause

LLM generated a value outside the specified enum constraints when using LangChain's structured output parsers with Pydantic models, often due to insufficient prompting or model hallucination.

generic

中文

LLM 在使用 LangChain 的结构化输出解析器和 Pydantic 模型时生成了指定枚举约束之外的值,通常是由于提示不足或模型幻觉。

Official Documentation

https://python.langchain.com/docs/modules/model_io/output_parsers/structured

Workarounds

  1. 85% success Improve the prompt to explicitly list allowed enum values and instruct the model to only output those exact values: 'The size must be exactly one of: small, medium. Do not output any other value.'
    Improve the prompt to explicitly list allowed enum values and instruct the model to only output those exact values: 'The size must be exactly one of: small, medium. Do not output any other value.'
  2. 90% success Use LangChain's with_structured_output method on chat models that support JSON mode, which enforces schema constraints at the API level rather than relying on parsing.
    Use LangChain's with_structured_output method on chat models that support JSON mode, which enforces schema constraints at the API level rather than relying on parsing.
  3. 75% success Implement a post-processing fallback that maps out-of-enum values to the closest valid enum member using a similarity heuristic or manual mapping.
    Implement a post-processing fallback that maps out-of-enum values to the closest valid enum member using a similarity heuristic or manual mapping.

中文步骤

  1. 改进提示,显式列出允许的枚举值,并指示模型仅输出这些确切值:'大小必须恰好是以下之一:small, medium。不要输出任何其他值。'
  2. 在支持 JSON 模式的聊天模型上使用 LangChain 的 with_structured_output 方法,该方法在 API 级别强制模式约束,而不是依赖解析。
  3. 实现一个后处理回退,使用相似度启发式或手动映射将超出枚举的值映射到最接近的有效枚举成员。

Dead Ends

Common approaches that don't work:

  1. 70% fail

    This defeats the purpose of constrained output; the LLM may still generate unexpected values outside the expanded set

  2. 50% fail

    Even with temperature=0, LLMs can produce non-deterministic outputs due to floating-point rounding or model updates

  3. 80% fail

    Silent fallback masks errors and may produce incorrect downstream results, leading to hard-to-debug issues