huggingface
runtime_error
ai_generated
true
令牌索引序列长度超过模型指定的最大序列长度 (2048 > 1024)。运行顺序错误
Token indices sequence length is longer than the specified maximum sequence length for this model (2048 > 1024). Running out-of-order
ID: huggingface/tokenizer-decoder-max-length-overflow
85%修复率
85%置信度
1证据数
2023-11-15首次发现
版本兼容性
| 版本 | 状态 | 引入 | 弃用 | 备注 |
|---|---|---|---|---|
| transformers>=4.30.0 | active | — | — | — |
| tokenizers>=0.13.0 | active | — | — | — |
| python>=3.8 | active | — | — | — |
根因分析
输入文本长度超过模型的 max_position_embeddings,导致分词器未正确截断或溢出。
English
Input text is too long for the model's max_position_embeddings, causing tokenizer to truncate incorrectly or overflow without proper truncation settings.
官方文档
https://huggingface.co/docs/transformers/main/en/llm_tutorial#truncation解决方案
-
在编码输入时设置 truncation=True 和 max_length=512。示例:tokenizer(text, truncation=True, max_length=512, return_tensors='pt')
-
使用具有更大 max_position_embeddings(如 4096)的模型,或切换到长上下文模型如 Longformer。
无效尝试
常见但无效的做法:
-
60% 失败
truncation=False disables truncation entirely, leading to a hard crash rather than graceful handling.
-
80% 失败
Model's learned positional embeddings only support up to max_position_embeddings; exceeding it leads to out-of-range errors.