# 令牌索引序列长度超过模型指定的最大序列长度 (2048 > 1024)。运行顺序错误

- **ID:** `huggingface/tokenizer-decoder-max-length-overflow`
- **领域:** huggingface
- **类别:** runtime_error
- **验证级别:** ai_generated
- **修复率:** 85%

## 根因

输入文本长度超过模型的 max_position_embeddings，导致分词器未正确截断或溢出。

## 版本兼容性

| 版本 | 状态 | 引入 | 弃用 |
|------|------|------|------|
| transformers>=4.30.0 | active | — | — |
| tokenizers>=0.13.0 | active | — | — |
| python>=3.8 | active | — | — |

## 解决方案

1. ```
   在编码输入时设置 truncation=True 和 max_length=512。示例：tokenizer(text, truncation=True, max_length=512, return_tensors='pt')
   ```
2. ```
   使用具有更大 max_position_embeddings（如 4096）的模型，或切换到长上下文模型如 Longformer。
   ```

## 无效尝试

- **** — truncation=False disables truncation entirely, leading to a hard crash rather than graceful handling. (60% 失败率)
- **** — Model's learned positional embeddings only support up to max_position_embeddings; exceeding it leads to out-of-range errors. (80% 失败率)