# UserWarning: 您正在使用仅解码器模型且 padding_side='right'。这可能会产生错误结果。建议将 padding_side 设置为 'left'。

- **ID:** `huggingface/tokenizer-padding-side-mismatch`
- **领域:** huggingface
- **类别:** config_error
- **验证级别:** ai_generated
- **修复率:** 95%

## 根因

仅解码器模型（如 GPT、LLaMA）期望在左侧进行填充以保持因果注意力掩码；右侧填充会导致模型关注序列末尾的填充 token。

## 版本兼容性

| 版本 | 状态 | 引入 | 弃用 |
|------|------|------|------|
| transformers>=4.30.0 | active | — | — |
| tokenizers>=0.14.0 | active | — | — |

## 解决方案

1. ```
   Set padding_side to 'left' before tokenization: `tokenizer.padding_side = 'left'; tokenizer.pad_token = tokenizer.eos_token; inputs = tokenizer(texts, padding=True, truncation=True, return_tensors='pt')`
   ```
2. ```
   Use the tokenizer's `__call__` with the `padding_side` parameter: `tokenizer(texts, padding=True, truncation=True, padding_side='left', return_tensors='pt')`
   ```
3. ```
   If using a pipeline, set padding_side via the tokenizer: `from transformers import pipeline; pipe = pipeline('text-generation', model='gpt2', tokenizer=tokenizer); pipe.tokenizer.padding_side = 'left'`
   ```

## 无效尝试

- **Setting `padding_side='right'` explicitly to suppress the warning** — This does not fix the underlying issue; the model still produces incorrect outputs due to attention mask misalignment. (90% 失败率)
- **Using a different tokenizer without changing padding_side** — All decoder-only tokenizers have the same requirement; the warning will persist or outputs will be wrong. (70% 失败率)
- **Adding `attention_mask` manually without changing padding_side** — Even with an attention mask, right padding causes the model to attend to padding tokens in the causal mask, leading to degraded generation. (80% 失败率)
