huggingface config_error ai_generated true

UserWarning: 您正在使用仅解码器模型且 padding_side='right'。这可能会产生错误结果。建议将 padding_side 设置为 'left'。

UserWarning: You are using a decoder-only model with padding_side='right'. This may produce incorrect results. Consider setting padding_side='left'.

ID: huggingface/tokenizer-padding-side-mismatch

其他格式: JSON · Markdown 中文 · English

95%修复率

87%置信度

1证据数

2023-08-25首次发现

版本兼容性

版本	状态	引入	弃用	备注
transformers>=4.30.0	active	—	—	—
tokenizers>=0.14.0	active	—	—	—

根因分析

仅解码器模型（如 GPT、LLaMA）期望在左侧进行填充以保持因果注意力掩码；右侧填充会导致模型关注序列末尾的填充 token。

English

Decoder-only models (like GPT, LLaMA) expect padding on the left side to maintain causal attention masking; right padding causes the model to attend to padding tokens at the end of sequences.

generic

官方文档

https://huggingface.co/docs/transformers/en/pad_truncation#padding-and-truncation

解决方案

Set padding_side to 'left' before tokenization: `tokenizer.padding_side = 'left'; tokenizer.pad_token = tokenizer.eos_token; inputs = tokenizer(texts, padding=True, truncation=True, return_tensors='pt')`

Use the tokenizer's `__call__` with the `padding_side` parameter: `tokenizer(texts, padding=True, truncation=True, padding_side='left', return_tensors='pt')`

If using a pipeline, set padding_side via the tokenizer: `from transformers import pipeline; pipe = pipeline('text-generation', model='gpt2', tokenizer=tokenizer); pipe.tokenizer.padding_side = 'left'`

无效尝试

常见但无效的做法:

Setting `padding_side='right'` explicitly to suppress the warning 90% 失败
This does not fix the underlying issue; the model still produces incorrect outputs due to attention mask misalignment.
Using a different tokenizer without changing padding_side 70% 失败
All decoder-only tokenizers have the same requirement; the warning will persist or outputs will be wrong.
Adding `attention_mask` manually without changing padding_side 80% 失败
Even with an attention mask, right padding causes the model to attend to padding tokens in the causal mask, leading to degraded generation.