huggingface
config_error
ai_generated
true
UserWarning: You are using a decoder-only model with padding_side='right'. This may produce incorrect results. Consider setting padding_side='left'.
ID: huggingface/tokenizer-padding-side-mismatch
95%Fix Rate
87%Confidence
1Evidence
2023-08-25First Seen
Version Compatibility
| Version | Status | Introduced | Deprecated | Notes |
|---|---|---|---|---|
| transformers>=4.30.0 | active | — | — | — |
| tokenizers>=0.14.0 | active | — | — | — |
Root Cause
Decoder-only models (like GPT, LLaMA) expect padding on the left side to maintain causal attention masking; right padding causes the model to attend to padding tokens at the end of sequences.
generic中文
仅解码器模型(如 GPT、LLaMA)期望在左侧进行填充以保持因果注意力掩码;右侧填充会导致模型关注序列末尾的填充 token。
Official Documentation
https://huggingface.co/docs/transformers/en/pad_truncation#padding-and-truncationWorkarounds
-
95% success Set padding_side to 'left' before tokenization: `tokenizer.padding_side = 'left'; tokenizer.pad_token = tokenizer.eos_token; inputs = tokenizer(texts, padding=True, truncation=True, return_tensors='pt')`
Set padding_side to 'left' before tokenization: `tokenizer.padding_side = 'left'; tokenizer.pad_token = tokenizer.eos_token; inputs = tokenizer(texts, padding=True, truncation=True, return_tensors='pt')`
-
90% success Use the tokenizer's `__call__` with the `padding_side` parameter: `tokenizer(texts, padding=True, truncation=True, padding_side='left', return_tensors='pt')`
Use the tokenizer's `__call__` with the `padding_side` parameter: `tokenizer(texts, padding=True, truncation=True, padding_side='left', return_tensors='pt')`
-
85% success If using a pipeline, set padding_side via the tokenizer: `from transformers import pipeline; pipe = pipeline('text-generation', model='gpt2', tokenizer=tokenizer); pipe.tokenizer.padding_side = 'left'`
If using a pipeline, set padding_side via the tokenizer: `from transformers import pipeline; pipe = pipeline('text-generation', model='gpt2', tokenizer=tokenizer); pipe.tokenizer.padding_side = 'left'`
中文步骤
Set padding_side to 'left' before tokenization: `tokenizer.padding_side = 'left'; tokenizer.pad_token = tokenizer.eos_token; inputs = tokenizer(texts, padding=True, truncation=True, return_tensors='pt')`
Use the tokenizer's `__call__` with the `padding_side` parameter: `tokenizer(texts, padding=True, truncation=True, padding_side='left', return_tensors='pt')`
If using a pipeline, set padding_side via the tokenizer: `from transformers import pipeline; pipe = pipeline('text-generation', model='gpt2', tokenizer=tokenizer); pipe.tokenizer.padding_side = 'left'`
Dead Ends
Common approaches that don't work:
-
Setting `padding_side='right'` explicitly to suppress the warning
90% fail
This does not fix the underlying issue; the model still produces incorrect outputs due to attention mask misalignment.
-
Using a different tokenizer without changing padding_side
70% fail
All decoder-only tokenizers have the same requirement; the warning will persist or outputs will be wrong.
-
Adding `attention_mask` manually without changing padding_side
80% fail
Even with an attention mask, right padding causes the model to attend to padding tokens in the causal mask, leading to degraded generation.