# 键错误：微调模型配置中未找到'tokenizer_vocab_size'。

- **ID:** `llm/tokenizer-vocab-mismatch-fine-tune`
- **领域:** llm
- **类别:** config_error
- **验证级别:** ai_generated
- **修复率:** 85%

## 根因

在使用自定义分词器加载预训练模型进行微调时，分词器词汇表大小与模型嵌入层大小不匹配。

## 版本兼容性

| 版本 | 状态 | 引入 | 弃用 |
|------|------|------|------|
| transformers 4.36.0 | active | — | — |
| transformers 4.37.0 | active | — | — |
| PyTorch 2.1.0 | active | — | — |

## 解决方案

1. ```
   Resize tokenizer embeddings before training: model.resize_token_embeddings(len(tokenizer))
   ```
2. ```
   Use the default tokenizer that comes with the pre-trained model instead of a custom one
   ```

## 无效尝试

- **Setting tokenizer_vocab_size manually in config to match tokenizer size** — Model embedding layer weights are fixed; resizing requires special method, not config change. (95% 失败率)
- **Reinstalling transformers package** — Error is configuration-related, not installation-related. (80% 失败率)
