llm config_error ai_generated true

键错误：微调模型配置中未找到'tokenizer_vocab_size'。

KeyError: 'tokenizer_vocab_size' not found in model config for fine-tuning

ID: llm/tokenizer-vocab-mismatch-fine-tune

其他格式: JSON · Markdown 中文 · English

85%修复率

88%置信度

1证据数

2024-01-20首次发现

版本兼容性

在使用自定义分词器加载预训练模型进行微调时，分词器词汇表大小与模型嵌入层大小不匹配。

Mismatch between tokenizer vocabulary size and model embedding layer size when loading a pre-trained model for fine-tuning with a custom tokenizer.

generic

Resize tokenizer embeddings before training: model.resize_token_embeddings(len(tokenizer))

Use the default tokenizer that comes with the pre-trained model instead of a custom one

常见但无效的做法:

Setting tokenizer_vocab_size manually in config to match tokenizer size 95% 失败
Model embedding layer weights are fixed; resizing requires special method, not config change.
Reinstalling transformers package 80% 失败
Error is configuration-related, not installation-related.