# KeyError: 'tokenizer_vocab_size' not found in model config for fine-tuning

- **ID:** `llm/tokenizer-vocab-mismatch-fine-tune`
- **Domain:** llm
- **Category:** config_error
- **Verification:** ai_generated
- **Fix Rate:** 85%

## Root Cause

Mismatch between tokenizer vocabulary size and model embedding layer size when loading a pre-trained model for fine-tuning with a custom tokenizer.

## Version Compatibility

| Version | Status | Introduced | Deprecated |
|---------|--------|------------|------------|
| transformers 4.36.0 | active | — | — |
| transformers 4.37.0 | active | — | — |
| PyTorch 2.1.0 | active | — | — |

## Workarounds

1. **Resize tokenizer embeddings before training: model.resize_token_embeddings(len(tokenizer))** (95% success)
   ```
   Resize tokenizer embeddings before training: model.resize_token_embeddings(len(tokenizer))
   ```
2. **Use the default tokenizer that comes with the pre-trained model instead of a custom one** (80% success)
   ```
   Use the default tokenizer that comes with the pre-trained model instead of a custom one
   ```

## Dead Ends

- **Setting tokenizer_vocab_size manually in config to match tokenizer size** — Model embedding layer weights are fixed; resizing requires special method, not config change. (95% fail)
- **Reinstalling transformers package** — Error is configuration-related, not installation-related. (80% fail)