# KeyError: 'tokenizer_vocab_size' not found in model config

- **ID:** `llm/tokenizer-vocab-mismatch`
- **Domain:** llm
- **Category:** config_error
- **Verification:** ai_generated
- **Fix Rate:** 90%

## Root Cause

When fine-tuning or loading a model, the tokenizer configuration file is missing the 'tokenizer_vocab_size' key, often due to using a mismatched tokenizer or an incomplete model card from Hugging Face.

## Version Compatibility

| Version | Status | Introduced | Deprecated |
|---------|--------|------------|------------|
| transformers==4.35.0 | active | — | — |
| transformers==4.38.0 | active | — | — |
| llama-2-7b-hf | active | — | — |
| mistral-7b-v0.1 | active | — | — |

## Workarounds

1. **Load the tokenizer separately and set the config manually: `from transformers import AutoTokenizer; tokenizer = AutoTokenizer.from_pretrained('model_name'); model.config.vocab_size = len(tokenizer)`** (95% success)
   ```
   Load the tokenizer separately and set the config manually: `from transformers import AutoTokenizer; tokenizer = AutoTokenizer.from_pretrained('model_name'); model.config.vocab_size = len(tokenizer)`
   ```
2. **Use a different model variant that includes the tokenizer config (e.g., prefer '-hf' variants from Hugging Face).** (85% success)
   ```
   Use a different model variant that includes the tokenizer config (e.g., prefer '-hf' variants from Hugging Face).
   ```
3. **Download the full model directory including tokenizer files from Hugging Face instead of using a partial or cached version.** (90% success)
   ```
   Download the full model directory including tokenizer files from Hugging Face instead of using a partial or cached version.
   ```

## Dead Ends

- **** — The value must match the actual tokenizer vocabulary size; an arbitrary value will cause embedding dimension mismatches or runtime errors. (90% fail)
- **** — The error is a configuration issue with the model, not a library installation problem. (95% fail)
