# ValueError：向已有特殊标记的分词器添加特殊标记；只有在打算添加新标记时才使用 `add_special_tokens=True`。收到 extra_ids=0，但分词器已有 2 个特殊标记。

- **ID:** `huggingface/tokenizer-extra-special-tokens-invalid`
- **领域:** huggingface
- **类别:** config_error
- **验证级别:** ai_generated
- **修复率:** 90%

## 根因

用户使用空的或冗余的特殊标记字典调用了 `tokenizer.add_special_tokens()`，但分词器已定义这些标记，导致验证错误。

## 版本兼容性

| 版本 | 状态 | 引入 | 弃用 |
|------|------|------|------|
| tokenizers 0.19.1 | active | — | — |
| transformers 4.44.0 | active | — | — |

## 解决方案

1. ```
   Check existing special tokens before adding: if `tokenizer.special_tokens_map` already contains the tokens, skip the `add_special_tokens` call entirely.
   ```
2. ```
   Use `tokenizer.add_special_tokens({'additional_special_tokens': ['<new_token>']})` only for truly new tokens, not duplicates.
   ```

## 无效尝试

- **** — The parameter `add_special_tokens` controls whether to add tokens to the vocabulary, not whether to check for duplicates; the error persists. (100% 失败率)
- **** — Deleting built-in special tokens (like [CLS], [SEP]) can break tokenizer functionality; re-adding may still fail if they are already present in the base tokenizer. (40% 失败率)
