llm config_error ai_generated true

ValueError: The tokenizer's chat_template is not compatible with the model's expected format. Expected 'llama' format, got 'chatml'.

ID: llm/chat-template-mismatch-hf

Also available as: JSON · Markdown · 中文

92%Fix Rate

90%Confidence

1Evidence

2024-01-10First Seen

Version Compatibility

Version	Status	Introduced	Deprecated	Notes
transformers 4.36.0	active	—	—	—
Mistral 7B Instruct v0.2	active	—	—	—
Llama 2 13B Chat	active	—	—	—
Zephyr 7B Beta	active	—	—	—

Root Cause

The Hugging Face tokenizer's chat_template (e.g., ChatML for Mistral) does not match the model's expected conversation format (e.g., Llama's [INST] tags), causing incorrect tokenization of system and user messages.

generic

中文

Hugging Face 分词器的 chat_template（例如 Mistral 的 ChatML）与模型期望的对话格式（例如 Llama 的 [INST] 标签）不匹配，导致系统消息和用户消息的 token 化错误。

Official Documentation

https://huggingface.co/docs/transformers/main/en/chat_templating

Workarounds

95% success Set the correct chat_template from the model's official repository. For Llama 2: `tokenizer.chat_template = "{% if messages[0]['role'] == 'system' %}{% set loop_messages = messages[1:] %}{% endif %}{% for message in messages %}{% if message['role'] == 'user' %}{{ '<s>[INST] ' + message['content'] + ' [/INST]' }}{% elif message['role'] == 'assistant' %}{{ ' ' + message['content'] + ' </s>' }}{% endif %}{% endfor %}"`.
```
Set the correct chat_template from the model's official repository. For Llama 2: `tokenizer.chat_template = "{% if messages[0]['role'] == 'system' %}{% set loop_messages = messages[1:] %}{% endif %}{% for message in messages %}{% if message['role'] == 'user' %}{{ '<s>[INST] ' + message['content'] + ' [/INST]' }}{% elif message['role'] == 'assistant' %}{{ ' ' + message['content'] + ' </s>' }}{% endif %}{% endfor %}"`.
```
85% success Load the tokenizer with `use_fast=True` and pass `model_max_length` explicitly to avoid truncation issues: `AutoTokenizer.from_pretrained('mistralai/Mistral-7B-Instruct-v0.2', use_fast=True, model_max_length=8192)`.
```
Load the tokenizer with `use_fast=True` and pass `model_max_length` explicitly to avoid truncation issues: `AutoTokenizer.from_pretrained('mistralai/Mistral-7B-Instruct-v0.2', use_fast=True, model_max_length=8192)`.
```

中文步骤

从模型的官方仓库设置正确的 chat_template。对于 Llama 2：`tokenizer.chat_template = "{% if messages[0]['role'] == 'system' %}{% set loop_messages = messages[1:] %}{% endif %}{% for message in messages %}{% if message['role'] == 'user' %}{{ '<s>[INST] ' + message['content'] + ' [/INST]' }}{% elif message['role'] == 'assistant' %}{{ ' ' + message['content'] + ' </s>' }}{% endif %}{% endfor %}"`。

使用 `use_fast=True` 加载分词器，并显式传递 `model_max_length` 以避免截断问题：`AutoTokenizer.from_pretrained('mistralai/Mistral-7B-Instruct-v0.2', use_fast=True, model_max_length=8192)`。

Dead Ends

Common approaches that don't work:

70% fail
Setting chat_template to None falls back to the default template, which may still be incorrect for the model, leading to garbled prompts.
50% fail
Manual formatting is error-prone and may omit special tokens (e.g., <s>, </s>) that the model requires, causing poor generation or errors.
80% fail
Using an incompatible tokenizer can introduce vocabulary mismatches and unknown token IDs, breaking the model entirely.