huggingface type_error ai_generated true

类型错误:'eval_dataset' 必须是 'Dataset' 或 'IterableDataset' 对象,但得到 <class 'list'>

TypeError: The `eval_dataset` must be a `Dataset` or `IterableDataset` object, but got <class 'list'>

ID: huggingface/trainer-eval-dataloader-type

其他格式: JSON · Markdown 中文 · English
93%修复率
87%置信度
1证据数
2024-01-18首次发现

版本兼容性

版本状态引入弃用备注
transformers>=4.28.0 active
datasets>=2.0.0 active

根因分析

Trainer 的 `evaluate()` 方法期望 datasets.Dataset 或 IterableDataset 对象,但传入了普通的 Python 列表,缺少所需的数据集接口。

English

The Trainer's `evaluate()` method expects a datasets.Dataset or IterableDataset object, but a plain Python list was passed, which lacks the required dataset interface.

generic

官方文档

https://huggingface.co/docs/transformers/v4.28.0/en/main_classes/trainer#transformers.Trainer.evaluate

解决方案

  1. 将列表转换为 Dataset 对象:from datasets import Dataset; eval_dataset = Dataset.from_list(your_list)。然后传递给 Trainer。
  2. 如果使用张量列表,从字典创建数据集:dataset = Dataset.from_dict({'input_ids': tensor_list, 'labels': label_list})

无效尝试

常见但无效的做法:

  1. Pass a list of dictionaries as eval_dataset and expect Trainer to convert it automatically 100% 失败

    Trainer does not perform implicit conversion; it strictly checks the type and raises TypeError.

  2. Set `eval_dataset` to `None` to skip evaluation 90% 失败

    This avoids the error but prevents evaluation entirely, which may hide issues in model performance.

  3. Wrap the list in `torch.utils.data.TensorDataset` 80% 失败

    TensorDataset is not compatible with Trainer's expected interface; it lacks methods like `map()` and `select()`.