huggingface type_error ai_generated true

TypeError: The `eval_dataset` must be a `Dataset` or `IterableDataset` object, but got <class 'list'>

ID: huggingface/trainer-eval-dataloader-type

Also available as: JSON · Markdown · 中文
93%Fix Rate
87%Confidence
1Evidence
2024-01-18First Seen

Version Compatibility

VersionStatusIntroducedDeprecatedNotes
transformers>=4.28.0 active
datasets>=2.0.0 active

Root Cause

The Trainer's `evaluate()` method expects a datasets.Dataset or IterableDataset object, but a plain Python list was passed, which lacks the required dataset interface.

generic

中文

Trainer 的 `evaluate()` 方法期望 datasets.Dataset 或 IterableDataset 对象,但传入了普通的 Python 列表,缺少所需的数据集接口。

Official Documentation

https://huggingface.co/docs/transformers/v4.28.0/en/main_classes/trainer#transformers.Trainer.evaluate

Workarounds

  1. 95% success Convert the list to a Dataset object: from datasets import Dataset; eval_dataset = Dataset.from_list(your_list). Then pass it to Trainer.
    Convert the list to a Dataset object: from datasets import Dataset; eval_dataset = Dataset.from_list(your_list). Then pass it to Trainer.
  2. 90% success If using a list of tensors, create a Dataset from a dictionary: dataset = Dataset.from_dict({'input_ids': tensor_list, 'labels': label_list})
    If using a list of tensors, create a Dataset from a dictionary: dataset = Dataset.from_dict({'input_ids': tensor_list, 'labels': label_list})

中文步骤

  1. 将列表转换为 Dataset 对象:from datasets import Dataset; eval_dataset = Dataset.from_list(your_list)。然后传递给 Trainer。
  2. 如果使用张量列表,从字典创建数据集:dataset = Dataset.from_dict({'input_ids': tensor_list, 'labels': label_list})

Dead Ends

Common approaches that don't work:

  1. Pass a list of dictionaries as eval_dataset and expect Trainer to convert it automatically 100% fail

    Trainer does not perform implicit conversion; it strictly checks the type and raises TypeError.

  2. Set `eval_dataset` to `None` to skip evaluation 90% fail

    This avoids the error but prevents evaluation entirely, which may hide issues in model performance.

  3. Wrap the list in `torch.utils.data.TensorDataset` 80% fail

    TensorDataset is not compatible with Trainer's expected interface; it lacks methods like `map()` and `select()`.