huggingface
data_error
ai_generated
true
ValueError: The features of the dataset do not match the expected schema. Missing columns: ['text', 'label']. Extra columns: ['id', 'metadata']
ID: huggingface/dataset-features-mismatch
88%Fix Rate
86%Confidence
1Evidence
2024-01-05First Seen
Version Compatibility
| Version | Status | Introduced | Deprecated | Notes |
|---|---|---|---|---|
| datasets>=2.10.0 | active | — | — | — |
| transformers>=4.30.0 | active | — | — | — |
| python>=3.8 | active | — | — | — |
Root Cause
The dataset loaded from Hugging Face Datasets has columns that do not match the expected schema required by the model or training script.
generic中文
从 Hugging Face Datasets 加载的数据集具有与模型或训练脚本所需的预期模式不匹配的列。
Official Documentation
https://huggingface.co/docs/datasets/main/en/package_reference/main_classes#datasets.Dataset.select_columnsWorkarounds
-
90% success Use dataset.select_columns(['text', 'label']) to keep only required columns, then add missing columns with default values: dataset = dataset.add_column('label', [0]*len(dataset)).
Use dataset.select_columns(['text', 'label']) to keep only required columns, then add missing columns with default values: dataset = dataset.add_column('label', [0]*len(dataset)). -
85% success Map extra columns to required ones: dataset = dataset.map(lambda x: {'text': x['metadata'], 'label': 0}).
Map extra columns to required ones: dataset = dataset.map(lambda x: {'text': x['metadata'], 'label': 0}).
中文步骤
使用 dataset.select_columns(['text', 'label']) 仅保留所需列,然后添加缺失列并赋予默认值:dataset = dataset.add_column('label', [0]*len(dataset))。将多余列映射到所需列:dataset = dataset.map(lambda x: {'text': x['metadata'], 'label': 0})。
Dead Ends
Common approaches that don't work:
-
50% fail
Missing columns still need to be added or mapped from existing columns.
-
60% fail
If the column name is misspelled, the error persists.