# ValueError: The features of the dataset do not match the expected schema. Missing columns: ['text', 'label']. Extra columns: ['input', 'target']

- **ID:** `huggingface/dataset-features-column-mismatch`
- **Domain:** huggingface
- **Category:** data_error
- **Verification:** ai_generated
- **Fix Rate:** 90%

## Root Cause

Dataset loaded from Hugging Face Datasets has different column names than those expected by the training script or tokenizer.

## Version Compatibility

| Version | Status | Introduced | Deprecated |
|---------|--------|------------|------------|
| datasets>=2.10.0 | active | — | — |
| transformers>=4.25.0 | active | — | — |

## Workarounds

1. **Align columns using Dataset.rename_columns() and Dataset.remove_columns(): `dataset = dataset.rename_columns({'input': 'text', 'target': 'label'}).remove_columns(['unused_col'])`** (95% success)
   ```
   Align columns using Dataset.rename_columns() and Dataset.remove_columns(): `dataset = dataset.rename_columns({'input': 'text', 'target': 'label'}).remove_columns(['unused_col'])`
   ```
2. **Use datasets.Dataset.map() with a function that selects only the required columns: `dataset = dataset.map(lambda x: {'text': x['input'], 'label': x['target']}, remove_columns=dataset.column_names)`** (90% success)
   ```
   Use datasets.Dataset.map() with a function that selects only the required columns: `dataset = dataset.map(lambda x: {'text': x['input'], 'label': x['target']}, remove_columns=dataset.column_names)`
   ```
3. **Load the dataset with expected column names by specifying the 'columns' argument in load_dataset() if the dataset supports it, or create a new dataset with the correct schema.** (85% success)
   ```
   Load the dataset with expected column names by specifying the 'columns' argument in load_dataset() if the dataset supports it, or create a new dataset with the correct schema.
   ```

## Dead Ends

- **** — If there are more mismatches (e.g., 'target' vs 'label'), the error persists. Also, renaming may break other downstream code that expects 'input'. (40% fail)
- **** — Trainer does not have ignore_columns; dropping columns with dataset.remove_columns() is correct but users often drop the wrong ones or forget to add missing columns. (50% fail)
- **** — Model config does not control dataset schema; this is a data preprocessing issue, not a model architecture issue. (70% fail)
