# ParquetReader: Corrupt footer CRC — file may be truncated or overwritten

- **ID:** `data/parquet-corrupted-footer-crc`
- **Domain:** data
- **Category:** data_error
- **Verification:** ai_generated
- **Fix Rate:** 85%

## Root Cause

Parquet file footer CRC check fails because the file was not fully written (e.g., Spark task failure, disk full) or was partially overwritten by another process.

## Version Compatibility

| Version | Status | Introduced | Deprecated |
|---------|--------|------------|------------|
| Apache Parquet 1.12.3 | active | — | — |
| Spark 3.4.1 | active | — | — |
| pyarrow 14.0.0 | active | — | — |
| Hive 4.0.0 | active | — | — |

## Workarounds

1. **Recreate the Parquet file from the original data using a reliable write path: spark.write.mode('overwrite').parquet('/path') with checkpointing enabled** (95% success)
   ```
   Recreate the Parquet file from the original data using a reliable write path: spark.write.mode('overwrite').parquet('/path') with checkpointing enabled
   ```
2. **Use parquet-cli or pyarrow to attempt reading with `use_legacy_int96_timestamps=False` and `buffer_size=0`; if the footer is partially readable, try `pq.read_table(path, use_pandas_metadata=False)` to skip metadata parsing** (70% success)
   ```
   Use parquet-cli or pyarrow to attempt reading with `use_legacy_int96_timestamps=False` and `buffer_size=0`; if the footer is partially readable, try `pq.read_table(path, use_pandas_metadata=False)` to skip metadata parsing
   ```
3. **Check file size with `ls -l` and compare to expected size from source logs; if truncated, use `dd if=truncated.parquet of=repaired.parquet bs=1 count=<expected_size>` to pad the file (last resort, may not restore data)** (50% success)
   ```
   Check file size with `ls -l` and compare to expected size from source logs; if truncated, use `dd if=truncated.parquet of=repaired.parquet bs=1 count=<expected_size>` to pad the file (last resort, may not restore data)
   ```

## Dead Ends

- **** — The source file itself is corrupted; re-downloading the same truncated file does not fix the underlying write failure. (90% fail)
- **** — parquet-tools meta also reads the footer and will fail with the same CRC error, providing no workaround. (85% fail)
- **** — Parquet readers (e.g., pyarrow, spark) always validate the footer CRC; there is no standard option to bypass it. (100% fail)
