# Parquet dictionary page truncated — unexpected end of stream

- **ID:** `data/parquet-dictionary-page-truncated`
- **Domain:** data
- **Category:** data_error
- **Error Code:** `ParquetDecodingException`
- **Verification:** ai_generated
- **Fix Rate:** 80%

## Root Cause

Parquet file dictionary page was not fully written due to incomplete write or partial upload, causing the reader to hit EOF prematurely.

## Version Compatibility

| Version | Status | Introduced | Deprecated |
|---------|--------|------------|------------|
| parquet-mr 1.12.0 | active | — | — |
| pyarrow 14.0.0 | active | — | — |
| spark 3.4.0 | active | — | — |

## Workarounds

1. **Verify file integrity using Parquet-tools: `parquet-tools meta corrupted.parquet` — if it fails, re-upload the file from a known good source.** (70% success)
   ```
   Verify file integrity using Parquet-tools: `parquet-tools meta corrupted.parquet` — if it fails, re-upload the file from a known good source.
   ```
2. **Repair the file by truncating to the last valid row group using pyarrow: `import pyarrow.parquet as pq; table = pq.read_table('corrupted.parquet', use_pandas_metadata=False); pq.write_table(table, 'repaired.parquet')` — this skips the broken dictionary.** (80% success)
   ```
   Repair the file by truncating to the last valid row group using pyarrow: `import pyarrow.parquet as pq; table = pq.read_table('corrupted.parquet', use_pandas_metadata=False); pq.write_table(table, 'repaired.parquet')` — this skips the broken dictionary.
   ```
3. **If using Spark, set `spark.sql.parquet.enableVectorizedReader=false` to fall back to non-vectorized reading which may handle partial files.** (50% success)
   ```
   If using Spark, set `spark.sql.parquet.enableVectorizedReader=false` to fall back to non-vectorized reading which may handle partial files.
   ```

## Dead Ends

- **Re-download the file from the same source without verifying checksum** — If the source file is corrupted at the origin, re-downloading doesn't fix the underlying issue. (60% fail)
- **Increase memory allocation for the reader (e.g., spark.executor.memory)** — The error is about truncated data, not memory limits; more memory doesn't reconstruct missing bytes. (90% fail)
- **Use a different Parquet reader library (e.g., fastparquet instead of pyarrow)** — All readers will fail on the same truncated dictionary page because the file is structurally incomplete. (95% fail)
