# Parquet bloom filter hash mismatch: unexpected hash algorithm ID 0

- **ID:** `data/parquet-bloom-filter-corruption`
- **Domain:** data
- **Category:** data_error
- **Error Code:** `ParquetBloomFilterHashMismatch`
- **Verification:** ai_generated
- **Fix Rate:** 78%

## Root Cause

Parquet file written by an older version of a library uses an unsupported or unregistered bloom filter hash algorithm, causing read failures in newer readers that strictly validate algorithm IDs.

## Version Compatibility

| Version | Status | Introduced | Deprecated |
|---------|--------|------------|------------|
| Parquet 2.10+ | active | — | — |
| Apache Arrow 12.0+ | active | — | — |
| PyArrow 14.0+ | active | — | — |
| Spark 3.5+ | active | — | — |

## Workarounds

1. **Rewrite the Parquet file using a newer version of the writer library that registers algorithm ID 0 as a known algorithm. For PyArrow: `import pyarrow.parquet as pq; table = pq.read_table('file.parquet'); pq.write_table(table, 'file_fixed.parquet')`** (85% success)
   ```
   Rewrite the Parquet file using a newer version of the writer library that registers algorithm ID 0 as a known algorithm. For PyArrow: `import pyarrow.parquet as pq; table = pq.read_table('file.parquet'); pq.write_table(table, 'file_fixed.parquet')`
   ```
2. **Use an older reader that does not validate bloom filter algorithm IDs. For Spark: downgrade to Spark 3.4 or earlier, then read and rewrite the file.** (70% success)
   ```
   Use an older reader that does not validate bloom filter algorithm IDs. For Spark: downgrade to Spark 3.4 or earlier, then read and rewrite the file.
   ```
3. **Strip bloom filters from the Parquet file using parquet-tools: `java -jar parquet-tools-1.12.3.jar meta file.parquet | grep bloom`, then use a custom script to remove the bloom filter metadata pages.** (65% success)
   ```
   Strip bloom filters from the Parquet file using parquet-tools: `java -jar parquet-tools-1.12.3.jar meta file.parquet | grep bloom`, then use a custom script to remove the bloom filter metadata pages.
   ```

## Dead Ends

- **** — The hash algorithm ID is an on-disk property of the Parquet file; updating the reader library does not change the file's content. (95% fail)
- **** — If the original writer is still the old library, the regenerated file will have the same unsupported algorithm ID. (90% fail)
- **** — This flag only disables bloom filter creation, not reading. The reader still attempts to parse existing bloom filters. (75% fail)
