Avro反序列化失败:使用新模式读取旧记录时出现'字段类型意外'
Avro deserialization fails with 'Unexpected type for field' when reading old records with new schema
ID: data/avro-schema-field-type-mismatch
版本兼容性
| 版本 | 状态 | 引入 | 弃用 | 备注 |
|---|---|---|---|---|
| Apache Avro 1.11+ | active | — | — | — |
| Confluent Schema Registry 7.5+ | active | — | — | — |
| Kafka Avro Deserializer 7.6+ | active | — | — | — |
根因分析
Avro模式演化更改修改了字段类型(例如从'int'改为'long'或从'string'改为'bytes'),没有适当的向后兼容性,导致旧记录反序列化失败。
English
An Avro schema evolution change modified a field's type (e.g., from 'int' to 'long' or from 'string' to 'bytes') without proper backward compatibility, causing deserialization of old records to fail.
官方文档
https://avro.apache.org/docs/1.11.1/spec.html#Schema+Evolution解决方案
-
创建一个使用联合类型以接受新旧两种类型的新模式版本:将`"type": "long"`改为`"type": ["int", "long"]`。将其注册为新版本并更新读取器使用它。
-
使用自定义Avro datum读取器覆盖类型解析逻辑。对于Java:扩展`SpecificDatumReader`并重写`resolveType`以处理旧类型。
-
通过转换器重新处理旧记录,使用旧模式的读取器和新模式的写入器将其转换为新模式。使用Apache Avro工具的示例:`java -jar avro-tools-1.11.1.jar fromjson --schema-file old_schema.avsc old_data.json | java -jar avro-tools-1.11.1.jar tojson --schema-file new_schema.avsc > new_data.json`
无效尝试
常见但无效的做法:
-
95% 失败
This is a destructive operation that causes data loss and is often not feasible in production.
-
85% 失败
This does not fix the existing deserialization failure; old records still have the old type and cannot be read with the new schema.
-
70% 失败
This allows incompatible changes but does not help deserialize existing data; the error occurs at read time, not registration time.