AvroTypeException data schema_error ai_generated true

Avro反序列化失败:使用新模式读取旧记录时出现'字段类型意外'

Avro deserialization fails with 'Unexpected type for field' when reading old records with new schema

ID: data/avro-schema-field-type-mismatch

其他格式: JSON · Markdown 中文 · English
80%修复率
87%置信度
1证据数
2024-01-05首次发现

版本兼容性

版本状态引入弃用备注
Apache Avro 1.11+ active
Confluent Schema Registry 7.5+ active
Kafka Avro Deserializer 7.6+ active

根因分析

Avro模式演化更改修改了字段类型(例如从'int'改为'long'或从'string'改为'bytes'),没有适当的向后兼容性,导致旧记录反序列化失败。

English

An Avro schema evolution change modified a field's type (e.g., from 'int' to 'long' or from 'string' to 'bytes') without proper backward compatibility, causing deserialization of old records to fail.

generic

官方文档

https://avro.apache.org/docs/1.11.1/spec.html#Schema+Evolution

解决方案

  1. 创建一个使用联合类型以接受新旧两种类型的新模式版本:将`"type": "long"`改为`"type": ["int", "long"]`。将其注册为新版本并更新读取器使用它。
  2. 使用自定义Avro datum读取器覆盖类型解析逻辑。对于Java:扩展`SpecificDatumReader`并重写`resolveType`以处理旧类型。
  3. 通过转换器重新处理旧记录,使用旧模式的读取器和新模式的写入器将其转换为新模式。使用Apache Avro工具的示例:`java -jar avro-tools-1.11.1.jar fromjson --schema-file old_schema.avsc old_data.json | java -jar avro-tools-1.11.1.jar tojson --schema-file new_schema.avsc > new_data.json`

无效尝试

常见但无效的做法:

  1. 95% 失败

    This is a destructive operation that causes data loss and is often not feasible in production.

  2. 85% 失败

    This does not fix the existing deserialization failure; old records still have the old type and cannot be read with the new schema.

  3. 70% 失败

    This allows incompatible changes but does not help deserialize existing data; the error occurs at read time, not registration time.