AvroTypeException data schema_error ai_generated true

Avro deserialization fails with 'Unexpected type for field' when reading old records with new schema

ID: data/avro-schema-field-type-mismatch

Also available as: JSON · Markdown · 中文
80%Fix Rate
87%Confidence
1Evidence
2024-01-05First Seen

Version Compatibility

VersionStatusIntroducedDeprecatedNotes
Apache Avro 1.11+ active
Confluent Schema Registry 7.5+ active
Kafka Avro Deserializer 7.6+ active

Root Cause

An Avro schema evolution change modified a field's type (e.g., from 'int' to 'long' or from 'string' to 'bytes') without proper backward compatibility, causing deserialization of old records to fail.

generic

中文

Avro模式演化更改修改了字段类型(例如从'int'改为'long'或从'string'改为'bytes'),没有适当的向后兼容性,导致旧记录反序列化失败。

Official Documentation

https://avro.apache.org/docs/1.11.1/spec.html#Schema+Evolution

Workarounds

  1. 85% success Create a new schema version that uses a union type to accept both old and new types: change `"type": "long"` to `"type": ["int", "long"]`. Register this as a new version and update the reader to use it.
    Create a new schema version that uses a union type to accept both old and new types: change `"type": "long"` to `"type": ["int", "long"]`. Register this as a new version and update the reader to use it.
  2. 75% success Use a custom Avro datum reader that overrides the type resolution logic. For Java: extend `SpecificDatumReader` and override `resolveType` to handle the old type.
    Use a custom Avro datum reader that overrides the type resolution logic. For Java: extend `SpecificDatumReader` and override `resolveType` to handle the old type.
  3. 80% success Reprocess old records through a converter that transforms them to the new schema, using a reader with the old schema and a writer with the new schema. Example with Apache Avro tools: `java -jar avro-tools-1.11.1.jar fromjson --schema-file old_schema.avsc old_data.json | java -jar avro-tools-1.11.1.jar tojson --schema-file new_schema.avsc > new_data.json`
    Reprocess old records through a converter that transforms them to the new schema, using a reader with the old schema and a writer with the new schema. Example with Apache Avro tools: `java -jar avro-tools-1.11.1.jar fromjson --schema-file old_schema.avsc old_data.json | java -jar avro-tools-1.11.1.jar tojson --schema-file new_schema.avsc > new_data.json`

中文步骤

  1. 创建一个使用联合类型以接受新旧两种类型的新模式版本:将`"type": "long"`改为`"type": ["int", "long"]`。将其注册为新版本并更新读取器使用它。
  2. 使用自定义Avro datum读取器覆盖类型解析逻辑。对于Java:扩展`SpecificDatumReader`并重写`resolveType`以处理旧类型。
  3. 通过转换器重新处理旧记录,使用旧模式的读取器和新模式的写入器将其转换为新模式。使用Apache Avro工具的示例:`java -jar avro-tools-1.11.1.jar fromjson --schema-file old_schema.avsc old_data.json | java -jar avro-tools-1.11.1.jar tojson --schema-file new_schema.avsc > new_data.json`

Dead Ends

Common approaches that don't work:

  1. 95% fail

    This is a destructive operation that causes data loss and is often not feasible in production.

  2. 85% fail

    This does not fix the existing deserialization failure; old records still have the old type and cannot be read with the new schema.

  3. 70% fail

    This allows incompatible changes but does not help deserialize existing data; the error occurs at read time, not registration time.