Avro
Posted: February 01, 2025 - Updated: February 03, 2025
What’s that?
Language-agnostic data serialisation system.
Binary format for rich data structures.
Note: playground for this note can be found here
What is schema
Schema is format definition for binary data.
Schema is stored with avro data files, so each file can be easy be read.
For transaction purpose schema can be sent once (on handshake) and then operate fast.
Schema stored as JSON.
- *.avsc - avro schema JSON file. Used for schema definition.
- *.avpr - avro protocol JSON file. Used for RPC protocol definition.
- *.avro - avro data file. Used for data storage.
- *.avdl - avro IDL file. New format for both schema and protocol definitions.
Usage
One of advantages is that Avro is language-agnostic.
- Data transfer format
- Data storage format
- RPC engine (example with Netty:
org.apache.avro:avro-ipc-netty:+
)
- message format for Kafka
- Hadoop map-reduce as input/output format
Schema evolution
Avro allows to evolve schema without breaking compatibility. For example add new field, remove field, change field type, rename field, change default value, etc.
Note - evolution is not the same as versioning. In short - your new changes must be compatible with both old and possible new schemas (or not, but then you simply must follow correct ordering of upgrading consumers/producers). Very first advice here will be: defining default values is important(on addition older records will use default from schema, on removing older records will use default from data).
Schema registry
- Spring Cloud Schema Registry
Is a part of Spring Cloud Stream
- Confluent Schema Registry
(https://docs.confluent.io/platform/current/schema-registry/index.html)
- Azure Schema Registry (https://docs.microsoft.com/en-us/azure/event-hubs/schema-registry-overview)
Also client library for Java.
- IBM’s Event Streams: https://cloud.ibm.com/docs/EventStreams?topic=EventStreams-ES_schema_registry
- AWS Glue Schema registry: https://docs.aws.amazon.com/glue/latest/dg/schema-registry.html
Security
Alternatives
- Binary:
- Text:
- json
- yaml
- xml (jaxb, soap, xstream)
More topics to discover
- Avro usage in: Kafka, Hadoop, RabbitMQ
- Avro IDL cook book (hard to find one and even get full ist of avro IDL features, example the
@javaAnnotation
annotation supported but missing most of the tutorials)
Resources
More examples
Comments