Data Processing using Transforms and Converters

Reminder

SMT and Converters

Stream vs. Message Processing

Stream Processing

has access to multiple data streams

factory requires either one or more input to perform its job :

  • joining 2 data streams to form another one.

  • perform a per-message processing.

  • aggregate all the values from a data stream into a single one.

  • establish a windowed periods in which all operations are performed.

Message Processing

has access to only one data stream

factory can accept multiple messages or inputs and for one input, we have one output.

SMT (Single Message Transform) created with this mindset in mind : limited set of operations to the message, operation performed at message level.

Message Processing

SMT cannot be used for Stream Processing.

SMT (Single Message Transform) Implementing Transformation (Generic Object of Kafka Library)

SMT has to override functionalities of the “Transformation” Class.

Converters

All messages are stored in a byte[] format (binary) and there is no serialization or deserialization happening on the Kafka broker. But, producers and consumers have to serialize and deserialize the data, by using CONVERTERS.