Kafka's Architecture and Kafka Connect

Kafka Connect Architecture - High Level

Kafka Connect and Flow of data

Kafka Connect Worker

Standalone - All work is performed in a single process.

Distributed - Work is shared and balanced across multiple nodes.

Kafka Connect Standalone

Configuration to our connectors is by using a CONFIGURATION FILE AND NOT A REST API.

The offset management is done through a simple file that is stored on the same machine that the worker process is running.

Kafka Connect Distributed

Kafka Connect Connectors

Kafka Connect Tasks

Each Kafka Connect task runs on a separate thread. There is a one-to-one relationship between the number of tasks and the number of worker threads.

From Kafka Connect to Apache Kafka

If we increase the number of tasks to 4, then we would have task 4 on Worker 1 and we would have partition-4 on Broker 1. This work distribution is done automatically and we do not have to intervene at all.

Work distribution according the strategy used (round robin strategy, etc.). If one the worker goes down, all tasks performed by that worker will be automatically redistributed to the other workers (Rebalancing).

Just bringing the worker back online will trigger another rebalancing and the tasks previously assigned to our faulted worker will be restored.