Kafka Connect Architecture - High Level
Kafka Connect and Flow of data
Kafka Connect Worker
Standalone - All work is performed in a single process.
Distributed - Work is shared and balanced across multiple nodes.
Kafka Connect Standalone
Configuration to our connectors is by using a CONFIGURATION FILE AND NOT A REST API.
The offset management is done through a simple file that is stored on the same machine that the worker process is running.
Kafka Connect Distributed
Kafka Connect Connectors
Kafka Connect Tasks
Each Kafka Connect task runs on a separate thread. There is a one-to-one relationship between the number of tasks and the number of worker threads.
From Kafka Connect to Apache Kafka
If we increase the number of tasks to 4, then we would have task 4 on Worker 1 and we would have partition-4 on Broker 1. This work distribution is done automatically and we do not have to intervene at all.
Work distribution according the strategy used (round robin strategy, etc.). If one the worker goes down, all tasks performed by that worker will be automatically redistributed to the other workers (Rebalancing).
Just bringing the worker back online will trigger another rebalancing and the tasks previously assigned to our faulted worker will be restored.