Kafka's Architecture and Kafka Connect

Kafka Connect Architecture - High Level

KC Architecture

Kafka Connect and Flow of data

KC Data Flow

Kafka Connect Worker

Worker Standalone Vs Distributed in cluster

Standalone - All work is performed in a single process.

Distributed - Work is shared and balanced across multiple nodes.

Kafka Connect Standalone

Worker in standalone mode

Configuration to our connectors is by using a CONFIGURATION FILE AND NOT A REST API.

The offset management is done through a simple file that is stored on the same machine that the worker process is running.

Kafka Connect Distributed

Worker in distributed mode

Kafka Connect Connectors

Connectors and Multiple threads with multiple instances of connector

Connectors : Source and Sink Connectors with SMT

Kafka Connect Tasks

Tasks of Kafka Connect

Each Kafka Connect task runs on a separate thread. There is a one-to-one relationship between the number of tasks and the number of worker threads.

From Kafka Connect to Apache Kafka

For Worker 1 (Task) to Broker 1 (Topic)

If we increase the number of tasks to 4, then we would have task 4 on Worker 1 and we would have partition-4 on Broker 1. This work distribution is done automatically and we do not have to intervene at all.

Work distribution according the strategy used (round robin strategy, etc.). If one the worker goes down, all tasks performed by that worker will be automatically redistributed to the other workers (Rebalancing).

Just bringing the worker back online will trigger another rebalancing and the tasks previously assigned to our faulted worker will be restored.

Automatic redistribution and rebalancing