Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Kafka Connect and Flow of data

...

Kafka Connect Worker

...

Standalone - All work is performed in a single process.

Distributed - Work is shared and balanced across multiple nodes.

Kafka Connect Standalone

...

Info

Configuration to our connectors is by using a CONFIGURATION FILE AND NOT A REST API.

The offset management is done through a simple file that is stored on the same machine that the worker process is running.

Kafka Connect Distributed

...

Kafka Connect Connectors

...

Kafka Connect Tasks

...

Each Kafka Connect task runs on a separate thread. There is a one-to-one relationship between the number of tasks and the number of worker threads.

From Kafka Connect to Apache Kafka

...

If we increase the number of tasks to 4, then we would have task 4 on Worker 1 and we would have partition-4 on Broker 1. This work distribution is done automatically and we do not have to intervene at all.

Work distribution according the strategy used (round robin strategy, etc.). If one the worker goes down, all tasks performed by that worker will be automatically redistributed to the other workers (Rebalancing).

Just bringing the worker back online will trigger another rebalancing and the tasks previously assigned to our faulted worker will be restored.

...