Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Kafka Streams simplifies application development by building on the Kafka producer and consumer libraries and leveraging the native capabilities of Kafka to offer data parallelism, distributed coordination, fault tolerance, and operational simplicity. In this section, we describe how Kafka Streams works underneath the covers : https://kafka.apache.org/23/documentation/streams/architecture

  1. Kafka as a Messaging System

    Image Modified
  2. Kafka Cluster

    Image Added

  3. Principles of distributed systems (multiple workers or nodes)

    In Kafka, worker nodes are the Kafka brokers. Within a distributed system, there are different roles and responsibilities and there is a hierarchy that starts with a controller or supervisor. A controller is just a worker node like any other.

    Once the controller is established and the workers are assigned and available, the cluster is distributed.

    Image Added

  4. Reliable work distribution
    The work that Cluster of brokers performs is receiving messages, categorizing them into topics and reliably persisting them for eventual retrieval.

    Image Added
  5. Distributed consensus with Apache Zookeeper

Info

Apache Zookeeper is a software developed by Apache that acts as a centralized service and is used to maintain a robust synchronization for distributed systems. It is used to managed and coordinate Kafka brokers : https://kafka.apache.org/documentation/#ecosystem

Image Added

Scaling out will increase levels of reliability and availability.

6. Kafka Versus Queues

Kafka

  • Very scalable.

  • Consumer has to track its position.

  • Order per partition is provided.

  • Each consumer group processes all messages form a topic. We can have multiple consumer groups processing the same topic.

Image Added

Queue

  • Not very scalable.

  • Queue has to track unprocessed messages.

  • Some do not guarantee any order, some guarantee FIFO order.

  • Every message is processed by 1 consumer only.

Image Added