Understanding Brokers, Partitions and Topics

  1. Topics

    - Kafka topics are really just a named feed or category of messages.
    - Kafka topics are a logical entity.
    - for each topic, Kafka cluster maintains one or more physical log files.

    Topic through brokers


    Each message represents an event or fact that from the perspective of the producer, it will be available to potential consumers.
    The message are immutable. Once they are received into a topic, they cannot be changed. So, if a message is not valid, it will be not valid in the topic. And , the consumer has to reconcile between the messages when it reads and processes them.

    Append-Only Messages & Ordered sequence


    Note : Message 3 (in red) is invalid. So, the only recourse is to follow up that invalid message with a new message (Message 5 in green).

    Each message has a Timestamp, a Referenceable identifier, a binary payload of data.

    How do the consumers maintain their autonomy as far as message consumption from a common topic ? IT’s called the message offset. IT’s how consumers can do read messages at their own pace.

    OFFSET = PLACEHOLDER AS THE LAST READ MESSAGE POSITION + MAINTAINED BY THE KAFKA CONSUMER + REFERS TO A MESSAGE IDENTIFIER

     

  2. Partitions & Brokers

    The topic (as logical concept) is represented by 1 or more physical log files called PARTITIONS. The number of partitions in a topic is configurable.
    A partition can :
    - Scale
    - Provide for greater levels of fault tolerance
    - Provide higher levels of throughput
    Each partition is maintained on at least one or more Brokers !

Each topic has to have a partition because it’s the physical representation of the topic as a commit log stored on broker.

Scenario A : A topic split across 1 partition, across 1 log file.

Scenario B : A topic split across 3 partitions, across 3 log files on 3 machines (brokers).

Each partition is mutually exclusive from one another in that they receive unique messages from a Kafka producer producing on the same topic. It’s a time-ordered sequence of events.