0% found this document useful (0 votes)
1 views4 pages

kafka_arch

Apache Kafka is a distributed, scalable, and fault-tolerant platform for real-time data streaming, consisting of core components such as brokers, topics, partitions, producers, consumers, and Zookeeper. It enables high throughput, low latency, and durability through features like message replication and configurable retention policies. Kafka is suitable for event streaming, log aggregation, and real-time analytics, with additional tools like Kafka Connect and Kafka Streams for integration and processing.

Uploaded by

Pranshu vashisth
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
1 views4 pages

kafka_arch

Apache Kafka is a distributed, scalable, and fault-tolerant platform for real-time data streaming, consisting of core components such as brokers, topics, partitions, producers, consumers, and Zookeeper. It enables high throughput, low latency, and durability through features like message replication and configurable retention policies. Kafka is suitable for event streaming, log aggregation, and real-time analytics, with additional tools like Kafka Connect and Kafka Streams for integration and processing.

Uploaded by

Pranshu vashisth
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Apache Kafka’s architecture is designed

for distributed, scalable, and fault-


tolerant real-time data streaming. Below
is a concise overview of its key
components and structure:

Core Components
Broker:
Kafka runs as a cluster of servers called
brokers.
Each broker stores data, serves clients,
and handles read/write operations.
Brokers are identified by unique IDs and
coordinate via Zookeeper.
Topic:
A logical channel where data (messages)
is published and consumed.
Topics are divided into partitions for
parallelism and scalability.
Each partition is an ordered, immutable
log of messages.
Partition:
A topic is split into multiple
partitions, distributed across brokers.
Partitions enable parallel processing and
load balancing.
Each partition has a leader (handled by
one broker) and replicas (for fault
tolerance) on other brokers.
Producer:
Clients that publish messages to Kafka
topics.
Producers write to the leader partition
of a topic; Kafka handles replication.
Consumer:
Clients that subscribe to topics and read
messages.
Consumers belong to consumer groups for
load balancing; each partition is
consumed by one consumer in a group.
Consumers track their progress using
offsets (position in the partition).
Zookeeper:
A distributed coordination service used
by Kafka for:
Managing broker metadata (e.g., which
broker is the leader for a partition).
Tracking cluster state and configuration.
Handling leader election and failover.
Kafka is moving toward removing Zookeeper
dependency (KRaft mode in newer
versions).
Data Flow
Producers send messages to a topic,
specifying a key (optional) to determine
the target partition.
Messages are appended to the leader
partition and replicated to follower
replicas for durability.
Consumers in a group read from assigned
partitions, pulling messages in order
using offsets.
Kafka retains messages for a configurable
period (or size limit), allowing replay
or late consumption.
Key Architectural Features
Scalability: Add brokers or partitions to
handle more data or traffic.
Fault Tolerance: Replicas ensure data
availability if a broker fails; leaders
are re-elected automatically.
High Throughput: Partitioning and
batching enable millions of messages per
second.
Durability: Messages are persisted to
disk, with configurable retention
policies.
Low Latency: Efficient log-based storage
and zero-copy I/O.
Example Workflow
A producer sends a message to TopicA,
which has 3 partitions.
The message (based on its key) goes to
Partition1, hosted on Broker1 (leader).
Broker1 replicates the message to Broker2
and Broker3 (followers).
A consumer group with two consumers
subscribes to TopicA:
Consumer1 reads from Partition1 and
Partition2.
Consumer2 reads from Partition3.
Zookeeper manages metadata, ensuring
brokers and consumers stay coordinated.
Additional Components
Kafka Connect: For integrating Kafka with
external systems (e.g., databases, S3).
Kafka Streams: A library for building
real-time stream processing applications.
Schema Registry: Manages data schemas for
serialization (often used with Avro).
This architecture makes Kafka ideal for
use cases like event streaming, log
aggregation, and real-time analytics. For
deeper insights, you can explore Kafka’s
documentation or tools like Confluent for
enterprise setups.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy