0% found this document useful (0 votes)
261 views

Cloudurable Kafka Tutorial v1 PDF

Uploaded by

akash b
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
261 views

Cloudurable Kafka Tutorial v1 PDF

Uploaded by

akash b
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 79

Cassandra / Kafka Support in EC2/AWS.

Kafka Training, Kafka Consulting


Cassandra and Kafka Support on AWS/EC2

Cloudurable Support around Cassandra

Introduction to Kafka
and Kafka running in EC2
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting

Cassandra / Kafka Support in EC2/AWS

Kafka Introduction Kafka messaging


Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting

What is Kafka?

❖ Distributed Streaming Platform


❖ Publish and Subscribe to streams of records
❖ Fault tolerant storage
❖ Process records as they occur
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting

Kafka Usage
❖ Build real-time streaming data pipe-lines
❖ Enable in-memory microservices (actors, Akka, Vert.x,
Qbit)
❖ Build real-time streaming applications that react to
streams
❖ Real-time data analytics
❖ Transform, react, aggregate, join real-time data flows
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting

Kafka Use Cases


❖ Metrics / KPIs gathering
❖ Aggregate statistics from many sources
❖ Even Sourcing
❖ Used with microservices (in-memory) and actor systems
❖ Commit Log
❖ External commit log for distributed systems. Replicated
data between nodes, re-sync for nodes to restore state
❖ Real-time data analytics, Stream Processing, Log Aggregation,
Messaging, Click-stream tracking, Audit trail, etc.
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting

Who uses Kafka?


❖ LinkedIn: Activity data and operational metrics
❖ Twitter: Uses it as part of Storm – stream processing
infrastructure
❖ Square: Kafka as bus to move all system events to various
Square data centers (logs, custom events, metrics, an so
on). Outputs to Splunk, Graphite, Esper-like alerting
systems
❖ Spotify, Uber, Tumbler, Goldman Sachs, PayPal, Box, Cisco,
CloudFlare, DataDog, LucidWorks, MailChimp, NetFlix,
etc.
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting

Kafka: Topics, Producers, and Consumers

Kafka Consumer
record Cluster
Producer
Consumer
Producer Topic
record
Producer
Consumer
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting

Kafka Fundamentals
❖ Records have a key, value and timestamp
❖ Topic a stream of records (“/orders”, “/user-signups”), feed name
❖ Log topic storage on disk
❖ Partition / Segments (parts of Topic Log)
❖ Producer API to produce a streams or records
❖ Consumer API to consume a stream of records
❖ Broker: Cluster of Kafka servers running in cluster form broker. Consists on many
processes on many servers
❖ ZooKeeper: Does coordination of broker and consumers. Consistent file system
for configuration information and leadership election
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting

Kafka Performance details


❖ Topic is like a feed name “/shopping-cart-done“, “/user-signups", which Producers write to and Consumers read from
❖ Topic associated with a log which is data structure on disk
❖ Producer(s) append Records at end of Topic log
❖ Whilst many Consumers read from Kafka at their own cadence
❖ Each Consumer (Consumer Group) tracks offset from where they left off reading
❖ How can Kafka scale if multiple producers and consumers read/write to the same Kafka Topic log?
❖ Sequential writes to filesystem are fast (700 MB or more a second)
❖ Kafka scales writes and reads by sharding Topic logs into Partitions (parts of a Topic log)
❖ Topics logs can be split into multiple Partitions different machines/different disks
❖ Multiple Producers can write to different Partitions of the same Topic
❖ Multiple Consumers Groups can read from different partitions efficiently
❖ Partitions can be distributed on different machines in a cluster
❖ high performance with horizontal scalability and failover
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting

Kafka Fundamentals 2
❖ Kafka uses ZooKeeper to form Kafka Brokers into a cluster
❖ Each node in Kafka cluster is called a Kafka Broker
❖ Partitions can be replicated across multiple nodes for failover
❖ One node/partition’s replicas is chosen as leader
❖ Leader handles all reads and writes of Records for partition
❖ Writes to partition are replicated to followers (node/partition pair)
❖ An follower that is in-sync is called an ISR (in-sync replica)
❖ If a partition leader fails, one ISR is chosen as new leader
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting

ZooKeeper does coordination for Kafka Consumer and Kafka


Cluster
ZooKeeper

Kafka Broker Consumer


Producer Kafka Broker
Kafka Broker Consumer
Producer Consumer
Topic
Producer
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting

Replication of Kafka Partitions 0


Record is considered "committed"
when all ISRs for partition wrote to their log
ISR = in-sync replica Leader Red
Client Producer
Only committed records are Follower Blue
readable from consumer
1) Write record

Kafka Broker 0 Kafka Broker 1 Kafka Broker 2


2) Replicate 2) Replicate
record record
Partition 0 Partition 0 Partition 0

Partition 1 Partition 1 Partition 1

Partition 2 Partition 2 Partition 2

Partition 3 Partition 3 Partition 3

Partition 4 Partition 4 Partition 4


Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting

Replication of Kafka Partitions 1


Leader Red
Client Producer
Another partition can be owned
by another leader on another Kafka broker
Follower Blue

1) Write record

Kafka Broker 1
Kafka Broker 0 Kafka Broker 2

Partition 0 Partition 0 Partition 0


2) Replicate 2) Replicate
record record
Partition 1 Partition 1 Partition 1

Partition 2 Partition 2 Partition 2

Partition 3 Partition 3 Partition 3

Partition 4 Partition 4 Partition 4


Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting

Kafka Extensions

❖ Streams API to transform, aggregate, process records


from a stream and produce derivative streams
❖ Connector API reusable producers and consumers (e.g.,
stream of changes from DynamoDB)
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting

Kafka Connectors and Streams


Connectors

DB DB

Consumers
Producers
App
Kafka
App
Cluster App
App
App
App
App App

Streams
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting

Kafka Polyglot clients / Wire protocol

❖ Kafka communication from clients and servers wire


protocol over TCP protocol
❖ Protocol versioned
❖ Maintains backwards compatibility
❖ Many languages supported
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting

Topics and Logs


❖ Topic is a stream of records
❖ Topics stored in log
❖ Log broken up into partitions and segments
❖ Topic is a category or stream name
❖ Topics are pub/sub
❖ Can have zero or many consumer groups (subscribers)
❖ Topics are broken up into partitions for speed and size
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting

Topic Partitions
❖ Topics are broken up into partitions
❖ Partitions are decided usually by key of record
❖ Key of record determines which partition
❖ Partitions are used to scale Kafka across many servers
❖ Record sent to correct partition by key
❖ Partitions are used to facilitate parallel consumers
❖ Records are consumed in parallel up to the number of
partitions
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting

Partition Log
❖ Order is maintained only in a single partition
❖ Partition is ordered, immutable sequence of records that is continually appended to—a structured
commit log
❖ Producers write at their own cadence so order of Records cannot be guaranteed across partitions
❖ Producers pick the partition such that Record/messages goes to a given same partition based on the data
❖ Example have all the events of a certain 'employeeId' go to same partition
❖ If order within a partition is not needed, a 'Round Robin' partition strategy can be used so Records are
evenly distributed across partitions.
❖ Records in partitions are assigned sequential id number called the offset 
❖ Offset identifies each record within the partition
❖ Topic Partitions allow Kafka log to scale beyond a size that will fit on a single server
❖ Topic partition must fit on servers that host it, but topic can span many partitions hosted by many servers
❖ Topic Partitions are unit of parallelism - each consumer in a consumer group can work on one partition at a
time
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting

Kafka Topic Partitions Layout

Partition
0 0 1 2 3 4 5 6 7 8 9 10 11 12

Partition
1 0 1 2 3 4 5 6 7 8 9
Writes
Partition
2 0 1 2 3 4 5 6 7 8 9 10 11

Partition 0 1 2 3 4 5 6 7 8
3

Older Newer
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting

Kafka Record retention


❖ Kafka cluster retains all published records

❖ Time based – configurable retention period

❖ Size based

❖ Compaction

❖ Retention policy of three days or two weeks or a month

❖ It is available for consumption until discarded by time, size or


compaction

❖ Consumption speed not impacted by size

Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting


Kafka Consumers / Producers


Producers

Partition
0 0 1 2 3 4 5 6 7 8 9 10 11 12

Consumer Group B

Consumer Group A

Consumers remember offset where they left off.

Consumers groups each have their own offset.


Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting

Kafka Partition Distribution


❖ Each partition has leader server and zero or more follower
servers

❖ Leader handles all read and write requests for partition


❖ Followers replicate leader, and take over if leader dies
❖ Used for parallel consumer handling within a group

❖ Partitions of log are distributed over the servers in the Kafka cluster
with each server handling data and requests for a share of
partitions

❖ Each partition can be replicated across a configurable number of


Kafka servers

❖ Used for fault tolerance


Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting

Kafka Producers
❖ Producers send records to topics

❖ Producer picks which partition to send record to per topic

❖ Can be done in a round-robin


❖ Can be based on priority

❖ Typically based on key of record


❖ Kafka default partitioner for Java uses hash of keys to
choose partitions, or a round-robin strategy if no key
❖ Important: Producer picks partition
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting

Kafka Consumer Groups


❖ Consumers are grouped into a Consumer Group 

❖ Consumer group has a unique id


❖ Each consumer group is a subscriber

❖ Each consumer group maintains its own offset

❖ Multiple subscribers = multiple consumer groups

❖ A Record is delivered to one Consumer in a Consumer Group


❖ Each consumer in consumer groups takes records and only one
consumer in group gets same record

❖ Consumers in Consumer Group load balance record


consumption
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting

Kafka Consumer Groups 2


❖ How does Kafka divide up topic so multiple Consumers in a consumer
group can process a topic?
❖ Kafka makes you group consumers into consumers group with a group id
❖ Consumer with same id belong in same Consumer Group
❖ One Kafka broker becomes group coordinator for Consumer Group
❖ assigns partitions when new members arrive (older clients would talk
direct to ZooKeeper now broker does coordination)
❖ or reassign partitions when group members leave or topic changes
(config / meta-data change
❖ When Consumer group is created, offset set according to reset policy of topic
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting

Kafka Consumer Group 3


❖ If Consumer fails before sending commit offset XXX to Kafka broker,
❖ different Consumer can continue from the last committed offset
❖ some Kafka records could be reprocessed (at least once behavior)
❖ "Log end offset" is offset of last record written to log partition and where Producers write to
next
❖ "High watermark" is offset of last record that was successfully replicated to all partitions
followers
❖ Consumer only reads up to the “high watermark”. Consumer can’t read un-replicated data
❖ Only a single Consumer from the same Consumer Group can access a single Partition
❖ If Consumer Group count exceeds Partition count:
❖ Extra Consumers remain idle; can be used for failover
❖ If more Partitions than Consumer Group instances,
❖ Some Consumers will read from more than one partition
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting

2 server Kafka cluster hosting 4 partitions (P0-P5)

Kafka Cluster

Server 1 Server 2

P2 P3 P4 P0 P1 P5

C0 C1 C3 C0 C1 C3

Consumer Group A Consumer Group B


Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting

Kafka Consumer Consumption


❖ Kafka Consumer consumption divides partitions over consumer instances

❖ Each Consumer is exclusive consumer of a "fair share" of partitions

❖ Consumer membership in group is handled by the Kafka protocol


dynamically

❖ If new Consumers join Consumer group they get share of partitions

❖ If Consumer dies, its partitions are split among remaining live


Consumers in group

❖ Order is only guaranteed within a single partition

❖ Since records are typically stored by key into a partition then order per
partition is sufficient for most use cases
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting

Kafka vs JMS Messaging


❖ It is a bit like both Queues and Topics in JMS
❖ Kafka is a queue system per consumer in consumer group so load
balancing like JMS queue
❖ Kafka is a topic/pub/sub by offering Consumer Groups which act
like subscriptions
❖ Broadcast to multiple consumer groups
❖ By design Kafka is better suited for scale due to partition topic log
❖ Also by moving location in log to client/consumer side of equation
instead of the broker, less tracking required by Broker
❖ Handles parallel consumers better
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting

Kafka scalable message storage


❖ Kafka acts as a good storage system for records/messages
❖ Records written to Kafka topics are persisted to disk and replicated to
other servers for fault-tolerance
❖ Kafka Producers can wait on acknowledgement
❖ Write not complete until fully replicated
❖ Kafka disk structures scales well
❖ Writing in large streaming batches is fast
❖ Clients/Consumers control read position (offset)
❖ Kafka acts like high-speed file system for commit log storage,
replication
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting

Kafka Stream Processing


❖ Kafka for Stream Processing

❖ Kafka enable real-time processing of streams.

❖ Kafka supports stream processor

❖ Stream processor takes continual streams of records from input topics, performs some
processing, transformation, aggregation on input, and produces one or more output
streams

❖ A video player app might take in input streams of videos watched and videos paused, and
output a stream of user preferences and gear new video recommendations based on recent
user activity or aggregate activity of many users to see what new videos are hot

❖ Kafka Stream API solves hard problems with out of order records, aggregating across
multiple streams, joining data from multiple streams, allowing for stateful computations,
and more

❖ Stream API builds on core Kafka primitives and has a life of its own
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting

Using Kafka Single Node


Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting

Run Kafka

❖ Run ZooKeeper
❖ Run Kafka Server/Broker
❖ Create Kafka Topic
❖ Run producer
❖ Run consumer
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting

Run ZooKeeper
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting

Run Kafka Server


Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting

Create Kafka Topic


Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting

Kafka Producer
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting

Kafka Consumer
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting

Running Kafka Producer and Consumer


Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting

Use Kafka to send and receive messages

Lab 1-A Use Kafka Use single server version of


Kafka
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting

Using Kafka Cluster


Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting

Running many nodes

❖ Modify properties files


❖ Change port
❖ Change Kafka log location
❖ Start up many Kafka server instances
❖ Create Replicated Topic
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting

Leave everything from before running


Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting

Create two new server.properties files

❖ Copy existing server.properties to server-1.properties,


server-2.properties
❖ Change server-1.properties to use port 9093, broker id 1,
and log.dirs “/tmp/kafka-logs-1”
❖ Change server-2.properties to use port 9094, broker id 2,
and log.dirs “/tmp/kafka-logs-2”
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting

server-x.properties
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting

Start second and third servers


Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting

Create Kafka replicated topic my-failsafe-topic


Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting

Start Kafka consumer and producer


Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting

Kafka consumer and producer running


Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting

Use Kafka Describe Topic

The leader is broker 0


There is only one partition
There are three in-sync replicas (ISR)
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting

Test Failover by killing 1st server

Use Kafka topic describe to see that a new leader was elected!

NEW LEADER IS 2!
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting

Use Kafka to send and receive messages

Lab 2-A Use Kafka Use a Kafka Cluster to


replicate a Kafka topic log
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting

Kafka Consumer and Working with producers and


consumers

Producers Step by step first example


Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting

Objectives Create Producer and Consumer example

❖ Create simple example that creates a Kafka Consumer


and a Kafka Producer
❖ Create a new replicated Kafka topic
❖ Create Producer that uses topic to send records
❖ Send records with Kafka Producer
❖ Create Consumer that uses topic to receive messages
❖ Process messages from Kafka with Consumer
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting

Create Replicated Kafka Topic


Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting

Build script
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting

Create Kafka Producer to send records

❖ Specify bootstrap servers


❖ Specify client.id
❖ Specify Record Key serializer
❖ Specify Record Value serializer
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting

Common Kafka imports and constants


Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting

Create Kafka Producer to send records


Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting

Send sync records with Kafka Producer

The response RecordMetadata has 'partition' where record was written and the 'offset' of the record.
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting

Send async records with Kafka Producer


Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting

Create Consumer using Topic to Receive Records

❖ Specify bootstrap servers


❖ Specify client.id
❖ Specify Record Key deserializer
❖ Specify Record Value deserializer
❖ Specify Consumer Group
❖ Subscribe to Topic
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting

Create Consumer using Topic to Receive Records


Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting

Process messages from Kafka with Consumer


Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting

Consumer poll
❖ poll() method returns fetched records based on current
partition offset
❖ Blocking method waiting for specified time if no records
available
❖ When/If records available, method returns straight away
❖ Control the maximum records returned by the poll() with
props.put(ConsumerConfig.MAX_POLL_RECORDS_CON
FIG, 100);
❖ poll() is not meant to be called from multiple threads
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting

Running both Consumer and Producer


Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting

Java Kafka simple example recap


❖ Created simple example that creates a Kafka Consumer
and a Kafka Producer
❖ Created a new replicated Kafka topic
❖ Created Producer that uses topic to send records
❖ Send records with Kafka Producer
❖ Created Consumer that uses topic to receive messages
❖ Processed records from Kafka with Consumer
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting

Kafka design Design discussion of Kafka


Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting

Kafka Design Motivation


❖ Kafka unified platform for handling real-time data feeds/streams
❖ High-throughput supports high volume event streams like log aggregation
❖ Must support real-time analytics
❖ real-time processing of streams to create new, derived streams
❖ inspired partitioning and consumer model
❖ Handle large data backlogs - periodic data loads from offline systems
❖ Low-latency delivery to handle traditional messaging use-cases
❖ Scale writes and reads via partitioned, distributed, commit logs
❖ Fault-tolerance for machine failures
❖ Kafka design is more like database transaction log than a traditional messaging system
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting

Persistence: Embrace filesystem


❖ Kafka relies heavily on filesystem for storing and caching messages/records
❖ Disk performance of hard drives performance of sequential writes is fast
❖ JBOD configuration with six 7200rpm SATA RAID-5 array is about 600MB/sec
❖ Sequential reads and writes are predictable, and are heavily optimized by operating systems
❖ Sequential disk access can be faster than random memory access and SSD
❖ Operating systems use available of main memory for disk caching
❖ JVM GC overhead is high for caching objects whilst OS file caches are almost free
❖ Filesystem and relying on page-cache is preferable to maintaining an in-memory cache in the
JVM
❖ By relying on the OS page cache Kafka greatly simplifies code for cache coherence
❖ Since Kafka disk usage tends to do sequential reads the read-ahead cache of the OS pre-
populating its page-cache
Cassandra, Netty, and Varnish use similar techniques.
The above is explained well in the Kafka Documentation.
And there is a more entertaining explanation at the Varnish site.
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting

Long sequential disk access


❖ Like Cassandra, LevelDB, RocksDB, and others Kafka uses a
form of log structured storage and compaction instead of an
on-disk mutable BTree
❖ Kafka uses tombstones instead of deleting records right away
❖ Since disks these days have somewhat unlimited space and
are very fast, Kafka can provide features not usually found in
a messaging system like holding on to old messages for a
really long time
❖ This flexibility allows for interesting application of Kafka
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting

Kafka compression
❖ Kafka provides End-to-end Batch Compression
❖ Bottleneck is not always CPU or disk but often network bandwidth
❖ especially in cloud and virtualized environments
❖ especially when talking datacenter to datacenter or WAN
❖ Instead of compressing records one at a time…
❖ Kafka enable efficient compression of a whole batch or a whole message set or message
batch
❖ Message batch can be compressed and sent to Kafka broker/server in one go
❖ Message batch will be written in compressed form in log partition
❖ don’t get decompressed until they consumer
❖ GZIP, Snappy and LZ4 compression protocols supported

Read more at Kafka documents on end to end compression.


Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting

Kafka Producer Load Balancing


❖ Producer sends records directly to Kafka broker partition
leader
❖ Producer asks Kafka broker for metadata about which
Kafka broker has which topic partitions leaders - thus no
routing layer needed
❖ Producer client controls which partition it publishes
messages to
❖ Partitioning can be done by key, round-robin or using a
custom semantic partitioner
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting

Kafka Producer Record Batching


❖ Kafka producers support record batching
❖ Batching is good for efficient compression and network IO throughput
❖ Batching can be configured by size of records in bytes in batch
❖ Batches can be auto-flushed based on time
❖ See code example on the next slide
❖ Batching allows accumulation of more bytes to send, which equate to few larger I/
O operations on Kafka Brokers and increase compression efficiency
❖ Buffering is configurable and lets you make a tradeoff between additional latency
for better throughput
❖ Or in the case of an heavily used system, it could be both better average
throughput and
QBit a microservice library uses message batching in an identical fashion as Kafka
to send messages over WebSocket between nodes and from client to QBit server.
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting

More producer settings for performance

For higher throughput, Kafka Producer allows buffering based on time and size.
Multiple records can be sent as a batches with fewer network requests.
Speeds up throughput drastically.
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting

Stay tuned

❖ More to come
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting

References
❖ Learning Apache Kafka, Second Edition 2nd Edition by
Nishant Garg (Author), 2015, ISBN 978-1784393090, Packet
Press
❖ Apache Kafka Cookbook, 1st Edition, Kindle Edition by
Saurabh Minni (Author), 2015, ISBN 978-1785882449, Packet
Press
❖ Kafka Streams for Stream processing: A few words about how
Kafka works, Serban Balamaci, 2017, Blog: Plain Ol' Java
❖ Kafka official documentation, 2017

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy