Cloudera Kafka
Cloudera Kafka
Cloudera Kafka
Important Notice
© 2010-2020 Cloudera, Inc. All rights reserved.
Hadoop and the Hadoop elephant logo are trademarks of the Apache Software
Foundation. All other trademarks, registered trademarks, product names and company
names or logos mentioned in this document are the property of their respective owners.
Reference to any products, services, processes or other information, by trade name,
trademark, manufacturer, supplier or otherwise does not constitute or imply
endorsement, sponsorship or recommendation thereof by us.
Complying with all applicable copyright laws is the responsibility of the user. Without
limiting the rights under copyright, no part of this document may be reproduced, stored
in or introduced into a retrieval system, or transmitted in any form or by any means
(electronic, mechanical, photocopying, recording, or otherwise), or for any purpose,
without the express written permission of Cloudera.
The information in this document is subject to change without notice. Cloudera shall
not be liable for any damages resulting from technical errors or omissions which may
be present in this document, or from use of this document.
Cloudera, Inc.
395 Page Mill Road
Palo Alto, CA 94306
US: 1-888-789-1488
Intl: 1-650-362-0488
Release Information
Kafka Setup............................................................................................................14
Hardware Requirements....................................................................................................................................14
Kafka Performance Considerations....................................................................................................................15
Operating System Requirements........................................................................................................................15
SUSE Linux Enterprise Server (SLES).....................................................................................................................................15
Kernel Limits ........................................................................................................................................................................15
Kafka Clients..........................................................................................................17
Commands for Client Interactions......................................................................................................................17
Kafka Producers..................................................................................................................................................18
Kafka Consumers................................................................................................................................................19
Subscribing to a topic...........................................................................................................................................................19
Groups and Fetching............................................................................................................................................................20
Protocol between Consumer and Broker.............................................................................................................................20
Rebalancing Partitions.........................................................................................................................................................22
Consumer Configuration Properties.....................................................................................................................................23
Kafka Clients and ZooKeeper..............................................................................................................................23
Kafka Brokers.........................................................................................................25
Single Cluster Scenarios.....................................................................................................................................25
Leader Positions...................................................................................................................................................................25
In-Sync Replicas....................................................................................................................................................................26
Topic Configuration............................................................................................................................................26
Topic Creation......................................................................................................................................................................27
Topic Properties...................................................................................................................................................................27
Partition Management.......................................................................................................................................27
Partition Reassignment........................................................................................................................................................28
Adding Partitions.................................................................................................................................................................28
Choosing the Number of Partitions......................................................................................................................................28
Kafka Integration....................................................................................................30
Kafka Security.....................................................................................................................................................30
Client-Broker Security with TLS............................................................................................................................................30
Using Kafka’s Inter-Broker Security......................................................................................................................................33
Enabling Kerberos Authentication.......................................................................................................................................34
Enabling Encryption at Rest.................................................................................................................................................35
Topic Authorization with Kerberos and Sentry.....................................................................................................................36
Managing Multiple Kafka Versions.....................................................................................................................39
Kafka Feature Support in Cloudera Manager and CDH........................................................................................................39
Client/Broker Compatibility Across Kafka Versions..............................................................................................................40
Upgrading your Kafka Cluster..............................................................................................................................................40
Managing Topics across Multiple Kafka Clusters................................................................................................42
Consumer/Producer Compatibility.......................................................................................................................................42
Topic Differences between Clusters......................................................................................................................................43
Optimize Mirror Maker Producer Location..........................................................................................................................43
Destination Cluster Configuration........................................................................................................................................43
Kerberos and Mirror Maker.................................................................................................................................................43
Setting up Mirror Maker in Cloudera Manager...................................................................................................................43
Setting up an End-to-End Data Streaming Pipeline............................................................................................44
Data Streaming Pipeline......................................................................................................................................................44
Ingest Using Kafka with Apache Flume................................................................................................................................44
Using Kafka with Apache Spark Streaming for Stream Processing......................................................................................51
Developing Kafka Clients....................................................................................................................................52
Simple Client Examples........................................................................................................................................................52
Moving Kafka Clients to Production.....................................................................................................................................55
Kafka Metrics......................................................................................................................................................57
Metrics Categories...............................................................................................................................................................57
Viewing Metrics...................................................................................................................................................................57
Building Cloudera Manager Charts with Kafka Metrics.......................................................................................................58
Kafka Administration..............................................................................................59
Kafka Administration Basics...............................................................................................................................59
Broker Log Management.....................................................................................................................................................59
Record Management...........................................................................................................................................................59
Broker Garbage Log Collection and Log Rotation................................................................................................................60
Adding Users as Kafka Administrators.................................................................................................................................60
Migrating Brokers in a Cluster............................................................................................................................60
Using rsync to Copy Files from One Broker to Another........................................................................................................61
Setting User Limits for Kafka..............................................................................................................................61
Setting Quotas.....................................................................................................................................................................62
Kafka Administration Using Command Line Tools..............................................................................................62
Unsupported Command Line Tools.....................................................................................................................................62
Notes on Kafka CLI Administration.....................................................................................................................................63
Enabling DEBUG or TRACE in command line scripts............................................................................................................71
Understanding the kafka-run-class Bash Script.................................................................................................................71
JBOD Setup and Migration...................................................................................................................................................71
JBOD Operational Procedures..............................................................................................................................................74
Kafka Reference.....................................................................................................86
Metrics Reference..............................................................................................................................................86
Useful Shell Command Reference....................................................................................................................161
Hardware Information.......................................................................................................................................................161
Disk Space..........................................................................................................................................................................161
I/O Activity and Utilization.................................................................................................................................................161
File Descriptor Usage.........................................................................................................................................................162
Network Ports, States, and Connections............................................................................................................................162
Process Information...........................................................................................................................................................162
Kernel Configuration..........................................................................................................................................................162
Kafka Architecture
As is the case with all real-world systems, Kafka's architecture deviates from the ideal publish-subscribe system. Some
of the key differences are:
• Messaging is implemented on top of a replicated, distributed commit log.
• The client has more functionality and, therefore, more responsibility.
• Messaging is optimized for batches instead of individual messages.
• Messages are retained even after they are consumed; they can be consumed again.
The results of these design decisions are:
• Extreme horizontal scalability
• Very high throughput
• High availability
• but, different semantics and message delivery guarantees
The next few sections provide an overview of some of the more important parts, while later section describe design
specifics and operations in greater detail.
In the ideal system presented above, messages from one publisher would somehow find their way to each subscriber.
Kafka implements the concept of a topic. A topic allows easy matching between publishers and subscribers.
A topic is a queue of messages written by one or more producers and read by one or more consumers. A topic is
identified by its name. This name is part of a global namespace of that Kafka cluster.
Specific to Kafka:
• Publishers are called producers.
• Subscribers are called consumers.
As each producer or consumer connects to the publish-subscribe system, it can read from or write to a specific topic.
Kafka is a distributed system that implements the basic features of the ideal publish-subscribe system described above.
Each host in the Kafka cluster runs a server called a broker that stores messages sent to the topics and serves consumer
Kafka is designed to run on multiple hosts, with one broker per host. If a host goes offline, Kafka does its best to ensure
that the other hosts continue running. This solves part of the “No Downtime” and “Unlimited Scaling” goals from the
ideal publish-subscribe system.
Kafka brokers all talk to Zookeeper for distributed coordination, additional help for the "Unlimited Scaling" goal from
the ideal system.
Topics are replicated across brokers. Replication is an important part of “No Downtime,” “Unlimited Scaling,” and
“Message Retention” goals.
There is one broker that is responsible for coordinating the cluster. That broker is called the controller.
As mentioned earlier, an ideal topic behaves as a queue of messages. In reality, having a single queue has scaling issues.
Kafka implements partitions for adding robustness to topics.
In Kafka, a publish-subscribe message is called a record. A record consists of a key/value pair and metadata including
a timestamp. The key is not required, but can be used to identify messages from the same data source. Kafka stores
keys and values as arrays of bytes. It does not otherwise care about the format.
The metadata of each record can include headers. Headers may store application-specific metadata as key-value pairs.
In the context of the header, keys are strings and values are byte arrays.
For specific details of the record format, see the Record definition in the Apache Kafka documentation.
Instead of all records handled by the system being stored in a single log, Kafka divides records into partitions. Partitions
can be thought of as a subset of all the records for a topic. Partitions help with the ideal of “Unlimited Scaling”.
Records in the same partition are stored in order of arrival.
When a topic is created, it is configured with two properties:
partition count
The number of partitions that records for this topic will be spread among.
replication factor
The number of copies of a partition that are maintained to ensure consumers always have access to the queue of
records for a given topic.
Each topic has one leader partition. If the replication factor is greater than one, there will be additional follower
partitions. (For the replication factor = M, there will be M-1 follower partitions.)
Any Kafka client (a producer or consumer) communicates only with the leader partition for data. All other partitions
exist for redundancy and failover. Follower partitions are responsible for copying new records from their leader
partitions. Ideally, the follower partitions have an exact copy of the contents of the leader. Such partitions are called
in-sync replicas (ISR).
With N brokers and topic replication factor M, then
• If M < N, each broker will have a subset of all the partitions
• If M = N, each broker will have a complete copy of the partitions
In the following illustration, there are N = 2 brokers and M = 2 replication factor. Each producer may generate records
that are assigned across multiple partitions.
Figure 4: Records in a Topic are Stored in Partitions, Partitions are Replicated across Brokers
Partitions are the key to keeping good record throughput. Choosing the correct number of partitions and partition
replications for a topic
• Spreads leader partitions evenly on brokers throughout the cluster
• Makes partitions within the same topic are roughly the same size.
• Balances the load on brokers.
If the order of records is important, the producer can ensure that records are sent to the same partition. The producer
can include metadata in the record to override the default assignment in one of two ways:
• The record can indicate a specific partition.
• The record can includes an assignment key.
The hash of the key and the number of partitions in the topic determines which partition the record is assigned
to. Including the same key in multiple records ensures all the records are appended to the same partition.
Note: These references to “log” should not be confused with where the Kafka broker stores their
operational logs.
In actuality, each partition does not keep all the records sequentially in a single file. Instead, it breaks each log into log
segments. Log segments can be defined using a size limit (for example, 1 GB), as a time limit (for example, 1 day), or
both. Administration around Kafka records often occurs at the log segment level.
Each of the partitions is broken into segments, with Segment N containing the most recent records and Segment 1
containing the oldest retained records. This is configurable on a per-topic basis.
Kafka Setup
Hardware Requirements
Kafka can function on a fairly small amount of resources, especially with some configuration tuning. Out of the box
configurations can run on little as 1 core and 1 GB memory with storage scaled based on data retention requirements.
These are the defaults for both broker and Mirror Maker in Cloudera Manager version 6.x.
Kafka requires a fairly small amount of resources, especially with some configuration tuning. By default, Kafka, can run
on as little as 1 core and 1GB memory with storage scaled based on requirements for data retention.
CPU is rarely a bottleneck because Kafka is I/O heavy, but a moderately-sized CPU with enough threads is still important
to handle concurrent connections and background tasks.
Kafka brokers tend to have a similar hardware profile to HDFS data nodes. How you build them depends on what is
important for your Kafka use cases. Use the following guidelines:
Networking requirements: Gigabit Ethernet or 10 Gigabit Ethernet. Avoid clusters that span multiple data centers.
It is common to run ZooKeeper on 3 broker nodes that are dedicated for Kafka. However, for optimal performance
Cloudera recommends the usage of dedicated Zookeeper hosts. This is especially true for larger, production
* hard as unlimited
* soft as unlimited
Kernel Limits
There are three settings you must configure properly for the kernel.
• File Descriptors
You can set these in Cloudera Manager via Kafka > Configuration > Maximum Process File Descriptors. We
recommend a configuration of 100000 or higher.
• Max Memory Map
You must configure this in your specific kernel settings. We recommend a configuration of 32000 or higher.
• Max Socket Buffer Size
Set the buffer size larger than any Kafka send buffers that you define.
Kafka Clients
Kafka clients are created to read data from and write data to the Kafka system. Clients can be producers, which publish
content to Kafka topics. Clients can be subscribers, which read content from Kafka topics.
where the ZooKeeper connect string zkinfo is a comma-separated list of the Zookeeper nodes in host: port
Validate the topic was created successfully
Produce messages
The following command can be used to publish a message to the Kafka cluster. After the command, each typed line
is a message that is sent to Kafka. After the last message, send an EOF or stop the command with Ctrl-D.
where kafkainfo is a comma-separated list of the Kafka brokers in host:port format. Using more than one
makes sure that the command can find a running broker.
Consume messages
The following command can be used to subscribe to a message from the Kafka cluster.
The output shows the same messages that you entered during your producer.
Set a ZooKeeper root node
It’s possible to use a root node (chroot) for all Kafka nodes in ZooKeeper by setting a value for zookeeper.chroot
in Cloudera Manager. Append this value to the end of your ZooKeeper connect string.
Set chroot in Cloudera Manager:
--zookeeper zkinfo/kafka
If you set chroot and then use only the host and port in the connect string, you'll see the following exception:
Kafka Producers
Kafka producers are the publishers responsible for writing records to topics. Typically, this means writing a program
using the KafkaProducer API. To instantiate a producer:
Most of the important producer settings, and mentioned below, are in the configuration passed by this constructor.
The full write path for records from a producer is to the leader partition and then to all of the follower replicas. The
producer can control which point in the path triggers an acknowledgment. Depending on the acks setting, the producer
may wait for the write to propagate all the way through the system or only wait for the earliest success point.
Valid acks values are:
• 0: Do not wait for any acknowledgment from the partition (fastest throughput).
• 1: Wait only for the leader partition response.
• all: Wait for follower partitions responses to meet minimum (slowest throughput).
In Kafka, the partitioner determines how records map to partitions. Use the mapping to ensure the order of records
within a partition and manage the balance of messages across partitions. The default partitioner uses the entire key
to determine which partition a message corresponds to. Records with the same key are always mapped to the same
partition (assuming the number of partitions does not change for a topic). Consider writing a custom partitioner if you
have information about how your records are distributed that can produce more efficient load balancing across
partitions. A custom partitioner lets you take advantage of the other data in the record to control partitioning.
Kafka Consumers
Kafka consumers are the subscribers responsible for reading records from one or more topics and one or more partitions
of a topic. Consumers subscribing to a topic can happen manually or automatically; typically, this means writing a
program using the KafkaConsumer API.
To instantiate a consumer:
The KafkaConsumer class has two generic type parameters. Just as producers can send data (the values) with keys,
the consumer can read data by keys. In this example both the keys and values are strings. If you define different types,
you need to define a deserializer to accommodate the alternate types. For deserializers you need to implement the
org.apache.kafka.common.serialization.Deserializer interface.
Subscribing to a topic
Subscribing to a topic using the subscribe() method call:
kafkaConsumer.subscribe(Collections.singletonList(topic), rebalanceListener);
Here we specify a list of topics that we want to consume from and a 'rebalance listener.' Rebalancing is an important
part of the consumer's life. Whenever the cluster or the consumers’ state changes, a rebalance will be issued. This will
ensure that all the partitions are assigned to a consumer.
After subscribing to a topic, the consumer polls to see if there are new records:
while (true) {
data = kafkaConsumer.poll();
// do something with 'data'
The poll returns multiple records that can be processed by the client. After processing the records the client commits
offsets synchronously, thus waiting until processing completes before continuing to poll.
The last important point is to save the progress. This can be done by the commitSync() and commitAsync() methods
Auto commit is not recommended; manual commit is appropriate in the majority of use cases.
Startup Protocol
As mentioned before, the consumers are working usually in groups. So a major part of the startup process is spent
with figuring out the consumer group.
At startup, the first step is to match protocol versions. It is possible that the broker and the consumer are of different
versions (the broker is older and the consumer is newer, or vice versa). This matching is done by the API_VERSIONS
The next step is to collect cluster information, such as the addresses of all the brokers (prior to this point we used the
bootstrap server as a reference), partition counts, and partition leaders. This is done in the METADATA request.
After acquiring the metadata, the consumer has the information needed to join the group. By this time on the broker
side, a coordinator has been selected per consumer group. The consumers must find their coordinator with the
After finding the coordinator, the consumer(s) are ready to join the group. Every consumer in the group sends their
own member-specific metadata to the coordinator in the JOIN_GROUP request. The coordinator waits until all the
consumers have sent their request, then assigns a leader for the group. At the response plus the collected metadata
are sent to the leader, so it knows about its group.
The remaining step is to assign partitions to consumers and propagate this state. Similar to the previous request, all
consumers send a SYNC_GROUP request to the coordinator; the leader provides the assignments in this request. After
it receives the sync request from each group member, the coordinator propagates this member state in the response.
By the end of this step, the consumers are ready and can start consuming.
Consumption Protocol
When consuming, the first step is to query where should the consumer start. This is done in the OFFSET_FETCH request.
This is not mandatory: the consumer can also provide the offset manually. After this, the consumer is free to pull data
from the broker. Data consumption happens in the FETCH requests. These are the long-pull requests. They are answered
only when the broker has enough data; the request can be outstanding for a longer period of time.
From time to time, the application has to either manually or automatically save the offsets in an OFFSET_COMMIT
request and send heartbeats too in the HEARTBEAT requests. The first ensures that the position is saved while the
latter ensures that the coordinator knows that the consumer is alive.
Shutdown Protocol
The last step when the consumption is done is to shut down the consumer gracefully. This is done in one single step,
called the LEAVE_GROUP protocol.
Rebalancing Partitions
You may notice that there are multiple points in the protocol between consumers and brokers where failures can
occur. There are points in the normal operation of the system where you need to change the consumer group
assignments. For example, to consume a new partition or to respond to a consumer going offline. The process or
responding to cluster information changing is called rebalance. It can occur in the following cases:
• A consumer leaves. It can be a software failure where the session times out or a connection stalls for too long,
but it can also be a graceful shutdown.
• A consumer joins. It can be a new consumer but an old one that just recovered from a software failure (automatically
or manually).
• Partition is adjusted. A partition can simply go offline because of a broker failure or a partition coming back online.
Alternatively an administrator can add or remove partitions to/from the broker. In these cases the consumers
must reassign who is consuming.
• The cluster is adjusted. When a broker goes offline, the partitions that are lead by this broker will be reassigned.
In turn the consumers must rebalance so that they consume from the new leader. When a broker comes back,
then eventually a preferred leader election happens which restores the original leadership. The consumers must
follow this change as well.
On the consumer side, this rebalance is propagated to the client via the ConsumerRebalanceListener interface. It
has two methods. The first, onPartitionsRevoked, will be invoked when any partition goes offline. This call happens
before the changes would reflect in any of the consumers, so this is the chance to save offsets if manual offset commit
is used. On the other hand onPartitionsAssigned is invoked after partition reassignment. This would allow for the
programmer to detect which partitions are currently assigned to the current consumer. Complete examples can be
found in the development section.
In Kafka retries typically happen on only for certain kinds of errors. When a retriable error is returned, the clients are
constrained by two facts: the timeout period and the backoff period.
The timeout period tells how long the consumer can retry the operation. The backoff period how often the consumer
should retry. There is no generic approach for "number of retries." Number of retries are usually controlled by timeout
In releases before version 2.0 of CDK Powered by Apache Kafka, the same metadata was located in ZooKeeper. The
new model removes the dependency and load from Zookeeper. In the old approach:
• The consumers save their offsets in a "consumer metadata" section of ZooKeeper.
• With most Kafka setups, there are often a large number of Kafka consumers. The resulting client load on ZooKeeper
can be significant, therefore this solution is discouraged.
Kafka Brokers
This section covers some of how a broker operates in greater detail. As we go over some of these details, we will
illustrate how these pieces can cause brokers to have issues.
Leader Positions
In the baseline example, each broker shown has three partitions per topic. In the figure above, the Kafka cluster has
well balanced leader partitions. Recall the following:
• Producer writes and consumer reads occur at the partition level
• Leader partitions are responsible for ensuring that the follower partitions keep their records in sync
In the baseline example, since the leader partitions were evenly distributed, most of the time the load to the overall
Kafka cluster will be relatively balanced.
In the example below, since a large chunk of the leaders for Topic A and Topic B are on Broker 1, a lot more of the
overall Kafka workload will occur at Broker 1. This will cause a backlog of work, which slows down the cluster throughput,
which will worsen the backlog.
Even if a cluster starts with perfectly balanced topics, failures of brokers can cause these imbalances: if leader of a
partition goes down one of the replicas will become the leader. When the original (preferred) leader comes back, it
will get back leadership only if automatic leader rebalancing is enabled; otherwise the node will become a replica and
the cluster gets imbalanced.
In-Sync Replicas
Let’s look at Topic A from the previous example with follower partitions:
• Broker 1 has six leader partitions, broker 2 has two leader partitions, and broker 3 has one leader partition.
• Assuming a replication factor of 3.
Assuming all replicas are in-sync, then any leader partition can be moved from Broker 1 to another broker without
issue. However, in the case where some of the follower partitions have not caught up, then the ability to change leaders
or have a leader election will be hampered.
Topic Configuration
We already introduced the concept of topics. When managing a Kafka cluster, configuring a topic can require some
planning. For small clusters or low record throughput, topic planning isn’t particularly tricky, but as you scale to the
large clusters and high record throughput, such planning becomes critical.
Topic Creation
To be able to use a topic, it has to be created. This can happen automatically or manually. When enabled, the Kafka
cluster creates topics on demand.
Topic Properties
There are numerous properties that influence how topics are handled by the cluster. These can be set with
kafka-topics tool on topic creation or later on with kafka-configs. The most commonly used properties are:
• min.insync.replicas: specifies how many brokers have to replicate the records before the leader sends back
an acknowledgment to the producer (if producer property acks is set to all). With a replication factor of 3, a
minimum in-sync replicas of 2 guarantees a higher level of durability. It is not recommended that you set this
value equal to the replication factor as it makes producing to the topic impossible if one of the brokers is temporarily
• retention.bytes and determines when a record is considered outdated. When data stored
in one partition exceeds given limits, broker starts a cleanup to save disk space.
• segment.bytes and determines how much data is stored in the same log segment (that is, in the
same file). If any of these limits is reached, a new log segment is created.
• unclean.leader.election.enable: if true, replicas that are not in-sync may be elected as new leaders. This
only happens when there are no live replicas in-sync. As enabling this feature may result in data loss, it should be
switched on only if availability is more important than durability.
• cleanup.policy: either delete or compact. If delete is set, old log segments will be deleted. Otherwise, only
the latest record is retained. This process is called log compaction. This is covered in greater detail in the Record
Management on page 59 section.
If you do not specify these properties, the prevailing broker level configuration will take effect. A complete list of
properties can be found in the Topic-Level Configs section of the Apache Kafka documentation.
Partition Management
Partitions are at the heart of how Kafka scales performance. Some of the administrative issues around partitions can
be some of the biggest challenges in sustaining high performance.
When creating a topic, you specify which brokers should have a copy of which partition or you specify replication factor
and number of partitions and the controller generates a replica assignment for you. If there are multiple brokers that
are assigned a partition, the first one in the list is always the preferred leader.
Whenever the leader of a partition goes down, Kafka moves leadership to another broker. Whether this is possible
depends on the current set of in-sync replicas and the value of unclean.leader.election.enable. However, no
new Kafka broker will start to replicate the partition to reach replication factor again. This is to avoid unnecessary load
on brokers when one of them is temporarily down. Kafka will regularly try to balance leadership between brokers by
electing the preferred leader. But this balance is based on number of leaderships and not throughput.
Partition Reassignment
In some cases require manual reassignment of partitions:
• If the initial distribution of partitions and leaderships creates an uneven load on brokers.
• If you want to add or remove brokers from the cluster.
Use kafka-reassign-partitions tool to move partitions between brokers. The typical workflow consist of the
• Generate a reassignment file by specifying topics to move and which brokers to move to (by setting
--topic-to-move-json-file and --broker-list to --generate command).
• Optionally edit the reassignment file and verify it with the tool.
• Actually re-assigning partitions (with option --execute).
• Verify if the process has finished as intended (with option --verify).
Note: When specifying throttles for inter broker communication, make sure you use the command
with --verify option to remove limitations on replication speed.
Adding Partitions
You can use kafka-topics tool to increase the number of partitions in a given topic. However, note that adding
partitions will in most cases break the guarantee preserving the order of records with the same key, because it
changes which partition a record key is produced to. Although order of records is preserved for both the old partition
the key was produced to and the new one, it still might happen that records from the new partition are consumed
before records from the old one.
The controller is one of the brokers that has additional partition and replica management responsibilities. It will control
/ be involved whenever partition metadata or state is changed, such as when:
• Topics or partitions are created or deleted.
• Brokers join or leave the cluster and partition leader or replica reassignment is needed.
It also tracks the list of in sync replicas (ISRs) and maintains broker, partition, and ISR data in Zookeeper.
Controller Election
Any of the brokers can play the role of the controller, but in a healthy cluster there is exactly one controller. Normally
this is the broker that started first, but there are certain situations when a re-election is needed:
• If the controller is shut down or crashes.
• If it loses connection to Zookeeper.
When a broker starts or participates in controller reelection, it will attempt to create an ephemeral node
(“/controller”) in ZooKeeper. If it succeeds, the broker becomes the controller. If it fails, there is already a controller,
but the broker will watch the node.
If the controller loses connection to ZooKeeper or stops ZooKeeper will remove the ephemeral node and the brokers
will get a notification to start a controller election.
Every controller election will increase the “controller epoch”. The controller epoch is used to detect situations when
there are multiple active controllers: if a broker gets a message with a lower epoch than the result of the last election,
it can be safely ignored. It is also used to detect a “split brain” situation when multiple nodes believe that they are in
the controller role.
Having 0 or 2+ controllers means the cluster is in a critical state, as broker and partition state changes are blocked.
Therefore it’s important to ensure that the controller has a stable connection to ZooKeeper to avoid controller elections
as much as possible.
Kafka Integration
Kafka Security
Client-Broker Security with TLS
Kafka allows clients to connect over TLS. By default, TLS is disabled, but can be turned on as needed.
openssl req -new -x509 -keyout ca-key -out ca-cert -days 365
The generated CA is a public-private key pair and certificate used to sign other certificates.
Add the generated CA to the client truststores so that clients can trust this CA:
Note: If you configure Kafka brokers to require client authentication by setting ssl.client.auth
to be requested or required on the Kafka brokers config, you must provide a truststore for the Kafka
brokers as well. The truststore must have all the CA certificates by which the clients keys are signed.
The keystore created in step 1 stores each machine’s own identity. In contrast, the truststore of a
client stores all the certificates that the client should trust. Importing a certificate into a truststore
means trusting all certificates that are signed by that certificate. This attribute is called the chain of
trust. It is particularly useful when deploying SSL on a large Kafka cluster. You can sign all certificates
in the cluster with a single CA, and have all machines share the same truststore that trusts the CA.
That way, all machines can authenticate all other machines.
• keystore: the location of the keystore
openssl x509 -req -CA ca-cert -CAkey ca-key -in cert-file -out cert-signed -days validity
-CAcreateserial -passin pass:ca-password
• ca-cert: the certificate of the CA
• ca-key: the private key of the CA
• cert-signed: the signed certificate of the server
• ca-password: the passphrase of the CA
3. Import both the certificate of the CA and the signed certificate into the keystore:
The following Bash script demonstrates the steps described above. One of the commands assumes a password of
SamplePassword123, so either use that password or edit the command before running it.
#Step 1
keytool -keystore server.keystore.jks -alias localhost -validity 365 -genkey
#Step 2
openssl req -new -x509 -keyout ca-key -out ca-cert -days 365
keytool -keystore server.truststore.jks -alias CARoot -import -file ca-cert
keytool -keystore client.truststore.jks -alias CARoot -import -file ca-cert
#Step 3
keytool -keystore server.keystore.jks -alias localhost -certreq -file cert-file
openssl x509 -req -CA ca-cert -CAkey ca-key -in cert-file -out cert-signed -days 365
-CAcreateserial -passin pass:SamplePassword123
keytool -keystore server.keystore.jks -alias CARoot -import -file ca-cert
keytool -keystore server.keystore.jks -alias localhost -import -file cert-signed
where kafka-broker-host-name is the FQDN of the broker that you selected from the Instances page in Cloudera
Manager. In the above sample configurations we used PLAINTEXT and SSL protocols for the SSL enabled brokers.
For information about other supported security protocols, see Using Kafka’s Inter-Broker Security on page 33.
4. Repeat the previous step for each broker.
The advertised.listeners configuration is needed to connect the brokers from external clients.
5. Deploy the above client configurations and rolling restart the Kafka service from Cloudera Manager.
Kafka CSD auto-generates listeners for Kafka brokers, depending on your SSL and Kerberos configuration. To enable
SSL for Kafka installations, do the following:
1. Turn on SSL for the Kafka service by turning on the ssl_enabled configuration for the Kafka CSD.
2. Set as SSL, if Kerberos is disabled; otherwise, set it as SASL_SSL.
The following SSL configurations are required on each broker. Each of these values can be set in Cloudera Manager.
Be sure to replace this example with the truststore password.
For instructions, see Changing the Configuration of a Service or Role Instance.
Note: Due to import regulations in some countries, Oracle implementation of JCA limits the strength
of cryptographic algorithms. If you need stronger algorithms, you must obtain the JCE Unlimited
Strength Jurisdiction Policy Files and install them in the JDK/JRE as described in JCA Providers
After SSL is configured your broker, logs should show an endpoint for SSL communication:
You can also check the SSL communication to the broker by running the following command:
This check can indicate that the server keystore and truststore are set up properly.
{variable sized random bytes}
subject=/C=US/ST=CA/L=Palo Alto/O=org/OU=org/CN=Franz Kafka
issuer=/C=US/ST=CA/L=Palo Alto
If the certificate does not appear, or if there are any other error messages, your keystore is not set up properly.
If client authentication is required, a keystore must be created as in step 1, it needs to be signed by the CA as in step
3, and you must also configure the following properties:
Other configuration settings might also be needed, depending on your requirements and the broker configuration:
• ssl.provider (Optional). The name of the security provider used for SSL connections. Default is the default
security provider of the JVM.
• ssl.cipher.suites (Optional). A cipher suite is a named combination of authentication, encryption, MAC, and
a key exchange algorithm used to negotiate the security settings for a network connection using TLS or SSL network
• ssl.enabled.protocols=TLSv1.2,TLSv1.1,TLSv1. This property should list at least one of the protocols
configured on the broker side.
• ssl.truststore.type=JKS
• ssl.keystore.type=JKS
• Enabling SSL encryption for client-broker communication but keeping broker-broker communication as PLAINTEXT.
Because SSL has performance overhead, you might want to keep inter-broker communication as PLAINTEXT if
your Kafka brokers are behind a firewall and not susceptible to network snooping.
• Migrating from a non-secure Kafka configuration to a secure Kafka configuration without requiring downtime.
Use a rolling restart and keep set to a protocol that is supported by all
brokers until all brokers are updated to support the new protocol.
For example, if you have a Kafka cluster that needs to be configured to enable Kerberos without downtime, follow
these steps:
1. Set to PLAINTEXT.
2. Update the Kafka service configuration to enable Kerberos.
3. Perform a rolling restart.
Kafka 2.0 and higher supports the combinations of protocols listed here.
SSL Kerberos
SSL Yes No
These protocols can be defined for broker-to-client interaction and for broker-to-broker interaction. The property allows the broker-to-broker communication protocol to be different than the
broker-to-client protocol, allowing rolling upgrades from non-secure to secure clusters. In most cases, set to the protocol you are using for broker-to-client communication. Set to a protocol different than the broker-to-client protocol only when you are
performing a rolling upgrade from a non-secure to a secure Kafka cluster.
KafkaClient { required
If you use a keytab, use this configuration. To generate keytabs, see Step 6: Get or Create a Kerberos Principal for
Each User Account).
KafkaClient { required
kinit user
11. Verify that the jaas.conf file is used by setting the environment.
export KAFKA_OPTS=""
Note: Cloudera's distribution of Kafka can make use of LDAP-based user groups when the LDAP
directory is synchronized to Linux via tools such as SSSD. CDK does not support direct integration with
LDAP, either through direct Kafka's LDAP authentication, or via Hadoop's group mapping (when is set to LdapGroupMapping). For more information, see Configuring LDAP
Group Mappings.
Authorizable Resources
Authorizable resources are resources or entities in a Kafka cluster that require special permissions for a user to be able
to perform actions on them. Kafka has four authorizable resources.
• Cluster: controls who can perform cluster-level operations such as creating or deleting a topic. This resource can
only have one value, kafka-cluster, as one Kafka cluster cannot have more than one cluster resource.
• Topic: controls who can perform topic-level operations such as producing and consuming topics. Its value must
match exactly the topic name in the Kafka cluster.
With CDH 5.15.0 and CDK 3.1 and later, wildcards (*) can be used to refer to any topic in the privilege.
• Consumergroup: controls who can perform consumergroup-level operations such as joining or describing a
consumergroup. Its value must exactly match the of a consumergroup.
With CDH 5.14.1 and later, you can use a wildcard (*) to refer to any consumer groups in the privilege. This resource
is useful when used with Spark Streaming, where a generated may be needed.
• Host: controls from where specific operations can be performed. Think of this as a way to achieve IP filtering in
Kafka. You can set the value of this resource to the wildcard (*), which represents all hosts.
Note: Only IP addresses should be specified in the host component of Kafka Sentry privileges,
hostnames are not supported.
Authorized Actions
You can perform multiple actions on each resource. The following operations are supported by Kafka, though not all
actions are valid on all resources.
• ALL is a wildcard action, and represents all possible actions on a resource.
• read
• write
• create
• delete
• alter
• describe
• clusteraction
Authorizing Privileges
Privileges define what actions are allowed on a resource. A privilege is represented as a string in Sentry. The following
rules apply to a valid privilege.
• Can have at most one Host resource. If you do not specify a Host resource in your privilege string, Host=* is
• Must have exactly one non-Host resource.
• Must have exactly one action specified at the end of the privilege string.
For example, the following are valid privilege strings:
kafka-sentry -lr
If Sentry privileges caching is enabled, as recommended, the new privileges you assign take some time to appear in
the system. The time is the time-to-live interval of the Sentry privileges cache, which is set using By default, this interval is 30 seconds. For test clusters, it is beneficial to have
changes appear within the system as fast as possible, therefore, Cloudera recommends that you either use a lower
time interval, or disable caching with sentry.kafka.caching.enable.
1. Allow users in testGroup to write to testTopic from localhost, which allows users to produce to testTopic.
Users need both write and describe permissions.
4. Create testTopic.
Now you can produce to and consume from the Kafka cluster.
1. Produce to testTopic.
Note that you have to pass a configuration file,, with information on JAAS configuration
and other Kerberos authentication related information. See SASL Configuration for Kafka Clients.
3. Allow users in testGroup to describe testTopic from localhost, which the user creates and uses.
5. Allow users in testGroup to read from a consumer group, testconsumergroup, that it will start and join.
6. Allow users in testGroup to read from testTopic from localhost and to consume from testTopic.
This is a message
This is another message
Setting just Sentry to DEBUG mode avoids the debug output from undesired dependencies, such as Jetty.
• Run the Kafka client or Kafka CLI with the required arguments and capture the Kafka log, which should be similar
This log information can provide insight into which privilege is not assigned to a user, causing a particular operation
to fail.
Important: You cannot install an old Kafka parcel on a new CDH 6.x cluster. = current_Kafka_version
log.message.format.version = current_Kafka_version
Make sure you enter full Kafka version numbers with three values, such as 0.10.0. Otherwise, you'll see an
error similar to the following:
At this point, the brokers are running in compatibility mode with older clients. It can run in this mode indefinitely. If
you do upgrade clients, after all clients are upgraded, remove the Safety Valve properties and restart the cluster.
Upstream Kafka
CDH Kafka Version Version Detailed Upstream Instructions
CDH 6.1.0 2.0.0 Upgrading from 0.8.x through 1.1.x to 2.0.0
CDH 6.0.0 1.0.1 Upgrading from 0.8.x through 0.11.0.x to 1.0.0
Kafka 3.1.0 1.0.1 Upgrading from 0.8.x through 0.11.0.x to 1.0.0
Kafka 3.0.0 0.11.0 Upgrading from 0.8.x through 0.10.2.x to
Upstream Kafka
CDH Kafka Version Version Detailed Upstream Instructions
Kafka 2.2 0.10.2 Upgrading from 0.8.x through 0.10.1.x to
Kafka 2.1 0.10.0 Upgrading from 0.8.x through
Kafka 2.0 0.9.0 Upgrading from 0.8.0 through 0.8.2.x to
While the diagram shows copying to one topic, Mirror Maker’s main mode of operation is running continuously, copying
one or more topics from the source cluster to the destination cluster.
Keep in mind the following design notes when configuring Mirror Maker:
• Mirror Maker runs as a single process.
• Mirror Maker can run with multiple consumers that read from multiple partitions in the source cluster.
• Mirror Maker uses a single producer to copy messages to the matching topic in the destination cluster.
Consumer/Producer Compatibility
The Mirror Maker consumer needs to be client compatible with the source cluster. The Mirror Maker producer needs
to be client compatible with the destination cluster.
See Client/Broker Compatibility Across Kafka Versions on page 40 for more details about what it means to be
Consumer setting
• auto.commit.enable=false
MirrorMaker setting
• abort.on.send.failure=true
Note: Do not configure a Kafka source to send data to a Kafka sink. If you do, the Kafka source sets
the topic in the event header, overriding the sink configuration and creating an infinite loop, sending
messages back and forth between the source and sink. If you need to use both a Kafka source and a
sink, use an interceptor to modify the event header and set a different topic.
For information on configuring Kafka to securely communicate with Flume, see Configuring Flume Security with Kafka.
The following sections describe how to configure Kafka sub-components for directing topics to long-term storage:
Use the Kafka source to stream data in Kafka topics to Hadoop. The Kafka source can be combined with any Flume
sink, making it easy to write Kafka data to HDFS, HBase, and Solr.
The following Flume configuration example uses a Kafka source to send data to an HDFS sink:
tier1.sources = source1
tier1.channels = channel1
tier1.sinks = sink1
tier1.sources.source1.type = org.apache.flume.source.kafka.KafkaSource
tier1.sources.source1.zookeeperConnect =
tier1.sources.source1.topic = weblogs
tier1.sources.source1.groupId = flume
tier1.sources.source1.channels = channel1
tier1.sources.source1.interceptors = i1
tier1.sources.source1.interceptors.i1.type = timestamp = 100
tier1.channels.channel1.type = memory
tier1.channels.channel1.capacity = 10000
tier1.channels.channel1.transactionCapacity = 1000
tier1.sinks.sink1.type = hdfs
tier1.sinks.sink1.hdfs.path = /tmp/kafka/%{topic}/%y-%m-%d
tier1.sinks.sink1.hdfs.rollInterval = 5
tier1.sinks.sink1.hdfs.rollSize = 0
tier1.sinks.sink1.hdfs.rollCount = 0
tier1.sinks.sink1.hdfs.fileType = DataStream = channel1
For higher throughput, configure multiple Kafka sources to read from the same topic. If you configure all the sources
with the same groupID, and the topic contains multiple partitions, each source reads data from a different set of
partitions, improving the ingest rate.
The following table describes parameters that the Kafka source supports. Required properties are listed in bold.
1. auto.commit.enable is set to false by the source, committing every batch. For improved performance, set
this parameter to true using the setting. Note that this change can lead to data
loss if the source goes down before committing.
2. is set to 10, so when Flume polls Kafka for new data, it waits no more than 10 ms for
the data to be available. Setting this parameter to a higher value can reduce CPU utilization due to less frequent
polling, but the trade-off is that it introduces latency in writing batches to the channel.
Kafka Sinks
Use the Kafka sink to send data to Kafka from a Flume source. You can use the Kafka sink in addition to Flume sinks,
such as HBase or HDFS.
The following Flume configuration example uses a Kafka sink with an exec source:
tier1.sources = source1
tier1.channels = channel1
tier1.sinks = sink1
tier1.sources.source1.type = exec
tier1.sources.source1.command = /usr/bin/vmstat 1
tier1.sources.source1.channels = channel1
tier1.channels.channel1.type = memory
tier1.channels.channel1.capacity = 10000
tier1.channels.channel1.transactionCapacity = 1000
tier1.sinks.sink1.type = org.apache.flume.sink.kafka.KafkaSink
tier1.sinks.sink1.topic = sink1
tier1.sinks.sink1.brokerList =, = channel1
tier1.sinks.sink1.batchSize = 20
The following table describes parameters the Kafka sink supports. Required properties are listed in bold.
The Kafka sink uses the topic and key properties from the FlumeEvent headers to determine where to send events
in Kafka. If the header contains the topic property, that event is sent to the designated topic, overriding the configured
topic. If the header contains the key property, that key is used to partition events within the topic. Events with the
same key are sent to the same partition. If the key parameter is not specified, events are distributed randomly to
partitions. Use these properties to control the topics and partitions to which events are sent through the Flume source
or interceptor.
Kafka Channels
CDH includes a Kafka channel to Flume in addition to the existing memory and file channels. You can use the Kafka
• To write to Hadoop directly from Kafka without using a source.
• To write to Kafka directly from Flume sources without additional buffering.
• As a reliable and highly available channel for any source/sink combination.
The following Flume configuration uses a Kafka channel with an exec source and HDFS sink:
tier1.sources = source1
tier1.channels = channel1
tier1.sinks = sink1
tier1.sources.source1.type = exec
tier1.sources.source1.command = /usr/bin/vmstat 1
tier1.sources.source1.channels = channel1
tier1.channels.channel1.type =
tier1.channels.channel1.capacity = 10000
tier1.channels.channel1.zookeeperConnect =
tier1.channels.channel1.parseAsFlumeEvent = false
tier1.channels.channel1.topic = channel2 = channel2-grp = earliest
tier1.channels.channel1.kafka.bootstrap.servers =,
tier1.channels.channel1.transactionCapacity = 1000
tier1.sinks.sink1.type = hdfs
tier1.sinks.sink1.hdfs.path = /tmp/kafka/channel
tier1.sinks.sink1.hdfs.rollInterval = 5
tier1.sinks.sink1.hdfs.rollSize = 0
tier1.sinks.sink1.hdfs.rollCount = 0
tier1.sinks.sink1.hdfs.fileType = DataStream = channel1
The following table describes parameters the Kafka channel supports. Required properties are listed in bold.
In Cloudera Manager, on the Flume Configuration page, select the Kafka service you want to connect to. This generates
the following files:
• flume.keytab
• jaas.conf
It also generates security protocol and Kerberos service name properties for the Flume agent configuration. If TLS/SSL
is also configured for Kafka brokers, the setting also adds SSL truststore properties to the beginning of the Flume agent
Review the deployed agent configuration and if the defaults do not match your environment (such as the truststore
password), you can override the settings by adding the same property to the agent configuration.
Minimum CDH Kafka
Embedded Kafka Client Apache Version
CDH Spark (Choose based on Kafka Kafka (Remote Upstream Integration
Version cluster) Version or Local) Guide API Stability
CDH 6.0 spark-streaming-kafka-0-10 0.10.0 Kafka 2.1 Spark 2.2 + Kafka 0.10 Stable
CDH 6.0 spark-streaming-kafka-0-8 Kafka 2.0 Spark 2.2 + Kafka 0.8 Deprecated
Spark 2.2 spark-streaming-kafka-0-10 0.10.0 Kafka 2.1 Spark 2.2 + Kafka 0.10 Experimental
Spark 2.2 spark-streaming-kafka-0-8 Kafka 2.0 Spark 2.2 + Kafka 0.8 Stable
Minimum CDH Kafka
Embedded Kafka Client Apache Version
CDH Spark (Choose based on Kafka Kafka (Remote Upstream Integration
Version cluster) Version or Local) Guide API Stability
Spark 2.1 spark-streaming-kafka-0-10 0.10.0 Kafka 2.1 Spark 2.1 + Kafka 0.10 Experimental
Spark 2.1 spark-streaming-kafka-0-8 Kafka 2.0 Spark 2.1 + Kafka 0.8 Stable
Spark 2.0 spark-streaming-kafka-0-8 0.8.2.? Kafka 2.0 Spark 2.0 + Kafka Stable,,
Note: If multiple applications use the same group and topic, each application receives a subset of
the data.
<project xmlns=""
The example includes Java properties for setting up the client identified in the comments; the functional parts of the
code are in bold. This code is compatible with versions as old as the 0.9.0-kafka-2.0.0 version of Kafka.
package com.cloudera.kafkaexamples;
import java.util.Date;
import java.util.Properties;
import org.apache.kafka.clients.producer.KafkaProducer;
import org.apache.kafka.clients.producer.ProducerConfig;
import org.apache.kafka.clients.producer.ProducerRecord;
import org.apache.kafka.common.serialization.StringSerializer;
props.setProperty(ProducerConfig.ACKS_CONFIG, "1");
Note that this consumer is designed as an infinite loop. In normal operation of Kafka, all the producers could be idle
while consumers are likely to be still running.
The example includes Java properties for setting up the client identified in the comments; the functional parts of the
code are in bold. This code is compatible with versions as old as the 0.9.0-kafka-2.0.0 version of Kafka.
package com.cloudera.kafkaexamples;
import java.util.Arrays;
import java.util.Properties;
import org.apache.kafka.clients.consumer.ConsumerConfig;
import org.apache.kafka.clients.consumer.ConsumerRecord;
import org.apache.kafka.clients.consumer.ConsumerRecords;
import org.apache.kafka.clients.consumer.KafkaConsumer;
import org.apache.kafka.common.serialization.StringDeserializer;
while (true) {
ConsumerRecords<String, String> records = consumer.poll(100);
Not Recommended
while (true) {
KafkaConsumer<String, String> consumer = new KafkaConsumer<>(props);
ConsumerRecords<String, String> records = consumer.poll(100);
Similarly, it is recommended that you use one KafkaConsumer and/or KafkaProducer object per thread. Creating
more objects opens multiple ports per broker connection. Overusing ephemeral ports can cause performance issues.
In addition, Cloudera recommends to set and use a fixed for producers and consumers when they are
connecting to the brokers. If this is not done, Kafka will assign a new client id every time a new connection is established,
which can severely increase resource utilization (memory) on the broker side.
while (true) {
KafkaConsumer<String, String> consumer = new KafkaConsumer<>(props);
ConsumerRecords<String, String> records = consumer.poll(100);
// the call below should return quickly in all cases
while (true) {
try {
ConsumerRecords<String, String> records = consumer.poll(100);
} catch (Exception e) {
} finally {
kafka.javaapi.* kafka.api.*
kafka.producer.KeyedMessage kafka.clients.producer.ProducerRecord
• KafkaProducer Javadoc
These Javadoc pages are quite dense with information. They assume you have sufficient background in reliable
computing, networking, multithreading, and distributed systems to use the APIs correctly. While the previous sections
point out many caveats in using the client APIs, the Javadoc (and ultimately the source code) provides a more detailed
Kafka Metrics
Kafka uses Yammer metrics to record internal performance measurements. The metrics are exposed via Java
Management Extensions (JMX) and can be read with a JMX console.
Metrics Categories
There are metrics available in the various components of Kafka. In addition, there are some metrics specific to how
Cloudera Manager and Kafka interact. This table has pointers to both the Apache Kafka metrics names and the Cloudera
Manager metric names.
Common Client
Producer/Consumer Client-to-Broker
Producer Producer
Producer Sender
Mirror Maker Mirror Maker Metrics on page 159 Same as Producer or Consumer tables
Viewing Metrics
Cloudera Manager records most of these metrics and makes them available via Chart Builder.
Because Cloudera Manager cannot track metrics on any clients (that is, producer or consumer), you may wish to use
an alternative JMX console program to check metrics. There are several JMX console options:
• The JDK comes with the jconsole utility.
• VisualVM has a MBeans plugin.
Partition activity
Chart tracking partition activity on a single broker.
kafka_partitions, kafka_under_replicated_partitions
producer or consumer metric
Kafka Administration
This section describes managing a Kafka cluster in production, including:
Record Management
There are two pieces to record management, log segments and log cleaner.
As part of the general data storage, Kafka rolls logs periodically based on size or time limits. Once either limit is hit, a
new log segment is created with the all new data being placed there, while older log segments should generally no
longer change. This helps limit the risk of data loss or corruption to a single segment instead of the entire log.
• log.roll.{ms|hours}: The time period for each log segment. Once the current segment is older than this
value, it goes through log segment rotation.
• log.segment.bytes: The maximum size for a single log segment.
There is an alternative to simply removing log segments for a partition. There is another feature based on the log
cleaner. When the log cleaner is enabled, individual records in older log segments can be managed differently:
• log.cleaner.enable: This is a global setting in Kafka to enable the log cleaner.
• cleanup.policy: This is a per-topic property that is usually set at topic creation time. There are two valid values
for this property, delete and compact.
• This is the retention period for the “head” of the log. Only records
outside of this retention period will be compacted by the log cleaner.
The compact policy, also called log compaction, assumes that the "most recent Kafka record is important." Some
examples include tracking a current email address or tracking a current mailing address. With log compaction, older
records with the same key are removed from a log segment and the latest one is kept. This effectively removes some
offsets from the partition.
Changing the default directory of garbage collection logs is currently not supported. However, you can configure
properties related garbage log rotation with the Kafka Broker Environment Advanced Configuration Snippet (Safety
Valve) property.
1. In Cloudera Manager, go to the Kafka service and click Configuration.
2. Find the Kafka Broker Environment Advanced Configuration Snippet (Safety Valve) property.
3. Add the following line to the property:
Modify the values of as required.
KAFKA_GC_LOG_OPTS="-XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10
3. Change of the new broker to the of the old one both in Cloudera Manager and in
data directory/
4. (Optional) Run rsync to copy files from one broker to another.
See Using rsync to Copy Files from One Broker to Another on page 61.
5. Start up the new broker.
It re-replicates data from the other nodes.
Note that data intensive administration operations such as rebalancing partitions, adding a broker, removing a broker,
or bootstrapping a new machine can cause significant additional load on the cluster.
To avoid performance degradation of business workloads, you can limit the resources that these background processes
can consume by specifying the -throttleparameter when running kafka-reassign-partitions.
rsync -avz
Cloudera recommends setting the value to a relatively high starting point, such as 32,768.
You can monitor the number of file descriptors in use on the Kafka Broker dashboard. In Cloudera Manager:
1. Go to the Kafka service.
2. Select a Kafka Broker.
3. Open Charts Library > Process Resources and scroll down to the File Descriptors chart.
See Viewing Charts for Cluster, Service, Role, and Host Instances.
For a quick video introduction to quotas, see Quotas.
In CDK 2.0 Powered by Apache Kafka and higher, Kafka can enforce quotas on produce and fetch requests. Producers
and consumers can use very high volumes of data. This can monopolize broker resources, cause network saturation,
and generally deny service to other clients and the brokers themselves. Quotas protect against these issues and are
important for large, multi-tenant clusters where a small set of clients using high volumes of data can degrade the user
Quotas are byte-rate thresholds, defined per client ID. A client ID logically identifies an application making a request.
A single client ID can span multiple producer and consumer instances. The quota is applied for all instances as a single
entity. For example, if a client ID has a produce quota of 10 MB/s, that quota is shared across all instances with that
same ID.
When running Kafka as a service, quotas can enforce API limits. By default, each unique client ID receives a fixed quota
in bytes per second, as configured by the cluster (quota.producer.default, quota.consumer.default). This
quota is defined on a per-broker basis. Each client can publish or fetch a maximum of X bytes per second per broker
before it gets throttled.
The broker does not return an error when a client exceeds its quota, but instead attempts to slow the client down.
The broker computes the amount of delay needed to bring a client under its quota and delays the response for that
amount of time. This approach keeps the quota violation transparent to clients (outside of client-side metrics). This
also prevents clients from having to implement special backoff and retry behavior.
Setting Quotas
You can override the default quota for client IDs that need a higher or lower quota. The mechanism is similar to per-topic
log configuration overrides. Write your client ID overrides to ZooKeeper under /config/clients. All brokers read
the overrides, which are effective immediately. You can change quotas without having to do a rolling restart of the
entire cluster.
By default, each client ID receives an unlimited quota. The following configuration sets the default quota per producer
and consumer client ID to 10 MB/s.
To set quotas using Cloudera Manager, open the Kafka Configuration page and search for Quota. Use the fields provided
to set the Default Consumer Quota or Default Producer Quota. For more information, see Modifying Configuration
Properties Using Cloudera Manager.
Note: Output examples in this document are cleaned and formatted for easier readability.
Tool Notes
connect-distributed Kafka Connect is currently not supported.
Tool Notes
kafka-configs Use Cloudera Manager to adjust any broker or security
properties instead of the kafka-configs tool. This tool
should only be used to modify topic properties.
kafka-delete-records Do not use with CDH.
kafka-mirror-maker Use Cloudera Manager to create any CDH Mirror Maker
kafka-preferred-replica-election This tool causes leadership for each partition to be
transferred back to the 'preferred replica'. It can be used
to balance leadership among the servers.
It is recommended to use kafka-reassign-partitions
instead of kafka-preferred-replica-election.
kafka-replay-log-producer Can be used to “rename” a topic.
kafka-replica-verification Validates that all replicas for a set of topics have the same
data. This tool is a “heavy duty” version of the ISR column
of kafka-topics tool.
kafka-server-start Use Cloudera Manager to manage any Kafka host.
export JAVA_HOME=/usr/java/jdk1.8.0_144-cloudera
• Using any Zookeeper command manually can be very difficult to get right when it comes to interaction with Kafka.
Cloudera recommends that you avoid doing any write operations or ACL modifications in Zookeeper.
Use the kafka-topics tool to generate a snapshot of topics in the Kafka cluster.
The output lists each topic and basic partition information. Note the following about the output:
• Partition count: The more partitions, the higher the possible parallelism among consumers and producers.
• Replication factor: Shows 1 for no redundancy and higher for more redundancy.
• Replicas and in-sync replicas (ISR): Shows which broker ID’s have the partitions and which replicas are current.
There are situations where this tool shows an invalid value for the leader broker ID or the number of ISRs is fewer than
the number of replicas. In those cases, there may be something wrong with those specific topics.
It is possible to change topic configuration properties using this tool. Increasing the partition count, the replication
factor or both is not recommended.
The kafka-configs tool allows you to set and unset properties to topics. Cloudera recommends that you use Cloudera
Manager instead of this tool to change properties on brokers, because this tool bypasses any Cloudera Manager safety
Setting a topic property:
The kafka-console-consumer tool can be useful in a couple of ways:
• Acting as an independent consumer of particular topics. This can be useful to compare results against a consumer
program that you’ve written.
• To test general topic consumption without the need to write any consumer code.
Examples of usage:
This tool is used to write messages to a topic. It is typically not as useful as the console consumer, but it can be useful
when the messages are in a text based format. In general, the usage will be something like:
The basic usage of the kafka-consumer-groups tool is:
This tool is primarily useful for debugging consumer offset issues. The output from the tool shows the log and consumer
offsets for each partition connected to the consumer group corresponding to GROUP_ID. You can see at a glance which
consumers are current with their partition and which ones are behind. From there, you can determine which partitions
(and likely the corresponding brokers) are slow.
Beyond this debugging usage, there are other more advanced options to this tool:
• --execute --reset-offsets SCENARIO_OPTION: Resets the offsets for a consumer group to a particular
value based on the SCENARIO_OPTION flag given.
Valid flags for SCENARIO_OPTION are:
– --to-datetime
– --by-period
– --to-earliest
– --to-latest
– --shift-by
– --from-file
– --to-current
You will likely want to set the --topic flag to restrict this change to a specific topic or a specific set of partitions
within that topic.
This tool can be used to reset all offsets on all topics. This is something you probably won’t ever want to do. It is highly
recommended that you use this command carefully.
This tool provides substantial control over partitions in a Kafka cluster. It is mainly used to balance storage loads across
brokers through the following reassignment actions:
• Change the ordering of the partition assignment list. Used to control leader imbalances between brokers.
• Reassign partitions from one broker to another. Used to expand existing clusters.
• Reassign partitions between log directories on the same broker. Used to resolve storage load imbalance among
available disks in the broker.
• Reassign partitions between log directories across multiple brokers. Used to resolve storage load imbalance across
multiple brokers.
The tool uses two JSON files for input. Both of these are created by the user. The two files are the following:
• Topics-to-Move JSON on page 66
• Reassignment Configuration JSON on page 66
Topics-to-Move JSON
This JSON file specifies the topics that you want to reassign. This a simple file that tells the
kafka-reassign-partitions tool which partitions it should look at when generating a proposal for the
reassignment configuration. The user has to create the topics-to-move JSON file from scratch.
The format of the file is the following:
The reassignment configuration contains multiple properties that each control and specify an aspect of the
configuration. The Reassignment Configuration Properties table lists each property and its description.
Property Description
topic Specifies the topic.
partition Specifies the partition.
replicas Specifies the brokers that the selected partition is assigned
to. The brokers are listed in order, which means that the
first broker in the list is always the leader for that partition.
Change the order of brokers to resolve any leader
balancing issues among brokers. Change the broker IDs to
reassign partitions to different brokers.
log_dirs Specifies the log directory of the brokers. The log
directories are listed in the same order as the brokers. By
Property Description
default any is specified as the log directory, which means
that the broker is free to choose where it places the
replica. By default, the current broker implementation
selects the log directory using a round-robin algorithm.
An absolute path beginning with a / can be used to
explicitly set where to store the partition replica.
Tool Usage
To reassign partitions, complete the following steps:
1. Create a topics-to-move JSON file that specifies the topics you want to reassign. Use the following format:
2. Generate the content for the reassignment configuration JSON with the following command:
Running the command lists the distribution of partition replicas on your current brokers followed by a proposed
partition reassignment configuration.
Example output:
In this example, the tool proposed a configuration which reassigns existing partitions on broker 1, 2, and 3 to
brokers 4 and 5.
3. Copy and paste the proposed partition reassignment configuration into an empty JSON file.
4. Review, and if required, modify the suggested reassignment configuration.
5. Save the file.
6. Start the redistribution process with the following command:
Note: Specifying a bootstrap server with the --bootstrap-server option is only required
when an absolute log directory path is specified for a replica in the reassignment configuration
JSON file.
The tool prints a list containing the original replica assignment and a message that reassignment has started.
Example output:
The tool prints the reassignment status of all partitions. Example output:
There are multiple ways to modify the configuration file. The following list of examples shows how a user can modify
a proposed configuration and what these changes do. Changes to the original example are marked in bold.
Suppose that the kafka-reassign-partitions tool generated the following proposed reassignment configuration:
This reassignment configuration moves partition mytopic1-0 to the /log/directory1 log directory.
Reassign partitions between log directories across multiple brokers
To reassign partitions between log directories across multiple brokers, change the broker ID specified in replicas
and the appropriate any entry to an absolute path. For example:
The kafka-log-dirs tool allows user to query a list of replicas per log directory on a broker. The tool provides
information that is required for optimizing replica assignment across brokers.
On successful execution, the tool prints a list of partitions per log directory for the specified topics and brokers. The
list contains information on topic partition, size, offset lag, and reassignment state. Example output:
"brokers": [
"broker": 86,
"logDirs": [
"error": null,
"logDir": "/var/local/kafka/data",
"partitions": [
"isFuture": false,
"offsetLag": 0,
"partition": "mytopic1-2",
"size": 0
"version": 1
The Contents of the kafka-log-dirs Output table gives an overview of the information provided by the kafka-log-dirs
Property Description
broker Displays the ID of the broker.
Property Description
error Indicates if there is a problem with the disk that hosts the
topic partition. If an error is detected,
is displayed. If no error is detected, the value is null.
logDir Specifies the location of the log directory. Returns an
absolute path.
isfuture The reassignment state of the partition. This property
shows whether there is currently replica movement
underway between the log directories.
offsetLag Displays the offset lag of the partition.
partition Displays the name of the partition.
size Displays the size of the partition in bytes.
Tool Usage
To retrieve replica assignment information, run the following command:
Important: On secure clusters the admin client config property file has to be specified with the
--command-config option. Otherwise, the tool fails to execute.
If no topic is specified with the --topic-list option, then all topics are queried. If no broker is specified with the
--broker-list option, then all brokers are queried. If a log directory is offline, the log directory will be marked
offline in the script output. Error example:
The kafka-*-perf-test tool can be used in several ways. In general, it is expected that these tools should be used
on a test or development cluster.
• Measuring read and/or write throughput.
• Stress testing the cluster based on specific parameters (such as message size).
• Load testing for the purpose of evaluating specific metrics or determining the impact of cluster configuration
The kafka-producer-perf-test script can either create a randomly generated byte record:
where the INPUT_FILE is a concatenated set of pre-generated messages separated by DELIMITER. This script keeps
producing messages or limited based on the --num-records flag.
cp /etc/kafka/conf/ /var/tmp
sed -i -e 's/WARN/DEBUG/g' /var/tmp/
export KAFKA_OPTS="-Dlog4j.configuration=file:/var/tmp/"
As of CDH 6.1.0, Kafka clusters with nodes using JBOD configurations are supported by Cloudera.
JBOD refers to a system configuration where disks are used independently rather than organizing them into redundant
arrays (RAID). Using RAID usually results in more reliable hard disk configurations even if the individual disks are not
reliable. RAID setups like these are common in large scale big data environments built on top of commodity hardware.
RAID enabled configurations are more expensive and more complicated to set up. In a large number of environments,
JBOD configurations are preferred for the following reasons:
• Reduced storage cost: RAID-10 is recommended to protect against disk failures. However, scaling RAID-10
configurations can become excessively expensive. Storing the data redundantly on each node means that storage
space requirements have to be multiplied because the data is also replicated across nodes.
• Improved performance: Just like HDFS, the slowest disk in RAID-10 configuration limits overall throughput. Writes
need to go through a RAID controller. On the other hand, when using JBOD, IO performance is increased as a
result of isolated writes across disks without a controller.
• Manual operation and administration: Monitoring offline directories and JBOD related metrics is done through
Cloudera Manager. However, identifying failed disks and rebalancing partitions between disks is done manually.
• Manual load balancing between disks: Unlike with RAID-10, JBOD does not automatically distribute data across
disks. The process is fully manual.
To provide robust JBOD support in Kafka, changes in the Kafka protocol have been made. When performing an upgrade
to a new version of Kafka, make sure that you follow the recommended rolling upgrade process.
For more information, see Upgrading the CDH Cluster.
For more information regarding the JBOD related Kafka protocol changes, see KIP-112 and KIP-113.
To set up JBOD in your Kafka environment, perform the following steps:
1. Mount the required number of disks on your system.
2. In Cloudera Manager, set up log directories for all Kafka brokers.
a. Go to the Kafka service, select Instances and select the broker.
b. Go to Configuration and find the Data Directories property.
c. Modify the path of the log directories so that they correspond with the newly mounted disks.
Note: Depending on your, setup you may need to add or remove multiple data directories.
d. Enter a Reason for change, and then click Save Changes to commit the changes.
3. Go to the Kafka service and select Configuration.
4. Find and configure the following properties depending on your system and use case.
• Number of I/O Threads
• Number of Network Threads
• Number of Replica Fetchers
• Minimum Number of Replicas in ISR
5. Set replication factor to at least 3.
Important: If you set replication factor to less than 3, your data will be at risk. In addition, in
case of a disk failure, disk maintenance cannot be carried out without system downtime.
Migrating data from one disk to another is achieved with the kafka-reassign-partitions tool. The following
instructions focus on migrating existing Kafka partitions to JBOD configured disks. For a full tool description, see
kafka-reassign-partitions on page 65.
Note: Cloudera recommends that you minimize the volume of replica changes per command instance.
Instead of moving 10 replicas with a single command, move two at a time in order to save cluster
• Set up JBOD in your Kafka environment. For more information, see Setup on page 72.
• Collect the log directory paths on the JBOD disks where you want to migrate existing data.
• Collect the broker IDs of the brokers you want to migrate data to.
• Collect the name of the topics you want to migrate partitions from.
Note: Output examples in these instructions are cleaned and formatted to make them easily readable.
2. Generate the content for the reassignment configuration JSON with the following command:
Running the command lists the distribution of partition replicas on your current brokers followed by a proposed
partition reassignment configuration.
Example output:
In this example, the tool proposed a configuration which reassigns existing partitions on broker 1, 2, and 3 to
brokers 4 and 5.
3. Copy and paste the proposed partition reassignment configuration into an empty JSON file.
4. Modify the suggested reassignment configuration.
When migrating data you have two choices. You can move partitions to a different log directory on the same
broker, or move it to a different log directory on another broker.
a. To reassign partitions between log directories on the same broker, change the appropriate any entry to an
absolute path. For example:
b. To reassign partitions between log directories across different brokers, change the broker ID specified in
replicas and the appropriate any entry to an absolute path. For example:
Important: The bootstrap server has to be specified with the --bootstrap-server option if
an absolute log directory path is specified for a replica in the reassignment configuration JSON
The tool prints a list containing the original replica assignment and a message that reassignment has started.
Example output:
The tool prints the reassignment status of all partitions. Example output:
Replication Status
Monitor replication status using Cloudera Manager Health Tests. Cloudera Manager automatically and continuously
monitors both the OfflineLogDirectoryCount and OfflineReplicaCount metrics. Alters are raised when
failures are detected. For more information, see Cloudera Manager Health Tests.
Disk Capacity
Monitor free space on mounted disks and open file descriptors. For more information, see Useful Shell Command
Reference on page 161. Reassign partitions or move log files around if necessary. For more information, see
kafka-reassign-partitions on page 65.
Important: If there are no healthy log directories present in the system, the broker stops working.
The cause of disk failures can be analyzed with the help of the kafka-log-dirs on page 69 tool, or by reviewing the error
messages of KafkaStorageException entries in the Kafka broker log file.
To view the Kafka broker log file, complete the following steps:
1. In Cloudera Manager go to the Kafka service, select Instances and select the broker.
2. Go to Log Files > Role Log File.
In case of a disk failure, a Kafka administrator can carry out either of the following actions. The action taken depends
on the failure type and system environment:
• Replace the faulty disk with a new one.
• Remove the disk and redistribute data across remaining disks to restore the desired replication factor.
Note: Disk replacement and disk removal both require stopping the broker. Therefore, Cloudera
recommends that you perform these actions during a maintenance window.
Disk Replacement
To replace a disk, complete the following steps:
1. Stop the broker that has a faulty disk.
a. In Cloudera Manager, go to the Kafka service, select Instances and select the broker.
b. Go to Actions > Gracefully stop this Kafka Broker.
2. Replace the disk.
3. Mount the disk.
4. Set up the directory structure on the new disk the same way as it was set up on the previous disk.
Note: You can find the directory paths for the old disk in the Data Directories property of the
Disk Removal
To remove a disk from the configuration, complete the following steps:
1. Stop the broker that has a faulty disk.
a. In Cloudera Manager, go to the Kafka service, select Instances and select the broker.
b. Go to Actions > Gracefully stop this Kafka Broker.
2. Remove the log directories on the faulty disk from the broker.
a. Go to Configuration and find the Data Directories property.
b. Remove the affected log directories with the Remove button.
c. Enter a Reason for change, and then click Save Changes to commit the changes.
3. Start the broker.
a. In Cloudera Manager go to the Kafka service, selectInstances and select the broker.
b. Go to Actions > Start this Kafka Broker.
The Kafka broker redistributes data across the cluster.
Tuning Brokers
Topics are divided into partitions. Each partition has a leader. Topics that are properly configured for reliability will
consist of a leader partition and 2 or more follower partitions. When the leaders are not balanced properly, one might
be overworked, compared to others.
Depending on your system and how critical your data is, you want to be sure that you have sufficient replication sets
to preserve your data. For each topic, Cloudera recommends starting with one partition per physical storage disk and
one consumer per partition.
Tuning Producers
Kafka uses an asynchronous publish/subscribe model. When your producer calls send(), the result returned is a future.
The future provides methods to let you check the status of the information in process. When the batch is ready, the
producer sends it to the broker. The Kafka broker waits for an event, receives the result, and then responds that the
transaction is complete.
If you do not use a future, you could get just one record, wait for the result, and then send a response. Latency is very
low, but so is throughput. If each transaction takes 5 ms, throughput is 200 events per second — slower than the
expected 100,000 events per second.
When you use Producer.send(), you fill up buffers on the producer. When a buffer is full, the producer sends the
buffer to the Kafka broker and begins to refill the buffer.
Two parameters are particularly important for latency and throughput: batch size and linger time.
Batch Size
batch.size measures batch size in total bytes instead of the number of messages. It controls how many bytes of
data to collect before sending messages to the Kafka broker. Set this as high as possible, without exceeding available
memory. The default value is 16384.
If you increase the size of your buffer, it might never get full. The Producer sends the information eventually, based
on other triggers, such as linger time in milliseconds. Although you can impair memory usage by setting the buffer
batch size too high, this does not impact latency.
If your producer is sending all the time, you are probably getting the best throughput possible. If the producer is often
idle, you might not be writing enough data to warrant the current allocation of resources.
Linger Time sets the maximum time to buffer data in asynchronous mode. For example, the setting of 100 means that
it batches 100ms of messages to send at once. This improves throughput, but the buffering adds message delivery
By default, the producer does not wait. It sends the buffer any time data is available.
Instead of sending immediately, you can set to 5 and send more messages in one batch. This would reduce
the number of requests sent, but would add up to 5 milliseconds of latency to records sent, even if the load on the
system does not warrant the delay.
The farther away the broker is from the producer, the more overhead required to send messages. Increase
for higher latency and higher throughput in your producer.
Tuning Consumers
Consumers can create throughput issues on the other side of the pipeline. The maximum number of consumers in a
consumer group for a topic is equal to the number of partitions. You need enough partitions to handle all the consumers
needed to keep up with the producers.
Consumers in the same consumer group split the partitions among them. Adding more consumers to a group can
enhance performance (up to the number of partitions). Adding more consumer groups does not affect performance.
• The Kafka producer can compress messages. For example, if the original message is a text-based format (such as
XML), in most cases the compressed message will be sufficiently small.
• Use the compression.type producer configuration parameters to enable compression. gzip, lz4 and Snappy
are supported.
• If shared storage (such as NAS, HDFS, or S3) is available, consider placing large files on the shared storage and
using Kafka to send a message with the file location. In many cases, this can be much faster than using Kafka to
send the large file itself.
• Split large messages into 1 KB segments with the producing client, using partition keys to ensure that all segments
are sent to the same Kafka partition in the correct order. The consuming client can then reconstruct the original
large message.
If you still need to send large messages with Kafka, modify the configuration parameters presented in the following
sections to match your requirements.
log.segment.bytes 1073741824 Size of a Kafka data file. Must be larger than any single
(1 GiB)
fetch.max.bytes 52428800 The maximum amount of data the server should return
for a fetch request.
(50 MiB)
Note: The consumer is able to consume a message batch that is larger than the default value of the
max.partition.fetch.bytes or fetch.max.bytes property. However, the batch will be sent
alone, which can cause performance degradation.
However, if you want to size a cluster without simulation, a very simple rule could be to size the cluster based on the
amount of disk-space required (which can be computed from the estimated rate at which you get data times the
required data retention period).
A slightly more sophisticated estimation can be done based on network and disk throughput requirements. To make
this estimation, let's plan for a use case with the following characteristics:
• W - MB/sec of data that will be written
• R - Replication factor
• C - Number of consumer groups, that is the number of readers for each write
Kafka is mostly limited by the disk and network throughput.
The volume of writing expected is W * R (that is, each replica writes each message). Data is read by replicas as part
of the internal cluster replication and also by consumers. Because every replicas but the master read each write, the
read volume of replication is (R-1) * W. In addition each of the C consumers reads each write, so there will be a read
volume of C * W. This gives the following:
• Writes: W * R
• Reads: (R+C- 1) * W
However, note that reads may actually be cached, in which case no actual disk I/O happens. We can model the effect
of caching fairly easily. If the cluster has M MB of memory, then a write rate of W MB/second allows M/(W * R) seconds
of writes to be cached. So a server with 32 GB of memory taking writes at 50 MB/second serves roughly the last 10
minutes of data from cache. Readers may fall out of cache for a variety of reasons—a slow consumer or a failed server
that recovers and needs to catch up. An easy way to model this is to assume a number of lagging readers you to budget
for. To model this, let’s call the number of lagging readers L. A very pessimistic assumption would be that L = R +
C -1, that is that all consumers are lagging all the time. A more realistic assumption might be to assume no more than
two consumers are lagging at any given time.
Based on this, we can calculate our cluster-wide I/O requirements:
• Disk Throughput (Read + Write): W * R + L * W
• Network Read Throughput: (R + C -1) * W
• Network Write Throughput: W * R
A single server provides a given disk throughput as well as network throughput. For example, if you have a 1 Gigabit
Ethernet card with full duplex, then that would give 125 MB/sec read and 125 MB/sec write; likewise 6 7200 SATA
drives might give roughly 300 MB/sec read + write throughput. Once we know the total requirements, as well as what
is provided by one machine, you can divide to get the total number of machines needed. This gives a machine count
running at maximum capacity, assuming no overhead for network protocols, as well as perfect balance of data and
load. Since there is protocol overhead as well as imbalance, you want to have at least 2x this ideal capacity to ensure
sufficient capacity.
For example, if you want to be able to read 1 GB/sec, but your consumer is only able process 50 MB/sec, then you
need at least 20 partitions and 20 consumers in the consumer group. Similarly, if you want to achieve the same for
producers, and 1 producer can only write at 100 MB/sec, you need 10 partitions. In this case, if you have 20 partitions,
you can maintain 1 GB/sec for producing and consuming messages. You should adjust the exact number of partitions
to number of consumers or producers, so that each consumer and producer achieve their target throughput.
So a simple formula could be:
• NP is the number of required producers determined by calculating: TT/TP
• NC is the number of required consumers determined by calculating: TT/TC
• TT is the total expected throughput for our system
• TP is the max throughput of a single producer to a single partition
• TC is the max throughput of a single consumer from a single partition
This calculation gives you a rough indication of the number of partitions. It's a good place to start. Keep in mind the
following considerations for improving the number of partitions after you have your system in place:
• The number of partitions can be specified at topic creation time or later.
• Increasing the number of partitions also affects the number of open file descriptors. So make sure you set file
descriptor limit properly.
• Reassigning partitions can be very expensive, and therefore it's better to over- than under-provision.
• Changing the number of partitions that are based on keys is challenging and involves manual copying (see Kafka
Administration on page 59).
• Reducing the number of partitions is not currently supported. Instead, create a new a topic with a lower number
of partitions and copy over existing data.
• Metadata about partitions are stored in ZooKeeper in the form of znodes. Having a large number of partitions
has effects on ZooKeeper and on client resources:
– Unneeded partitions put extra pressure on ZooKeeper (more network requests), and might introduce delay
in controller and/or partition leader election if a broker goes down.
– Producer and consumer clients need more memory, because they need to keep track of more partitions and
also buffer data for all partitions.
• As guideline for optimal performance, you should not have more than 4000 partitions per broker and not more
than 200,000 partitions in a cluster.
Make sure consumers don’t lag behind producers by monitoring consumer lag. To check consumers' position in a
consumer group (that is, how far behind the end of the log they are), use the following command:
Cloudera recommends to set 4-8 GB of JVM heap size memory for the brokers depending on your use case. As Kafka’s
performance depends heavily on the operating systems page cache, it is not recommended to collocate with other
memory-hungry applications.
• Large messages can cause longer garbage collection (GC) pauses as brokers allocate large chunks. Monitor the GC
log and the server log.
Add this to Broker Java Options:
-XX:+PrintGC -XX:+PrintGCDetails
• If long GC pauses cause Kafka to abandon the ZooKeeper session, you may need to configure longer timeout
values, see Kafka-ZooKeeper Performance Tuning on page 85 for details.
ISR Management
An in-sync replica (ISR) set for a topic partition contains all follower replicas that are caught-up with the leader partition,
and are situated on a broker that is alive.
• If a replica lags “too far” behind from the partition leader, it is removed from the ISR set. The definition of what
is too far is controlled by the configuration setting If a follower hasn't sent any
fetch requests or hasn't consumed up to the leaders log end offset for at least this time, the leader removes the
follower from the ISR set.
• num.replica.fetchers is a cluster-wide configuration setting that controls how many fetcher threads are in
a broker. These threads are responsible for replicating messages from a source broker (that is, where partition
leader resides). Increasing this value results in higher I/O parallelism and fetcher throughput. Of course, there is
a trade-off: brokers use more CPU and network.
• replica.fetch.min.bytes controls the minimum number of bytes to fetch from a follower replica. If there
is not enough bytes, wait up to
• controls how long to sleep before checking for new messages from a fetcher
replica. This value should be less than, otherwise the replica is kicked out of the
ISR set.
• To check the ISR set for topic partitions, run the following command:
• If a partition leader dies, a new leader is selected from the ISR set. There will be no data loss. If there is no ISR,
unclean leader election can be used with the risk of data-loss.
• Unclean leader election occurs if unclean.leader.election.enable is set to true. By default, this is set to
Log Cleaner
As discussed in Record Management on page 59, the log cleaner implements log compaction. The following cluster-wide
configuration settings can be used to fine tune log compaction:
• log.cleaner.threads controls how many background threads are responsible for log compaction. Increasing
this value improves performance of log compaction at the cost of increased I/O activity.
• throttles log cleaner’s I/O activity so that the sum of its read and
write is less than this value on average.
• log.cleaner.dedupe.buffer.size specifies memory used for log compaction across all cleaner threads.
• controls total memory used for log cleaner I/O buffers across all cleaner threads.
• controls how long messages are left uncompacted.
• controls log cleaner’s load factor for the dedupe buffer. Increasing
this value allows the system to clean more logs at once but increases hash collisions.
• controls how long to wait until the next check if there is no log to compact.
The broker needs additional file descriptors to communicate via network sockets with external parties (such as clients,
other brokers, Zookeeper, Sentry, and Kerberos).
The Maximum Process File Descriptors setting can be monitored in Cloudera Manager and increased if usage
requires a larger value than the default ulimit (often 64K). It should be reviewed for use case suitability.
• To review FD limit currently set for a running Kafka broker, run cat /proc/KAFKA_BROKER_PID/limits, and
look for Max open files.
• To see open file descriptors, run:
Linux records when a file was created (ctime), modified (mtime) and accessed (atime). The value noatime is a special
mount option for filesystems (such as EXT4) in Linux that tells the kernel not to update inode information every time
a file is accessed (that is, when it was last read). Using this option may result in write performance gain. Kafka is not
relying on atime. The value relatime is another mounting option that optimizes how atime is persisted. Access
time is only updated if the previous atime was earlier than the current modified time.
To view mounting options, run mount -l or cat /etc/fstab command.
Networking Parameters
Kafka is designed to handle a huge amount of network traffic. By default, the Linux kernel is not tuned for this scenario.
The following kernel settings may need to be tuned based on use case or specific Kafka workload:
• net.core.wmem_default: Default send socket buffer size.
• net.core.rmem_default: Default receive socket buffer size.
• net.core.wmem_max: Maximum send socket buffer size.
• net.core.rmem_max: Maximum receive socket buffer size.
• net.ipv4.tcp_wmem: Memory reserved for TCP send buffers.
• net.ipv4.tcp_rmem: Memory reserved for TCP receive buffers.
• net.ipv4.tcp_window_scaling: TCP Window Scaling option.
• net.ipv4.tcp_max_syn_backlog: Maximum number of outstanding TCP SYN requests (connection requests).
• net.core.netdev_max_backlog: Maximum number of queued packets on the kernel input side (useful to deal
with spike of network requests).
To specify the parameters, you can use Cloudera Enterprise Reference Architecture as a guideline.
Kafka Reference
Metrics Reference
In addition to these metrics, many aggregate metrics are available. If an entity type has parents defined, you can
formulate all possible aggregate metrics using the formula base_metric_across_parents.
In addition, metrics for aggregate totals can be formed by adding the prefix total_ to the front of the metric name.
Use the type-ahead feature in the Cloudera Manager chart browser to find the exact aggregate metric name, in case
the plural form does not end in "s". For example, the following metric names may be valid for Kafka:
• alerts_rate_across_clusters
• total_alerts_rate_across_clusters
Some metrics, such as alerts_rate, apply to nearly every metric context. Others only apply to a certain service or
For more information about metrics the Cloudera Manager, see Cloudera Manager Metrics and Metric Aggregation.
Note: The following sections are identical to the metrics listed with Cloudera Manager. Be sure to
scroll horizontally to see the full content in each table.
Base Metrics
Broker Metrics
kafka_bytes_fetched_ Amount of data consumers bytes per cluster, kafka, CDH 5, CDH 6
15min_rate fetched from this topic on this message.units. rack
broker: 15 Min Rate singular.second
kafka_bytes_fetched_ Amount of data consumers bytes per cluster, kafka, CDH 5, CDH 6
1min_rate fetched from this topic on this message.units. rack
broker: 1 Min Rate singular.second
kafka_bytes_fetched_ Amount of data consumers bytes per cluster, kafka, CDH 5, CDH 6
5min_rate fetched from this topic on this message.units. rack
broker: 5 Min Rate singular.second
kafka_bytes_fetched_ Amount of data consumers bytes per cluster, kafka, CDH 5, CDH 6
avg_rate fetched from this topic on this message.units. rack
broker: Avg Rate singular.second
kafka_bytes_fetched_ Amount of data consumers bytes per cluster, kafka, CDH 5, CDH 6
rate fetched from this topic on this second rack
kafka_bytes_received_ Amount of data written to topic bytes per cluster, kafka, CDH 5, CDH 6
15min_rate on this broker: 15 Min Rate message.units. rack
kafka_bytes_received_ Amount of data written to topic bytes per cluster, kafka, CDH 5, CDH 6
1min_rate on this broker: 1 Min Rate message.units. rack
kafka_bytes_received_ Amount of data written to topic bytes per cluster, kafka, CDH 5, CDH 6
5min_rate on this broker: 5 Min Rate message.units. rack
kafka_bytes_received_ Amount of data written to topic bytes per cluster, kafka, CDH 5, CDH 6
avg_rate on this broker: Avg Rate message.units. rack
Replica Metrics
Disk Space
df or mount show the disks mounted and can be used to show disk space.
On a file or directory level, the du command is useful for seeing how much disk space is being used.
Process Information
• top shows a sorted list of processes.
• ps shows a snapshot list of processes. Arguments can be used to filter the output.
• ps -o min_flt,maj_flt pid shows page fault information.
Kernel Configuration
• ulimit -a is used to display kernel limits and shows which flags affect which kernel settings.
• ulimit -n FD to set a limit on open file descriptors.
What is Kafka?
Kafka is a streaming message platform. Breaking it down a bit further:
“Streaming”: Lots of messages (think tens or hundreds of thousands) being sent frequently by publishers ("producers").
Message polling occurring frequently by lots of subscribers ("consumers").
“Message”: From a technical standpoint, a key value pair. From a non-technical standpoint, a relatively small number
of bytes (think hundreds to a few thousand bytes).
If this isn’t your planned use case, Kafka may not be the solution you are looking for. Contact your favorite Cloudera
representative to discuss and find out. It is better to understand what you can and cannot do upfront than to go ahead
based on some enthusiastic arbitrary vendor message with a solution that will not meet your expectations in the end.
What is Kafka not well fitted for (or what are the tradeoffs)?
It’s very easy to get caught up in all the things that Kafka can be used for without considering the tradeoffs. Kafka
configuration is also not automatic. You need to understand each of your use cases to determine which configuration
properties can be used to tune (and retune!) Kafka for each use case.
Some more specific examples where you need to be deeply knowledgeable and careful when configuring are:
• Using Kafka as your microservices communication hub
Kafka can replace both the message queue and the services discovery part of your software infrastructure. However,
this is generally at the cost of some added latency as well as the need to monitor a new complex system (i.e. your
Kafka cluster).
• Using Kafka as long-term storage
While Kafka does have a way to configure message retention, it’s primarily designed for low latency message
delivery. Kafka does not have any support for the features that are usually associated with filesystems (such as
metadata or backups). As such, using some form of long-term ingestion, such as HDFS, is recommended instead.
• Using Kafka as an end-to-end solution
Kafka is only part of a solution. There are a lot of best practices to follow and support tools to build before you
can get the most out of it (see this wise LinkedIn post).
• Deploying Kafka without the right support
Uber has given some numbers for their engineering organization. These numbers could help give you an idea what
it takes to reach that kind of scale: 1300 microservers, 2000 engineers.
What’s a good size of a Kafka record if I care about performance and stability?
There is an older blog post from 2014 from LinkedIn titled: Benchmarking Apache Kafka: 2 Million Writes Per Second
(On Three Cheap Machines). In the “Effect of Message Size” section, you can see two charts which indicate that Kafka
throughput starts being affected at a record size of 100 bytes through 1000 bytes and bottoming out around 10000
bytes. In general, keeping topics specific and keeping message sizes deliberately small helps you get the most out of
Excerpting from Deploying Apache Kafka: A Practical FAQ:
• If shared storage is available (HDFS, S3, NAS), place the large payload on shared storage and use Kafka just to
send a message with the payload location.
• Handle large messages by chopping them into smaller parts before writing into Kafka, using a message key to
make sure all the parts are written to the same partition so that they are consumed by the same Consumer, and
re-assembling the large message from its parts when consuming.
Use Cases
Like most Open Source projects, Kafka provides a lot of configuration options to maximize performance. In some cases,
it is not obvious how best to map your specific use case to those configuration options. We attempt to address some
of those situations below.
How can I configure Kafka to ensure that events are stored reliably?
The following recommendations for Kafka configuration settings make it extremely difficult for data loss to occur.
1. Producer
a. block.on.buffer.full=true
b. retries=Long.MAX_VALUE
c. acks=all
e. Remember to close the producer when it is finished or when there is a long pause.
2. Broker
a. Topic replication.factor >= 3
b. Min.insync.replicas = 2
c. Disable unclean leader election
3. Consumer
a. Disable
b. Commit offsets after messages are processed by your consumer client(s).
If you have more than 3 hosts, you can increase the broker settings appropriately on topics that need more protection
against data loss.
Once I’ve followed all the previous recommendations, my cluster should never lose data, right?
Kafka does not ensure that data loss never occurs. There are the following tradeoffs:
1. Throughput vs. reliability. For example, the higher the replication factor, the more resilient your setup will be
against data loss. However, to make those extra copies takes time and can affect throughput.
2. Reliability vs. free disk space. Extra copies due to replication use up disk space that would otherwise be used for
storing events.
Beyond the above design tradeoffs, there are also the following issues:
• To ensure events are consumed you need to monitor your Kafka brokers and topics to verify sufficient consumption
rates are sustained to meet your ingestion requirements.
• Ensure that replication is enabled on any topic that requires consumption guarantees. This protects against Kafka
broker failure and host failure.
• Kafka is designed to store events for a defined duration after which the events are deleted. You can increase the
duration that events are retained up to the amount of supporting storage space.
• You will always run out of disk space unless you add more nodes to the cluster.
How do I size my topic? Alternatively: What is the “right” number of partitions for a topic?
Choosing the proper number of partitions for a topic is the key to achieving a high degree of parallelism with respect
to writes to and reads and to distribute load. Evenly distributed load over partitions is a key factor to have good
throughput (avoid hot spots). Making a good decision requires estimation based on the desired throughput of producers
and consumers per partition.
For example, if you want to be able to read 1 GB/sec, but your consumer is only able process 50 MB/sec, then you
need at least 20 partitions and 20 consumers in the consumer group. Similarly, if you want to achieve the same for
producers, and 1 producer can only write at 100 MB/sec, you need 10 partitions. In this case, if you have 20 partitions,
you can maintain 1 GB/sec for producing and consuming messages. You should adjust the exact number of partitions
to number of consumers or producers, so that each consumer and producer achieve their target throughput.
So a simple formula could be:
• NP is the number of required producers determined by calculating: TT/TP
• NC is the number of required consumers determined by calculating: TT/TC
• TT is the total expected throughput for our system
• TP is the max throughput of a single producer to a single partition
• TC is the max throughput of a single consumer from a single partition
This calculation gives you a rough indication of the number of partitions. It's a good place to start. Keep in mind the
following considerations for improving the number of partitions after you have your system in place:
• The number of partitions can be specified at topic creation time or later.
• Increasing the number of partitions also affects the number of open file descriptors. So make sure you set file
descriptor limit properly.
• Reassigning partitions can be very expensive, and therefore it's better to over- than under-provision.
• Changing the number of partitions that are based on keys is challenging and involves manual copying (see Kafka
Administration on page 59).
• Reducing the number of partitions is not currently supported. Instead, create a new a topic with a lower number
of partitions and copy over existing data.
• Metadata about partitions are stored in ZooKeeper in the form of znodes. Having a large number of partitions
has effects on ZooKeeper and on client resources:
– Unneeded partitions put extra pressure on ZooKeeper (more network requests), and might introduce delay
in controller and/or partition leader election if a broker goes down.
– Producer and consumer clients need more memory, because they need to keep track of more partitions and
also buffer data for all partitions.
• As guideline for optimal performance, you should not have more than 4000 partitions per broker and not more
than 200,000 partitions in a cluster.
Make sure consumers don’t lag behind producers by monitoring consumer lag. To check consumers' position in a
consumer group (that is, how far behind the end of the log they are), use the following command:
In general, if everything is going well with a particular topic, each consumer’s CURRENT-OFFSET should be up-to-date
or nearly up-to-date with the LOG-END-OFFSET. From this command, you can determine whether a particular host
or a particular partition is having issues keeping up with the data rate.
• On secure clusters, the source cluster and destination cluster must be in the same Kerberos realm.
How can I build a Spark streaming application that consumes data from Kafka?
You will need to set up your development environment to use both Spark libraries and Kafka libraries:
• Building Spark Applications
1. Kafka basic training:
2. Kafka developer training:
3. Doctor Kafka:
4. Kafka manager:
5. Cruise control:
6. Upstream documentation:
Apache License
Version 2.0, January 2004
Cloudera | 173
Appendix: Apache License, Version 2.0
licensable by such Contributor that are necessarily infringed by their Contribution(s) alone or by combination of their
Contribution(s) with the Work to which such Contribution(s) was submitted. If You institute patent litigation against
any entity (including a cross-claim or counterclaim in a lawsuit) alleging that the Work or a Contribution incorporated
within the Work constitutes direct or contributory patent infringement, then any patent licenses granted to You under
this License for that Work shall terminate as of the date such litigation is filed.
4. Redistribution.
You may reproduce and distribute copies of the Work or Derivative Works thereof in any medium, with or without
modifications, and in Source or Object form, provided that You meet the following conditions:
1. You must give any other recipients of the Work or Derivative Works a copy of this License; and
2. You must cause any modified files to carry prominent notices stating that You changed the files; and
3. You must retain, in the Source form of any Derivative Works that You distribute, all copyright, patent, trademark,
and attribution notices from the Source form of the Work, excluding those notices that do not pertain to any part
of the Derivative Works; and
4. If the Work includes a "NOTICE" text file as part of its distribution, then any Derivative Works that You distribute
must include a readable copy of the attribution notices contained within such NOTICE file, excluding those notices
that do not pertain to any part of the Derivative Works, in at least one of the following places: within a NOTICE
text file distributed as part of the Derivative Works; within the Source form or documentation, if provided along
with the Derivative Works; or, within a display generated by the Derivative Works, if and wherever such third-party
notices normally appear. The contents of the NOTICE file are for informational purposes only and do not modify
the License. You may add Your own attribution notices within Derivative Works that You distribute, alongside or
as an addendum to the NOTICE text from the Work, provided that such additional attribution notices cannot be
construed as modifying the License.
You may add Your own copyright statement to Your modifications and may provide additional or different license
terms and conditions for use, reproduction, or distribution of Your modifications, or for any such Derivative Works as
a whole, provided Your use, reproduction, and distribution of the Work otherwise complies with the conditions stated
in this License.
5. Submission of Contributions.
Unless You explicitly state otherwise, any Contribution intentionally submitted for inclusion in the Work by You to the
Licensor shall be under the terms and conditions of this License, without any additional terms or conditions.
Notwithstanding the above, nothing herein shall supersede or modify the terms of any separate license agreement
you may have executed with Licensor regarding such Contributions.
6. Trademarks.
This License does not grant permission to use the trade names, trademarks, service marks, or product names of the
Licensor, except as required for reasonable and customary use in describing the origin of the Work and reproducing
the content of the NOTICE file.
7. Disclaimer of Warranty.
Unless required by applicable law or agreed to in writing, Licensor provides the Work (and each Contributor provides
its Contributions) on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied,
including, without limitation, any warranties or conditions of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or
FITNESS FOR A PARTICULAR PURPOSE. You are solely responsible for determining the appropriateness of using or
redistributing the Work and assume any risks associated with Your exercise of permissions under this License.
8. Limitation of Liability.
In no event and under no legal theory, whether in tort (including negligence), contract, or otherwise, unless required
by applicable law (such as deliberate and grossly negligent acts) or agreed to in writing, shall any Contributor be liable
to You for damages, including any direct, indirect, special, incidental, or consequential damages of any character arising
as a result of this License or out of the use or inability to use the Work (including but not limited to damages for loss
of goodwill, work stoppage, computer failure or malfunction, or any and all other commercial damages or losses), even
if such Contributor has been advised of the possibility of such damages.
9. Accepting Warranty or Additional Liability.
174 | Cloudera
Appendix: Apache License, Version 2.0
While redistributing the Work or Derivative Works thereof, You may choose to offer, and charge a fee for, acceptance
of support, warranty, indemnity, or other liability obligations and/or rights consistent with this License. However, in
accepting such obligations, You may act only on Your own behalf and on Your sole responsibility, not on behalf of any
other Contributor, and only if You agree to indemnify, defend, and hold each Contributor harmless for any liability
incurred by, or claims asserted against, such Contributor by reason of your accepting any such warranty or additional
Cloudera | 175