Apache Kafka
Apache Kafka
Apache Kafka
of Contents
Introduction 1.1
Overview of Kafka 1.2
1
KafkaRequestHandlerPool — Pool of Daemon KafkaRequestHandler Threads 2.20
KafkaScheduler 2.21
LogDirFailureHandler 2.22
LogManager 2.23
Metadata 2.24
Metadata Update Listener 2.24.1
MetadataCache 2.25
MetadataResponse 2.26
MetadataUpdater 2.27
DefaultMetadataUpdater 2.27.1
NetworkClient — Non-Blocking KafkaClient 2.28
KafkaClient 2.28.1
NetworkClientUtils 2.28.2
OffsetConfig 2.29
Partition 2.30
PartitionStateMachine 2.31
ReplicaManager 2.32
ReplicaFetcherManager 2.32.1
AbstractFetcherManager 2.32.1.1
ReplicaFetcherThread 2.32.2
AbstractFetcherThread 2.32.2.1
ReplicaFetcherBlockingSend 2.32.2.2
ReplicationQuotaManager 2.32.3
ReplicationUtils 2.32.4
ReplicaStateMachine 2.32.5
Selector — Selectable on Socket Channels (from Java’s New IO API) 2.33
Selectable 2.33.1
ShutdownableThread 2.34
SocketServer 2.35
TopicDeletionManager 2.36
TransactionCoordinator 2.37
TransactionStateManager 2.38
ZkUtils 2.39
ZKRebalancerListener 2.40
2
Kafka Features
Topic Replication 3.1
Topic Deletion 3.2
Kafka Controller Election 3.3
Kafka Architecture
Broker Nodes — Kafka Servers 6.1
Broker 6.1.1
3
Topics 6.2
Messages 6.3
Kafka Clients 6.4
Producers 6.4.1
Consumers 6.4.2
RequestCompletionHandler 6.5
ClientResponse 6.6
Clusters 6.7
Kafka Metrics
Sensor 8.1
MetricsReporter 8.2
ProducerMetrics 8.3
SenderMetrics 8.4
Kafka Tools
Kafka Tools 9.1
kafka-configs.sh 9.1.1
kafka-topics.sh 9.1.2
4
Kafka Configuration
Properties 10.1
bootstrap.servers 10.1.1
client.id 10.1.2
enable.auto.commit 10.1.3
group.id 10.1.4
retry.backoff.ms 10.1.5
Logging 10.2
Kafka Connect
WorkerGroupMember 12.1
ConnectDistributed 12.2
Appendix
Further reading or watching 13.1
5
Introduction
I’m Jacek Laskowski, an independent consultant, software developer and technical instructor
specializing in Apache Spark, Apache Kafka and Kafka Streams (with Scala, sbt,
Kubernetes and a bit of Apache Mesos, DC/OS, Hadoop YARN).
I offer software development and consultancy services with very hands-on in-depth
workshops and mentoring. Reach out to me at jacek@japila.pl or @jaceklaskowski to
discuss opportunities.
Consider joining me at Warsaw Scala Enthusiasts and Warsaw Spark meetups in Warsaw,
Poland.
I’m also writing Mastering Apache Spark 2, Mastering Spark SQL, Mastering
Tip
Kafka Streams and Spark Structured Streaming Notebook gitbooks.
This collections of notes (what some may rashly call a "book") serves as the ultimate place
of mine to collect all the nuts and bolts of using Apache Kafka in your projects. The notes
help me designing and developing better products with Kafka. They are also a viable proof
of my understanding of Kafka (which I believe will help me reaching the highest level of
mastery in Apache Kafka).
Expect text and code snippets from a variety of public sources. Attribution follows.
6
Overview of Kafka
Overview of Kafka
Apache Kafka is an open source project for a distributed publish-subscribe messaging
system rethought as a distributed commit log.
Kafka stores messages in topics that are partitioned and replicated across multiple brokers
in a cluster. Producers send messages to topics from which consumers read.
Messages are byte arrays (with String, JSON, and Avro being the most common formats). If
a message has a key, Kafka makes sure that all messages of the same key are in the same
partition.
Consumers may be grouped in a consumer group with multiple consumers. Each consumer
in a consumer group will read messages from a unique subset of partitions in each topic
they subscribe to. Each message is delivered to one consumer in the group, and all
messages with the same key arrive at the same consumer.
Durability — Kafka does not track which messages were read by each consumer. Kafka
keeps all messages for a finite amount of time, and it is consumers' responsibility to track
their location per topic, i.e. offsets.
It is worth to note that Kafka is often compared to the following open source projects:
1. Apache ActiveMQ and RabbitMQ given they are message broker systems, too.
2. Apache Flume for its ingestion capabilities designed to send data to HDFS and Apache
HBase.
7
AdminManager
AdminManager
AdminManager is…FIXME
Figure 1. AdminManager
logIdent is [Admin Manager on Broker [brokerId]].
createTopicPolicy
topicPurgatory
Refer to Logging.
createTopics(
timeout: Int,
validateOnly: Boolean,
createInfo: Map[String, CreateTopicsRequest.TopicDetails],
responseCallback: (Map[String, ApiError]) => Unit): Unit
8
AdminManager
createTopics …FIXME
KafkaConfig
Metrics
MetadataCache
ZkUtils
9
Authorizer
Authorizer
Authorizer is…FIXME
configure Method
Caution FIXME
10
Cluster
Cluster
Cluster represents a subset of the nodes and topic partitions in a Kafka cluster.
A special variant of a cluster is boostrap cluster that is made up of the bootstrap brokers
that are mandatory (and specified explicitly) when Kafka clients are created, i.e.
KafkaAdminClient, AdminClient, KafkaConsumer and KafkaProducer.
Note A bootstrap cluster does not hold all information about the cluster.
partitionsByTopic
bootstrap Method
bootstrap …FIXME
isBootstrapConfigured Method
boolean isBootstrapConfigured()
11
Cluster
partitionsForTopic returns a collection of zero or more partition of the input topic from
Metadata getClusterForCurrentTopics
KafkaAdminClient describeTopics
Note
KafkaConsumer partitionsFor
12
Cluster (deprecated)
Cluster (deprecated)
It seems that Cluster class is created using ZkUtils.getCluster that is
used exclusively when ZKRebalancerListener does syncedRebalance (that
Important in turn happens for the currently-deprecated ZookeeperConsumerConnector ).
In other words, Cluster class and the page are soon to be removed.
topics Method
Caution FIXME
availablePartitionsForTopic Method
Caution FIXME
13
ClusterConnectionStates
ClusterConnectionStates
ClusterConnectionStates is…FIXME
connecting Method
connecting …FIXME
disconnected Method
disconnected …FIXME
14
ClusterResourceListener (and ClusterResourceListeners Collection)
ClusterResourceListener (and
ClusterResourceListeners Collection)
ClusterResourceListener is the contract for objects that want to be notified about changes in
package org.apache.kafka.common;
ClusterResourceListeners Collection
ClusterResourceListeners collection holds zero or more ClusterResourceListener objects
KafkaServer starts up
15
DynamicConfigManager
DynamicConfigManager
DynamicConfigManager is…FIXME
startup Method
startup
startup …FIXME
DynamicConfigManager
16
Fetcher
Fetcher
Fetcher is created exclusively when KafkaConsumer is created.
Table 1. Fetcher’s Internal Properties (e.g. Registries and Counters) (in alphabetical order)
Name Description
client
ConsumerNetworkClient that is given when Fetcher is
created.
ConsumerNetworkClient
Fetch size
17
Fetcher
Metadata
SubscriptionState
Metrics
FetcherMetricsRegistry
Time
IsolationLevel
FIXME
sendFetches Method
Caution FIXME
18
Fetcher
beginningOffsets Method
Caution FIXME
retrieveOffsetsByTimes Method
Caution FIXME
getAllTopicMetadata gets topic metadata specifying no topics (which means all topics
available).
19
Fetcher
20
GroupCoordinator
GroupCoordinator
GroupCoordinator is…FIXME
Caution FIXME
Broker ID
GroupConfig
OffsetConfig
GroupMetadataManager
DelayedOperationPurgatory[DelayedHeartbeat]
DelayedOperationPurgatory[DelayedJoin]
Time
21
GroupCoordinator
startup first prints out the following INFO message to the logs:
In the end, startup prints out the following INFO message to the logs:
22
GroupMetadataManager
GroupMetadataManager
GroupMetadataManager is…FIXME
groupMetadataTopicPartitionCount
scheduler KafkaScheduler
enableMetadataExpiration Method
enableMetadataExpiration(): Unit
23
GroupMetadataManager
Broker ID
ApiVersion
OffsetConfig
ReplicaManager
ZkUtils
Time
cleanupGroupMetadata takes the current time (using time) and for every GroupMetadata in
1. FIXME
In the end, cleanupGroupMetadata prints out the following INFO message to the logs:
getGroupMetadataTopicPartitionCount: Int
24
GroupMetadataManager
__consumer_offsets topic.
25
InterBrokerSendThread
InterBrokerSendThread
InterBrokerSendThread …FIXME
doWork Method
doWork …FIXME
26
Kafka
Kafka — Standalone Command-Line
Application
kafka.Kafka is a standalone command-line application that starts a Kafka broker.
getPropsFromArgs Method
Caution FIXME
In the end, main starts the KafkaServerStartable and waits till it finishes.
main terminates the JVM with status 0 when KafkaServerStartable shuts down properly
27
Kafka
registerLoggingSignalHandler(): Unit
registerLoggingSignalHandler registers signal handlers for TERM , INT and HUP signals
so that, once received, it prints out the following INFO message to the logs:
28
KafkaApis — API Request Handler
KafkaRequestHandlerPool).
ControlledShutdown handleControlledShutdownRequest
CreatePartitions handleCreatePartitionsRequest
CreateTopics handleCreateTopicsRequest
DeleteTopics handleDeleteTopicsRequest
Fetch handleFetchRequest
LeaderAndIsr handleAlterReplicaLogDirsRequest
Metadata handleTopicMetadataRequest
OffsetFetch handleOffsetFetchRequest
29
KafkaApis — API Request Handler
Refer to Logging.
handle first prints out the following TRACE message to the logs:
handle then relays the input request to the corresponding handler per the apiKey (from
handleLeaderAndIsrRequest …FIXME
handleAlterReplicaLogDirsRequest …FIXME
30
KafkaApis — API Request Handler
handleCreateTopicsRequest …FIXME
handleOffsetFetchRequest …FIXME
handleFetchRequest …FIXME
31
KafkaApis — API Request Handler
Caution FIXME
Caution FIXME
handleCreatePartitionsRequest …FIXME
handleDeleteTopicsRequest …FIXME
handleControlledShutdownRequest …FIXME
32
KafkaApis — API Request Handler
RequestChannel
ReplicaManager
AdminManager
GroupCoordinator
TransactionCoordinator
KafkaController
ZkUtils
Broker ID
KafkaConfig
MetadataCache
Metrics
Optional Authorizer
QuotaManagers
BrokerTopicStats
Cluster ID
Time
33
KafkaHealthcheck
KafkaHealthcheck
KafkaHealthcheck registers the broker it runs on with Zookeeper (which in turn makes the
broker visible to other brokers that together can form a Kafka cluster).
Figure 1. KafkaHealthcheck
Table 1. KafkaHealthcheck’s Internal Properties (e.g. Registries and Counters)
Name Description
sessionExpireListener SessionExpireListener
Broker ID
Advertised endpoints
ZkUtils
ApiVersion
startup
34
KafkaHealthcheck
register(): Unit
For every EndPoint with no host assigned (in advertisedEndpoints), register assigns the
fully-qualified domain name of the local host.
register then finds the first EndPoint with PLAINTEXT security protocol or creates an
empty EndPoint .
Tip Define EndPoint with PLAINTEXT security protocol for older clients to connect.
In the end, register requests ZkUtils to registerBrokerInZk for brokerId, the host and port
of the PLAINTEXT endpoint, the updated endpoints, the JMX port, the optional rack and
protocol version.
Note register makes a broker visible for other brokers to form a Kafka cluster.
handleNewSession Method
Caution FIXME
35
KafkaServerStartable — Thin Management Layer over KafkaServer
KafkaServerStartable — Thin Management
Layer over KafkaServer
KafkaServerStartable is a thin management layer to manage a single KafkaServer instance.
awaitShutdown Method
Caution FIXME
shutdown Method
Caution FIXME
1. KafkaConfig
2. Collection of KafkaMetricsReporters
36
KafkaServerStartable — Thin Management Layer over KafkaServer
Caution FIXME
startup Method
startup(): Unit
In case of any exceptions, startup exits the JVM with status 1 . You should see the
following FATAL message in the logs if that happens.
Note startup is used when a Kafka Broker starts (on command line).
37
KafkaServer
KafkaServer
KafkaServer is a Kafka broker that wires (creates and starts) Kafka services together.
apis KafkaApis
brokerState BrokerState
_brokerTopicStats BrokerTopicStats
_clusterId Cluster ID
credentialProvider CredentialProvider
38
KafkaServer
dynamicConfigHandlers
dynamicConfigManager DynamicConfigManager
groupCoordinator GroupCoordinator
kafkaController KafkaController
kafkaHealthcheck KafkaHealthcheck
logContext LogContext
logDirFailureChannel LogDirFailureChannel
logManager LogManager
metadataCache MetadataCache
Collection of MetricsReporter
reporters
Used when…FIXME
requestHandlerPool KafkaRequestHandlerPool
socketServer SocketServer
transactionCoordinator TransactionCoordinator
quotaManagers QuotaManagers
39
KafkaServer
zkUtils ZkUtils
Caution FIXME
Caution FIXME
Caution FIXME
Caution FIXME
KafkaConfig
Caution FIXME
40
KafkaServer
startup(): Unit
Internally, startup first prints out the following INFO message to the logs:
startup notifies cluster resource listeners (i.e. KafkaMetricsReporters and the configured
41
KafkaServer
startup creates the SocketServer (for KafkaConfig, Metrics and CredentialProvider) and
startup creates the KafkaController (for KafkaConfig, ZkUtils, Metrics and the optional
ZkUtils).
startup creates the GroupCoordinator (for KafkaConfig, ZkUtils and ReplicaManager) and
configures it.
and num.io.threads).
startup creates the KafkaHealthcheck (for broker ID, the advertised listeners, ZkUtils,
info,id=[brokerId] .
42
KafkaServer
In the end, you should see the following INFO message in the logs:
The INFO message above uses so-called log ident with the value of
Note broker.id property and is always in the format [Kafka Server [brokerId]],
after a Kafka server has fully started.
metadata to them.
43
KafkaConfig
KafkaConfig
KafkaConfig is the configuration of a Kafka server and the services.
hostName host.name
numNetworkThreads num.network.threads
port port
replicaLagTimeMaxMs
Caution FIXME
getConfiguredInstances Method
Caution FIXME
getListeners: Seq[EndPoint]
getListeners creates the EndPoints if defined using listeners Kafka property or defaults to
44
KafkaController
KafkaController
KafkaController is a Kafka service responsible for:
topic deletion
…FIXME
In a Kafka cluster, one of the brokers serves as the controller, which is responsible for
managing the states of partitions and replicas and for performing administrative tasks
like reassigning partitions.
Figure 1. KafkaController
KafkaController is part of every Kafka broker, but only one KafkaController is active at all
times.
45
KafkaController
ControlledShutdown
ID
ControlledShutdown
controlledShutdownCallback :
Try[Set[TopicAndPartition]]
⇒ Unit
Reelect ControllerChange
2. (only when the broker is no
longer an active controller
Resigns as the active controller
3. elect
1. registerSessionExpirationListener
Startup ControllerChange 2. registerControllerChangeListener
3. elect
46
KafkaController
controllerContext
ControllerEventManager for
eventManager controllerContext.stats.rateAndTimeMetrics and
updateMetrics listener
kafkaScheduler
KafkaScheduler with a single daemon thread with prefix
kafka-scheduler
partitionStateMachine PartitionStateMachine
replicaStateMachine ReplicaStateMachine
topicDeletionManager TopicDeletionManager
47
KafkaController
Registered in
isrChangeNotificationListener
registerIsrChangeNotificationListener when
KafkaController does onControllerFailover.
De-registered in
deregisterIsrChangeNotificationListener when
KafkaController resigns as the active controller.
logDirEventNotificationListener LogDirEventNotificationListener
De-registered in deregisterTopicDeletionListener
when KafkaController resigns as the active
controller.
Refer to Logging.
initiateReassignReplicasForTopicPartition
Method
48
KafkaController
initiateReassignReplicasForTopicPartition
initiateReassignReplicasForTopicPartition …FIXME
deregisterPartitionReassignmentIsrChangeListen
ers Method
deregisterPartitionReassignmentIsrChangeListeners
deregisterPartitionReassignmentIsrChangeListeners …FIXME
resetControllerContext Method
resetControllerContext
resetControllerContext …FIXME
deregisterBrokerChangeListener Method
deregisterBrokerChangeListener
deregisterBrokerChangeListener …FIXME
deregisterTopicChangeListener Method
deregisterTopicChangeListener
deregisterTopicChangeListener …FIXME
49
KafkaController
onControllerResignation(): Unit
onControllerResignation starts by printing out the following DEBUG message to the logs:
Resigning
znodes in order:
offlinePartitionCount
preferredReplicaImbalanceCount
globalTopicCount
globalPartitionCount
onControllerResignation deregisterPartitionReassignmentIsrChangeListeners.
onControllerResignation deregisterTopicChangeListener.
partitionModificationsListeners.
onControllerResignation deregisterTopicDeletionListener.
onControllerResignation deregisterBrokerChangeListener.
50
KafkaController
onControllerResignation resetControllerContext.
In the end, onControllerResignation prints out the following DEBUG message to the logs:
Resigned
deregisterIsrChangeNotificationListener(): Unit
logs:
De-registering IsrChangeNotificationListener
deregisterLogDirEventNotificationListener(): Unit
51
KafkaController
logs:
De-registering logDirEventNotificationListener
deregisterPreferredReplicaElectionListener(): Unit
deregisterPartitionReassignmentListener(): Unit
52
KafkaController
triggerControllerMove(): Unit
triggerControllerMove …FIXME
1. KafkaController handleIllegalState
Note
2. KafkaController caught an exception while electing or becoming a
controller
handleIllegalState …FIXME
sendUpdateMetadataRequest Method
sendUpdateMetadataRequest(): Unit
sendUpdateMetadataRequest …FIXME
updateLeaderEpochAndSendRequest(): Unit
updateLeaderEpochAndSendRequest …FIXME
shutdown Method
53
KafkaController
shutdown(): Unit
shutdown …FIXME
Caution FIXME
onBrokerStartup Method
onBrokerStartup …FIXME
elect Method
elect(): Unit
elect …FIXME
Note elect is used when KafkaController enters Startup and Reelect states.
onControllerFailover Method
Caution FIXME
isActive Method
isActive: Boolean
54
KafkaController
isActive says whether the activeControllerId equals the broker ID (from KafkaConfig).
registerIsrChangeNotificationListener Internal
Method
registerIsrChangeNotificationListener(): Option[Seq[String]]
registerIsrChangeNotificationListener …FIXME
deregisterIsrChangeNotificationListener
Internal Method
deregisterIsrChangeNotificationListener(): Unit
deregisterIsrChangeNotificationListener …FIXME
KafkaConfig
ZkUtils
Time
Metrics
startup(): Unit
55
KafkaController
startup puts Startup event at the end of the event queue of ControllerEventManager and
requests it to start.
registerSessionExpirationListener(): Unit
registerControllerChangeListener(): Unit
ControllerEventManager).
ControllerChangeListener emits:
1. ControllerChange event with the current controller ID (on the event queue
Note of ControllerEventManager ) every time the data of a znode changes
2. Reelect event when the data associated with a znode has been deleted
56
KafkaController
registerBrokerChangeListener(): Option[Seq[String]]
getControllerID(): Int
getControllerID returns the ID of the active Kafka controller that is associated with
Internally, getControllerID requests ZkUtils for data associated with /controller znode.
If available, getControllerID parses the data (being the current controller info in JSON
format) to extract brokerid field.
$ ./bin/zookeeper-shell.sh 0.0.0.0:2181
Connecting to 0.0.0.0:2181
Welcome to ZooKeeper!
...
get /controller
{"version":1,"brokerid":100,"timestamp":"1506197069724"}
cZxid = 0xf9
ctime = Sat Sep 23 22:04:29 CEST 2017
mZxid = 0xf9
mtime = Sat Sep 23 22:04:29 CEST 2017
pZxid = 0xf9
cversion = 0
dataVersion = 0
aclVersion = 0
ephemeralOwner = 0x15eaa3a4fdd000d
dataLength = 56
numChildren = 0
57
KafkaController
registerTopicDeletionListener(): Option[Seq[String]]
deregisterTopicDeletionListener(): Unit
58
ControllerEventManager
ControllerEventManager
ControllerEventManager is…FIXME
thread
ControllerEventThread with controller-event-thread
thread name
59
ControllerEventManager
start(): Unit
60
ControllerEventThread
ControllerEventThread
ControllerEventThread is a ShutdownableThread that is started when
ControllerEventManager is started
doWork(): Unit
61
ControllerEventThread
doWork takes and removes the head of event queue (waiting if necessary until an element
becomes available).
The very first event in the event queue is Startup that KafkaController puts
Note
when it is started.
doWork finds the KafkaTimer for the state in rateAndTimeMetrics lookup table (of
ControllerEventManager ).
62
ControllerEvent
ControllerEvent
ControllerEvent is the contract of events in the lifecycle of KafkaController state machine
that, once emitted, triggers state change and the corresponding process action.
package kafka.controller
ControllerEvent is a Scala sealed trait and so all the available events are in a
Note
single compilation unit (i.e. a file).
process
Used when ControllerEventThread does the work to
trigger an action associated with state change.
63
TopicDeletion Controller Event
state is TopicDeletion .
process Method
process(): Unit
Note process is executed on the active controller only (and does nothing otherwise).
process requests ControllerContext for allTopics and finds topics that are supposed to be
If there are any non-existent topics, process prints out the following WARN message to the
logs and requests ZkUtils to deletePathRecursive /admin/delete_topics/[topicName] znode
for every topic in the list.
With delete.topic.enable enabled (i.e. true ), process prints out the following INFO
message to the logs:
64
TopicDeletion Controller Event
With delete.topic.enable disabled (i.e. false ), process prints out the following INFO
message to the logs (for every topic):
topic).
65
ControllerBrokerRequestBatch
ControllerBrokerRequestBatch
ControllerBrokerRequestBatch is…FIXME
sendRequestsToBrokers Method
sendRequestsToBrokers …FIXME
ReplicaStateMachine handleStateChanges
66
KafkaMetricsReporter
KafkaMetricsReporter
Caution FIXME
67
KafkaRequestHandler
KafkaRequestHandler
KafkaRequestHandler is a thread of execution (i.e. Java’s Runnable) that is responsible for
Refer to Logging.
run(): Unit
Caution FIXME
68
KafkaRequestHandler
ID
Broker ID
RequestChannel
KafkaApis
Time
69
KafkaRequestHandlerPool — Pool of Daemon KafkaRequestHandler Threads
KafkaRequestHandlerPool — Pool of Daemon
KafkaRequestHandler Threads
KafkaRequestHandlerPool is a pool of daemon kafka-request-handler threads that are
shutdown Method
Caution FIXME
Broker ID
RequestChannel
KafkaApis
70
KafkaRequestHandlerPool — Pool of Daemon KafkaRequestHandler Threads
Time
71
KafkaScheduler
KafkaScheduler
KafkaScheduler is a Scheduler to schedule tasks in Kafka.
Refer to Logging.
startup(): Unit
When startup is executed, you should see the following DEBUG message in the logs:
startup initializes executor with threads threads. The name of the threads is in format of
72
KafkaScheduler
Caution FIXME
shutdown Method
Caution FIXME
Caution FIXME
schedule(name: String, fun: () => Unit, delay: Long, period: Long, unit: TimeUnit): Un
it
When schedule is executed, you should see the following DEBUG message in the logs:
DEBUG Scheduling task [name] with initial delay [delay] ms and period [period] ms. (ka
fka.utils.KafkaScheduler)
schedule first makes sure that KafkaScheduler is running (which simply means that the
For positive period , schedule schedules the thread every period after the initial delay .
Otherwise, schedule schedules the thread once.
Whenever the thread is executed, and before fun gets triggerred, you should see the
following TRACE message in the logs:
73
KafkaScheduler
After the execution thread is finished, you should see the following TRACE message in the
logs:
In case of any exceptions, the execution thread catches them and you should see the
following ERROR message in the logs:
Scheduler Contract
trait Scheduler {
def startup(): Unit
def shutdown(): Unit
def isStarted: Boolean
def schedule(name: String, fun: () => Unit, delay: Long = 0, period: Long = -1, unit
: TimeUnit = TimeUnit.MILLISECONDS)
}
74
LogDirFailureHandler
LogDirFailureHandler
LogDirFailureHandler is…FIXME
start Method
Caution FIXME
75
LogManager
LogManager
Caution FIXME
startup Method
Caution FIXME
76
Metadata
Metadata
Metadata describes a Kafka cluster…FIXME
AdminClient
1. Request Metadata for a update (that simply turns the needUpdate flag on)
3. Request Metadata to wait for metadata change (i.e. until the metadata version has
changed)
Updated when:
listeners
77
Metadata
lastSuccessfulRefreshMs
Flag…FIXME
Disabled (i.e. false ) when Metadata is created
needMetadataForAllTopics
Updated when Metadata is requested to set state to
indicate that metadata for all topics in Kafka cluster is
required
Metadata version
version 0 when Metadata is created
Refer to Logging.
78
Metadata
updateRequested …FIXME
failedUpdate …FIXME
getClusterForCurrentTopics …FIXME
timeToNextUpdate Method
timeToNextUpdate …FIXME
ConsumerNetworkClient ensureFreshMetadata
Note
DefaultMetadataUpdater (of NetworkClient ) isUpdateDue and
maybeUpdate
79
Metadata
add Method
add …FIXME
requestUpdate Method
requestUpdate …FIXME
awaitUpdate …FIXME
Caution FIXME
80
Metadata
now .
flag is on and turns needUpdate flag off (that may have been turned on…FIXME).
update prints out the cluster ID and notifies clusterResourceListeners that cluster has
refreshBackoffMs
metadataExpireMs
allowAutoTopicCreation flag
81
Metadata
topicExpiryEnabled flag
ClusterResourceListeners
needMetadataForAllTopics .
add
Note
needMetadataForAllTopics
setTopics
82
Metadata Update Listener
package org.apache.kafka.clients;
83
MetadataCache
MetadataCache
MetadataCache is…FIXME
84
MetadataResponse
MetadataResponse
MetadataResponse holds information about a Kafka cluster, i.e. the broker nodes, the
Figure 1. MetadataResponse
MetadataResponse is created mainly when KafkaApis handles a Metadata request.
cluster Method
Cluster cluster()
cluster …FIXME
throttleTimeMs
Broker nodes
cluster ID
controller ID
Collection of TopicMetadata
85
MetadataUpdater
MetadataUpdater
MetadataUpdater Contract
package org.apache.kafka.clients;
interface MetadataUpdater {
List<Node> fetchNodes();
void handleDisconnection(String destination);
void handleAuthenticationFailure(AuthenticationException exception);
void handleCompletedMetadataResponse(RequestHeader requestHeader, long now, Metadata
Response metadataResponse);
boolean isUpdateDue(long now);
long maybeUpdate(long now);
void requestUpdate();
}
requestUpdate
handleAuthenticationFailure
handleCompletedMetadataResponse
Used exclusively when NetworkClient handles
completed receives
86
DefaultMetadataUpdater
DefaultMetadataUpdater
DefaultMetadataUpdater is a MetadataUpdater that NetworkClient uses to…FIXME
log4j.logger.org.apache.kafka.clients.NetworkClient=DEBUG
Tip
Add the following line to config/log4j.properties :
log4j.logger.org.apache.kafka.clients.NetworkClient=DEBUG, stdout
Refer to Logging.
FIXME
87
DefaultMetadataUpdater
isUpdateDue Method
Caution FIXME
maybeUpdate(long now)
maybeUpdate takes requestTimeoutMs for the time to wait till metadata fetch in progress
maybeUpdate takes the maximum of the two values above to check if the current cluster
If not, maybeUpdate gives the maximum value (that says how long to wait till the current
cluster metadata expires).
Otherwise, maybeUpdate selects the node to request a cluster metadata from and
maybeUpdate (with the input now timestamp and the node).
If no node was found, maybeUpdate prints out the following DEBUG message to the logs and
gives reconnectBackoffMs.
maybeUpdate …FIXME
88
DefaultMetadataUpdater
When there are no nodes in the cluster, handleCompletedMetadataResponse prints out the
following TRACE message to the logs and requests Metadata to record a failure (with no
exception).
89
NetworkClient — Non-Blocking KafkaClient
NetworkClient — Non-Blocking KafkaClient
NetworkClient is a non-blocking KafkaClient that uses Selectable for network
Selector is the one and only Selectable that uses Java’s selectable channels for
Note stream-oriented connecting sockets (i.e. Java’s
java.nio.channels.SocketChannel).
NetworkClient does the actual reads and writes (to sockets) every poll.
Figure 1. NetworkClient
NetworkClient is created when:
TransactionMarkerChannelManager is created
ReplicaFetcherBlockingSend is created
90
NetworkClient — Non-Blocking KafkaClient
log4j.logger.org.apache.kafka.clients.NetworkClient=DEBUG
Tip
Add the following line to config/log4j.properties :
log4j.logger.org.apache.kafka.clients.NetworkClient=DEBUG, stdout
Refer to Logging.
ClientRequest newClientRequest(
String nodeId
AbstractRequest.Builder<?> requestBuilder
long createdTimeMs,
boolean expectResponse)
newClientRequest …FIXME
initiateConnect requests Selectable to connect to the broker node (at a given host and
port).
initiateConnect passes the sizes of send and receive buffers for the socket
Note
connection.
91
NetworkClient — Non-Blocking KafkaClient
DefaultMetadataUpdater maybeUpdate
ready Method
ready …FIXME
wakeup Method
void wakeup()
92
NetworkClient — Non-Blocking KafkaClient
poll requests MetadataUpdater for cluster metadata update (if needed and possible).
In the end, poll handles completed request sends, receives, disconnected connections,
records any connections to new brokers, initiates API version requests, expire in-flight
requests, and finally triggers their RequestCompletionHandlers .
handleCompletedReceives Method
handleCompletedReceives …FIXME
93
NetworkClient — Non-Blocking KafkaClient
Arguments Description
MetadataUpdater
Metadata
Selectable
Client ID
maxInFlightRequestsPerConnection
reconnectBackoffMs
reconnectBackoffMax
requestTimeoutMs
Time
discoverBrokerVersions Flag…
ApiVersions
Sensor
LogContext
94
NetworkClient — Non-Blocking KafkaClient
completeResponses informs every ClientResponse (in the input responses ) that a response
In case of any exception, completeResponses prints out the following ERROR message to
the logs:
95
KafkaClient
KafkaClient
KafkaClient is the contract for…FIXME
KafkaClient Contract
package org.apache.kafka.clients;
96
KafkaClient
Used when:
newClientRequest
…FIXME
Used when:
wakeup
…FIXME
Used when:
ConsumerNetworkClient polls
97
NetworkClientUtils
NetworkClientUtils
NetworkClientUtils is…FIXME
sendAndReceive Method
sendAndReceive …FIXME
awaitReady …FIXME
98
OffsetConfig
OffsetConfig
OffsetConfig is…FIXME
99
Partition
Partition
A Kafka topic is spread across a Kafka cluster as a virtual group of one or more partitions.
A single partition of a topic (topic partition) can be replicated across a Kafka cluster to one
or more Kafka brokers.
A topic partition has one partition leader node and zero or more replicas.
Kafka producers publish messages to topic leaders as do Kafka consumers consume them
from.
Partition is…FIXME
maybeExpandIsr Method
FIXME
maybeExpandIsr …FIXME
maybeShrinkIsr Method
maybeShrinkIsr …FIXME
updateReplicaLogReadResult Method
100
Partition
updateReplicaLogReadResult …FIXME
updateIsr …FIXME
Note updateIsr is used when Partition is requested to expand or shrink the ISR.
makeLeader Method
makeLeader(
controllerId: Int,
partitionStateInfo: LeaderAndIsrRequest.PartitionState,
correlationId: Int): Boolean
makeLeader …FIXME
makeFollower Method
makeFollower(
controllerId: Int,
partitionStateInfo: LeaderAndIsrRequest.PartitionState,
correlationId: Int): Boolean
makeFollower …FIXME
leaderReplicaIfLocal Method
101
Partition
leaderReplicaIfLocal: Option[Replica]
leaderReplicaIfLocal gives…FIXME
maybeShrinkIsr Method
Caution FIXME
Topic name
Partition ID
Time
ReplicaManager
102
PartitionStateMachine
PartitionStateMachine
PartitionStateMachine is…FIXME
triggerOnlinePartitionStateChange Method
triggerOnlinePartitionStateChange(): Unit
triggerOnlinePartitionStateChange …FIXME
handleStateChanges Method
handleStateChanges(
partitions: Set[TopicAndPartition],
targetState: PartitionState,
leaderSelector: PartitionLeaderSelector = noOpPartitionLeaderS...,
callbacks: Callbacks): Unit
handleStateChanges …FIXME
shutdown Method
FIXME
shutdown …FIXME
103
ReplicaManager
ReplicaManager
ReplicaManager is created and started when KafkaServer starts up.
ReplicaManager is a KafkaMetricsGroup .
lastIsrChangeMs
Time when isrChangeSet has a new TopicPartition
added.
logDirFailureHandler LogDirFailureHandler
OfflinePartition
104
ReplicaManager
createReplicaFetcherManager(
metrics: Metrics
time: Time
threadNamePrefix: Option[String]
quotaManager: ReplicationQuotaManager): ReplicaFetcherManager
createReplicaFetcherManager …FIXME
shutdown Method
shutdown …FIXME
alterReplicaLogDirs Method
alterReplicaLogDirs …FIXME
becomeLeaderOrFollower Method
becomeLeaderOrFollower(
correlationId: Int,
leaderAndISRRequest: LeaderAndIsrRequest,
onLeadershipChange: (Iterable[Partition], Iterable[Partition]) => Unit): BecomeLeade
rOrFollowerResult
becomeLeaderOrFollower …FIXME
105
ReplicaManager
makeFollowers(
controllerId: Int,
epoch: Int,
partitionState: Map[Partition, LeaderAndIsrRequest.PartitionState],
correlationId: Int,
responseMap: mutable.Map[TopicPartition, Errors]) : Set[Partition]
makeFollowers …FIXME
recordIsrChange Method
recordIsrChange adds the input topicPartition to isrChangeSet internal registry and sets
updateFollowerLogReadResults(
replicaId: Int,
readResults: Seq[(TopicPartition, LogReadResult)]): Seq[(TopicPartition, LogReadResu
lt)]
updateFollowerLogReadResults …FIXME
fetchMessages Method
106
ReplicaManager
fetchMessages(
timeout: Long,
replicaId: Int,
fetchMinBytes: Int,
fetchMaxBytes: Int,
hardMaxBytesLimit: Boolean,
fetchInfos: Seq[(TopicPartition, FetchRequest.PartitionData)],
quota: ReplicaQuota = UnboundedQuota,
responseCallback: Seq[(TopicPartition, FetchPartitionData)] => Unit,
isolationLevel: IsolationLevel): Unit
fetchMessages …FIXME
getLeaderPartitions: List[Partition]
getLeaderPartitions gives the partitions from allPartitions that are not offline and their
isr-expiration Task
Caution FIXME
isr-change-propagation Task
Caution FIXME
maybePropagateIsrChanges Method
maybePropagateIsrChanges(): Unit
maybePropagateIsrChanges …FIXME
107
ReplicaManager
KafkaConfig
Metrics
Time
ZkUtils
Scheduler
LogManager
isShuttingDown flag
ReplicationQuotaManager
BrokerTopicStats
MetadataCache
LogDirFailureChannel
DelayedOperationPurgatory[DelayedProduce]
DelayedOperationPurgatory[DelayedFetch]
DelayedOperationPurgatory[DelayedDeleteRecords]
startup(): Unit
1. isr-expiration
2. isr-change-propagation
108
ReplicaManager
maybeShrinkIsr(): Unit
TRACE Evaluating ISR list of partitions to see which replicas can be removed from the
ISR
maybeShrinkIsr requests the partitions (from allPartitions pool that are not offline partitions)
109
ReplicaFetcherManager
ReplicaFetcherManager
ReplicaFetcherManager is a AbstractFetcherManager that…FIXME (describe properties)
createFetcherThread Method
createFetcherThread …FIXME
KafkaConfig
ReplicaManager
Metrics
Time
ReplicationQuotaManager
110
AbstractFetcherManager
AbstractFetcherManager
AbstractFetcherManager is…FIXME
addFetcherForPartitions Method
addFetcherForPartitions …FIXME
Note makeFollowers
111
ReplicaFetcherThread
ReplicaFetcherThread
ReplicaFetcherThread is a AbstractFetcherThread…FIXME
Name
Fetcher ID
Source BrokerEndPoint
KafkaConfig
ReplicaManager
Metrics
Time
ReplicationQuotaManager
earliestOrLatestOffset …FIXME
fetchEpochsFromLeader Method
112
ReplicaFetcherThread
fetchEpochsFromLeader …FIXME
113
AbstractFetcherThread
AbstractFetcherThread
AbstractFetcherThread is…FIXME
114
ReplicaFetcherBlockingSend
ReplicaFetcherBlockingSend
ReplicaFetcherBlockingSend is…FIXME
Node
sourceNode
Used when…FIXME
BrokerEndPoint
KafkaConfig
Metrics
Time
Fetcher ID
Client ID
115
ReplicaFetcherBlockingSend
LogContext
sendRequest requests NetworkClient to create a new client request to the source broker.
sendRequest requests NetworkClientUtils to send the client request and wait for a
response.
sendRequest is a blocking operation (i.e. blocks the current thread) and polls for
Note responses until the one arrives or a disconnection or a version mismatch
happens.
close Method
close(): Unit
close …FIXME
116
ReplicationQuotaManager
ReplicationQuotaManager
ReplicationQuotaManager is…FIXME
117
ReplicationUtils
ReplicationUtils
ReplicationUtils …FIXME
propagateIsrChanges Method
propagateIsrChanges …FIXME
118
ReplicaStateMachine
ReplicaStateMachine
ReplicaStateMachine is…FIXME
handleStateChanges Method
handleStateChanges(
replicas: Set[PartitionAndReplica],
targetState: ReplicaState,
callbacks: Callbacks): Unit
handleStateChanges …FIXME
shutdown Method
FIXME
shutdown …FIXME
119
Selector — Selectable on Socket Channels (from Java’s New IO API)
KafkaConsumer is created
KafkaProducer is created
AdminClient is created
TransactionMarkerChannelManager is created
Processor is created
ReplicaFetcherBlockingSend is created
connect Method
void connect(
String id,
InetSocketAddress address,
int sendBufferSize,
int receiveBufferSize) throws IOException
120
Selector — Selectable on Socket Channels (from Java’s New IO API)
connect …FIXME
121
Selectable
Selectable
Selectable is the contract for asynchronous, multi-channel network I/O.
package org.apache.kafka.common.network;
List<Send> completedSends();
List<NetworkReceive> completedReceives();
connect
Used exclusively when NetworkClient is requested to
establish a connection to a broker
poll
122
ShutdownableThread
ShutdownableThread
ShutdownableThread is the contract for non-daemon threads of execution.
shutdownLatch
Java’s java.util.concurrentCountDownLatch with the
number of passes being 1
run Method
run(): Unit
Note run is a part of java.lang.Runnable that is executed when the thread is started.
run first prints out the following INFO message to the logs:
Starting
In the end, run decrements the count of shutdownLatch and prints out the following INFO
message to the logs:
Stopped
123
SocketServer
SocketServer
SocketServer is a NIO socket server.
MemoryPoolAvailable
MemoryPoolUsed
124
SocketServer
Table 2. SocketServer’s Internal Properties (e.g. Registries and Counters) (in alphabetical
order)
Name Description
acceptors Acceptor threads per EndPoint
connectionQuotas ConnectionQuotas
maxQueuedRequests
maxConnectionsPerIp
maxConnectionsPerIpOverrides
memoryPool
requestChannel
totalProcessorThreads
Total number of processors, i.e.
numProcessorThreads for every endpoint
Caution FIXME
startup(): Unit
For every endpoint (in endpoints registry) startup does the following:
125
SocketServer
4. Starts a non-daemon thread for the Acceptor with the name as kafka-socket-acceptor-
[listenerName]-[securityProtocol]-[port] (e.g. kafka-socket-acceptor-
In the end, startup prints out the following INFO message to the logs:
KafkaConfig
Metrics
Time
CredentialProvider
126
TopicDeletionManager
TopicDeletionManager
TopicDeletionManager is…FIXME
topicsIneligibleForDeletion
The names of the topics that must not be deleted (i.e.
are ineligible for deletion)
Refer to Logging.
127
TopicDeletionManager
Caution FIXME
enqueueTopicsForDeletion Method
Caution FIXME
failReplicaDeletion Method
Caution FIXME
KafkaController
ControllerEventManager
markTopicIneligibleForDeletion Method
If there are any topics in the intersection, markTopicIneligibleForDeletion prints out the
following INFO message to the logs:
128
TopicDeletionManager
KafkaController initiateReassignReplicasForTopicPartition
reset(): Unit
(only with delete.topic.enable Kafka property enabled) reset removes all elements from the
following internal registries:
topicsToBeDeleted
partitionsToBeDeleted
topicsIneligibleForDeletion
129
TransactionCoordinator
TransactionCoordinator
TransactionCoordinator is…FIXME
startup Method
startup
startup …FIXME
TransactionCoordinator
130
TransactionStateManager
TransactionStateManager
TransactionStateManager is…FIXME
getTransactionTopicPartitionCount Method
getTransactionTopicPartitionCount
getTransactionTopicPartitionCount …FIXME
131
ZkUtils
ZkUtils
ZkUtils is…FIXME
Table 2. ZkUtils’s Internal Properties (e.g. Registries and Counters) (in alphabetical order)
Name Description
persistentZkPaths
zkPath
getCluster Method
getCluster(): Cluster
getCluster gets the children znodes of /brokers/ids znode and reads their data (as a
JSON blob).
132
ZkUtils
getCluster then adds creates a Broker from the znode id and the JSON blob (with a host,
deletePathRecursive Method
Caution FIXME
deletePath Method
Caution FIXME
apply(
zkUrl: String,
sessionTimeout: Int,
connectionTimeout: Int,
isZkSecurityEnabled: Boolean): ZkUtils
apply …FIXME
2. FIXME
133
ZkUtils
2. FIXME
subscribeChildChanges …FIXME
and childListener .
and dataListener .
registerBrokerInZk Method
134
ZkUtils
registerBrokerInZk(
id: Int,
host: String,
port: Int,
advertisedEndpoints: Seq[EndPoint],
jmxPort: Int,
rack: Option[String],
apiVersion: ApiVersion): Unit
registerBrokerInZk …FIXME
getTopicPartitionCount Method
getTopicPartitionCount …FIXME
"version":1
"brokerid":[brokerId]
"timestamp":[timestamp]
import kafka.utils._
scala> ZkUtils.controllerZkData(1, System.currentTimeMillis())
res0: String = {"version":1,"brokerid":1,"timestamp":"1506161225262"}
135
ZkUtils
ZkClient
ZkConnection
isSecure flag
readDataMaybeNull returns None (for Option[String] ) when path znode is not available.
136
ZKRebalancerListener
ZKRebalancerListener
ZKRebalancerListener is…FIXME
syncedRebalance Method
syncedRebalance(): Unit
syncedRebalance …FIXME
137
Topic Replication
Topic Replication
Topic Replication is the process to offer fail-over capability for a topic.
./bin/kafka-topics.sh --create \
--topic my-topic \
--replication-factor 1 \ // <-- define replication factor
--partitions 1 \
--zookeeper localhost:2181
Producers always send requests to the broker that is the current leader replica for a topic
partition.
Data from producers is first saved to a commit log before consumers can find out that it is
available. It will only be visible to consumers when the followers acknowledge that they have
got the data and stored in their local logs.
138
Topic Deletion
Topic Deletion
Topic Deletion is a feature of Kafka that allows for deleting topics.
$ ./bin/kafka-server-start.sh config/server.properties \
--override delete.topic.enable=true \
--override broker.id=100 \
--override log.dirs=/tmp/kafka-logs-100 \
--override port=9192
Note that the broker 100 is the leader for remove-me topic.
Stop the broker 100 and start another with broker ID 200 .
139
Topic Deletion
$ ./bin/kafka-server-start.sh config/server.properties \
--override delete.topic.enable=true \
--override broker.id=200 \
--override log.dirs=/tmp/kafka-logs-200 \
--override port=9292
As you may have noticed, kafka-topics.sh --delete will only delete a topic if the topic’s
leader broker is available (and can acknowledge the removal). Since the broker 100 is down
and currently unavailable the topic deletion has only been recorded in Zookeeper.
As long as the leader broker 100 is not available, the topic to be deleted remains marked
for deletion.
$ ./bin/kafka-server-start.sh config/server.properties \
--override delete.topic.enable=true \
--override broker.id=100 \
--override log.dirs=/tmp/kafka-logs-100 \
--override port=9192
With kafka.controller.KafkaController logger at DEBUG level, you should see the following
messages in the logs:
140
Topic Deletion
DEBUG [Controller id=100] Delete topics listener fired for topics remove-me to be dele
ted (kafka.controller.KafkaController)
INFO [Controller id=100] Starting topic deletion for topics remove-me (kafka.controlle
r.KafkaController)
INFO [GroupMetadataManager brokerId=100] Removed 0 expired offsets in 0 milliseconds.
(kafka.coordinator.group.GroupMetadataManager)
DEBUG [Controller id=100] Removing replica 100 from ISR 100 for partition remove-me-0.
(kafka.controller.KafkaController)
INFO [Controller id=100] Retaining last ISR 100 of partition remove-me-0 since unclean
leader election is disabled (kafka.controller.KafkaController)
INFO [Controller id=100] New leader and ISR for partition remove-me-0 is {"leader":-1,
"leader_epoch":1,"isr":[100]} (kafka.controller.KafkaController)
INFO [ReplicaFetcherManager on broker 100] Removed fetcher for partitions remove-me-0
(kafka.server.ReplicaFetcherManager)
INFO [ReplicaFetcherManager on broker 100] Removed fetcher for partitions (kafka.serv
er.ReplicaFetcherManager)
INFO [ReplicaFetcherManager on broker 100] Removed fetcher for partitions remove-me-0
(kafka.server.ReplicaFetcherManager)
INFO Log for partition remove-me-0 is renamed to /tmp/kafka-logs-100/remove-me-0.fe6d0
39ff884498b9d6113fb22a75264-delete and is scheduled for deletion (kafka.log.LogManager
)
DEBUG [Controller id=100] Delete topic callback invoked for org.apache.kafka.common.re
quests.StopReplicaResponse@8c0f4f0 (kafka.controller.KafkaController)
INFO [Controller id=100] New topics: [Set()], deleted topics: [Set()], new partition r
eplica assignment [Map()] (kafka.controller.KafkaController)
DEBUG [Controller id=100] Delete topics listener fired for topics to be deleted (kafk
a.controller.KafkaController)
The topic is now deleted. Use Zookeeper CLI tool to confirm it.
141
Kafka Controller Election
$ ./bin/zookeeper-server-start.sh config/zookeeper.properties
...
INFO binding to port 0.0.0.0/0.0.0.0:2181 (org.apache.zookeeper.server.NIOServerCnxnFa
ctory)
Add the following line to config/log4j.properties to enable DEBUG logging level for
kafka.controller.KafkaController logger.
log4j.logger.kafka.controller.KafkaController=DEBUG, stdout
$ ./bin/kafka-server-start.sh config/server.properties \
--override broker.id=100 \
--override log.dirs=/tmp/kafka-logs-100 \
--override port=9192
...
INFO Registered broker 100 at path /brokers/ids/100 with addresses: EndPoint(192.168.1
.4,9192,ListenerName(PLAINTEXT),PLAINTEXT) (kafka.utils.ZkUtils)
INFO Kafka version : 1.0.0-SNAPSHOT (org.apache.kafka.common.utils.AppInfoParser)
INFO Kafka commitId : 852297efd99af04d (org.apache.kafka.common.utils.AppInfoParser)
INFO [KafkaServer id=100] started (kafka.server.KafkaServer)
142
Kafka Controller Election
Connect to Zookeeper using Zookeeper CLI (command-line interface). Use the official
distribution of Apache Zookeeper as described in Zookeeper Tips.
Once connected, execute get /controller to get the data associated with /controller
znode where the active Kafka controller stores the controller ID.
(optional) Clear the consoles of the two Kafka brokers so you have the election logs only.
You should see the following in the logs in the consoles of the two Kafka brokers.
and
143
Kafka Controller Election
144
KafkaProducer — Main Class For Kafka Producers
Figure 1. KafkaProducer
KafkaProducer is a part of the public API and is created with properties and (key and value)
serializers as configuration.
145
KafkaProducer — Main Class For Kafka Producers
Metadata
Created when KafkaProducer is created with the
following properties:
retry.backoff.ms for refreshBackoffMs
metadata.max.age.ms for metadataExpireMs
metadata
allowAutoTopicCreation flag enabled
topicExpiryEnabled flag enabled
Updated with a bootstrap cluster when
KafkaProducer is created
Used in waitOnMetadata
Refer to Logging.
146
KafkaProducer — Main Class For Kafka Producers
send …FIXME
doSend …FIXME
FIXME
Configuring ClusterResourceListeners
— configureClusterResourceListeners Internal
Method
ClusterResourceListeners configureClusterResourceListeners(
Serializer<K> keySerializer,
Serializer<V> valueSerializer,
List<?>... candidateLists)
valueSerializer .
147
KafkaProducer — Main Class For Kafka Producers
partitionsFor waits on cluster metadata for the input topic and max.block.ms time. Once
ClusterAndWaitTime waitOnMetadata(
String topic,
Integer partition,
long maxWaitMs) throws InterruptedException
waitOnMetadata first checks if the available cluster metadata could be current enough.
waitOnMetadata requests Metadata for the current cluster information and then requests the
If the cluster metadata is not current enough (i.e. the number of partitions is unavailable or
the partition is above the current count), waitOnMetadata prints out the following TRACE
message to the logs:
waitOnMetadata requests Metadata for update and requests Sender to wake up.
waitOnMetadata then requests Metadata to wait for a metadata update and then Metadata
waitOnMetadata keeps doing it until the number of partitions of the input topic is available.
148
KafkaProducer — Main Class For Kafka Producers
unauthorized.
available partitions.
Invalid partition given with record: [partition] is not in the range [0...[partitionsC
ount]).
149
Producer
Producer
Producer is…FIXME
150
DefaultPartitioner
DefaultPartitioner
DefaultPartitioner is…FIXME
151
Partitioner
Partitioner
Partitioner is…FIXME
152
ProducerInterceptor
ProducerInterceptor
ProducerInterceptor is…FIXME
153
Sender
Sender
Sender is thread of execution that handles the sending of produce requests to a Kafka
cluster.
KafkaProducer is created.
sendProduceRequests …FIXME
void sendProduceRequest(
long now,
int destination,
short acks,
int timeout,
List<ProducerBatch> batches)
sendProduceRequest …FIXME
154
Sender
sendProducerData …FIXME
run …FIXME
Note run is used exclusively when Sender is started (as a thread of execution).
void run()
Note run is a part of java.lang.Runnable that is executed when the thread is started.
run first prints out the following DEBUG message to the logs:
run keeps running (with the current time in milliseconds) until running flag is turned off.
run …FIXME
LogContext
155
Sender
KafkaClient
Metadata
RecordAccumulator
guaranteeMessageOrder flag
maxRequestSize
acks
SenderMetricsRegistry
Time
requestTimeout
retryBackoffMs
TransactionManager
ApiVersions
156
Serializer
Serializer
Serializer is…FIXME
157
KafkaConsumer — Main Class For Kafka Consumers
Figure 1. KafkaConsumer
KafkaConsumer is a part of the public API and is created with properties and (key and value)
deserializers as configuration.
158
KafkaConsumer — Main Class For Kafka Consumers
// sandbox/kafka-sandbox
val bootstrapServers = "localhost:9092"
val groupId = "kafka-sandbox"
import org.apache.kafka.clients.consumer.ConsumerConfig
val configs: Map[String, Object] = Map(
// required properties
ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG -> bootstrapServers,
ConsumerConfig.GROUP_ID_CONFIG -> groupId
)
import org.apache.kafka.common.serialization.StringDeserializer
val keyDeserializer = new StringDeserializer
val valueDeserializer = new StringDeserializer
import scala.collection.JavaConverters._
import org.apache.kafka.clients.consumer.KafkaConsumer
val consumer = new KafkaConsumer[String, String](
configs.asJava,
keyDeserializer,
valueDeserializer)
159
KafkaConsumer — Main Class For Kafka Consumers
ConsumerNetworkClient
Used mainly (?) to create the Fetcher and
client
ConsumerCoordinator
Used also in poll, pollOnce and wakeup (but I think the
usage should be limited to create Fetcher and
ConsumerCoordinator)
clientId
ConsumerCoordinator
coordinator Initialized when KafkaConsumer is created
Used for…FIXME
Fetcher
fetcher Created right when KafkaConsumer is created.
Used when…FIXME
Metadata
metadata Created right when KafkaConsumer is created.
Used when…FIXME
metrics Metrics
160
KafkaConsumer — Main Class For Kafka Consumers
Refer to Logging.
assign …FIXME
unsubscribe Method
void unsubscribe()
unsubscribe …FIXME
161
KafkaConsumer — Main Class For Kafka Consumers
import scala.collection.JavaConverters._
consumer.subscribe(topics.asJava)
Internally, subscribe prints out the following DEBUG message to the logs:
subscribe then requests SubscriptionState to subscribe for the topics and listener .
In the end, subscribe requests SubscriptionState for groupSubscription that it then passes
along to Metadata to set the topics to track.
162
KafkaConsumer — Main Class For Kafka Consumers
val seconds = 10
while (true) {
println(s"Polling for records for $seconds secs")
val records = consumer.poll(seconds * 1000)
// do something with the records here
}
If there are records available, poll checks Fetcher for sendFetches and
ConsumerNetworkClient for pendingRequestCount flag. If either is positive, poll requests
ConsumerNetworkClient to pollNoWakeup.
FIXME Make the above more user-friendly, e.g. when could interceptors
Caution
be empty?
commitSync Method
void commitSync()
commitSync …FIXME
seek Method
163
KafkaConsumer — Main Class For Kafka Consumers
seek …FIXME
Caution FIXME
endOffsets Method
Caution FIXME
offsetsForTimes Method
Caution FIXME
updateFetchPositions Method
Caution FIXME
pollOnce …FIXME
Internally, listTopics simply requests Fetcher for metadata for all topics and returns it.
164
KafkaConsumer — Main Class For Kafka Consumers
beginningOffsets Method
KafkaConsumer API offers other constructors that in the end use the public 3-
Note argument constructor that in turn passes the call on to the private internal
constructor.
// Public API
KafkaConsumer(
Map<String, Object> configs,
Deserializer<K> keyDeserializer,
Deserializer<V> valueDeserializer)
When created, KafkaConsumer adds the keyDeserializer and valueDeserializer to configs (as
key.deserializer and value.deserializer properties respectively) and creates a
ConsumerConfig.
165
KafkaConsumer — Main Class For Kafka Consumers
KafkaConsumer(
ConsumerConfig config,
Deserializer<K> keyDeserializer,
Deserializer<V> valueDeserializer)
When called, the internal KafkaConsumer constructor prints out the following DEBUG
message to the logs:
KafkaConsumer sets the internal clientId to client.id or generates one with prefix consumer-
KafkaConsumer sets the internal Metrics (and JmxReporter with kafka.consumer prefix).
1. retryBackoffMs
2. metadata.max.age.ms
3. allowAutoTopicCreation enabled
4. topicExpiryEnabled disabled
fetch.min.bytes
166
KafkaConsumer — Main Class For Kafka Consumers
fetch.max.bytes
fetch.max.wait.ms
max.partition.fetch.bytes
max.poll.records
check.crcs
In the end, KafkaConsumer prints out the following DEBUG message to the logs:
wakeup Method
void wakeup()
167
KafkaConsumer — Main Class For Kafka Consumers
Note Causes the first selection operation that has not yet returned to return
immediately.
Read about Selection in java.nio.channels.Selector's javadoc.
Configuring ClusterResourceListeners
— configureClusterResourceListeners Internal
Method
ClusterResourceListeners configureClusterResourceListeners(
Deserializer<K> keyDeserializer,
Deserializer<V> valueDeserializer,
List<?>... candidateLists)
valueDeserializer .
168
Consumer
Consumer
Consumer is the contract for Kafka consumers.
KafkaConsumer is the main public class that Kafka developers use to write
Note
Kafka consumers.
Consumer Contract
169
Consumer
package org.apache.kafka.clients.consumer;
void unsubscribe();
void commitSync();
void commitSync(Map<TopicPartition, OffsetAndMetadata> offsets);
void commitAsync();
void commitAsync(OffsetCommitCallback callback);
void commitAsync(Map<TopicPartition, OffsetAndMetadata> offsets, OffsetCommitCallbac
k callback);
void close();
void close(long timeout, TimeUnit unit);
void wakeup();
}
170
Consumer
commitSync
seek
subscribe
unsubscribe
wakeup
171
Deserializer
Deserializer
Caution FIXME
172
ConsumerConfig
ConsumerConfig
Caution FIXME
173
ConsumerCoordinator
ConsumerCoordinator
ConsumerCoordinator is a concrete AbstractCoordinator that…FIXME
ConsumerCoordinator .
session.timeout.ms sessionTimeoutMs
heartbeat.interval.ms heartbeatIntervalMs
retry.backoff.ms retryBackoffMs
enable.auto.commit autoCommitEnabled
auto.commit.interval.ms autoCommitIntervalMs
exclude.internal.topics excludeInternalTopics
internal.leave.group.on.close leaveGroupOnClose
174
ConsumerCoordinator
maybeAutoCommitOffsetsAsync …FIXME
maybeAutoCommitOffsetsSync …FIXME
void doAutoCommitOffsetsAsync()
doAutoCommitOffsetsAsync …FIXME
close Method
175
ConsumerCoordinator
close …FIXME
commitOffsetsAsync Method
void commitOffsetsAsync(
final Map<TopicPartition,
OffsetAndMetadata> offsets,
final OffsetCommitCallback callback)
commitOffsetsAsync …FIXME
commitOffsetsSync Method
boolean commitOffsetsSync(
Map<TopicPartition,
OffsetAndMetadata> offsets,
long timeoutMs)
commitOffsetsSync …FIXME
refreshCommittedOffsetsIfNeeded Method
void refreshCommittedOffsetsIfNeeded()
refreshCommittedOffsetsIfNeeded …FIXME
onJoinComplete Method
void onJoinComplete(
int generation,
String memberId,
String assignmentStrategy,
ByteBuffer assignmentBuffer)
176
ConsumerCoordinator
onJoinComplete …FIXME
onJoinPrepare Method
onJoinPrepare …FIXME
performAssignment Method
performAssignment …FIXME
maybeLeaveGroup Method
maybeLeaveGroup …FIXME
updatePatternSubscription Method
updatePatternSubscription …FIXME
177
ConsumerCoordinator
needRejoin Method
boolean needRejoin()
needRejoin …FIXME
timeToNextPoll Method
timeToNextPoll …FIXME
poll …FIXME
void addMetadataListener()
addMetadataListener …FIXME
fetchCommittedOffsets Method
fetchCommittedOffsets …FIXME
178
ConsumerCoordinator
LogContext
ConsumerNetworkClient
Group ID
rebalanceTimeoutMs
sessionTimeoutMs
heartbeatIntervalMs
Collection of PartitionAssignors
Metadata
SubscriptionState
Metrics
Time
retryBackoffMs
autoCommitEnabled flag
autoCommitIntervalMs
ConsumerInterceptors
excludeInternalTopics flag
leaveGroupOnClose flag
179
AbstractCoordinator
AbstractCoordinator
AbstractCoordinator is a contract for…FIXME
package org.apache.kafka.clients.consumer.internals;
onJoinPrepare
Used exclusively when AbstractCoordinator is requested
to joinGroupIfNeeded (and needsJoinPrepare flag is on)
performAssignment
needsJoinPrepare
Controls when to execute onJoinPrepare while
performing joinGroupIfNeeded
180
AbstractCoordinator
LogContext
ConsumerNetworkClient
Group ID
rebalanceTimeoutMs
sessionTimeoutMs
heartbeatIntervalMs
Metrics
Time
retryBackoffMs
leaveGroupOnClose flag
joinGroupIfNeeded Method
void joinGroupIfNeeded()
joinGroupIfNeeded …FIXME
initiateJoinGroup …FIXME
181
AbstractCoordinator
ensureActiveGroup Method
void ensureActiveGroup()
ensureActiveGroup …FIXME
182
ConsumerInterceptor
ConsumerInterceptor
Example
package pl.jaceklaskowski.kafka
import java.util
onConsume Method
Caution FIXME
183
ConsumerNetworkClient
ConsumerNetworkClient
ConsumerNetworkClient is a high-level Kafka consumer that…FIXME
Refer to Logging.
checkDisconnects …FIXME
184
ConsumerNetworkClient
send records the new ClientRequest with the node in unsent internal registry.
wakeup Method
void wakeup()
wakeup turns the internal wakeup flag on and requests KafkaClient to wakeup
void ensureFreshMetadata()
ensureFreshMetadata waits for metadata update when Metadata was requested for update
pendingRequestCount Method
Caution FIXME
leastLoadedNode Method
Caution FIXME
poll Method
185
ConsumerNetworkClient
poll …FIXME
awaitMetadataUpdate …FIXME
awaitPendingRequests Method
Caution FIXME
pollNoWakeup Method
void pollNoWakeup()
pollNoWakeup …FIXME
186
ConsumerNetworkClient
KafkaConsumer polls
LogContext
KafkaClient
Metadata
Time
retryBackoffMs
requestTimeoutMs
187
ConsumerRebalanceListener
ConsumerRebalanceListener
ConsumerRebalanceListener is a callback interface to be notified when the set of partitions
package org.apache.kafka.clients.consumer;
interface ConsumerRebalanceListener {
void onPartitionsAssigned(Collection<TopicPartition> partitions);
void onPartitionsRevoked(Collection<TopicPartition> partitions);
}
topics.
onPartitionsAssigned
Used exclusively when ConsumerCoordinator is requested
to onJoinComplete
onPartitionsRevoked
Used exclusively when ConsumerCoordinator is requested
to onJoinPrepare
188
SubscriptionState
SubscriptionState
SubscriptionState allows for tracking the topics, partitions, and offsets assigned to a Kafka
consumer.
subscribe Method
subscribe …FIXME
189
Broker Nodes — Kafka Servers
Given topics are always partitioned across brokers in a cluster a single broker
Note hosts topic partitions of one or more topics actually (even when a topic is only
partitioned to just a single partition).
A broker’s prime responsibility is to bring sellers and buyers together and thus a broker
is the third-person facilitator between a buyer and a seller.
A Kafka broker receives messages from producers and stores them on disk keyed by unique
offset.
A Kafka broker allows consumers to fetch messages by topic, partition and offset.
Kafka brokers can create a Kafka cluster by sharing information between each other directly
or indirectly using Zookeeper.
A Kafka cluster has exactly one broker that acts as the Controller.
./bin/zookeeper-server-start.sh config/zookeeper.properties
Only when Zookeeper is up and running you can start a Kafka server (that will connect to
Zookeeper).
./bin/kafka-server-start.sh config/server.properties
190
Broker Nodes — Kafka Servers
kafka-server-start.sh script
kafka-server-start.sh starts a Kafka broker.
$ ./bin/kafka-server-start.sh
USAGE: ./bin/kafka-server-start.sh [-daemon] server.properties [--override property=va
lue]*
KAFKA_LOG4J_OPTS="-Dlog4j.configuration=file:config/log4j.properties"
Command-line options:
4. --override property=value — value that should override the value set for property in
server.properties file.
191
Broker
Broker
Broker represents a Kafka broker that has an id, a host, a port, communication endpoints
createBroker Method
createBroker …FIXME
192
Topics
Topics
Topics are virtual groups of one or many partitions across Kafka brokers in a Kafka cluster.
A single Kafka broker stores messages in a partition in an ordered fashion, i.e. appends
them one message after another and creates a log file.
Producers write messages to the tail of these logs that consumers read at their own pace.
Kafka scales topic consumption by distributing partitions among a consumer group, which is
a set of consumers sharing a common group identifier.
Partitions
Partitions with messages — topics can be partitioned to improve read/write performance
and resiliency. You can lay out a topic (as partitions) across a cluster of machines to allow
data streams larger than the capability of a single machine. Partitions are log files on disk
with sequential write only. Kafka guarantees message ordering in a partition.
The log end offset is the offset of the last message written to a log.
The high watermark offset is the offset of the last message that was successfully copied to
all of the log’s replicas.
A consumer can only read up to the high watermark offset to prevent reading
Note
unreplicated messages.
193
Messages
Messages
Messages are the data that brokers store in the partitions of a topic.
Messages are sequentially appended to the end of the partition log file and numbered by
unique offsets. They are persisted on disk (aka disk-based persistence) and replicated within
the cluster to prevent data loss. It has an in-memory page cache to improve data reads.
Messages are in partitions until deleted when TTL occurs or after compaction.
Offsets
Offsets are message positions in a topic.
194
Kafka Clients
Kafka Clients
Producers
Consumers
195
Producers
Producers
Multiple concurrent producers that send (aka push) messages to topics which is appending
the messages to the end of partitions. They can batch messages before they are sent over
the wire to a topic. Producers support message compression. Producers can send
messages in synchronous (with acknowledgement) or asynchronous mode.
import collection.JavaConversions._
import org.apache.kafka.common.serialization._
import org.apache.kafka.clients.producer.KafkaProducer
import org.apache.kafka.clients.producer.ProducerRecord
scala> f.get
res7: org.apache.kafka.clients.producer.RecordMetadata = my-topic-0@1
producer.close
196
Consumers
Kafka Consumers
Multiple concurrent consumers read (aka pull) messages from topics however they want
using offsets. Unlike typical messaging systems, Kafka consumers pull messages from a
topic using offsets.
Kafka 0.9.0.0 was about introducing a brand new Consumer API aka New
Note
Consumer.
When a consumer is created, it requires bootstrap.servers which is the initial list of brokers
to discover the full set of alive brokers in a cluster from.
A consumer has to subscribe to the topics it wants to read messages from called topic
subscription.
Consumer Contract
197
Consumers
Topic Subscription
Topic Subscription is the process of announcing the topics a consumer wants to read
messages from.
subscribe method is not incremental and you always must include the full list
Note
of topics that you want to consume from.
You can change the set of topics a consumer is subscrib to at any time and (given the note
above) any topics previously subscribed to will be replaced by the new list after subscribe .
Caution FIXME
Consumer Groups
A consumer group is a set of Kafka consumers that share a common link:a set of
consumers sharing a common group identifier#group.id[group identifier].
Caution FIXME
the new consumer uses a group coordination protocol built into Kafka
For each group, one of the brokers is selected as the group coordinator. The
coordinator is responsible for managing the state of the group. Its main job is to mediate
partition assignment when new members arrive, old members depart, and when topic
metadata changes. The act of reassigning partitions is known as rebalancing the group.
When a group is first initialized, the consumers typically begin reading from either the
earliest or latest offset in each partition. The messages in each partition log are then
read sequentially. As the consumer makes progress, it commits the offsets of messages
it has successfully processed.
198
Consumers
When a partition gets reassigned to another consumer in the group, the initial position is
set to the last committed offset. If a consumer suddenly crashed, then the group
member taking over the partition would begin consumption from the last committed
offset (possibly reprocessing messages that the failed consumer would have processed
already but not committed yet).
199
RequestCompletionHandler
RequestCompletionHandler
RequestCompletionHandler is the contract to attach an action that is executed when a
request is complete, i.e. the corresponding response has been received or there was a
disconnection while handling the request.
package org.apache.kafka.clients;
TxnRequestHandler
TransactionMarkerRequestCompletionHandler
200
ClientResponse
ClientResponse
ClientResponse is…FIXME
onComplete Method
void onComplete()
201
Clusters
Clusters
A Kafka cluster is the central data exchange backbone for an organization.
202
kafka-consumer-groups.sh
203
ConsumerGroupCommand
ConsumerGroupCommand
ConsumerGroupCommand is a standalone command-line application that is used for the
following actions:
(only for the old Zookeeper-based consumer API) Deleting consumer group info ( --
delete option)
main Method
ConsumerGroupCommandOptions ).
--list option
Caution FIXME
204
ConsumerGroupCommand
list Option
main simply request groups from the consumer group service (e.g.
KafkaConsumerGroupService for the new consumer API) and prints them out to the console.
205
KafkaConsumerGroupService
KafkaConsumerGroupService
KafkaConsumerGroupService is a ConsumerGroupService that ConsumerGroupCommand
use the new Java consumer API (and hence do not use Zookeeper to store information).
listGroups(): List[String]
listGroups requests AdminClient for all consumer groups and takes their group ids.
describeGroup Method
Caution FIXME
resetOffsets Method
Caution FIXME
ConsumerGroupCommandOptions
206
KafkaConsumerGroupService
prepareOffsetsToReset Method
FIXME
prepareOffsetsToReset …FIXME
getPartitionsToReset Method
FIXME
getPartitionsToReset …FIXME
collectGroupAssignment Method
FIXME
collectGroupAssignment …FIXME
207
ConsumerGroupService
ConsumerGroupService
ConsumerGroupService is…FIXME
208
KafkaAdminClient
KafkaAdminClient
KafkaAdminClient is a AdminClient that…FIXME
describeTopics Method
describeTopics …FIXME
alterReplicaLogDirs Method
AlterReplicaLogDirsResult alterReplicaLogDirs(
Map<TopicPartitionReplica, String> replicaAssignment,
final AlterReplicaLogDirsOptions options)
alterReplicaLogDirs …FIXME
AdminClientConfig
client ID
sanitized client ID
Time
Metadata
Metrics
KafkaClient
209
KafkaAdminClient
TimeoutProcessorFactory
LogContext
210
AdminClient
AdminClient
AdminClient …FIXME
ConsumerNetworkClient).
Refer to Logging.
create …FIXME
send(
target: Node,
api: ApiKeys,
request: AbstractRequest.Builder[_ <: AbstractRequest]): AbstractResponse
211
AdminClient
send waits until the future result has come after which it is removed from pendingFutures
registry.
When the future result has completed, send takes the response body for a successful
result or reports a RuntimeException .
findAllBrokers Method
findAllBrokers(): List[Node]
findAllBrokers creates a Metadata API request and sends it to one of the bootstrap
brokers.
findAllBrokers returns the nodes from the cluster metadata of the MetadataResponse .
212
AdminClient
findCoordinator Method
FIXME
findCoordinator …FIXME
deleteRecordsBefore Method
FIXME
deleteRecordsBefore …FIXME
listGroups Method
FIXME
listGroups …FIXME
listAllBrokerVersionInfo Method
FIXME
213
AdminClient
listAllBrokerVersionInfo …FIXME
awaitBrokers Method
FIXME
awaitBrokers …FIXME
listAllConsumerGroups Method
FIXME
listAllConsumerGroups …FIXME
listAllGroups finds all brokers (in a cluster) and collects their groups.
listAllGroupsFlattened Method
listAllGroupsFlattened(): List[GroupOverview]
214
AdminClient
listAllConsumerGroupsFlattened(): List[GroupOverview]
listGroupOffsets Method
FIXME
listGroupOffsets …FIXME
215
ReassignPartitionsCommand
ReassignPartitionsCommand
ReassignPartitionsCommand is…FIXME
main …FIXME
executeAssignment Method
executeAssignment(
zkUtils: ZkUtils,
adminClientOpt: Option[AdminClient],
opts: ReassignPartitionsCommand.ReassignPartitionsCommandOptions): Unit
executeAssignment(
zkUtils: ZkUtils,
adminClientOpt: Option[AdminClient],
reassignmentJsonString: String,
throttle: ReassignPartitionsCommand.Throttle,
timeoutMs: Long = 10000L): Unit
executeAssignment …FIXME
reassignPartitions Method
reassignPartitions …FIXME
alterReplicaLogDirsIgnoreReplicaNotAvailable
216
ReassignPartitionsCommand
alterReplicaLogDirsIgnoreReplicaNotAvailable
Internal Method
alterReplicaLogDirsIgnoreReplicaNotAvailable(
replicaAssignment: Map[TopicPartitionReplica, String],
adminClient: JAdminClient,
timeoutMs: Long): Set[TopicPartitionReplica]
alterReplicaLogDirsIgnoreReplicaNotAvailable …FIXME
217
TopicCommand
TopicCommand
kafka.admin.TopicCommand
218
Sensor
Sensor
Sensor is…FIXME
219
MetricsReporter
MetricsReporter
JmxReporter
JmxReporter is a metrics reporter that is always included in metric.reporters setting with
220
ProducerMetrics
ProducerMetrics
ProducerMetrics is…FIXME
221
SenderMetrics
SenderMetrics
SenderMetrics is…FIXME
222
Kafka Tools
Kafka Tools
ConsoleProducer
kafka.tools.ConsoleProducer
ConsoleConsumer
kafka.tools.ConsoleConsumer
223
kafka-configs.sh
./bin/kafka-configs.sh \
--zookeeper localhost:2181 \
--alter \
--entity-type topics \
--entity-name test \
--add-config retention.ms=5000
224
kafka-topics.sh
225
Properties
Kafka Properties
Table 1. Properties
Default
Name Importance
Value
auto.commit.interval.ms
How often (in milliseconds) consum
enabled
authorizer.class.name
A comma-separated list of
bootstrap.servers (empty) Yes
cluster, e.g. localhost:9092
broker.rack
(random-
client.id
generated)
226
Properties
heartbeat.interval.ms
The expected time between heartb
management facilities.
inter.broker.protocol.version
Comma-separated list of
interceptor.classes (empty)
props.put(ConsumerConfig
max.block.ms
227
Properties
Use ConsumerConfig.MAX_POLL_RECOR
Internally, max.poll.records
metadata.max.age.ms
receive.buffer.bytes
The hint about the size of the TCP
reading data. If the value is -1, the
replica.lag.time.max.ms
replica.socket.timeout.ms
228
Properties
Use ConsumerConfig.RETRY_BACKOFF_
sasl.enabled.mechanisms
send.buffer.bytes
The hint about the size of the TCP
sending data. If the value is -1, the
import org.apache.kafka.connect.runtime.distributed.DistributedConfig
DistributedConfig.SESSION_TIMEOUT_MS_CONFIG
229
bootstrap.servers
bootstrap.servers Property
bootstrap.servers is a comma-separated list of host and port pairs that are the addresses
of the Kafka brokers in a "bootstrap" Kafka cluster that a Kafka client connects to initially to
bootstrap itself.
localhost:9092
localhost:9092,another.host:9092
bootstrap.servers provides the initial hosts that act as the starting point for a Kafka client to
Since these servers are just used for the initial connection to discover the full
cluster membership (which may change dynamically), this list does not have to
Note
contain the full set of servers (you may want more than one, though, in case a
server is down).
Use org.apache.kafka.clients.CommonClientConfigs.BOOTSTRAP_SERVERS_CONFIG
Tip
public value to refer to the property.
230
client.id
client.id Property
An optional identifier of a Kafka consumer (in a consumer group) that is passed to a Kafka
broker with every request.
The sole purpose of this is to be able to track the source of requests beyond just ip and port
by allowing a logical application name to be included in Kafka logs and monitoring
aggregates.
231
enable.auto.commit
enable.auto.commit Property
enable.auto.commit …FIXME
By default, as the consumer reads messages from Kafka, it will periodically commit its
current offset (defined as the offset of the next message to be read) for the partitions it is
reading from back to Kafka. Often you would like more control over exactly when offsets are
committed. In this case you can set enable.auto.commit to false and call the commit
method on the consumer.
232
group.id
group.id Property
group.id specifies the name of the consumer group a Kafka consumer belongs to.
When the Kafka consumer is constructed and group.id does not exist yet (i.e. there are no
existing consumers that are part of the group), the consumer group will be created
automatically.
233
retry.backoff.ms
retry.backoff.ms Property
retry.backoff.ms is the time to wait before attempting to retry a failed request to a given
topic partition.
This avoids repeatedly sending requests in a tight loop under some failure scenarios.
234
Logging
Logging
Kafka Streams uses Apache Log4j 2 for logging service.
log4j.rootLogger=INFO, stdout
log4j.appender.stdout=org.apache.log4j.ConsoleAppender
log4j.appender.stdout.target=System.out
log4j.appender.stdout.layout=org.apache.log4j.PatternLayout
log4j.appender.stdout.layout.ConversionPattern=[%d] %p %m (%c)%n
log4j.logger.org.apache.kafka.streams=DEBUG, stdout
log4j.additivity.org.apache.kafka.streams=false
build.sbt
libraryDependencies += "org.slf4j" % "slf4j-simple" % "1.8.0-alpha2"
Tip Replace slf4j’s simple binding to switch between logging frameworks (e.g. slf4j-
log4j12 for log4j).
build.sbt
val logback = "1.2.3"
libraryDependencies += "ch.qos.logback" % "logback-core" % logback
libraryDependencies += "ch.qos.logback" % "logback-classic" % logback
235
Logging
With logback’s configuration (as described in the above tip) you may see the following message
build.sbt
val logback = "1.2.3"
libraryDependencies += "ch.qos.logback" % "logback-core" % logback
//libraryDependencies += "ch.qos.logback" % "logback-classic" % logback
236
Gradle Tips
Gradle Tips
Building Kafka Distribution
It takes around 2 minutes (after all the dependencies were downloaded once).
237
Zookeeper Tips
Zookeeper Tips
The zookeeper shell shipped with Kafka works with no support for command line history
because jline jar is missing (see KAFKA-2385).
A solution is to use the official distribution of Apache Zookeeper (3.4.10 as of this writing)
from Apache ZooKeeper Releases.
Once downloaded, use ./bin/zkCli.sh to connect to Zookeeper that is used for Kafka.
[zk: localhost:2181(CONNECTED) 0] ls /
[cluster, controller_epoch, controller, brokers, zookeeper, admin, isr_change_notifica
tion, consumers, log_dir_event_notification, latest_producer_id_block, config]
238
Kafka in Scala REPL for Interactive Exploration
The reason for executing console command after sbt has started up is that
Note
command history did not work using the key-up and key-down keys. YMMV
build.sbt
name := "kafka-sandbox"
version := "1.0"
scalaVersion := "2.12.3"
➜ kafka-sandbox sbt
[info] Loading settings from plugins.sbt ...
[info] Loading project definition from /Users/jacek/dev/sandbox/kafka-sandbox/project
[info] Loading settings from build.sbt ...
[info] Set current project to kafka-sandbox (in build file:/Users/jacek/dev/sandbox/ka
fka-sandbox/)
[info] sbt server started at 127.0.0.1:4408
sbt:kafka-sandbox> console
[info] Starting scala interpreter...
Welcome to Scala 2.12.3 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_144).
Type in expressions for evaluation. Or try :help.
239
Kafka in Scala REPL for Interactive Exploration
240
Running Kafka Broker in Docker
You can use Docker Compose to run such an installation where all the components are
dockerized (i.e. run as Docker containers).
There are two projects with the Docker images for the components that seem to have been
trusted the most:
1. wurstmeister/kafka
2. spotify/kafka
Note ches/docker-kafka is another Docker image (that I have not tried myself yet).
wurstmeister/kafka gives separate images for Apache Zookeeper and Apache Kafka while
With the separate images for Apache Zookeeper and Apache Kafka in wurstmeister/kafka
project and a docker-compose.yml configuration for Docker Compose that is a very good
starting point as well as allows for further customizations.
Let’s start a very basic one-broker Kafka cluster using Docker and wurstmeister/kafka
project.
241
Running Kafka Broker in Docker
$ cd kafka-docker
// Edit `docker-compose.yml`
// 1. Change the docker host IP in `KAFKA_ADVERTISED_HOST_NAME`
// e.g. KAFKA_ADVERTISED_HOST_NAME: docker.for.mac.localhost on macOS
// See https://docs.docker.com/docker-for-mac/networking/
// 2. Expose port `9092` of `kafka` service to the host
// i.e. change it to `9092:9092`
// Check the connection from the host to the single Kafka broker
$ nc -vz localhost 9092
found 0 associations
found 1 connections:
1: flags=82<CONNECTED,PREFERRED>
outif lo0
src ::1 port 60643
dst ::1 port 9092
rank info not available
TCP aux info available
242
WorkerGroupMember
WorkerGroupMember
Caution FIXME WorkerCoordinator? DistributedHerder?
243
ConnectDistributed
ConnectDistributed
ConnectDistributed is a command-line utility that runs Kafka Connect in distributed mode.
Caution FIXME Doh, I’d rather not enter Kafka Connect yet. Not interested in it yet.
244
Further reading or watching
Articles
1. Apache Kafka for Beginners - an excellent article that you should start your Kafka
journey with.
245
Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.
Alternative Proxies: