Lab-Kafka Administration VI
Lab-Kafka Administration VI
Lab-Kafka Administration VI
1. Prerequisites: ...................................................................................................................... 3
2. Installation of Kafka – 90 Mins ....................................................................................... 8
3. Installation Confluent Kafka (Local) – 30 Minutes ....................................................... 16
4. Basic Kafka Operations - CLI (Topic) – 30 Mins ........................................................... 17
5. Basic Kafka Operations - CLI (Producer & Consumer) – 30 Mins ............................... 24
6. Zookeeper – 120 Minutes ............................................................................................... 27
7. Kafka cluster – 90 Minutes ............................................................................................ 46
8. Securing Kafka SSL – 90 Minutes .................................................................................. 71
9. Securing Kafka ACL – 60 Minutes ................................................................................. 99
10. Mirroring data between clusters – MirrorMaker V 1 – 90 Minutes ............................ 122
11. Mirroring data between clusters – MirrorMaker V2 – 90 Minutes ............................. 135
12. Kafka Connector ( File & JDBC) - 150 Minutes ............................................................ 145
13. Schema Registry - Manage Schemas for Topics – 30 Minutes .................................... 174
14. Performance & Tuning – 60 Minutes (ND) .................................................................. 179
15. Errors ............................................................................................................................. 192
I. {test=LEADER_NOT_AVAILABLE} (org.apache.kafka.clients.NetworkClient) ....... 192
16. LOG verification + segment Sizing ...............................................................................194
17. Annexure Code: ............................................................................................................. 195
II. DumplogSegment ....................................................................................................... 195
2 Kafka – Administration
Hardware:
8 GB RAM , 30 GB HDD , Centos 7 or above OS. Access to internet.
Software Inventory:
• Zookeeper Version: apache-zookeeper-3.8.0-bin.tar
• Apache kafka : 2.13-3.2.1
• JDK 11.0.16
• Eclipse for Linux. (Any Latest version for JEE Development)
• Status : Color is Verified
1. Prerequisites:
Option I
Start the VM using VM player and Logon to the server using telnet or directly in the VM
console. Enter the root credentials to logon.
Option II.
Using docker:
Instantiate a container, kafka0.
You can copy files from the host to container using docker copy command.
Optional:
#docker run --name kafka0 --hostname kafka0 -p 9092:9092 -p 9081:8081 -p 9082:8082
-p 3181:2181 -p 9990:9999 -p 9021:9021 -i -t --privileged --network spark-net -v
/Volumes/Samsung_T5/software/:/Software -v
/Volumes/Samsung_T5/software/install/:/opt -v
/Volumes/Samsung_T5/software/data/:/data centos:7 /usr/sbin/init
// Don’t change the listeners address if the client also run in the host machine.
advertised.listeners=PLAINTEXT://localhost:9092
5 Kafka – Administration
Install the kafka as done before if required or copy the kafka folder in different name to
persevere the libraries.
Kafka Mirror Maker: (It will execute the Kafka Mirroring process)
#docker run --name kafkam --hostname kafkam -p 6092:9092 -p 6081:8081 -i -t --
privileged --network spark-net -v /Users/henrypotsangbam/Documents/Docker:/opt
centos:7 /usr/sbin/init
Note:
If you are using docker ensure to update the server.properties with the following entries for
accessing the broker from the host machine.
// Changes Begin
listeners=PLAINTEXT://0.0.0.0:9092,PLAINTEXT_HOST://0.0.0.0:8081
advertised.listeners=PLAINTEXT://kafka0:9092,PLAINTEXT_HOST://localhost:8081
listener.security.protocol.map=PLAINTEXT:PLAINTEXT,PLAINTEXT_HOST:PLAINTEXT
// Changes End
use container, kafka0 to connect from Docker. However, use localhost:8081 for connecting
from the Host machine.
kafka_2.13-3.2.1.tgz
jdk-11.0.16_linux-aarch64_bin.tar
apache-zookeeper-3.8.0-bin.tar.gz
8 Kafka – Administration
Installation of Java
#tar -xvf jdk-11.0.16_linux* -C /opt
#cd /opt
#mv jdk* jdk
#vi ~/.bashrc
export JAVA_HOME=/opt/jdk
export PATH=$PATH:$JAVA_HOME/bin
Installing Zookeeper
Option I (Fresh Installation): (We will use this for our lab)
Extract the zookeeper archive file in /opt and rename the installation folder for brevity.
Create a zookeeper configuration file and update with the following values.
#vi /opt/zookeeper/conf/zoo.cfg
tickTime=2000
dataDir=/opt/data/zookeeper
clientPort=2181
Update the zoo.cfg with the above entries. You can save the file using esc+wq!
# /opt/zookeeper/bin/zkServer.sh start
#bin/zookeeper-server-start.sh config/zookeeper.properties
You can now validate that Zookeeper is running correctly in standalone mode by connecting
to the client port and sending the four-letter command srvr:
#yum install telnet
If you are using docker, kindly refer the prerequisite section for setting specific to docker.
#mkdir /opt/scripts
All the common execution scripts will be stored in the above folder.
The following scripts will start a zookeeper along with a broker. Create the following files
and update with the following scripts. It will start the zookeeper and kafka broker using the
mention script.
#cd /opt/scripts
#vi startABroker.sh
############################# Scripts Begin ####################
#!/usr/bin/env bash
14 Kafka – Administration
# Start Zookeeper.
/opt/zookeeper/bin/zkServer.sh start
To Stop Zookeeper create the following script. Don’t include the ---- line.
#vi /opt/scripts/stopZookeeper.sh
Update the following commands in the above script and save it.
--------------------------------------------------------------------
#!/usr/bin/env bash
# Stop Zookeeper.
15 Kafka – Administration
/opt/zookeeper/bin/zkServer.sh stop
echo "Stop zookeeper Successfully"
#!/usr/bin/env bash
rm -fr /opt/kafka/data/kafka-logs/*
rm -fr /opt/kafka/data/zookeeper/*
cp /opt/kafka/config/server.properties_plain /opt/kafka/config/server.properties
------------------------------------------------------------------ -----------------------------------------
Lab Installation completes End here.
16 Kafka – Administration
The purpose of this lab is to demonstrates the basic and most powerful capabilities of Confluent Platform
– Schema Registry.
export CONFLUENT_HOME=/opt/confluent
Set your PATH variable:
# vi ~/.bashrc
export PATH=/opt/confluent/bin:${PATH};
In this lab you will be able to create a topic and perform some operations to understand the
information about topic like partition and replication.
You need to start the broker using startABroker.sh. The script should be in /opt/scripts
folder
#sh startABroker.sh
#jps
Once the Kafka broker is started, we can verify that it is working by performing some
simple operations against the broker; creating a test topic etc.
List only single topic named "test" (prints only topic name)
Describe only single topic named "test" (prints details about the topic)
Create Topics
What does the tool do?
By default, Kafka auto creates topic if "auto.create.topics.enable" is set to true on the server.
This creates a topic with a default number of partitions, replication factor and uses Kafka's
default scheme to do replica assignment. Sometimes, it may be required that we would like
to customize a topic while creating it. This tool helps to create a topic and also specify the
number of partitions, replication factor and replica assignment list for the topic.
As shown above, it generates an error. Since there is only a single broker. It will fix later.
23 Kafka – Administration
In this lab we will send message to the broker and consumer message using the kafka
inbuilt commands.
You need to complete the previous lab before proceeding ahead.
You need to start the broker using startABroker.sh if not done earlier. The script should be
in /opt/scripts folder
#sh startABroker.sh
#jps
25 Kafka – Administration
Sent message to test topic: Open a console to send message to the topic, test. Enter some
text as shown below.
Consume messages from a test topic: As soon as you enter the following script in a separate
terminal, you should be able to consume the messages that we have type in the producer
console.
In this lab, we will perform familiarization of ZK cli features and configure zookeeper
ensemble:
To perform ZooKeeper CLI operations, first start your ZooKeeper server and then,
ZooKeeper client by executing “bin/zkCli.sh”, which is in the Broker installation folder.
/opt/zookeeper/bin
#bin/zkCli.sh
or New CLI
28 Kafka – Administration
As highlighted above, the cli is connected to the zookeeper running on localhost listening at
port 2181.
#ls /
29 Kafka – Administration
#ls -R /
#ls -R /brokers
#get /brokers/ids/0
Here, the Node 0 is own by the broker IP – 192.168.0.111 with the port 9092.
This way you can determine the Node details whenever there are any issues as shown
below:
Using the Kafka tool zookeeper-shell.sh (Broker Installation folder) we can connect to a
ZooKeeper host in our cluster and look at how data is stored.
30 Kafka – Administration
#zookeeper-shell.sh localhost:2181
#ls /brokers/topics
If we look at the path /brokers/topics you should see a list of the topics that you have
created.
You should also be able to see the topic __consumer_offsets. That topic is one that you did
not create, it is in fact a private topic used internal by Kafka itself. This topic stored the
committed offsets for each topic and partition per group id.
The path /controller exists in zookeeper and we are running one command to look at that
current value
31 Kafka – Administration
zookeeper-shell.sh localhost:2181
get /controller
dataDir=/opt/data/zookeeper/1
clientPort=2181
initLimit=5
syncLimit=2
server.1=localhost:2888:3888
server.2=localhost:2889:3889
server.3=localhost:2890:3890
32 Kafka – Administration
dataDir=/opt/data/zookeeper/2
clientPort=2182
initLimit=5
syncLimit=2
server.1=localhost:2888:3888
server.2=localhost:2889:3889
server.3=localhost:2890:3890
To complete your multi-node configuration, you will specify a node ID on each of the
servers. To do this, you will create a myid file on each node. Each file will contain a number
that correlates to the server number assigned in the configuration file.
33 Kafka – Administration
#cd /opt/data/zookeeper/
#mkdir 1 2 3
#vi 1/myid à 1
2/myid à 2
3/myid à 3
To start the servers, you can simply explicitly reference the configuration files:
cd <ZOOKEEPER_HOME>
bin/zkServer.sh start conf/zoo1.cfg
bin/zkServer.sh start conf/zoo2.cfg
bin/zkServer.sh start conf/zoo3.cfg
34 Kafka – Administration
You will now start a ZooKeeper command line client and connect to ZooKeeper on node 1:
#ls /
#cd /opt/zookeeper
#bin/zkServer.sh status conf/zoo1.cfg
# bin/zkServer.sh status conf/zoo2.cfg
# bin/zkServer.sh status conf/zoo3.cfg
log.dirs=/opt/data/kafka-logs1
You can verify from the server.log that the broker is referring to the new property as shown
below: /opt/kafka/logs/server.log
Let us verify the Leader of the zookeeper and Kill the leader process.
http://localhost:8080/commands/leader
or CLI :
or using cli.
server.1=localhost:2888:3888 ( 2181)
server.2=localhost:2889:3889. (2182) - First Leader
server.3=localhost:2890:3890 -(2183) Second Leader
We are going to create a replicated topic and then demonstrate consumer along with broker
failover. We also demonstrate load balancing of Kafka consumers.
We show how, with many groups, Kafka acts like a Publish/Subscribe. But, when we put all
of our consumers in the same group, Kafka will load share the messages to the consumers
in the same group (more like a queue than a topic in a traditional MOM sense).
Next, you need to copy server properties for three brokers (detailed instructions to follow).
Then we will modify these Kafka server properties to add unique Kafka ports, Kafka log
locations, and unique Broker ids. Then we will create three scripts to start these servers up
using these properties, and then start the servers.
Lastly, we create replicated topic and use it to demonstrate Kafka consumer failover, and
Kafka broker failover.
47 Kafka – Administration
Copy server properties file as follows: We will store all server’s configuration in a single
folder config.
$ cd /opt
$ mkdir -p kafka-config/config
$ cp kafka/config/server.properties kafka-config/config/server-0.properties
$ cp kafka/config/server.properties kafka-config/config/server-1.properties
$ cp kafka/config/server.properties kafka-config/config/server-2.properties
48 Kafka – Administration
With your favourite text editor update server-0.properties so that log.dirs is set
to ./logs/kafka-0. Leave the rest of the file the same. Make sure log.dirs is only defined
once.
#vi /opt/kafka-config/config/server-0.properties
broker.id=0
listeners=PLAINTEXT://kafka0:9092
advertised.listeners=PLAINTEXT://kafka0:9092
log.dirs=/opt/data/kafka-logs/kafka-0
49 Kafka – Administration
With your favorite text editor change log.dirs, broker.id and and log.dirs of server-
1.properties as follows.
#vi /opt/kafka-config/config/server-1.properties
broker.id=1
listeners=PLAINTEXT://kafka0:9093
advertised.listeners=PLAINTEXT://kafka0:9093
log.dirs=/opt/data/kafka-logs/kafka-1
With your favorite text editor change log.dirs, broker.id and and log.dirs of server-
2.properties as follows.
#vi /opt/kafka-config/config/server-2.properties
broker.id=2
listeners=PLAINTEXT://kafka0:9094
advertised.listeners=PLAINTEXT://kafka0:9094
log.dirs=/opt/data/kafka-logs/kafka-2
50 Kafka – Administration
The startup scripts will just run kafka-server-start.sh with the corresponding properties file.
#vi /opt/kafka-config/start-1st-server.sh
#!/usr/bin/env bash
## Run Kafka
/opt/kafka/bin/kafka-server-start.sh "/opt/kafka-config/config/server-0.properties"
#vi /opt/kafka-config/start-2nd-server.sh
#!/usr/bin/env bash
## Run Kafka
/opt/kafka/bin/kafka-server-start.sh "/opt/kafka-config/config/server-1.properties"
51 Kafka – Administration
#vi /opt/kafka-config/start-3rd-server.sh
#!/usr/bin/env bash
## Run Kafka
/opt/kafka/bin/kafka-server-start.sh "/opt/kafka-config/config/server-2.properties"
Notice we are passing the Kafka server properties files that we created in the last step.
Now run all three in separate terminals/shells.
$ ./start-2nd-server.sh
$ ./start-3rd-server.sh
Now we will create a replicated topic that the console producers and console consumers can
use.
Notice that the replication factor gets set to 3, and the topic name is my-failsafe-topic, and
like before it has 13 partitions.
Notice that a list of Kafka servers is passed to --bootstrap-server parameter. Only, two of
the three servers get passed that we ran earlier. Even though only one broker is needed, the
54 Kafka – Administration
consumer client will learn about the other broker from just one server. Usually, you list
multiple brokers in case there is an outage so that the client can connect.
Next, we create a script that starts the producer. Then launch the producer with the script
you create.
Notice we start Kafka producer and pass it a list of Kafka Brokers to use via the parameter --
broker-list.
Now use the start producer script to launch the producer as follows.
Producer Console.
56 Kafka – Administration
Consumer Console
Now Start two more consumers in their own terminal window and send more messages
from the producer. (Replace the hostname of your server aka localhost)
Consumer 1.
Consumer 2.
Consumer Console 1st, you should be able to view the messages whatever you type on the
producer console after the new consumer console was started.
Consumer Console 2nd in new Terminal, similarly all the messages that were type on the
producer console should be also display in the second console after it was started.
Consumer Console 2nd in new Terminal
60 Kafka – Administration
Notice that the messages are sent to all of the consumers because each consumer is in a
different consumer group.
Stop the producers and the consumers before, but leave Kafka and ZooKeeper running. You
can use ctrl + c.
We want to put all of the consumers in same consumer group. This way the consumers will
share the messages as each consumer in the consumer group will get its share of partitions.
Notice that the script is the same as before except we added --consumer-property
group.id=mygroup which will put every consumer that runs with this script into
the mygroup consumer group.
Now we just run the producer and three consumers.
Notice that the messages are spread evenly among the consumers.
Notice that each consumer in the group got a share of the messages.
Next, let’s demonstrate consumer failover by killing one of the consumers and sending
seven more messages. Kafka should divide up the work to the consumers that are running.
First, kill the third consumer (CTRL-C in the consumer terminal does the trick).
Notice that the messages are spread evenly among the remaining consumers.
We killed one consumer, sent seven more messages, and saw Kafka spread the load to
remaining consumers. Kafka consumer failover works!
66 Kafka – Administration
Run describe-topics
We are going to lists which broker owns (leader of) which partition, and list replicas and
ISRs of each partition. ISRs are replicas that are up to date. Remember there are 13 topics.
$ kill `ps aux | grep java | grep server-0.properties | tr -s " " | cut -d " " -f2`
68 Kafka – Administration
You can stop the first broker by hitting CTRL-C in the broker terminal or by running the
above command.
Now that the first Kafka broker has stopped, let’s use Kafka topics describe to see that new
leaders were elected!
Notice how Kafka spreads the leadership over the 2nd and 3rd Kafka brokers.
Create consolidated scripts as mention below. It will help you to start zookeeper and kafka
using a single script.
Start 3 Brokers.
#vi /opt/scripts/start3Brokers.sh
#!/usr/bin/env bash
# Start Zookeeper.
/opt/zookeeper/bin/zkServer.sh start
You need to shut down all the brokers before going ahead with this lab.
[hints: jps and kill -9 all kafka/zookeeper processes]
Before going ahead with the SSL configuration let us verify that TLS is configured or not.
#openssl s_client -debug -connect kafka0:9092 -tls1
72 Kafka – Administration
Change the alias i.e kafka0 parameter with that of your hostname
#cd /opt/scripts/cer
#keytool -keystore server.keystore.jks -alias kafka0 -validity 365 -genkey -keyalg RSA
This command will prompt you for a password. After entering and confirming the
password, the next prompt is for the first and last name. This is actually the common name.
Enter kafka0 for our test cluster. (Alternatively, if you are accessing a cluster by a different
hostname, enter that name
73 Kafka – Administration
instead.) Leave other fields blank. At the final prompt, hit y to confirm.
The result will be a server.keystore.jks file deposited into the current directory.
The following command can be run afterwards to verify the contents of the generated
certificate:
#keytool -list -v -keystore server.keystore.jks
Output:
[root@kafka0 cer]# keytool -list -v -keystore server.keystore.jks
Enter keystore password:
Keystore type: PKCS12
Keystore provider: SUN
SHA1: D7:0E:14:58:EB:1D:21:19:C8:CA:C4:B8:35:3A:FF:E2:7B:E2:17:A8
SHA256:
38:89:82:DF:15:7D:11:7F:80:18:AD:85:1D:C5:10:5D:34:8D:D2:74:4B:86:0F:E6:8D:88:72:2E:13:
Signature algorithm name: SHA256withRSA
Subject Public Key Algorithm: 2048-bit RSA key
Version: 3
Extensions:
*******************************************
*******************************************
76 Kafka – Administration
After the first step, each machine in the cluster has a public-private key pair, and a
certificate to identify the machine. The certificate, however, is unsigned, which means that
an attacker can create such a certificate to pretend to be any machine.
Therefore, it is important to prevent forged certificates by signing them for each machine in
the cluster. A certificate authority (CA) is responsible for signing certificates. CA works likes
a government that issues passports—the government stamps (signs) each passport so that
the passport becomes difficult to forge. Other governments verify the stamps to ensure the
passport is authentic. Similarly, the CA signs the certificates, and the cryptography
guarantees that a signed certificate is computationally difficult to forge. Thus, as long as the
CA is a genuine and trusted authority, the clients have high assurance that they are
connecting to the authentic machines.
#openssl req -new -x509 -keyout ca-key -out ca-cert -days 365
The generated CA is simply a public-private key pair and certificate, and it is intended to
sign other certificates. (Enter Phrase: EnjoyKafka)
77 Kafka – Administration
You are required to provide a password, which may differ from the password used in the
previous step. Leave all fields empty with the exception of the Common Name, which
should be set to kafka0.
Output:
[root@kafka0 cer]# openssl req -new -x509 -keyout ca-key -out ca-cert -days 365
Generating a RSA private key
....................+++++
......................................................................................................+++++
writing new private key to 'ca-key'
Enter PEM pass phrase:
Verifying - Enter PEM pass phrase:
-----
You are about to be asked to enter information that will be incorporated
into your certificate request.
What you are about to enter is what is called a Distinguished Name or a DN.
There are quite a few fields but you can leave some blank
For some fields there will be a default value,
If you enter '.', the field will be left blank.
-----
Country Name (2 letter code) [XX]:
State or Province Name (full name) []:
Locality Name (eg, city) [Default City]:
Organization Name (eg, company) [Default Company Ltd]:
Organizational Unit Name (eg, section) []:
78 Kafka – Administration
[root@kafka0 cer]# keytool -keystore client.truststore.jks -alias CARoot -import -file ca-cert
Enter keystore password:
Re-enter new password:
Owner: CN=kafka0, O=Default Company Ltd, L=Default City, C=XX
79 Kafka – Administration
Extensions:
Note: If you configure the Kafka brokers to require client authentication by setting
ssl.client.auth to be "requested" or "required" on the Kafka brokers config then you
must provide a truststore for the Kafka brokers as well and it should have all the CA
certificates that clients' keys were signed by.
Password: kafka213
Output:
[root@kafka0 cer]# keytool -keystore server.truststore.jks -alias CARoot -import -file ca-
cert
Enter keystore password:
Re-enter new password:
Owner: CN=kafka0, O=Default Company Ltd, L=Default City, C=XX
Issuer: CN=kafka0, O=Default Company Ltd, L=Default City, C=XX
Serial number: 24cd3f803451fc94d6a851be5ca7a52ed4e7204
Valid from: Mon Oct 24 11:07:46 IST 2022 until: Tue Oct 24 11:07:46 IST 2023
Certificate fingerprints:
SHA1: 43:AC:2A:34:BF:60:9C:9D:99:1D:C9:56:76:7B:81:56:71:04:66:E0
SHA256:
54:A6:4B:AF:AA:E4:58:D1:5C:12:C3:07:1E:72:97:E8:AC:D1:EF:ED:21:A8:F2:FA:1A:77:8A:
64:DB:94:2E:72
Signature algorithm name: SHA256withRSA
Subject Public Key Algorithm: 2048-bit RSA key
Version: 3
Extensions:
AuthorityKeyIdentifier [
KeyIdentifier [
0000: CD CC 71 10 CF 2A 80 A2 67 96 A0 E1 AF 1A 9A 87 ..q..*..g.......
0010: 67 D1 62 DD g.b.
]
]
In contrast to the keystore in step 1 that stores each machine's own identity, the
truststore of a client stores all the certificates that the client should trust. Importing a
certificate into one's truststore also means trusting all certificates that are signed by
that certificate. As the analogy above, trusting the government (CA) also means
trusting all passports (certificates) that it has issued. This attribute is called the chain
of trust, and it is particularly useful when deploying SSL on a large Kafka cluster. You
can sign all certificates in the cluster with a single CA, and have all machines share the
same truststore that trusts the CA. That way all machines can authenticate all other
machines.
The next step is to sign all certificates generated by step 1 with the CA generated in
step 2. First, you need to export the certificate from the keystore: replace kafka0 with
that of your hostname which you enter before.
Password: kafka213
This produces cert-req, being the signing request. To sign with the CA, run the
following command.
Then sign it with the CA:
#openssl x509 -req -CA ca-cert -CAkey ca-key -in cert-file -out cert-signed -days 365 -CAcreateserial
#
Password: EnjoyKafka
Output:
[root@kafka0 cer]# openssl x509 -req -CA ca-cert -CAkey ca-key -in cert-file -out cert-
signed -days 365 -CAcreateserial
Signature ok
subject=C = Unknown, ST = Unknown, L = Unknown, O = Unknown, OU = Unknown, CN
= kafka0
Getting CA Private Key
Enter pass phrase for ca-key:
The CA certificate must be imported into the server’s keystore under the CARoot alias.
Finally, you need to import both the certificate of the CA and the signed certificate into the
keystore: Accept Yes for all Queries Y/N.
85 Kafka – Administration
Output:
[root@kafka0 cer]# keytool -keystore server.keystore.jks -alias CARoot -import -file ca-cert
Enter keystore password:
Owner: CN=kafka0, O=Default Company Ltd, L=Default City, C=XX
Issuer: CN=kafka0, O=Default Company Ltd, L=Default City, C=XX
Serial number: 24cd3f803451fc94d6a851be5ca7a52ed4e7204
Valid from: Mon Oct 24 11:07:46 IST 2022 until: Tue Oct 24 11:07:46 IST 2023
Certificate fingerprints:
SHA1: 43:AC:2A:34:BF:60:9C:9D:99:1D:C9:56:76:7B:81:56:71:04:66:E0
SHA256:
54:A6:4B:AF:AA:E4:58:D1:5C:12:C3:07:1E:72:97:E8:AC:D1:EF:ED:21:A8:F2:FA:1A:77:8A:64:DB:94:2
Signature algorithm name: SHA256withRSA
Subject Public Key Algorithm: 2048-bit RSA key
Version: 3
Extensions:
86 Kafka – Administration
[root@kafka0 cer]#
Then, import the signed certificate into the server’s keystore under the kafka0 alias.
Till now, we have configured the server key and certificate. Then we can configure the
Broker to use SSL
listeners
If SSL is not enabled for inter-broker communication, both PLAINTEXT and SSL ports will
be necessary. Update the entry as shown below. Substitute with your hostname.
listeners= PLAINTEXT://kafka0:9092,SSL://kafka0:8092
89 Kafka – Administration
We have enable SSL on port no 8092. This broker can be connected using plain and SSL
protocol.
Comment all the following highlighted entries:
90 Kafka – Administration
Following SSL configs are needed on the broker side. Add the following in the
server.properties. You can append at the last.
ssl.keystore.location=/opt/scripts/cer/server.keystore.jks
ssl.keystore.password=kafka213
ssl.key.password=kafka213
ssl.truststore.location=/opt/scripts/cer/server.truststore.jks
ssl.truststore.password=kafka213
ssl.endpoint.identification.algorithm=
ssl.client.auth=required
91 Kafka – Administration
You need to ensure that path to the generated keys are in /opt/scripts/cer/ folder.
Note: ssl.truststore.password is technically optional but highly recommended. If a password
is not set access to the truststore is still available, but integrity checking is disabled.
Once you start broker you should be able to see in the server.log.
In the log,/opt/kafka/logs/server.log, there will be an entry as shown below. i.e SSL port
being configured.
93 Kafka – Administration
You can even verify using the following command to find out the listening port no.
#yum install lsof
#lsof -i:8092
To check quickly if the server keystore and truststore are setup properly you can run the
following command
#openssl s_client -debug -connect kafka0:8092
94 Kafka – Administration
If the certificate does not show up or if there are any other error messages, then your
keystore is not setup properly.
96 Kafka – Administration
Since the client is present on the same broker, you don’t need to generate different
certificate. The existing broker certificate can be used for connecting to the server.
security.protocol=SSL
ssl.truststore.location=/opt/scripts/cer/server.truststore.jks
ssl.truststore.password=kafka213
ssl.keystore.location=/opt/scripts/cer/server.keystore.jks
ssl.keystore.password=kafka213
ssl.key.password=kafka213
ssl.client.auth=required
You should be able to consume the messages with the following commands.
Let us now configure ACL. Before going ahead with the next section, you need to roll back
the server.properties.
98 Kafka – Administration
Steps:
We will perform the following activity in a single node broker only. We will enable ACL in a
single broker.
To run a secure broker, two steps need to be performed.
1. Configure the Kafka brokers with ACL using JAAS file
2. Kafka Clients – Pass the Credentials
First, we need to let the broker know authorized users’ credentials. This will be stored in a
JAAS file.
Add a JAAS configuration file for each Kafka broker. Create a kafka_plain_jaas.conf file as
specified below:
KafkaServer {
org.apache.kafka.common.security.plain.PlainLoginModule required
username="admin"
password="admin"
100 Kafka – Administration
user_admin="admin"
user_alice="alice"
user_bob="bob"
user_charlie="charlie";
};
Let’s understand the content of kafka_plain_jaas.conf file and how Kafka Brokers and
Kafka Clients use it.
KafkaServer Section:
The KafkaServer section defines four users: admin, alice, bob and charlie. The properties
username and password are used by the broker to initiate connections to other brokers. In
this example, admin is the user for inter-broker communication. The set of properties
user_{userName} defines the passwords for all users that connect to the broker and the
broker validates all client connections including those from other brokers using these
properties.
This file needs to be passed in as a JVM config option when running the broker, using -
Djava.security.auth.login.config=[path_to_jaas_file].
101 Kafka – Administration
#vi /opt/scripts/kafkasecurity.sh
#!/usr/bash
export KAFKA_HOME=/opt/kafka
export ZOOKEEPER_HOME=/opt/zookeeper
export KAFKA_PLAIN_PARAMS="-
Djava.security.auth.login.config=/opt/kafka/config/kafka_jaas.conf"
export KAFKA_OPTS="$KAFKA_PLAIN_PARAMS $KAFKA_OPTS"
export PATH="$PATH:$ZOOKEEPER_HOME/bin:$KAFKA_HOME/bin:$JAVA_HOME/bin"
102 Kafka – Administration
listeners=SASL_PLAINTEXT://kafka0:9092
security.inter.broker.protocol=SASL_PLAINTEXT
sasl.mechanism.inter.broker.protocol=PLAIN
sasl.enabled.mechanisms=PLAIN
advertised.listeners=SASL_PLAINTEXT://kafka0:9092
super.users=User:admin
103 Kafka – Administration
Before starting the server, let us clean the zookeeper/kafka log data so that there won’t be
any conflict with configuration of the previous cluster (/opt/data/zookeeper and
(/opt/data/kafka-logs).
All the commands shown below should be executed from the path /opt/scripts.
107 Kafka – Administration
Create a topic:
Before running the Kafka console Producer configure the producer.properties file as shown:
# vi producer.properties
Update with the following entries.
sasl.jaas.config=org.apache.kafka.common.security.plain.PlainLoginModule required \
username="alice" \
password="alice";
security.protocol=SASL_PLAINTEXT
sasl.mechanism=PLAIN
The security configuration still does not give specific permissions to our Kafka users (except
for admin who is a super user). These permissions are defined using the ACL command
(bin/kafka-acls.sh). To verify existing ACLs run:
110 Kafka – Administration
#cd /opt/scripts
#vi admin.properties
Update the following content in it.
// Begin - Exclude it
sasl.jaas.config=org.apache.kafka.common.security.plain.PlainLoginModule required \
username="admin" \
password="admin";
security.protocol=SASL_PLAINTEXT
sasl.mechanism=PLAIN
This returns no ACL definitions. You have handled authentication, but have not provided
any authorization rules to define the users able run specific APIs and access certain Kafka
resources.
111 Kafka – Administration
Now you can try sending message again from the producer console. Type one or two
messages in the producer console.
# cd /opt/scripts
#/opt/kafka/bin/kafka-console-producer.sh --broker-list kafka0:9092 --topic plain-
topic --producer.config producer.properties
Consuming from Topic using bob user, hence grant ACL to bob.
Next, you need to let user bob consume (or fetch) from topic plain-topic using
the Fetch API, as a member of the bob-group consumer group. Bob's ACL for fetching from
topic test is:
Principal bob is Allowed Operation Read From Host * On Topic plain-topic .
or
# /opt/kafka/bin/kafka-acls.sh --bootstrap-server=kafka0:9092 -command-config
admin.properties --add --allow-principal User:bob --operation Read --topic plain-topic
Bob needs a second ACL for committing offsets to group bob-group (using
the OffsetCommit API):
115 Kafka – Administration
By granting these permissions to Bob, he can now consume messages from topic test as a
member of bob-group.
Before running Kafka console consumer configure the consumer.properties file as shown:
# vi consumer.properties
sasl.jaas.config=org.apache.kafka.common.security.plain.PlainLoginModule required \
username="bob" \
password="bob";
security.protocol=SASL_PLAINTEXT
sasl.mechanism=PLAIN
116 Kafka – Administration
Sent a few messages from the producer console, and you will see that messages are being
consume on the console...
117 Kafka – Administration
# vi group.properties
sasl.jaas.config=org.apache.kafka.common.security.plain.PlainLoginModule required \
username="charlie" \
password="charlie";
security.protocol=SASL_PLAINTEXT
sasl.mechanism=PLAIN
Now Charlie is able to get the proper listing of offsets in the group:
$ /opt/kafka/bin/kafka-consumer-groups.sh --bootstrap-server kafka0:9092 --describe --
group bob-group --command-config group.properties
119 Kafka – Administration
or
As you can see above the details of the consumer group, Here the log end offset is showing
as 3/5 that is the current offset are same with that of the end offset. This is an ideal cluster,
where there is no lagging in the production and consumption.
The above ACLs grants enough permissions for this use case to run.
To summarize, configured ACLs execute the following command.
120 Kafka – Administration
Errors:
121 Kafka – Administration
In this lab we will demonstrate the replication of message across the cluster for DR.
As shown in the above picture, there are two cluster; a kafka cluster in (kafka0:9092)DC1
and another kafka cluster in DC2(kafka0:9093)
point the consumer to the source cluster's ZooKeeper, and the producer to the mirror
cluster's ZooKeeper (or use the broker.list parameter).
Create Two instances of kafka Nodes which will act as seperate DC.
§ DC 1 (kafka0:9092) : zookeeper : /source
§ DC 2 (kafka0:9093) : zookeeper : /dest
Start a common zookeeper for DC1 and DC2 with different znode.
Follow the following steps to Start a New Zookeeper for the Mirror Maker
dataDir=/opt/data/zookeeper-km/
clientPort=2181
initLimit=5
syncLimit=2
Use the above zookeeper config file to start the zookeeper instance
#cd /opt/zookeeper
#bin/zkServer.sh start conf/zoom.cfg
124 Kafka – Administration
$ cd /opt
$ mkdir -p kafka-config/config
broker.id=0
listeners=PLAINTEXT://kafka0:9092
advertised.listeners=PLAINTEXT://kafka0:9092
log.dirs=/opt/data/kafka-logs/kafka-p
zookeeper.connect=localhost:2181/source
125 Kafka – Administration
#vi /opt/kafka-config/config/server-s.properties
broker.id=0
listeners=PLAINTEXT://kafka0:9093
advertised.listeners=PLAINTEXT://kafka0:9093
log.dirs=/opt/data/kafka-logs/kafka-s
zookeeper.connect=localhost:2181/dest
/opt/kafka/bin/kafka-server-start.sh "/opt/kafka-config/config/server-s.properties"
126 Kafka – Administration
Let us create the following configuration in the data center, DC2 . Change the directory to
/opt/scripts
#cd /opt/scripts
Create consumer.props
In this example, the file that lists the properties and values for the consumers that will read
messages from the topics in Apache Kafka is named consumer.props. It contains this list:
group.id=cg.1
bootstrap.servers=kafka0:9092
shallow.iterator.enable=false
The file that lists the properties and values for the producers that will publish messages to
topics in Kafka DC2 is named producer.props. It contains this list: (Destination Brokers)
#vi producer.props
bootstrap.servers=kafka0:9093
127 Kafka – Administration
Here is an example showing how to mirror a single topic (named test-topic) from an input
cluster:
If the test-topic is not present, ensure that you have created on the DC1:
Let us configure replication across the DC using Mirror maker. You need to execute the
following on the DC2. It can be executed from any of the DC, however we have created
configuration files in the DC2 hence we will be executing this from the /opt/scripts folder of
DC2 vm
#cd /opt/scripts
#/opt/kafka/bin/kafka-mirror-maker.sh --consumer.config consumer.props --
producer.config producer.props --whitelist test-topic
129 Kafka – Administration
Note that we specify the list of topics with the --whitelist option. This option allows any
regular expression using Java-style regular expressions. So you could mirror two topics
named A and B using --whitelist 'A|B'. Or you could mirror all topics using --whitelist '*'.
Make sure to quote any regular expression to ensure the shell doesn't try to expand it as a
file path. For convenience we allow the use of ',' instead of '|' to specify a list of topics.
Combining mirroring with the configuration auto.create.topics.enable=true makes it
possible to have a replica cluster that will automatically create and replicate all data in a
source cluster even as new topics are added.
130 Kafka – Administration
The consumer groups tool is useful to gauge how well your mirror is keeping up with the
source cluster. Note that the –bootstrap-server argument should point to the source
cluster's broker (DC1 in this scenario). For example:
Send messages from DC1. Ensure that you open a separate terminal for each of the console.
As you can verify that the messages are being replicated. The lag is showing as 0.
In this lab we will demonstrate the replication of message across the cluster for DR.
As shown in the above picture, there are two cluster; a kafka cluster in (kafka0:9092)DC1
and another kafka cluster in DC2(kafka0:9093)
point the consumer to the source cluster's ZooKeeper, and the producer to the mirror
cluster's ZooKeeper (or use the broker.list parameter).
Create Two instances of kafka Nodes which will act as seperate DC.
§ DC 1 (kafka0:9092) : zookeeper : /source
§ DC 2 (kafka0:9093) : zookeeper : /dest
Start a common zookeeper for DC1 and DC2 with different znode.
Follow the following steps to Start a New Zookeeper for the Mirror Maker
dataDir=/opt/data/zookeeper-km/
clientPort=2181
initLimit=5
syncLimit=2
Use the above zookeeper config file to start the zookeeper instance
#cd /opt/zookeeper
#bin/zkServer.sh start conf/zoom.cfg
137 Kafka – Administration
$ cd /opt
$ mkdir -p kafka-config/config
$ cp kafka/config/server.properties kafka-config/config/server-p.properties
$ cp kafka/config/server.properties kafka-config/config/server-s.properties
broker.id=0
listeners=PLAINTEXT://kafka0:9092
advertised.listeners=PLAINTEXT://kafka0:9092
138 Kafka – Administration
log.dirs=/opt/data/kafka-logs/kafka-p
zookeeper.connect=localhost:2181/source
Start the same zookeeper with separate znode:dest and a kafka node
#vi /opt/kafka-config/config/server-s.properties
broker.id=0
listeners=PLAINTEXT://kafka0:9093
advertised.listeners=PLAINTEXT://kafka0:9093
log.dirs=/opt/data/kafka-logs/kafka-s
zookeeper.connect=localhost:2181/dest
139 Kafka – Administration
#/opt/kafka/bin/kafka-server-start.sh "/opt/kafka-config/config/server-
s.properties"
Let us create the following configuration in the data center, DC2 . Change the directory to
/opt/scripts
#cd /opt/scripts
#vi mirror-maker.properties
Create mirror-maker.properties
primary->secondary.enabled = true
secondary->primary.enabled = false
140 Kafka – Administration
topics = .*
groups = .*
replication.factor = 1
refresh.topics.enabled = true
refresh.topics.interval.seconds = 30
key.converter=org.apache.kafka.connect.converters.ByteArrayConverter
value.converter=org.apache.kafka.connect.converters.ByteArrayConverter
#File Ends here.
141 Kafka – Administration
Here is an example showing how to mirror a single topic (named test-topic) from an input
cluster:
If the test-topic is not present, ensure that you have created on the DC1:
# Run in secondary's data center, reading from the remote `primary` cluster
$ /opt/kafka/bin/connect-mirror-maker.sh mirror-maker.properties
The --clusters secondary tells the MirrorMaker process that the given cluster(s) are nearby,
and prevents it from replicating data or sending configuration to clusters at other, remote
locations.
143 Kafka – Administration
Send messages from DC1. Ensure that you open a separate terminal for each of the console.
https://fossies.org/linux/kafka/config/connect-mirror-maker.properties
https://access.redhat.com/documentation/en-
us/red_hat_amq/7.7/html/using_amq_streams_on_rhel/assembly-mirrormaker-str
--------------------- -------------------------- Labs End Here -----------------------------------------
145 Kafka – Administration
Apache Kafka Connector – Connectors are the components of Kafka that could be setup
to listen the changes that happen to a data source like a file or database, and pull in those
changes automatically.
When working with Kafka you might need to write data from a local file to a Kafka topic.
This is actually very easy to do with Kafka Connect. Kafka Connect is a framework that
provides scalable and reliable streaming of data to and from Apache Kafka. With Kafka
Connect, writing a file’s content to a topic requires only a few simple steps
Creating a Topic to Write to, the message will be fetch from the file and publish to the topic.
#cd /opt/scripts
#/opt/kafka/bin/kafka-topics.sh \
--create \
--bootstrap-server kafka0:9092 \
--replication-factor 1 \
--partitions 1 \
--topic my-kafka-connect
147 Kafka – Administration
Since we are reading the contents of a local file and writing to Kafka, this file is considered
our “source”. Therefore, we will use the FileSource connector. We must create a
configuration file to use with this connector. For this most part you can copy the example
available in $KAFKA_HOME/config/connect-file-source.properties. Below is an example
of our my-file-source.properties file.
This file indicates that we will use the FileStreamSource connector class, read data from the
/tmp/my-test.txt file, and publish records to the my-kafka-connect Kafka topic. We are also
only using 1 task to push this data to Kafka, since we are reading/publishing a single file.
Processes that execute Kafka Connect connectors and tasks are called workers. Since we are
reading data from a single machine and publishing to Kafka, we can use the simpler of the
two types, standalone workers (as opposed to distributed workers). You can find a sample
config file for standalone workers in $KAFKA_HOME/config/connect-
standalone.properties. We will call our file my-standalone.properties.
149 Kafka – Administration
Create a file.
#vi my-standalone.properties
The main change in this example in comparison to the default is the key.converter and
value.converter settings. Since our file contains simple text, we use the StringConverter
types.
Open a terminal.
Our input file /tmp/my-test.txt will be read in a single process to the Kafka my-kafka-
connect topic. Here is a look at the file contents:
Now it is time to run Kafka Connect with our worker and source configuration files. As
mentioned before we will be running Kafka Connect in standalone mode. Here is an
example of doing this with our custom config files:
Open another terminal and execute the following consumer to consume the message.
Tools - https://www.confluent.io/hub/confluentinc/kafka-connect-jdbc
155 Kafka – Administration
Note that the plugin.path is the path that we need to place the library that we downloaded.
#vi /software/connect-distributed.properties
# A list of host/port pairs to use for establishing the initial connection to the Kafka
cluster.
bootstrap.servers=localhost:9092
# unique name for the cluster, used in forming the Connect cluster group. Note that this
must not conflict with consumer group IDs
group.id=connect-cluster
# The converters specify the format of data in Kafka and how to translate it into Connect
data.
key.converter=org.apache.kafka.connect.json.JsonConverter
value.converter=org.apache.kafka.connect.json.JsonConverter
key.converter.schemas.enable=true
value.converter.schemas.enable=true
# Topic to use for storing offsets. This topic should have many partitions and be replicated
and compacted.
offset.storage.topic=connect-offsets
offset.storage.replication.factor=1
# Topic to use for storing connector and task configurations; note that this should be a
single partition, highly replicated,
config.storage.topic=connect-configs
config.storage.replication.factor=1
# Topic to use for storing statuses. This topic can have multiple partitions and should be
replicated and compacted.
status.storage.topic=connect-status
status.storage.replication.factor=1
plugin.path=/software/plugins
158 Kafka – Administration
Note that the plugin.path is the path that we need to place the library that we downloaded.
Go to the bin folder of the kafka installation or specify the full path.
#/opt/kafka/bin/connect-distributed.sh /software/connect-distributed.properties
159 Kafka – Administration
After running the connector we can confirm that connector's REST endpoint is accessible,
and we can confirm that JDBC connector is in the plugin list by calling
http://localhost:8083/connector-plugins
# curl http://localhost:8083/connector-plugins
160 Kafka – Administration
Install the postgress db. We will configure postgressdb to store some data which will be transfer by
connect to kafka topic.
For Centos 7:
Or
161 Kafka – Administration
The configuration for the plugin is stored in /software/jdbc-source.json file on kafka broker server.
It contents is as follows: ( replace localhost with container IP of postgress DB if you are using docker)
{
"name": "jdbc_source_connector_postgresql_01",
"config": {
"connector.class": "io.confluent.connect.jdbc.JdbcSourceConnector",
"connection.url": "jdbc:postgresql://localhost:5432/postgres",
"connection.user": "postgres",
"connection.password": "postgres",
"topic.prefix": "postgres-01-",
"poll.interval.ms" : 3600000,
"mode":"bulk"
}
}
Create a table and insert few records: Execute all the bellows SQL commands from the Postgresdb CLI.
To exit : \q
165 Kafka – Administration
Allow all to connect to postgresDB from remote server by updating the following contents:
#vi /var/lib/pgsql/13/data/postgresql.conf
listen_addresses = '*'
# vi /var/lib/pgsql/13/data/pg_hba.conf
As we operate on distributed mode we run the connectors by calling REST endpoints with the
configuration JSON. We can specify the configuration payload from a file for curl command. The
following command starts the connector. Execute the following command from the directory you
have store the configuration json file.
curl -d @"jdbc-source.json" \
-H "Content-Type: application/json" \
-X POST http://localhost:8083/connectors
166 Kafka – Administration
We can see that in postgres database table leads is loaded to kafka topic- postgres-02-leads:
And each row in the tables are loaded as a message. You can verify using the consumer
console.
The configuration for the plugin is stored in /software/jdbc-source_smt.json file on kafka broker server.
It contents is as follows: ( replace localhost with container IP of postgress DB if you are using docker)
{
"name": "jdbc_source_connector_postgresql_02",
"config": {
"connector.class": "io.confluent.connect.jdbc.JdbcSourceConnector",
"connection.url": "jdbc:postgresql://localhost:5432/postgres",
"connection.user": "postgres",
"connection.password": "postgres",
"topic.prefix": "postgres-02-",
"poll.interval.ms" : 3600000,
"mode":"bulk",
"transforms": "RenameField",
"transforms.RenameField.type": "org.apache.kafka.connect.transforms.ReplaceField$Value",
"transforms.RenameField.renames": "name:fullname"
}
}
In the above configuration, the field , name is replace with fullname attribute.
Verify the topic and check the message inside that topic.
As shown above the message field has been transform from Name to fullname.
#curl http://localhost:8083/connectors
#curl http://localhost:8083/connectors/jdbc_source_connector_postgresql_01/config
Retrieve details for specific tasks (JDBC Tasks) - Get a list of tasks currently running for the connector.
#curl http://localhost:8083/connectors/jdbc_source_connector_postgresql_02/tasks
172 Kafka – Administration
#curl http://localhost:8083/connectors/jdbc_source_connector_postgresql_01/status
173 Kafka – Administration
# curl
http://localhost:8083/connectors/jdbc_source_connector_postgresql_01/tasks/0/status
Download and install confluent kafka : Schema Registry only. (Refer the Confluent
Installation.)
kafkastore.bootstrap.servers=PLAINTEXT://kafka0:9092
#cd /opt/confluent/
#schema-registry-start /opt/confluent/etc/schema-registry/schema-registry.properties
175 Kafka – Administration
Use the Schema Registry API to add a schema for the topic my-kafka.
You can verify the topic which maintains the schema details in the broker.
# kafka-topics.sh --list --bootstrap-server kafka0:9092
177 Kafka – Administration
Use this two-step process to find subjects associated with a given ID:
E.x
#curl -X GET http://localhost:8081/subjects/my-kafka-value/versions/latest
178 Kafka – Administration
https://docs.confluent.io/platform/current/schema-registry/develop/using.html
/opt/kafka/bin/kafka-topics.sh --create \
--bootstrap-server localhost:9092 \
--replication-factor 1 \
--partitions 1 \
--topic benchmark-1-1-none
Let us execute the below benchmark that will simulate the pushing of 150000000 records.
--producer-props \
acks=1 \
bootstrap.servers=tos.master.com:9092 \
buffer.memory=67108864 \
compression.type=none \
batch.size=8196
This is because in an effort to increase availability and durability, version 0.8 introduced
intra-cluster replication support, and by default a producer waits for an acknowledgement
response from the broker on every message (or batch of messages if async mode is used). It
is possible to mimic the old behavior, but we were not very interested in that given that we
intend to use replication in production.
181 Kafka – Administration
Performance degraded further once we started using a sample of real ~1KB sized log
messages rather than the synthetic messages produced by the Kafka tool, resulting in a
throughput of about 10 MB/sec.
/opt/kafka/bin/kafka-topics.sh --create \
--bootstrap-server localhost:9020 \
182 Kafka – Administration
--partitions 1 \
--topic benchmark-3-0-none
Demo:
First let’s create a topic for the test data. The following example will result in a topic named
my-perf-test with 2 partitions, a replication factor of 1 and retention time of 24 hours.
Replace broker0 as needed for the Kafka cluster in your environment:
Then run the producer performance test script with different configuration settings. The following
example will use the topic created above to store 1 lakh messages with a size of 1 KiB each. The -1
value for --throughput means that messages are produced as quickly as possible, with no throttling
limit. Kafka producer related configuration properties like acks and bootstrap.servers are mentioned
as part of --producer-props argument:
In this example, roughly 2.3k messages are produced per second on average, with a maximum
latency of approx. 15 seconds.
To run the consumer performance test script as an example, will read 1 lakh messages from our
same topic my-perf-test.
#kafka-consumer-perf-test.sh --topic my-perf-test --broker-list kafka0:9092 --messages 100000
187 Kafka – Administration
The throughput (MB.sec and nMsg.sec) which are of interest : 13.75 and 14084.
Verify the nos. after increasing the partition. The differences will be observed if you have
multiple nodes.
Tuning Brokers
188 Kafka – Administration
Topics are divided into partitions. Each partition has a leader. Topics that are properly
configured for reliability will consist of a leader partition and 2 or more follower partitions.
When the leaders are not balanced properly, one might be overworked, compared to others.
Depending on your system and how critical your data is, you want to be sure that you have
sufficient replication sets to preserve your data. For each topic, It is recommends starting
with one partition per physical storage disk and one consumer per partition.
#kafka-topics.sh --bootstrap-server kafka0:9092 --describe --topic my-perf-test4
Tuning Producers
189 Kafka – Administration
When you use Producer.send(), you fill up buffers on the producer. When a buffer is full,
the producer sends the buffer to the Kafka broker and begins to refill the buffer.
batch.size measures batch size in total bytes instead of the number of messages.
Tuning Consumers
Consumers can create throughput issues on the other side of the pipeline. The maximum
number of consumers in a consumer group for a topic is equal to the number of partitions.
You need enough partitions to handle all the consumers needed to keep up with the
producers.
Consumers in the same consumer group split the partitions among them. Adding more
consumers to a group can enhance performance (up to the number of partitions). Adding
190 Kafka – Administration
more consumer groups does not affect performance. Execute the following command from
2 terminals.
• To check the ISR set for topic partitions, run the following command:
• If a partition leader dies, new leader is selected from the ISR set. There will be no data loss. If there is
no ISR, unclean leader election can be used with the risk of data-loss.
• Unclean leader election occurs if unclean.leader.election.enable is set to true. By default, this
is set to false.
15. Errors
I. {test=LEADER_NOT_AVAILABLE}
(org.apache.kafka.clients.NetworkClient)
Solutions: /opt/kafka/config/server.properties
Update the following information.
II. DumplogSegment
/opt/kafka/bin/kafka-run-class.sh kafka.tools.DumpLogSegments --deep-iteration --print-
data-log --files \
/tmp/kafka-logs/my-kafka-connect-0/00000000000000000000.log | head -n 4
196 Kafka – Administration
III. Resources
https://developer.ibm.com/hadoop/2017/04/10/kafka-security-mechanism-saslplain/
197 Kafka – Administration
https://sharebigdata.wordpress.com/2018/01/21/implementing-sasl-plain/
https://developer.ibm.com/code/howtos/kafka-authn-authz
https://jaceklaskowski.gitbooks.io/apache-kafka/content/kafka-tools-GetOffsetShell.html