0% found this document useful (0 votes)
150 views

Apache Kafka Confluent Enterprise Ref Architecture

This document provides an overview of the Confluent Enterprise Reference Architecture for deploying Apache Kafka and Confluent Platform. It describes the core components, including Zookeeper, Kafka brokers, Kafka Connect workers, Kafka clients, and the Kafka Streams API. It also mentions additional Confluent components like the Confluent Schema Registry, REST Proxy, Replicator, Auto Data Balancing, and Control Center. The document indicates that Zookeeper and Kafka brokers should have a minimum of 3 nodes each for high availability and that Kafka Connect is best deployed in a clustered mode with workers running on separate machines.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
150 views

Apache Kafka Confluent Enterprise Ref Architecture

This document provides an overview of the Confluent Enterprise Reference Architecture for deploying Apache Kafka and Confluent Platform. It describes the core components, including Zookeeper, Kafka brokers, Kafka Connect workers, Kafka clients, and the Kafka Streams API. It also mentions additional Confluent components like the Confluent Schema Registry, REST Proxy, Replicator, Auto Data Balancing, and Control Center. The document indicates that Zookeeper and Kafka brokers should have a minimum of 3 nodes each for high availability and that Kafka Connect is best deployed in a clustered mode with workers running on separate machines.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

Apache Kafka® and Confluent

Enterprise Reference Architecture

Written by: Gwen Shapira


March 2018

www.confluent.io/contact ©2018 Confluent, Inc.


Table of Contents

Introduction 3

Apache Kafka and Confluent Enterprise Architecture 4

ZooKeeper 4

Kafka Brokers 4

Kafka Connect Workers 5

Kafka Clients 5

Kafka Streams API 5

Confluent KSQL Server 6

Confluent REST Proxy 6

Confluent Schema Registry 6

Confluent Replicator 6

Confluent Auto Data Balancing 7

Confluent Control Center 7

Large Cluster Reference Architecture 8

Small Cluster Reference Architecture 9

Capacity Planning 10

Storage 10

Memory 11

CPU 12

Network 12

Hardware Recommendations for On-Premise Deployment 13

Large Cluster 13

Small Cluster 14

Cloud Deployment 14

Amazon AWS EC2 15

Microsoft Azure 16

Google Cloud Compute Engine 16

Conclusion 17

2 ©2018 Confluent, Inc.


Introduction
Choosing the right deployment model is critical for the success and scalability of the Confluent streaming platform. We want to
provide the right hardware (and cloud instances) for each use case to ensure that the system reliably provides high-throughput and
low-latency data streams.

This white paper provides a reference for data architects and system administrators who are planning to deploy Apache Kafka® and
Confluent Platform in production. We discuss important considerations for production deployments and provide guidelines for the
selection of hardware and cloud instances. We also provide recommendations on how to deploy the Kafka Connect API in production,
as well as components of Confluent Platform that integrate with Apache Kafka, such as the Confluent Schema Registry, Confluent
REST Proxy and Confluent Control Center.

3 ©2018 Confluent, Inc.


Confluent Enterprise Architecture
Apache Kafka is an open source streaming platform designed to provide the basic components necessary for managing streaming
data: Storage (Kafka core), integration (Kafka Connect), and processing (Kafka Streams). Apache Kafka is proven technology,
deployed in countless production environments to power some of the world’s largest stream processing systems.

Confluent Platform includes Apache Kafka, as well as selected software projects that are frequently used with Kafka, which makes
Confluent Platform a one-stop shop for setting up a production-ready streaming platform. These projects include clients for C, C++,
Python, and Go programming languages; Connectors for JDBC, Elasticsearch, and HDFS; Confluent Schema Registry for managing
metadata for Kafka topics; and Confluent REST Proxy for integrating with web applications.

Confluent Enterprise takes this to the next level by addressing requirements of modern enterprise streaming applications. It
includes Confluent Control Center for end-to-end monitoring and management, Confluent Replicator for managing multi-datacenter
deployments, and Confluent Auto Data Balancing for optimizing resource utilization and easy scalability.

We’ll start describing the architecture from ground level up and for each component we’ll explain when it is needed and the best
plan for deploying it in several scenarios. We will not discuss capacity or hardware recommendations at this point, as this will be
discussed in depth in the next section. In addition, you should refer to the Confluent documentation for installation instructions.

ZooKeeper
ZooKeeper is a centralized service for managing distributed processes and is a mandatory component in every Apache Kafka cluster.
While the Kafka community has been working to reduce the dependency of Kafka clients on ZooKeeper, Kafka brokers still use
ZooKeeper to manage cluster membership and elect a cluster controller.

In order to provide high availability, you will need at least 3 ZooKeeper nodes (allowing for one-node failure) or 5 nodes (allowing
for two-node failures). All ZooKeeper nodes are equivalent, so they will usually run on identical nodes. Note that the number of
ZooKeeper nodes MUST be odd.

Kafka Brokers
Kafka brokers are the main storage and messaging components of Apache Kafka. Kafka is a streaming platform that uses
messaging semantics. The Kafka cluster maintains streams of messages called topics; the topics are sharded into partitions
(ordered, immutable logs of messages) and the partitions are replicated and distributed for high availability. The servers that run the
Kafka cluster are called brokers.

You will usually want at least 3 Kafka brokers in a cluster, each running on a separate server. This way you can replicate each Kafka
partition at least 3 times and have a cluster that will survive a failure of 2 nodes without data loss. Note that with 3 Kafka brokers, if
any broker is not available, you won’t be able to create new topics with 3 replicas until all brokers are available again. For this reason,
if you have use-cases that require creating new topics frequently, we recommend running at least 4 brokers in a cluster.

If the Kafka cluster is not going to be highly loaded, it is acceptable to run Kafka brokers on the same servers as the ZooKeeper
nodes. In this case, it is recommended to allocate separate disks for ZooKeeper (as we’ll specify in the hardware recommendations
below). For high-throughput use cases we do recommend installing Kafka brokers on separate nodes.

4 ©2018 Confluent, Inc.


Kafka Connect Workers
Kafka Connect is a component of Apache Kafka that allows it to integrate with external systems to pull data from source systems
and push data to sink systems. It works as a pluggable interface, so you plug in Connectors for the systems you want to integrate
with. For example, you deploy Kafka Connect with JDBC and Elasticsearch Connectors to copy data from MySQL to Kafka and from
Kafka to Elasticsearch. The full list of available Connectors can be found here: http://www.confluent.io/product/connectors

Kafka Connect can be deployed in one of two modes:

• Standalone mode: This is similar to how Logstash or Apache Flume are deployed. If you need to get logs from a specific
machine to Kafka, you run Kafka Connect with a file connector or spooling directory connector on the machine and it reads
the files and send events to Kafka.
• Cluster mode: This is the recommended deployment mode for Kafka Connect in production. You install Kafka Connect and
its Connectors on several machines. They discover each other, with Kafka brokers serving as the synchronization layer, and
they automatically load-balance and failover work between them. To start and stop Connectors anywhere on the cluster, you
connect to any Kafka Connect node (known as “worker”) and use a REST API to start, stop, pause, resume, and configure
Connectors. No matter which node you use to start a connector, Kafka Connect will determine the optimal level of parallelism
for the Connector and start parallel tasks to pull or push data as needed on the least loaded available worker nodes.

In standalone mode you will deploy Kafka Connect on the servers that contain the files or applications you want to integrate with.
In cluster mode, Kafka Connect will usually run on a separate set of machines, especially if you plan to run multiple Connectors
simultaneously. As Kafka Connect workers are stateless, they can also be safely deployed in containers.

Kafka Clients
Apache Kafka clients are used in the applications that produce and consume events. The Apache Kafka’s Java client JARs are included
in the Confluent Platform Kafka packages and are installed alongside Kafka brokers, but they are typically deployed with the application
that imports them by adding the client libraries as application dependencies using a build manager such as Apache Maven.

At the core of Confluent’s other clients (C, C++, Python, and Go) is librdkafka, Confluent’s C/C client for Apache Kafka. Librdkafka is
an open-source, well-proven, reliable and high performance client. By basing our Python (confluent-kafka-python) and Go (confluent-
kafka-go) clients on librdkafka, we provide consistent APIs and semantics, high performance and high-quality clients in various
programming languages.

Confluent Platform includes librdkafka packages and these should be installed on servers where applications using the C, C++,
Python or Go clients will be installed.

Kafka Streams API


Kafka Streams, a component of open source Apache Kafka, is a powerful, easy-to- use library for building highly scalable, fault-
tolerant, stateful distributed stream processing applications on top of Apache Kafka. It builds upon important concepts for stream
processing such as properly distinguishing between event-time and processing-time, handling of late-arriving data, and efficient
management of application state.

Kafka Streams is a library that is embedded in the application code (just like Jetty, for instance), and as such, you don’t need to
allocate Kafka Streams servers, but you do need to allocate servers for the applications that use Kafka Streams library (or at least
resources for their containers). Kafka Streams will use parallel-running tasks for the different partitions and processing stages in the
application, and as such will benefit from higher core count. If you deploy multiple instances of the application on multiple servers
(recommended!), Kafka Streams library will handle load-balancing and failover automatically. In order to maintain its application state,
Kafka Streams uses embedded RocksDB database. It is recommended to use persistent SSD disks for the RocksDB storage.

5 ©2018 Confluent, Inc.


Confluent KSQL Server
Confluent KSQL is a an open source streaming SQL engine that implements continuous queries against Apache Kafka. It allows you
to query, read, write, and process data in Apache Kafka in real-time, at scale using SQL-like semantics. The KSQL Command Line
Interface (CLI) allows you to interactively write KSQL queries, this CLI acts as a client and can run on any machine (server or laptop)
with access to the KSQL server. The KSQL server runs the engine that executes KSQL queries, which includes the data processing as
well as reading data from and writing data to the target Kafka cluster.

KSQL is typically deployed on a set of servers that form a cluster. The number of servers in the cluster will be determined by the
processing capacity required — this includes the number of concurrent queries that will execute on the cluster, as well as the
complexity of the queries. KSQL servers are essentially instances of a KafkaStreams application, so you can expect similar behaviors.
Like Kafka Streams, the KSQL cluster will handle load balancing and failover between the nodes in the clusters automatically. It will
benefit from higher CPU counts, good network throughput and SSD storage for the RocksDB state store.

Confluent REST Proxy


The Confluent REST Proxy is an open source HTTP server that provides a RESTful interface to a Kafka cluster. It makes it easy to
produce and consume messages, view the state of the cluster, and perform administrative actions without using the native Kafka
protocol or clients. The REST Proxy is not a mandatory component of the platform — you will choose to use the REST Proxy if you
wish to produce and consume messages to/from Kafka using a RESTful HTTP protocol. If your applications only use the native
clients (mentioned above), you can choose not to deploy the REST Proxy.

The REST Proxy is typically deployed on a separate set of machines. For additional throughput and high availability, it is
recommended to deploy multiple REST Proxy servers behind a sticky load balancer. When using the high-level consumer API, it is
important that all requests to the same consumer will be directed to the same REST Proxy server, so use of a “sticky” load balancing
policy is recommended.

Since REST Proxy servers are stateless, they can also be safely deployed in containers.

Confluent Schema Registry


Confluent Schema Registry is an open source serving layer for your metadata. It provides a RESTful interface for storing and
retrieving Avro schemas. It stores a versioned history of all schemas, provides multiple compatibility settings, and allows evolution
of schemas according to the configured compatibility setting. The Confluent Schema Registry packages also include serializers that
plug into Kafka clients and automatically handle schema storage and retrieval for Kafka messages that are sent in the Avro format.

Schema Registry is typically installed on its own servers, although for smaller installations it can safely be installed alongside
Confluent REST Proxy and Kafka Connect workers. For high availability, you’ll want to install multiple Schema Registry servers.
With multiple servers, the Schema Registry uses a leader-followers architecture. In this configuration, at most one Schema Registry
instance is leader at any given moment. Only the leader is capable of publishing writes to the underlying Kafka log, but all nodes are
capable of directly serving read requests. Follower nodes forward write requests them to the current leader. Schema Registry stores
all its schemas in Kafka, and therefore Schema Registry nodes do not require storage and can be deployed in containers.

Confluent Replicator
Confluent Replicator is a new component added to Confluent Enterprise to help manage multi-cluster deployments of Confluent
Platform and Apache Kafka. It provides a centralized configuration of cross-cluster replication. Unlike Apache Kafka’s MirrorMaker, it
replicates topic configuration in addition to the messages in the topics.

Confluent Replicator is integrated with the Kafka Connect framework and should be installed on the Connect nodes in the destination
cluster. If there are multiple Connect worker nodes, Replicator should be installed on all of them. By installing Replicator on a larger
number of nodes, Replicator will scale to replicate at higher throughput and will be highly available through a built-in failover mechanism.

6 ©2018 Confluent, Inc.


Confluent Auto Data Balancing
Confluent Auto Data Balancing is a new component added to Confluent Enterprise to optimize resource utilization and help scale
Kafka clusters. Auto Data Balancing evaluates information on the number of brokers, partitions, leaders and sizes of partitions to
decide on a balanced placement of partitions on brokers and modify the replicas assigned to each broker to achieve a balanced
placement. For example, when a new broker is added to the cluster, Auto Data Balancing will move partitions to the new broker to
balance the load between all brokers available in the cluster. To avoid impact on production workloads, the rebalancing traffic can
be throttled to a fraction of the available network capacity.

Auto Data Balancing can be installed on any machine in the Confluent Platform cluster — it just needs to be able to communicate
with the Kafka brokers and ZooKeeper to collect load metrics and send instructions to move partitions. For convenience, we
recommend installing it alongside Kafka brokers or on the Confluent Control Center node if available.

Confluent Control Center


Confluent Control Center is Confluent’s web-based tool for managing and monitoring Apache Kafka. It is part of Confluent Platform
Enterprise and provides two key types of functionality for building and monitoring production data pipelines and streaming
applications:

• Data Stream Monitoring and Alerting: You can use Control Center to monitor your data streams end to end, from producer
to consumer. Use Control Center to verify that every message sent is received (and received only once), and to measure
system performance end to end. Drill down to better understand cluster usage, and identify any problems. Configure alerts
to notify you when end-to-end performance does not match SLAs or measure whether messages sent were received.
• Multi-cluster monitoring and management: A single Control Center node can monitor data flows in multiple clusters and
manage data replication between the clusters.
• Kafka Connect configuration: You can also use Control Center to manage and monitor Kafka Connect: the open source
toolkit for connecting external systems to Kafka. You can easily add new sources to load data from external data systems
and new sinks to write data into external data systems. Additionally, you can manage, monitor, and configure connectors
with Confluent Control Center.

Confluent Control Center currently runs on a single machine, and due to the resources required we recommend dedicating a separate
machine for Control Center.

7 ©2018 Confluent, Inc.


Large Cluster Reference Architecture
Taking all the recommendations above, a Confluent Platform cluster that is built for high-throughput long-term scalability will have an
architecture similar to the following:

Confluent Schema Registry Kafka Connect


Application Microservice
Leader Follower
Worker +
Connectors
+ Replicator
Confluent KSQL
Sticky Load Balancer
KSQL Server KSQL Server Worker +
Connectors
Confluent REST Proxy + Replicator

Proxy Proxy Proxy

Kafka Streams
App
Kafka brokers

Broker + Broker + Broker + Broker +


Rebalancer Rebalancer Rebalancer Rebalancer Microservice

ZooKeeper Nodes
Confluent Control
ZK ZK ZK ZK ZK Center

This architecture was built to scale. Each component is given its own servers, and if any layer becomes overly loaded it can be scaled
independently simply by adding nodes to that specific layer. For example, when adding applications that use the Confluent REST
Proxy, you may find that the REST Proxy no longer provides the required throughput while the underlying Kafka brokers still have
spare capacity. In that case, you only need to add REST Proxy nodes in order to scale your entire platform.

8 ©2018 Confluent, Inc.


Small Cluster Reference Architecture
Usually companies start out by adopting Confluent Platform for one use case with limited load, and when this proves successful they
grow the cluster to accommodate additional applications and teams. This architecture is recommended for the early stages where
investing in full-scale deployment is usually not required for the success of the project. In those cases, starting with fewer servers
and installing multiple components per server is the way to go. We still give several resource-intensive components like Confluent
Control Center and Confluent KSQL dedicated servers.

Confluent KSQL
Application Microservice
KSQL KSQL
Server Server

Sticky Load Balancer

Confluent Platform

Connect Connect
+ +
Replicator Replicator
+ +
Connectors Connectors
+ +
Schema Registry Schema Registry Kafka Streams
+ + App
REST Proxy REST Proxy

Microservice

Kafka brokers

ZK + ZK + ZK +
Broker + Broker + Broker + Confluent Control
Rebalancer Rebalancer Rebalancer Center

Note that this architecture, although requiring far fewer servers, provides the high-availability of a full-scale cluster. As the use case
expands, you will notice bottlenecks develop in the system. In this case the correct approach is to start by separating the bottleneck
components to their own servers, and when further growth is required, scale by adding servers to each component. With time, this
architecture will evolve to resemble the recommended large scale architecture.

9 ©2018 Confluent, Inc.


Capacity Planning
When planning your architecture for Confluent Platform, you need to provide sufficient resources for the planned workload. Storage,
memory, CPU and network resources can all be potential bottlenecks and must be considered. Since every component is scalable,
usage of storage, memory and CPU can be monitored on each node and additional nodes can be added when required. While most
components do not store state, there is no problem to add nodes at any time and immediately take advantage of the added capacity.
The main exception is Kafka brokers, which serve as the main storage component for the cluster. It is critical to monitor Kafka
brokers closely and to add capacity and rebalance before any broker is overloaded — usually when any resource reaches 60-70% of
capacity. The reason is that the rebalancing operation itself takes resources and the more resources you can spare for rebalancing,
the less time it will take to rebalance and the sooner the cluster will benefit from additional capacity.

Performance can be monitored via Confluent Control Center and any third party JMX monitoring tool. Confluent Control Center
enables you to configure alerts when SLAs are not met, which will allow you to take action proactively.

Storage
Storage is mostly a concern on ZooKeeper and Kafka brokers.

For ZooKeeper the main concern is low latency writes to the transaction log. Therefore we recommend dedicated disks, specifically
for storing the ZooKeeper transaction log (even in small scale deployment where ZooKeeper is installed alongside Kafka Brokers).

Kafka brokers are the main storage for the Confluent Platform cluster and therefore usually require ample storage capacity. Most
deployments use 6-12 disks, usually 1TB each. The exact amount of storage you will need obviously depends on the number of
topics, partitions, the rate at which applications will be writing to each topic and the retention policies you configure.

You also want to consider the type of storage. SSD and spinning magnetic drives offer different performance characteristics and
depending on your use case, SSD performance benefits may be worth their higher cost. While Kafka brokers write sequentially to
each partition, most deployments store more than one partition per disk and if your use case requires Kafka brokers to access disk
frequently, minimizing seek times will increase throughput.

When selecting a file system, we recommend either EXT4 or XFS — both have been tested and used extensively in production
Kafka clusters.

Note that the use of shared-storage devices, while supported, is not recommended. Confluent Platform is not tested with SAN/NAS
and very careful configuration is required in order to achieve good performance and availability when using shared storage.

Confluent Control Center relies on local state in RocksDB. We recommend at least 300GB of storage space, preferably SSDs. All local
data is kept in the directory specified by the confluent.controlcenter.data.dir config parameter.

Kafka Streams and Confluent KSQL are both stateful and use RocksDB as a local persistent state store. The exact use of storage
depends on the specific streams application. Aggregation, windowed aggregation and windowed join all use RocksDB stores to store
their state. The size used will depend on the number of partitions, unique keys in the stream (cardinality), size of keys and values
and the retention for windowed operations (specified in the DSL using until operator). Note that Kafka Streams uses quite a few
file descriptors for its RocksDB stores, so make sure to increase number of file descriptors to 64K or above. Since calculating exact
usage is complex, we typically allocate generous disk space to streams application to allow for ample local state. 100-300GB is a
good starting point for capacity planning. Our KSQL performance tests use SSD storage for RocksDB instances, so we recommend
this configuration.

10 ©2018 Confluent, Inc.


Memory
Sufficient memory is essential for efficient use of almost all of Confluent Platform components.

ZooKeeper uses the JVM heap, and 4GB RAM is typically sufficient. Too small of a heap will result in high CPU due to constant garbage
collection while too large heap may result in long garbage collection pauses and loss of connectivity within the ZooKeeper cluster.

Kafka brokers use both the JVM heap and the OS page cache. The JVM heap is used for replication of partitions between brokers
and for log compaction. Replication requires 1MB (default replica.max.fetch.size) for each partition on the broker. In Apache
Kafka 0.10.1 (Confluent Platform 3.1), we added a new configuration (replica.fetch.response.max.bytes) that limits
the total RAM used for replication to 10MB, to avoid memory and garbage collection issues when the number of partitions on a
broker is high. For log compaction, calculating the required memory is more complicated and we recommend referring to the Kafka
documentation if you are using this feature. For small to medium-sized deployments, 4GB heap size is usually sufficient. In addition,
it is highly recommended that consumers always read from memory, i.e. from data that was written to Kafka and is still stored in
the OS page cache. The amount of memory this requires depends on the rate at this data is written and how far behind you expect
consumers to get. If you write 20GB per hour per broker and you allow brokers to fall 3 hours behind in normal scenario, you will want
to reserve 60GB to the OS page cache. In cases where consumers are forced to read from disk, performance will drop significantly.

Kafka Connect itself does not use much memory, but some connectors buffer data internally for efficiency. If you run multiple
connectors that use buffering, you will want to increase the JVM heap size to 1GB or higher.

The more memory you give Confluent Control Center the better but we recommend at least 32GB of RAM. The JVM heap size can be
fairly small (defaults to 3GB) but the application needs the additional memory for RocksDB in-memory indexes and caches as well as
OS page cache for faster access to persistent data.

Kafka Producer Clients can benefit from generous JVM heap sizes to achieve high throughput. Our clients attempt to batch data
as it is sent to brokers in order to use the network more efficiently, and in addition they store messages in memory until they are
acknowledged successfully by the brokers. Having sufficient memory for the producer buffers will allow the producer to keep retrying
to send messages to the broker in events of network issues or leader election rather than block or throw exceptions.

Kafka Streams and Confluent KSQL have several memory areas and the total memory usage will depend on your specific streams
application and on the configuration. Starting with Apache Kafka 0.10.1 (Confluent Platform 3.1) and higher there is a streams buffer
cache. It defaults to 10MB and controlled through cache.max.bytes.buffering configuration. Setting it higher will generally result in
better performance for your streams application. In addition, streams uses RocksDB memory stores for each partition involved in
each aggregation, windowed aggregation and windowed-join. Kafka Streams exposes the RocksDB configuration and we recommend
using the RocksDB tuning guide to size those. In addition, Kafka Streams uses a Kafka consumer for each thread you configure
for your application. Each consumer allocates the lower of either 1MB per partition or 50MB per broker. Since calculating all these
variables is complex and since more memory generally increases performance of streams applications, we typically allocate large
amounts of memory — 32GB and above.

Confluent REST Proxy buffers data for both producers and consumers. Consumers use at least 2MB per consumer and up to 64MB
in cases of large responses from brokers (typical for bursty traffic). Producers will have a buffer of 64MB each. Start by allocating
1GB RAM and add 64MB for each producer and 16MB for each consumer planned.

Note that in all cases, we recommend using the G1 garbage collection for the JVM to minimize garbage collection overhead.

11 ©2018 Confluent, Inc.


CPU
Most of the Confluent Platform components are not particularly CPU bound. If you notice high CPU it is usually a result of
misconfiguration, insufficient memory, or a bug.

The few exceptions are:


• Compression: Kafka Producers and Consumers will compress and decompress data if you configure them to do so. We
recommend enabling compression since it improves network and disk utilization, but it does use more CPU on the clients.
Kafka brokers older than 0.10.0 decompressed and recompressed the messages before storing them to disk, which
increased CPU utilization on the brokers as well.
• Encryption: Starting at version 0.9.0, Kafka clients can communicate with brokers using SSL. There is a small performance
overhead on both the client and the broker when using encryption, and a larger overhead when it is the consumer that
connects over SSL — because the broker needs to encrypt messages before sending them to the consumer, it can’t use the
normal zero- copy optimization and therefore uses significantly more CPU. Large scale deployment often go to some length
to make sure consumers are deployed within the same LAN as the brokers where encryption is often not a requirement.
• High rate of client requests: If you have large number of clients, or if consumers are configured with max.fetch.wait=0,
they can send very frequent requests and effectively saturate the broker. In those cases configuring clients to batch
requests will improve performance.

Note that many components are multi-threaded and will benefit more from large number of cores than from faster cores.

Network
Large-scale Kafka deployments that are using 1GbE will typically become network-bound. If you are planning on scaling the cluster
to over 100MB/s, you will need to provision a higher bandwidth network. When provisioning for network capacity, you will want to
take into account the replication traffic and leave some overhead for rebalancing operations and bursty clients. Network is one of
the resources that are most difficult to provision since adding nodes will eventually run against switch limitations, therefore consider
enabling compression to get better throughput from existing network resources. Note that Kafka Producer will compress messages
in batches, so configuring the producer to send larger batches will result in better compression ratio and improved network utilization.

12 ©2018 Confluent, Inc.


Hardware Recommendations for On-Premise Deployment

Large Cluster

Conmponent Nodes Storage Memory CPU

Transaction log: 512GB


SSD This is typically not a
5 is recommended for
ZooKeeper 32GB RAM botleneck. 2-4 cores
fault tolerance Storage: 2 X 1TB SATA, suffice.
RAID 10
12 X 1TB disk. RAID 10
At least 3, more for is optional. More is better. Usually dual 12
additional storage, RAM,
Kafka broker Separate OS disks from 64GB RAM+ core sockets.
network throughput
Kafka storage.

Typically not CPU-


Other than installation, bound.
0.5-4GB Heap, depend-
Kafka Connect At least 2 for HA not needed. More cores is better
ing on connectors
than faster cores.

Typically not CPU-


bound.
Confluent Schema Other than installation,
At least 2 for HA 1GB Heap Size More cores is better
Registry not needed.
than faster cores.

At least 16 cores to
1GB overhead + 64MB handle http requests in
At least 2 for HA,
Other than installation, per producer and parallel and background
Confluent Rest Proxy more for additional
not needed. 16MB per consumer threads for consumers
throughput
and producers.

Use SSD. Sizing


At least 2 for HA, depends on the number
20GB minimum,
Confluent KSQL more for additional of concurrent queries At least 4 core
we test on 30GB
throughput and the aggregation
performed.

Confluent Control At least 300GB, At least 8 cores. More


1 32GB+
Center preferably SSDs. preferred.

13 ©2018 Confluent, Inc.


Small Cluster

Conmponent Nodes Storage Memory CPU

12X 1TB disks Dedicat-


ed disk for ZooKeeper
ZooKeeper + Kafka Transaction log. More is better. Usually dual 12 core
At least 3
broker Dedicated disk (or two) 64GB RAM+ sockets.
for OS Remaining disks
dedicated to Kafka data

1GB for connect


1GB for Schema
Kafka Connect +
Registry 1 GB + 64 MB
Confluent Schema Other than installation, At least 16 cores
At least 2 per producer + 16 MB
Registry + Confluent not needed.
per consumer for REST
Rest Proxy
Proxy

Use SSD. Sizing


At least 2 for HA, depends on the number
20GB minimum,
Confluent KSQL for additional of concurrent queries At least 4 core
we test on 30GB
throughput and the aggregation
performed.

Confluent Control At least 300GB, At least 8 cores.


1 32GB+
Center preferably SSDs. More preferred.

Public Cloud Deployment

Today many deployments run on public clouds where node sizing is more flexible than ever before. The hardware recommendations
discussed earlier are applicable when provisioning cloud instances. Special considerations to take into account are:

• Cores: Take into account that cloud providers use “virtual” cores when sizing machines. Those are typically weaker than
modern cores you will use in your data center. You may need to scale the number of cores up when planning cloud
deployments.
• Network: Most cloud providers only provide 10GbE on their highest tier nodes. Make sure your cluster has sufficient nodes
and network capacity to provide the throughput you need after taking replication traffic into account.
• Below are some examples of instance types that can be used in various cloud providers. Note that cloud offerings
continuously evolve and there are typically variety of nodes with similar characteristics. As long as you adhere to the
hardware recommendations, you will be in good shape. The instance types below are just examples.
While the examples below show each component on a separate node, some operations teams prefer to standardize on a single instance
type. This approach makes automation easier, but it does require standardizing on the largest required instance type. In this case, you
can choose to co-locate some services together, as long as there are sufficient resources for all components on the instance.

You’ll need multiple instances of each node. Previous recommendations regarding number of Kafka brokers, Confluent REST Proxy
servers, Kafka Connect workers, etc still apply.

14 ©2018 Confluent, Inc.


Amazon AWS EC2

Conmponent Node Type Memory CPU Storage Network

ZooKeeper m5.large 8GB 2 vCPU 1 x 32GB SSD Up to 2,120 Mbps

Use SSD-based EBS


storage. We recom-
Kafka broker r4.xlarge 30.5GB 4 vCPU mend configuring Up to 10 Gigabit
the instances as
“ebs optimized”

Kafka Connect c5.xlarge 8GB 4 vCPU Use EBS Up to 2,250 Mbps

Confluent REST
c5.xlarge 8GB 4 vCPU Use EBS Up to 2,250 Mbps
Proxy

Confluent Schema
m5.large 8GB 2 vCPU Use EBS Up to 2,120 Mbps
Registry

Use EBS (SSD,


Confluent KSQL i3.xlarge or r4.xlarge 30.5GB 4 vCPU Up to 10 Gigabit
optimized)

Confluent Control Use EBS (SSD,


m5.2xlarge 32GB 8 vCPU Up to 2,120 Mbps
Center optimized),

In past versions of this document, we also recommended using “Storage Optimized” instances with local SSDs
for Kafka brokers. At the time we had concerns about EBS stability, latency and throughput. Our experience
Z in the past year has shown that EBS is stable and can deliver the latency and throughput Confluent Platform
users require. AWS gives you a choice of 4 tiers of EBS performance and “EBS Optimized Instances” with QoS
guarantees, which is useful in cases where consistent storage performance is important.

15 ©2018 Confluent, Inc.


Microsoft Azure

Conmponent Node Type Memory CPU Storage Network

ZooKeeper Standard_DS 2 7GB 2 vCPU 4 x 14GB SSD High

16 x 56GB SSD
Standard_DS 4 28GB 8 vCPU High
or WASB Stroage
Kafka broker
16 X 400GB
Standard_D4_v2 28GB 8 vCPU High
or WASB Storage

Kafka Connect Standard_A3 7GB 4 vCPU 8 X 285GB High

Confluent Schema
Standard_A2 3.5GB 2 vCPU 4 x 135GB Moderate
Registry

Confluent REST
Standard_D4 28GB 8 vCPU 16 x 500GB High
Proxy

Confluent Control 16 x 56GB SSD


Standard_DS 4 28GB 8 vCPU High
Center or WASB Storage

Google Cloud Compute Engine

Conmponent Node Type Memory CPU Storage Network

Max 16 disks, up to
ZooKeeper n1-Standard-2 7.5GB 2 vCPU
64TB each

Max 16 disks, up to
Kafka broker n1-Highmem- 4 26GB 4 vCPU
64TB each

Max 16 disks, up to
Kafka Connect n1-Standard- 4 15GB 4 vCPU
64TB each

Confluent REST Max 16 disks, up to


n1-standard- 4 15GB 4 vCPU
Proxy 64TB each

Confluent Schema Max 16 disks, up to


n1-standard- 2 7.5GB 2 vCPU
Registry 64TB each

Max 16 disks, up to
Confluent KSQL n1-highmem- 4 26GB 4 vCPU
64TB each

Confluent Control Max 16 disks, up to


n1-highmem- 8 52GB 8 vCPU
Center 64TB each

16 ©2018 Confluent, Inc.


Conclusion

This paper is intended to share some of our best practices around the deployment of Confluent Platform. Of course, each use case
and workload is slightly different and the best architectures are tailored to the specific requirements of the organization. When
designing an architecture consideration such as workload characteristics, access patterns and SLAs are very important — but are
too specific to cover in a general paper. To choose the right deployment strategy for specific cases, we recommend engaging with
Confluent’s professional services for architecture and operational review

17 Apache, Apache Kafka, Kafka, and associated open source project names are trademarks of the Apache Software Foundation. ©2018 Confluent, Inc.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy