Apache Kafka Confluent Enterprise Ref Architecture
Apache Kafka Confluent Enterprise Ref Architecture
Introduction 3
ZooKeeper 4
Kafka Brokers 4
Kafka Clients 5
Confluent Replicator 6
Capacity Planning 10
Storage 10
Memory 11
CPU 12
Network 12
Large Cluster 13
Small Cluster 14
Cloud Deployment 14
Microsoft Azure 16
Conclusion 17
This white paper provides a reference for data architects and system administrators who are planning to deploy Apache Kafka® and
Confluent Platform in production. We discuss important considerations for production deployments and provide guidelines for the
selection of hardware and cloud instances. We also provide recommendations on how to deploy the Kafka Connect API in production,
as well as components of Confluent Platform that integrate with Apache Kafka, such as the Confluent Schema Registry, Confluent
REST Proxy and Confluent Control Center.
Confluent Platform includes Apache Kafka, as well as selected software projects that are frequently used with Kafka, which makes
Confluent Platform a one-stop shop for setting up a production-ready streaming platform. These projects include clients for C, C++,
Python, and Go programming languages; Connectors for JDBC, Elasticsearch, and HDFS; Confluent Schema Registry for managing
metadata for Kafka topics; and Confluent REST Proxy for integrating with web applications.
Confluent Enterprise takes this to the next level by addressing requirements of modern enterprise streaming applications. It
includes Confluent Control Center for end-to-end monitoring and management, Confluent Replicator for managing multi-datacenter
deployments, and Confluent Auto Data Balancing for optimizing resource utilization and easy scalability.
We’ll start describing the architecture from ground level up and for each component we’ll explain when it is needed and the best
plan for deploying it in several scenarios. We will not discuss capacity or hardware recommendations at this point, as this will be
discussed in depth in the next section. In addition, you should refer to the Confluent documentation for installation instructions.
ZooKeeper
ZooKeeper is a centralized service for managing distributed processes and is a mandatory component in every Apache Kafka cluster.
While the Kafka community has been working to reduce the dependency of Kafka clients on ZooKeeper, Kafka brokers still use
ZooKeeper to manage cluster membership and elect a cluster controller.
In order to provide high availability, you will need at least 3 ZooKeeper nodes (allowing for one-node failure) or 5 nodes (allowing
for two-node failures). All ZooKeeper nodes are equivalent, so they will usually run on identical nodes. Note that the number of
ZooKeeper nodes MUST be odd.
Kafka Brokers
Kafka brokers are the main storage and messaging components of Apache Kafka. Kafka is a streaming platform that uses
messaging semantics. The Kafka cluster maintains streams of messages called topics; the topics are sharded into partitions
(ordered, immutable logs of messages) and the partitions are replicated and distributed for high availability. The servers that run the
Kafka cluster are called brokers.
You will usually want at least 3 Kafka brokers in a cluster, each running on a separate server. This way you can replicate each Kafka
partition at least 3 times and have a cluster that will survive a failure of 2 nodes without data loss. Note that with 3 Kafka brokers, if
any broker is not available, you won’t be able to create new topics with 3 replicas until all brokers are available again. For this reason,
if you have use-cases that require creating new topics frequently, we recommend running at least 4 brokers in a cluster.
If the Kafka cluster is not going to be highly loaded, it is acceptable to run Kafka brokers on the same servers as the ZooKeeper
nodes. In this case, it is recommended to allocate separate disks for ZooKeeper (as we’ll specify in the hardware recommendations
below). For high-throughput use cases we do recommend installing Kafka brokers on separate nodes.
• Standalone mode: This is similar to how Logstash or Apache Flume are deployed. If you need to get logs from a specific
machine to Kafka, you run Kafka Connect with a file connector or spooling directory connector on the machine and it reads
the files and send events to Kafka.
• Cluster mode: This is the recommended deployment mode for Kafka Connect in production. You install Kafka Connect and
its Connectors on several machines. They discover each other, with Kafka brokers serving as the synchronization layer, and
they automatically load-balance and failover work between them. To start and stop Connectors anywhere on the cluster, you
connect to any Kafka Connect node (known as “worker”) and use a REST API to start, stop, pause, resume, and configure
Connectors. No matter which node you use to start a connector, Kafka Connect will determine the optimal level of parallelism
for the Connector and start parallel tasks to pull or push data as needed on the least loaded available worker nodes.
In standalone mode you will deploy Kafka Connect on the servers that contain the files or applications you want to integrate with.
In cluster mode, Kafka Connect will usually run on a separate set of machines, especially if you plan to run multiple Connectors
simultaneously. As Kafka Connect workers are stateless, they can also be safely deployed in containers.
Kafka Clients
Apache Kafka clients are used in the applications that produce and consume events. The Apache Kafka’s Java client JARs are included
in the Confluent Platform Kafka packages and are installed alongside Kafka brokers, but they are typically deployed with the application
that imports them by adding the client libraries as application dependencies using a build manager such as Apache Maven.
At the core of Confluent’s other clients (C, C++, Python, and Go) is librdkafka, Confluent’s C/C client for Apache Kafka. Librdkafka is
an open-source, well-proven, reliable and high performance client. By basing our Python (confluent-kafka-python) and Go (confluent-
kafka-go) clients on librdkafka, we provide consistent APIs and semantics, high performance and high-quality clients in various
programming languages.
Confluent Platform includes librdkafka packages and these should be installed on servers where applications using the C, C++,
Python or Go clients will be installed.
Kafka Streams is a library that is embedded in the application code (just like Jetty, for instance), and as such, you don’t need to
allocate Kafka Streams servers, but you do need to allocate servers for the applications that use Kafka Streams library (or at least
resources for their containers). Kafka Streams will use parallel-running tasks for the different partitions and processing stages in the
application, and as such will benefit from higher core count. If you deploy multiple instances of the application on multiple servers
(recommended!), Kafka Streams library will handle load-balancing and failover automatically. In order to maintain its application state,
Kafka Streams uses embedded RocksDB database. It is recommended to use persistent SSD disks for the RocksDB storage.
KSQL is typically deployed on a set of servers that form a cluster. The number of servers in the cluster will be determined by the
processing capacity required — this includes the number of concurrent queries that will execute on the cluster, as well as the
complexity of the queries. KSQL servers are essentially instances of a KafkaStreams application, so you can expect similar behaviors.
Like Kafka Streams, the KSQL cluster will handle load balancing and failover between the nodes in the clusters automatically. It will
benefit from higher CPU counts, good network throughput and SSD storage for the RocksDB state store.
The REST Proxy is typically deployed on a separate set of machines. For additional throughput and high availability, it is
recommended to deploy multiple REST Proxy servers behind a sticky load balancer. When using the high-level consumer API, it is
important that all requests to the same consumer will be directed to the same REST Proxy server, so use of a “sticky” load balancing
policy is recommended.
Since REST Proxy servers are stateless, they can also be safely deployed in containers.
Schema Registry is typically installed on its own servers, although for smaller installations it can safely be installed alongside
Confluent REST Proxy and Kafka Connect workers. For high availability, you’ll want to install multiple Schema Registry servers.
With multiple servers, the Schema Registry uses a leader-followers architecture. In this configuration, at most one Schema Registry
instance is leader at any given moment. Only the leader is capable of publishing writes to the underlying Kafka log, but all nodes are
capable of directly serving read requests. Follower nodes forward write requests them to the current leader. Schema Registry stores
all its schemas in Kafka, and therefore Schema Registry nodes do not require storage and can be deployed in containers.
Confluent Replicator
Confluent Replicator is a new component added to Confluent Enterprise to help manage multi-cluster deployments of Confluent
Platform and Apache Kafka. It provides a centralized configuration of cross-cluster replication. Unlike Apache Kafka’s MirrorMaker, it
replicates topic configuration in addition to the messages in the topics.
Confluent Replicator is integrated with the Kafka Connect framework and should be installed on the Connect nodes in the destination
cluster. If there are multiple Connect worker nodes, Replicator should be installed on all of them. By installing Replicator on a larger
number of nodes, Replicator will scale to replicate at higher throughput and will be highly available through a built-in failover mechanism.
Auto Data Balancing can be installed on any machine in the Confluent Platform cluster — it just needs to be able to communicate
with the Kafka brokers and ZooKeeper to collect load metrics and send instructions to move partitions. For convenience, we
recommend installing it alongside Kafka brokers or on the Confluent Control Center node if available.
• Data Stream Monitoring and Alerting: You can use Control Center to monitor your data streams end to end, from producer
to consumer. Use Control Center to verify that every message sent is received (and received only once), and to measure
system performance end to end. Drill down to better understand cluster usage, and identify any problems. Configure alerts
to notify you when end-to-end performance does not match SLAs or measure whether messages sent were received.
• Multi-cluster monitoring and management: A single Control Center node can monitor data flows in multiple clusters and
manage data replication between the clusters.
• Kafka Connect configuration: You can also use Control Center to manage and monitor Kafka Connect: the open source
toolkit for connecting external systems to Kafka. You can easily add new sources to load data from external data systems
and new sinks to write data into external data systems. Additionally, you can manage, monitor, and configure connectors
with Confluent Control Center.
Confluent Control Center currently runs on a single machine, and due to the resources required we recommend dedicating a separate
machine for Control Center.
Kafka Streams
App
Kafka brokers
ZooKeeper Nodes
Confluent Control
ZK ZK ZK ZK ZK Center
This architecture was built to scale. Each component is given its own servers, and if any layer becomes overly loaded it can be scaled
independently simply by adding nodes to that specific layer. For example, when adding applications that use the Confluent REST
Proxy, you may find that the REST Proxy no longer provides the required throughput while the underlying Kafka brokers still have
spare capacity. In that case, you only need to add REST Proxy nodes in order to scale your entire platform.
Confluent KSQL
Application Microservice
KSQL KSQL
Server Server
Confluent Platform
Connect Connect
+ +
Replicator Replicator
+ +
Connectors Connectors
+ +
Schema Registry Schema Registry Kafka Streams
+ + App
REST Proxy REST Proxy
Microservice
Kafka brokers
ZK + ZK + ZK +
Broker + Broker + Broker + Confluent Control
Rebalancer Rebalancer Rebalancer Center
Note that this architecture, although requiring far fewer servers, provides the high-availability of a full-scale cluster. As the use case
expands, you will notice bottlenecks develop in the system. In this case the correct approach is to start by separating the bottleneck
components to their own servers, and when further growth is required, scale by adding servers to each component. With time, this
architecture will evolve to resemble the recommended large scale architecture.
Performance can be monitored via Confluent Control Center and any third party JMX monitoring tool. Confluent Control Center
enables you to configure alerts when SLAs are not met, which will allow you to take action proactively.
Storage
Storage is mostly a concern on ZooKeeper and Kafka brokers.
For ZooKeeper the main concern is low latency writes to the transaction log. Therefore we recommend dedicated disks, specifically
for storing the ZooKeeper transaction log (even in small scale deployment where ZooKeeper is installed alongside Kafka Brokers).
Kafka brokers are the main storage for the Confluent Platform cluster and therefore usually require ample storage capacity. Most
deployments use 6-12 disks, usually 1TB each. The exact amount of storage you will need obviously depends on the number of
topics, partitions, the rate at which applications will be writing to each topic and the retention policies you configure.
You also want to consider the type of storage. SSD and spinning magnetic drives offer different performance characteristics and
depending on your use case, SSD performance benefits may be worth their higher cost. While Kafka brokers write sequentially to
each partition, most deployments store more than one partition per disk and if your use case requires Kafka brokers to access disk
frequently, minimizing seek times will increase throughput.
When selecting a file system, we recommend either EXT4 or XFS — both have been tested and used extensively in production
Kafka clusters.
Note that the use of shared-storage devices, while supported, is not recommended. Confluent Platform is not tested with SAN/NAS
and very careful configuration is required in order to achieve good performance and availability when using shared storage.
Confluent Control Center relies on local state in RocksDB. We recommend at least 300GB of storage space, preferably SSDs. All local
data is kept in the directory specified by the confluent.controlcenter.data.dir config parameter.
Kafka Streams and Confluent KSQL are both stateful and use RocksDB as a local persistent state store. The exact use of storage
depends on the specific streams application. Aggregation, windowed aggregation and windowed join all use RocksDB stores to store
their state. The size used will depend on the number of partitions, unique keys in the stream (cardinality), size of keys and values
and the retention for windowed operations (specified in the DSL using until operator). Note that Kafka Streams uses quite a few
file descriptors for its RocksDB stores, so make sure to increase number of file descriptors to 64K or above. Since calculating exact
usage is complex, we typically allocate generous disk space to streams application to allow for ample local state. 100-300GB is a
good starting point for capacity planning. Our KSQL performance tests use SSD storage for RocksDB instances, so we recommend
this configuration.
ZooKeeper uses the JVM heap, and 4GB RAM is typically sufficient. Too small of a heap will result in high CPU due to constant garbage
collection while too large heap may result in long garbage collection pauses and loss of connectivity within the ZooKeeper cluster.
Kafka brokers use both the JVM heap and the OS page cache. The JVM heap is used for replication of partitions between brokers
and for log compaction. Replication requires 1MB (default replica.max.fetch.size) for each partition on the broker. In Apache
Kafka 0.10.1 (Confluent Platform 3.1), we added a new configuration (replica.fetch.response.max.bytes) that limits
the total RAM used for replication to 10MB, to avoid memory and garbage collection issues when the number of partitions on a
broker is high. For log compaction, calculating the required memory is more complicated and we recommend referring to the Kafka
documentation if you are using this feature. For small to medium-sized deployments, 4GB heap size is usually sufficient. In addition,
it is highly recommended that consumers always read from memory, i.e. from data that was written to Kafka and is still stored in
the OS page cache. The amount of memory this requires depends on the rate at this data is written and how far behind you expect
consumers to get. If you write 20GB per hour per broker and you allow brokers to fall 3 hours behind in normal scenario, you will want
to reserve 60GB to the OS page cache. In cases where consumers are forced to read from disk, performance will drop significantly.
Kafka Connect itself does not use much memory, but some connectors buffer data internally for efficiency. If you run multiple
connectors that use buffering, you will want to increase the JVM heap size to 1GB or higher.
The more memory you give Confluent Control Center the better but we recommend at least 32GB of RAM. The JVM heap size can be
fairly small (defaults to 3GB) but the application needs the additional memory for RocksDB in-memory indexes and caches as well as
OS page cache for faster access to persistent data.
Kafka Producer Clients can benefit from generous JVM heap sizes to achieve high throughput. Our clients attempt to batch data
as it is sent to brokers in order to use the network more efficiently, and in addition they store messages in memory until they are
acknowledged successfully by the brokers. Having sufficient memory for the producer buffers will allow the producer to keep retrying
to send messages to the broker in events of network issues or leader election rather than block or throw exceptions.
Kafka Streams and Confluent KSQL have several memory areas and the total memory usage will depend on your specific streams
application and on the configuration. Starting with Apache Kafka 0.10.1 (Confluent Platform 3.1) and higher there is a streams buffer
cache. It defaults to 10MB and controlled through cache.max.bytes.buffering configuration. Setting it higher will generally result in
better performance for your streams application. In addition, streams uses RocksDB memory stores for each partition involved in
each aggregation, windowed aggregation and windowed-join. Kafka Streams exposes the RocksDB configuration and we recommend
using the RocksDB tuning guide to size those. In addition, Kafka Streams uses a Kafka consumer for each thread you configure
for your application. Each consumer allocates the lower of either 1MB per partition or 50MB per broker. Since calculating all these
variables is complex and since more memory generally increases performance of streams applications, we typically allocate large
amounts of memory — 32GB and above.
Confluent REST Proxy buffers data for both producers and consumers. Consumers use at least 2MB per consumer and up to 64MB
in cases of large responses from brokers (typical for bursty traffic). Producers will have a buffer of 64MB each. Start by allocating
1GB RAM and add 64MB for each producer and 16MB for each consumer planned.
Note that in all cases, we recommend using the G1 garbage collection for the JVM to minimize garbage collection overhead.
Note that many components are multi-threaded and will benefit more from large number of cores than from faster cores.
Network
Large-scale Kafka deployments that are using 1GbE will typically become network-bound. If you are planning on scaling the cluster
to over 100MB/s, you will need to provision a higher bandwidth network. When provisioning for network capacity, you will want to
take into account the replication traffic and leave some overhead for rebalancing operations and bursty clients. Network is one of
the resources that are most difficult to provision since adding nodes will eventually run against switch limitations, therefore consider
enabling compression to get better throughput from existing network resources. Note that Kafka Producer will compress messages
in batches, so configuring the producer to send larger batches will result in better compression ratio and improved network utilization.
Large Cluster
At least 16 cores to
1GB overhead + 64MB handle http requests in
At least 2 for HA,
Other than installation, per producer and parallel and background
Confluent Rest Proxy more for additional
not needed. 16MB per consumer threads for consumers
throughput
and producers.
Today many deployments run on public clouds where node sizing is more flexible than ever before. The hardware recommendations
discussed earlier are applicable when provisioning cloud instances. Special considerations to take into account are:
• Cores: Take into account that cloud providers use “virtual” cores when sizing machines. Those are typically weaker than
modern cores you will use in your data center. You may need to scale the number of cores up when planning cloud
deployments.
• Network: Most cloud providers only provide 10GbE on their highest tier nodes. Make sure your cluster has sufficient nodes
and network capacity to provide the throughput you need after taking replication traffic into account.
• Below are some examples of instance types that can be used in various cloud providers. Note that cloud offerings
continuously evolve and there are typically variety of nodes with similar characteristics. As long as you adhere to the
hardware recommendations, you will be in good shape. The instance types below are just examples.
While the examples below show each component on a separate node, some operations teams prefer to standardize on a single instance
type. This approach makes automation easier, but it does require standardizing on the largest required instance type. In this case, you
can choose to co-locate some services together, as long as there are sufficient resources for all components on the instance.
You’ll need multiple instances of each node. Previous recommendations regarding number of Kafka brokers, Confluent REST Proxy
servers, Kafka Connect workers, etc still apply.
Confluent REST
c5.xlarge 8GB 4 vCPU Use EBS Up to 2,250 Mbps
Proxy
Confluent Schema
m5.large 8GB 2 vCPU Use EBS Up to 2,120 Mbps
Registry
In past versions of this document, we also recommended using “Storage Optimized” instances with local SSDs
for Kafka brokers. At the time we had concerns about EBS stability, latency and throughput. Our experience
Z in the past year has shown that EBS is stable and can deliver the latency and throughput Confluent Platform
users require. AWS gives you a choice of 4 tiers of EBS performance and “EBS Optimized Instances” with QoS
guarantees, which is useful in cases where consistent storage performance is important.
16 x 56GB SSD
Standard_DS 4 28GB 8 vCPU High
or WASB Stroage
Kafka broker
16 X 400GB
Standard_D4_v2 28GB 8 vCPU High
or WASB Storage
Confluent Schema
Standard_A2 3.5GB 2 vCPU 4 x 135GB Moderate
Registry
Confluent REST
Standard_D4 28GB 8 vCPU 16 x 500GB High
Proxy
Max 16 disks, up to
ZooKeeper n1-Standard-2 7.5GB 2 vCPU
64TB each
Max 16 disks, up to
Kafka broker n1-Highmem- 4 26GB 4 vCPU
64TB each
Max 16 disks, up to
Kafka Connect n1-Standard- 4 15GB 4 vCPU
64TB each
Max 16 disks, up to
Confluent KSQL n1-highmem- 4 26GB 4 vCPU
64TB each
This paper is intended to share some of our best practices around the deployment of Confluent Platform. Of course, each use case
and workload is slightly different and the best architectures are tailored to the specific requirements of the organization. When
designing an architecture consideration such as workload characteristics, access patterns and SLAs are very important — but are
too specific to cover in a general paper. To choose the right deployment strategy for specific cases, we recommend engaging with
Confluent’s professional services for architecture and operational review
17 Apache, Apache Kafka, Kafka, and associated open source project names are trademarks of the Apache Software Foundation. ©2018 Confluent, Inc.