0% found this document useful (0 votes)

44 views

Kafka

Uploaded by

Joseph Peter

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

44 views

Kafka

Uploaded by

Joseph Peter

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 5

Kafka

Contents

● What?
○ Concept
● Why?
○ Multiple Partitions (Scalability)
○ Low Latencies (Performance)
○ Fault-tolerant cluster (Resilience)
○ Intra-cluster replication (Resilience)
● How?
○ Maintaining Commit Log
○ Partitions
○ Partition Rebalance
● Setup?
○ Setup
■ Setting up Kafka
■ Creating multiple topics
■ Creating producer systems and sending messages from producer
■ Creating consumer systems and consuming messages from queue

Script

Scaler has a huge number of learners who use the platform parallely, and do multiple activities
like during live lectures they ask questions, send chat messages, give reactions, at other times
they solve problems for assignment and homework, check dashboard, check leaderboard, order
swags from Scaler store and all these activities for every student will need to be tracked and
monitored. These actions served as inputs for an array of applications on the backend such as
machine learning systems, search optimizers, and report generators which all play an important
role in enriching the user experience.
Other information like device from which the student is logging in, probably location updates,
and more such information, what would be a correct way to develop a system which can take all
these information and store them for future reference. This monitoring and storing of data
should not impact the performance of the application and students should not feel any lag or
slowness in the system.

Scaler wants to increase the number of students on the platform, you have to design a system
which should also work when the load becomes 10X the current load.

What to do now?

Of course, you talk to some of your fellow engineers or read something online to find the
solution, i.e., Setup a Messaging Queue. It can help you create a process to consume all the
activities done by students on the website and store them without impacting the performance of
the actual system.

Messaging Queue
A message queue is a form of asynchronous service-to-service communication used in
serverless and microservices architectures. Messages are stored on the queue until they are
processed and deleted. Each message is processed only once, by a single consumer. Message
queues can be used to decouple heavyweight processing, to buffer or batch work, and to
smooth spiky workloads.

We can have single of multiple senders sending messages and multiple receivers accepting the
message as well, messaging queue works asynchronously and helps in transferring the data
between multiple services.

An example of a single sender and multiple receiver via messaging queue is shown below.

There are multiple messaging queues available to be used like :

1. Kafka
2. AWS SQS
3. RabbitMQ
4. Tibco

We would be using Kafka as a messaging queue going ahead.

Kafka
Kafka is described by the official documentation as:

A distributed event streaming platform that lets you read, write, store, and process
events (also called records or messages in the documentation) across many machines.

At a high level, Apache Kafka allows you to publish and subscribe to streams of records, store
these streams in the order they were created, and process these streams in real time.

Now let’s dig a bit deeper.

Producer
Client application that push events into topics
Cluster
One or more servers (called brokers) running Apache Kafka

Topic
The method to categorize and durably store events. There are two types of topics: compacted
and regular. Records in compacted topics do not expire based on time or space bounds. Newer
topic messages update older messages that possess the same key and Apache Kafka does not
delete the latest message unless deleted by the user. For regular topics, records can be
configured to expire, deleting old data to free storage space.

Partition
The mechanism to distribute data across multiple storage servers (brokers). Messages are
indexed and stored together with a timestamp and ordered by the position of the message
within a partition. Partitions are distributed across a node cluster and are replicated to multiple
servers to ensure that Apache Kafka delivers message streams in a fault-tolerant manner

Consumers
Client applications which read and process the events from partitions. The Apache Kafka
Streams API allows writing Java applications which pull data from Topics and write results back
to Apache Kafka. External stream processing systems such as Apache Spark, Apache Apex,
Apache Flink, Apache NiFi and Apache Storm can also be applied to these message streams

Why ?

Multiple Partitions
By dividing a topic into multiple partitions, Apache Kafka provides load balancing over a pool of
servers. This allows you to scale production clusters up or down to fit your needs and to spread
clusters across geographic regions or availability zones.

Low Latencies
By decoupling data streams, Apache Kafka is able to deliver messages at network limited
throughput using a cluster of servers with extremely low latency (as low as 2ms).
Fault-tolerant cluster and Intra-cluster replication
Fault tolerance refers to the ability of a system (computer, network, cloud cluster, etc.) to
continue operating without interruption when one or more of its components fail.

The objective of creating a fault-tolerant system is to prevent disruptions arising from a single
point of failure, ensuring the high availability and business continuity of mission-critical
applications or systems.

Intra-cluster replication means one cluster replicates its own set of storage with another cluster
and its set of storage.

Apache Kafka makes the data highly fault-tolerant and durable in two main ways. First, it
protects against server failure by distributing storage of data streams in a fault-tolerant cluster.
Second, it provides intra-cluster replication because it persists the messages to disk.

Examples

We will build two different projects to understand the work and setup of Kafka.

1. Basic setup where we will have one producer and one consumer, both pushing and
consuming messages from a single Kafka topic. We will build and run this from our terminal.

2. We will try to create basic design for a distributed system containing different micro-services,
we will build a online test consultation service, which will have the following micro services :

● Dashboard for users to book a test -> BookingService

● Service which will accept bookings and allocate a lab for each test -> LabService
● Service which consume the user booking and store in DB for analytics purposes ->
DataService

BookingService will produce data and push messages to a Kafka topic, LabService and
DataService will consume the messages from the kafka topic and do their respective
tasks.

Kafka Using Spring Boot
No ratings yet
Kafka Using Spring Boot
136 pages
Exploring Hadoop Ecosystem (Volume 2): Stream Processing
From Everand
Exploring Hadoop Ecosystem (Volume 2): Stream Processing
Wei Liu
No ratings yet
Understanding Apache Kafka White Paper
No ratings yet
Understanding Apache Kafka White Paper
7 pages
Apache Kafka Documentation
No ratings yet
Apache Kafka Documentation
419 pages
Kafka Up and Running for Network DevOps: Set Your Network Data in Motion
From Everand
Kafka Up and Running for Network DevOps: Set Your Network Data in Motion
Eric Chou
No ratings yet
Apache Kafka - Introduction - Tutorialspoint
No ratings yet
Apache Kafka - Introduction - Tutorialspoint
3 pages
Apache Kafka - Introduction
No ratings yet
Apache Kafka - Introduction
2 pages
Confluent Certified Developer for Apache Kafka® Exam kit
From Everand
Confluent Certified Developer for Apache Kafka® Exam kit
PRIYANKA
No ratings yet
Apache Kafka Description
No ratings yet
Apache Kafka Description
36 pages
Advanced Apache Kafka: Engineering High-Performance Streaming Applications
From Everand
Advanced Apache Kafka: Engineering High-Performance Streaming Applications
Peter Jones
No ratings yet
Apache Kafka Long Polling
No ratings yet
Apache Kafka Long Polling
20 pages
Kafka Using Spring Boot v2
No ratings yet
Kafka Using Spring Boot v2
150 pages
Kafka Sparkstreaming
No ratings yet
Kafka Sparkstreaming
75 pages
Apache Kafka - PPT
No ratings yet
Apache Kafka - PPT
27 pages
Kafka
No ratings yet
Kafka
23 pages
Instaclustr Understanding Apache Kafka White Paper
No ratings yet
Instaclustr Understanding Apache Kafka White Paper
8 pages
Getting To Know Kafka: Ola Is The First Course in The Series of Courses Covering All The Aspects of Kafka
No ratings yet
Getting To Know Kafka: Ola Is The First Course in The Series of Courses Covering All The Aspects of Kafka
23 pages
Kafka Notes
No ratings yet
Kafka Notes
7 pages
Configuring Kafka For High Throughput
No ratings yet
Configuring Kafka For High Throughput
11 pages
01 - Chapter Introduction To AMQ Streams
No ratings yet
01 - Chapter Introduction To AMQ Streams
10 pages
Fundamentals and Architecture of Apache Kafka
No ratings yet
Fundamentals and Architecture of Apache Kafka
30 pages
Kafka
No ratings yet
Kafka
12 pages
Apache Kafka Beginner Guide
No ratings yet
Apache Kafka Beginner Guide
40 pages
4. Introduction to Apache Kafka and its setup (3)
No ratings yet
4. Introduction to Apache Kafka and its setup (3)
29 pages
Unit 5 Apache Kafka Notes
No ratings yet
Unit 5 Apache Kafka Notes
54 pages
08_Apache_Kafka
No ratings yet
08_Apache_Kafka
45 pages
KAFKA PRESENTATION (1)
No ratings yet
KAFKA PRESENTATION (1)
16 pages
Big Data-Kafka
No ratings yet
Big Data-Kafka
14 pages
Large Scale Data Pipelines
No ratings yet
Large Scale Data Pipelines
91 pages
Fast Data Processing Systems with SMACK Stack
From Everand
Fast Data Processing Systems with SMACK Stack
Raúl Estrada
No ratings yet
Kafka Patterns and Anti-Patterns
No ratings yet
Kafka Patterns and Anti-Patterns
7 pages
kafka
No ratings yet
kafka
43 pages
Mastering Kafka Streams: From Basics to Expert Proficiency
From Everand
Mastering Kafka Streams: From Basics to Expert Proficiency
William Smith
No ratings yet
kafka-overview
No ratings yet
kafka-overview
36 pages
Kafka Clustering v1.0.0
No ratings yet
Kafka Clustering v1.0.0
20 pages
AK
No ratings yet
AK
22 pages
Apache_Kafka_360_1631077800
No ratings yet
Apache_Kafka_360_1631077800
137 pages
Apache_kafka notes
No ratings yet
Apache_kafka notes
9 pages
KAFKAExample2
No ratings yet
KAFKAExample2
12 pages
Kafka Notes1
No ratings yet
Kafka Notes1
19 pages
Chapter 1 - Introduction To KAFKA: Objectives
No ratings yet
Chapter 1 - Introduction To KAFKA: Objectives
17 pages
Introduction To Apache Kafka
No ratings yet
Introduction To Apache Kafka
18 pages
Mastering Kubernetes
From Everand
Mastering Kubernetes
Manish Soni
No ratings yet
Apache Kafka
No ratings yet
Apache Kafka
17 pages
Introduction To Apache Kafka - 070224-1155-334
No ratings yet
Introduction To Apache Kafka - 070224-1155-334
7 pages
Apache Kafka Introduction
No ratings yet
Apache Kafka Introduction
21 pages
Apache Kafka
No ratings yet
Apache Kafka
38 pages
Mastering Apache Cassandra - Second Edition
From Everand
Mastering Apache Cassandra - Second Edition
Nishant Neeraj
No ratings yet
Mastering OpenStack: Design, deploy, and manage clouds in mid to large IT infrastructures
From Everand
Mastering OpenStack: Design, deploy, and manage clouds in mid to large IT infrastructures
Omar Khedher
No ratings yet
Real Time Analytics With Apache Kafka and Spark: Rahul Jain
No ratings yet
Real Time Analytics With Apache Kafka and Spark: Rahul Jain
54 pages
The Apache Kafka® and Generative AI Handbook
From Everand
The Apache Kafka® and Generative AI Handbook
Joseph Matthew Stein
No ratings yet
Apache Kafka | Thi Nguyen's Blog
No ratings yet
Apache Kafka | Thi Nguyen's Blog
39 pages
Advanced Java
From Everand
Advanced Java
Manish Soni
No ratings yet
Apache Kafka 101
No ratings yet
Apache Kafka 101
25 pages
Cours - Kafka
No ratings yet
Cours - Kafka
72 pages
5_kafka_2.7m
No ratings yet
5_kafka_2.7m
46 pages
Apache Kafka Tutorial
No ratings yet
Apache Kafka Tutorial
3 pages
Kafka for Distributed Systems: Definitive Reference for Developers and Engineers
From Everand
Kafka for Distributed Systems: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Kafka Mastery Guide: Comprehensive Techniques and Insights
From Everand
Kafka Mastery Guide: Comprehensive Techniques and Insights
Adam Jones
No ratings yet
JavaScript File Handling from Scratch: A Practical Guide with Examples
From Everand
JavaScript File Handling from Scratch: A Practical Guide with Examples
William E. Clark
No ratings yet
Ds-3E0526P-E 24-Port Gigabit Unmanaged Poe Switch: Feature and Function
No ratings yet
Ds-3E0526P-E 24-Port Gigabit Unmanaged Poe Switch: Feature and Function
5 pages
Se Guides Netra 402324
No ratings yet
Se Guides Netra 402324
351 pages
CC Unit3 ppt
No ratings yet
CC Unit3 ppt
43 pages
CCN Q Ba
No ratings yet
CCN Q Ba
5 pages
Chapter 5 (Class 8th)
100% (2)
Chapter 5 (Class 8th)
3 pages
14N271K
No ratings yet
14N271K
33 pages
01 MS 16JB+16J9 v2.0 G English
100% (1)
01 MS 16JB+16J9 v2.0 G English
60 pages
Oracle Installation VM 19c
No ratings yet
Oracle Installation VM 19c
3 pages
Smart Logger 1000
No ratings yet
Smart Logger 1000
1 page
HP 2000 Inventec 6050A2498701-MB-A02 AMD PDF
No ratings yet
HP 2000 Inventec 6050A2498701-MB-A02 AMD PDF
45 pages
Scan To SMB PDF
No ratings yet
Scan To SMB PDF
3 pages
Centos 7.3 Linux Server Guide - Gregory Blake
No ratings yet
Centos 7.3 Linux Server Guide - Gregory Blake
80 pages
Protection and Restoration in Optical Network: Ling Huang Hling@cs - Berkeley.edu
No ratings yet
Protection and Restoration in Optical Network: Ling Huang Hling@cs - Berkeley.edu
29 pages
Lecture-51 INTEL 8259A Programmable Interrupt Controller
No ratings yet
Lecture-51 INTEL 8259A Programmable Interrupt Controller
7 pages
Lecture 5: Cost, Price, and Price For Performance: Professor Randy H. Katz Computer Science 252 Spring 1996
No ratings yet
Lecture 5: Cost, Price, and Price For Performance: Professor Randy H. Katz Computer Science 252 Spring 1996
29 pages
External Data Recovery Services by Techchef
No ratings yet
External Data Recovery Services by Techchef
7 pages
Hitachi Storage Platform Matrix Product Line Card
100% (1)
Hitachi Storage Platform Matrix Product Line Card
2 pages
Perform Database Backups
No ratings yet
Perform Database Backups
7 pages
Assignment 7.1 Implement DHCPv4
No ratings yet
Assignment 7.1 Implement DHCPv4
6 pages
HW 8
0% (1)
HW 8
9 pages
Lecture 09
100% (1)
Lecture 09
35 pages
SAP Logon Navigation
No ratings yet
SAP Logon Navigation
13 pages
Oracle Audit Vault Installation Manual
No ratings yet
Oracle Audit Vault Installation Manual
14 pages
GCP DevOps - Interview Preparation Guide
No ratings yet
GCP DevOps - Interview Preparation Guide
10 pages
UNIT-3 (Networking With TCP-IP Notes)
No ratings yet
UNIT-3 (Networking With TCP-IP Notes)
24 pages
Arithmetic Logic Unit PDF
No ratings yet
Arithmetic Logic Unit PDF
5 pages
Mmcplus & Mmcmobile Card: Product Specification
No ratings yet
Mmcplus & Mmcmobile Card: Product Specification
15 pages
Voyager
No ratings yet
Voyager
18 pages
From:: T.Satish Kumar, B.Tech (E.E.E)
No ratings yet
From:: T.Satish Kumar, B.Tech (E.E.E)
5 pages
FMW 11gr1certmatrix
No ratings yet
FMW 11gr1certmatrix
112 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Kafka

Uploaded by

Kafka

Uploaded by

Kafka

There are multiple messaging queues available to be used like :

We would be using Kafka as a messaging queue going ahead.

Now let’s dig a bit deeper.

● Dashboard for users to book a test -> BookingService

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.