Class 3 Cassandra
Class 3 Cassandra
Systems
Section 3: Apache Cassandra
Instructor: Yutong Zhao
Outline
1. Introduction to Cassandra
2. Architecture & Data Model
3. Working with Cassandra
4. Querying Data with CQL
5. Indexing & Performance Optimization
6. Replication, Consistency, and Security
7. Backup, Monitoring & Recovery
8. Advanced Topics & Real-world Use
Cases
2
1. Introduction to Cassandra
3
What is Cassandra?
• Apache Cassandra is a distributed, NoSQL database designed for handling large volumes of
structured data across many servers without a single point of failure.
•Originally developed at Facebook to power their inbox search, later open-sourced and adopted by the
Apache Software Foundation.
•Key Characteristics:
• Scalability: Easily scales horizontally by adding more nodes.
• High Availability: No single point of failure; data is replicated across multiple nodes.
• NoSQL Model: Uses a schema-less, column-family-based storage model.
• Optimized for Write-Intensive Workloads: Handles high-speed inserts and updates efficiently.
• Eventual Consistency: Ensures availability over strict consistency (CAP theorem).
• Used By: Netflix, Twitter, eBay, Uber, and many more for high-performance, globally
distributed applications.
4
Key Features of Cassandra (1)
5
Key Features of Cassandra (2)
6
When to use Cassandra?
7
Cassandra vs. Relational Database (RDBMS)
8
Cassandra vs. Other NoSQL Database
9
Outline
1. Introduction to Cassandra
2. Architecture & Data Model
3. Working with Cassandra
4. Querying Data with CQL
5. Indexing & Performance Optimization
6. Replication, Consistency, and Security
7. Backup, Monitoring & Recovery
8. Advanced Topics & Real-world Use
Cases
1
2. Architecture & Data Model
11
Architecture Overview: Nodes, Partitions
1
Architecture Overview: Replication
1
Gossip Protocol – How Nodes Communicate
1
Hinted Handoff & Read
Repair
1
Cassandra’s Data Model
1
Key Features of Cassandra
1
Outline
1. Introduction to Cassandra
2. Architecture & Data Model
3. Working with Cassandra
4. Querying Data with CQL
5. Indexing & Performance Optimization
6. Replication, Consistency, and Security
7. Backup, Monitoring & Recovery
8. Advanced Topics & Real-world Use
Cases
1
3. Working with Cassandra
19
Creating a Keyspace
2
Creating a Table with Primary and Clustering
Keys
• product_id is the Partition
Key, which decides which
node stores the data.
2
Best Practices for Schema Design
2
Best Practices for Schema Design
2
Partitioning Strategies for Performance
Optimization
2
Denormalization vs. Normalization in
Cassandra
2
Outline
1. Introduction to Cassandra
2. Architecture & Data Model
3. Working with Cassandra
4. Querying Data with CQL
5. Indexing & Performance Optimization
6. Replication, Consistency, and Security
7. Backup, Monitoring & Recovery
8. Advanced Topics & Real-world Use
Cases
2
4. Querying Data with CQL
27
Inserting and Querying Data
2
CRUD Operations in Cassandra
2
Query Data with CQL – Basic SELECT
3
Query Data with CQL – Filtering Data
• Filtering Data:
3
Query Data with CQL - Aggregation
3
Lightweight Transactions (LWT) – Ensuring
Consistency
3
Using TTL (Time-to-Live) for Expiring Data
3
Batch Queries: Benefits and Pitfalls
3
Outline
1. Introduction to Cassandra
2. Architecture & Data Model
3. Working with Cassandra
4. Querying Data with CQL
5. Indexing & Performance Optimization
6. Replication, Consistency, and Security
7. Backup, Monitoring & Recovery
8. Advanced Topics & Real-world Use
Cases
3
5. Indexing & Performance Optimization
37
Secondary Index
3
Materialized
View
3
SASI Indexes – Advanced Searching
4
Read/Write Path Internals – Memtables,
SSTables, and Commitlogs
4
Outline
1. Introduction to Cassandra
2. Architecture & Data Model
3. Working with Cassandra
4. Querying Data with CQL
5. Indexing & Performance Optimization
6. Replication, Consistency, and Security
7. Backup, Monitoring & Recovery
8. Advanced Topics & Real-world Use
Cases
4
6. Replication, Consistency, and Security
43
Performance Optimization & Data Replication
4
Replication Strategies in Cassandra
4
Understanding Consistency Levels
4
Role-Based Access Control (RBAC)
& User Management
4
TLS Encryption & Authentication for Secure
Cassandra
4
Audit Logging & Monitoring Access
4
Tuning Read & Write Performance
5
Using Caching & Compaction Strategies
5
Outline
1. Introduction to Cassandra
2. Architecture & Data Model
3. Working with Cassandra
4. Querying Data with CQL
5. Indexing & Performance Optimization
6. Replication, Consistency, and Security
7. Backup, Monitoring & Recovery
8. Advanced Topics & Real-world Use
Cases
5
7. Backup, Monitoring & Recovery
53
Backup and Restore in Cassandra
5
Multi-Data Center Replication for Global
Availability
5
Handling Node Failures in Production
5
Security Features in Cassandra
5
Monitoring and Troubleshooting Cassandra
5
Advanced Data Modeling in Cassandra
5
Integration with Other
Tools
6
Outline
1. Introduction to Cassandra
2. Architecture & Data Model
3. Working with Cassandra
4. Querying Data with CQL
5. Indexing & Performance Optimization
6. Replication, Consistency, and Security
7. Backup, Monitoring & Recovery
8. Advanced Topics & Real-world Use
Cases
6
8. Advanced Topics & Real-world Use
Cases
62
Case Studies of Cassandra in Production
6
6