0% found this document useful (0 votes)

164 views6 pages

Cassandra Notes

Cassandra is a distributed database that stores huge datasets across commodity servers in a way that maintains high availability and no single point of failure. It uses a decentralized architecture with no master node and provides tunable consistency. Data is distributed across nodes through consistent hashing and replicated for fault tolerance. The main interfaces to Cassandra are CQL, a SQL-like language, and Thrift.

Uploaded by

Amit Shah

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as TXT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

164 views6 pages

Cassandra Notes

Uploaded by

Amit Shah

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as TXT, PDF, TXT or read online on Scribd

You are on page 1/ 6

http://wiki.apache.

org/cassandra/ArticlesAndPresentations
http://docs.datastax.com/en/landing_page/doc/landing_page/current.html

Info from website: http://kkovacs.eu/cassandra-vs-mongodb-vs-couchdb-vs-redis

================================================
Written in: Java
Main point: Store huge datasets in "almost" sql
License: Apache
Protocol: CQL3 & Thrift
- CQL3 is very similar SQL, but with some limitations that come from the
scalability (most notably: no JOINs, no aggregate functions.)
- CQL3 is now the official interface. Don't look at Thrift, unless you're working
on a legacy app. This way, you can live without understanding ColumnFamilies,
SuperColumns, etc.
- Querying by key, or key range (secondary indices are also available)
- Tunable trade-offs for distribution and replication (N, R, W)
- Data can have expiration (set on INSERT).
- Writes can be much faster than reads (when reads are disk-bound)
- Map/reduce possible with Apache Hadoop
- All nodes are similar, as opposed to Hadoop/HBase
- Very good and reliable cross-datacenter replication
- Distributed counter datatype.
- You can write triggers in Java.
Best used: When you need to store data so huge that it doesn't fit on server, but
still want a friendly familiar interface to it.
For example: Web analytics, to count hits by hour, by browser, by IP, etc.
Transaction logging. Data collection from huge sensor arrays.
================================================

A) Cassandra Architecture: -

Cassandra
- A distributed database.
- There is no master-slave concept and each node is equal.
- A cluster can easily be across more than one data center.

Snitch
- It is, How the nodes in a cluster know about the topology of the cluster.
- There is no master-slave concept and each node is equal.
- Type: Dynamic Snitching, SimpleSnitch, RackInferring Snitch, PropretyFileSnitch,
GossipingPropertyFileSnitch, EC2Snitch, EC2MultiRegionSnitch

Gossip (Internal communication)

- It is, How the nodes in a cluster communicates with each other.
- Every one second, each node communicates with up to three other nodes, exchanging
information about itself and all other nodes that it has information about.
Note: For External communication, such as from an application to C* database,
CQL(Cassandra Query Language) or Thrift are used.

Data Distribution
- It is done through consistent hashing, to strive for even distribution of data
across the nodes in cluster.
- Rather than all rows of a table existing on only one node, the rows are
distributed across the nodes in cluster, in an attempt to evenly spread out the
load of the table's data.
- To distribute the rows across the nodes, a partitioner is used. The partitioner
uses an algorithm to determine which node a given row of data will go to
- The default partitioner in cassandra is Murmur3
Murmur3: It takes the value in the first column of the row to generate a unique
number between -2^63 and 2^63.
Calculate the token ranges: -
<In below python formula it is calculated for 4 nodes, you can replace it with
actual number of nodes for your env.>
$ python -c 'print [str(((2**64 / 4) * i - 2**63) for in range(4)]'
['-9223372036854775808', '-4611686018427387904', 0, '461686018427387904']
OR
Use a Murmur3 calculator
- Each nodes in a cluster is assigned one token range. (OR multiple ranges with
virtual nodes)
e.g.: Each node is responsible for the token range between its endpoint and the
endpoint of the previous node.
Node wise endpoint is defined below.
NodeA: -100
NodeB: 0
NodeC: 51
NodeD: 100
-> NodeA can store value from value greater than 100 in +ve and value less than
-100 in -ve
-> NodeB can store value from -99 to 0
-> NodeC can store value from 1 to 51
-> NodeD can store value from 52 to 100

Replication Factor:
- It must be specified whenever a database is defined.
- It specifies how many instances of the data there will be within a given
database.
- Although 1 can be specified, it is common to specify 2,3, or more so that if a
node goes down, there is at least one other replica of the data, so that the data
is not lost with down node.

Virtual Nodes:
- They are alternative way to assign token ranges to nodes, and "Virtual Nodes" are
now the default in Cassandra.
- With Virtual Nodes, instead of a node being responsible for just one token range,
it is instead responsible for many small token range (by default, 256 of them)
- Virtual Nodes allow for assigning a high number of ranges to a powerful
computer(e.g. 512) and a lower number of ranges (e.g. 128) to a less powerful
computer
- Virtual Nodes (aka vnodes) were created to make it easier to add new nodes to a
cluster while keeping the cluster balanced
- When a new node is added, it receives many small token range slices from the
existing nodes, to maintain a balanced cluster

===================================================================================
=========================================================================
B) Installing and Configuring

Installation: -
- http://www.planetcassandra.org/cassandra/
- Where you unzip the folder Casssndra is installed in that directory.

Configuration: -
- Go inside conf directory to see configuration files.
(/Users/ashah/cassandra/dsc-cassandra-3.0.0/conf)
- cassandra.yaml is main configuration file.
File permission: -
<if you have modified cassandra.yaml as per below then create those directories
and give permission>
- sudo mkdir /var/lib/cassandra
- sudo mkdir /var/log/cassandra
- sudo chown -R $USER:$GROUP /var/lib/cassandra
- sudo chown -R $USER:$GROUP /var/log/cassandra

Starting/Stoping Cassandra: -

Way 1)
-> Start
<for now it is via root user>
- $pwd
o/p:/Users/ashah/cassandra/dsc-cassandra-3.0.0
- bin/cassandra

-> Stop
- ps aux | grep cass
- kill <pid>

Way 2)
- start: bin/cassandra -f
- stop: control or command + c

Checking Status: -
- bin/nodetool status
- bin/nodetool info [-h <host>]
- bin/nodetool ring

Accessing the Cassandra system.log File

- Location: /Users/ashah/cassandra/dsc-cassandra-3.0.0/logs
- File name is system.log and debug.log.
- Current version:: Setting of log file direcgory: /Users/ashah/cassandra/dsc-
cassandra-3.0.0/conf/logback.xml
- Earlier version:: Setting of log file directory: /Users/ashah/cassandra/dsc-
cassandra-<x>/conf/log4j-server.properties

===================================================================================
=========================================================================
C) Communicating with Cassandra

Understanding ways to communicate with Cassandra: -

- CQL (Cassandra Query Langauge) is a SQL-like query language for communicating
with Cassandra, created to make it easy for people familiar with SQL to work with
Cassandra.
e.g.: select home_id, datetime, event, code_used from activity;
* CQL commands are not case-sensitive.
* Although CQL looks similar to SQL, it does not have all of these options as
SQL, due to the distributed nature of C* database.
- Thrift is a low-level API, currently still supported in Cassandra (support may be
phased out in future release of C*)(It exists before CQL)
- For Administrative activities, such as cluster monitoring and management tasks,
tool built on JMX (Java Management Extentions) are commonly used.

CQLSH: -
- bin/cqlsh
- cqlsh> HELP
- cqlsh> help create_keyspace
- Semicolon (";") is optional for CQLSH command but mandatory for CQL command.

===================================================================================
========================================================================
D) Creating a database

Understanding a Cassandra Database: -

- In C*, a database is defined as a keyspace -> Within keyspace tables can be
defined.
- Check existing keyspaces: -
cqlsh> describe keyspaces;
- To see inside keyspace: -
cqlsh> describe keyspace <name>;

Defining a keyspace: -
- A keyspace name is case sensitive only if you put it inside double quote
otherwise it will go in lower case.
e.g.: a) CREATE KEYSPACE "Test" :: This will be created as Test.
b) CREATE KEYSPACE Test :: This will be created as test.
- A keyspace can be defined through the create keyspace command.
->
CREATE KEYSPACE vehicle_traker WITH REPLICATION =
{'class':'NetworkTopologyStrategy', 'dc1':3, 'dc2':2};
<dc1 3 means data center 1 contains 3 replica of data and same way data center 2
contains 2 replica of data>
->
CREATE KEYSPACE vehicle_traker WITH REPLICATION = {'class':'SimpleStrategy',
'replication_factor':1}

Deleting a keyspace: -
- DROP KEYSPACE vehicle_tracker;

Working inside a keyspace: -

- USE <keyspace_name>

===================================================================================
=========================================================================
E) Creating a Table

Creating/dropping a Table: -
- CREATE TABLE activity
(home_id text, datetime timestamp, event text, code_used text PRIMARY
KEY(home_id, datatime)) WITH CLUSTERING ORDER BY (datetime DESC);
- DROP TABLE activity;

Defining Columns And Data Type: -

- Data types: ascii, bigint, blob, boolean, counter, decimal, double, float, inet,
int, list, map, set, text, timestamp, uuid, timeuuid, varchar, varint

Defining a primary key: -

- same as other database

Reconizing a partition key: -

- The partition key is hashed by the partitioner to determine which node in the
cluster will store the partition.
- The primary key column defines the partition key.
- For compound primary key, first column listed in primary key defines the
partition key.
-> How data is stored internally is that, all of the CQL rows that have the same
partition key value are stored in the same partition key (aka RowKey).

Specifying a descending clustering order

- A table can be defined to store its data in ascending (default) or descending
order
e.g.: WITH CLUSTERING ORDER BY (datetime DESC)
- Specifying descending causes writes to take a little longer, as cells are
inserted at the start of a partition, rather than added at the end, but improves
read performance when descending order needed by an application
- Once clustering order is defined, changing the clustering order of a table is not
an option.

===================================================================================
=========================================================================
F) Inserting Data

Understanding Ways to Write Data

- INSERT INTO (CQL command)
- COPY command
- sstableloader tool (bulk loading)

Using the INSERT INTO command

- Same as other DB Insert command.
e.g. : INSERT INTO activity (home_id, datetime , event, code_used) VALUES
('H01474777', '2014-05-21 07:32:16', 'alarm set', '5599');

Using the COPY command

- The COPY command can be used to import data (COPY FROM) from a .csv file.
e.g.: COPY activity (home_id, datetime , event, code_used) FROM
'/Users/ashah/events.csv' WITH header = true AND delimiter = '|';

- The COPY command can be used to export data (COPY TO) a .csv file.

How Data is stored in C*

- Internally, a partition key value(in Thrift, referred to as a row key value) is
what makes an internal storate row unique.

How Data is stored on Disk

- When data is written to a table in Cassandra, it goes to both a commit log on
disk(for playback, in case of node failure) and to memory(called memcache).
- Once the memcache for a table is full, it is flused to disk, as an SSTable
- For each table on each node there is a memcache
- The SSTables for a table are stored on disk, in the location specified in the
Cassandra.yaml file.
- To see the contents of an SSTable, sstable2json can be used. (looks like obsolate
in 3.0)
- To flush the content to disk use below command.
-> bin/nodetool flush home_security

===================================================================================
=========================================================================
G) Modelling Data

===================================================================================
=========================================================================
H) Creating an application

===================================================================================
=========================================================================

Apache Cassandra Administrator Associate - Exam Practice Tests
From Everand
Apache Cassandra Administrator Associate - Exam Practice Tests
Cristian Scutaru
No ratings yet
Watson Assistant Services Deep Dive and Lab PDF
100% (1)
Watson Assistant Services Deep Dive and Lab PDF
105 pages
Moving From Oracle Database SE2 To Enterprise Edition: Features and Benefits
No ratings yet
Moving From Oracle Database SE2 To Enterprise Edition: Features and Benefits
54 pages
2.1 Informix HighAvailability and Scalability
No ratings yet
2.1 Informix HighAvailability and Scalability
102 pages
DBX-3 0 3-DeployDBX
No ratings yet
DBX-3 0 3-DeployDBX
131 pages
Veritas Netbackup 6.5 For Oracle For Solaris PDF
No ratings yet
Veritas Netbackup 6.5 For Oracle For Solaris PDF
1 page
Azure Synapse
No ratings yet
Azure Synapse
609 pages
Apache Cassandra
No ratings yet
Apache Cassandra
7 pages
DRBD-Cookbook: How to create your own cluster solution, without SAN or NAS!
From Everand
DRBD-Cookbook: How to create your own cluster solution, without SAN or NAS!
Joerg Christian Seubert
No ratings yet
Cassandra Unit 4
No ratings yet
Cassandra Unit 4
18 pages
Using Netezza Query Plan
No ratings yet
Using Netezza Query Plan
5 pages
Infobright Best Practices
No ratings yet
Infobright Best Practices
36 pages
Highly Efficient Backups With Percona Xtrabackup
No ratings yet
Highly Efficient Backups With Percona Xtrabackup
34 pages
Oracle Exadata Database - Boas Praticas - Document 1067527.1
No ratings yet
Oracle Exadata Database - Boas Praticas - Document 1067527.1
137 pages
Infoscale Enterprise Vse+ Level Training: SF Cluster File System High Availability
No ratings yet
Infoscale Enterprise Vse+ Level Training: SF Cluster File System High Availability
23 pages
Xsolaris SPARC To Solaris x86 Porting Guide
No ratings yet
Xsolaris SPARC To Solaris x86 Porting Guide
38 pages
NetApp ALUA Configuration
No ratings yet
NetApp ALUA Configuration
30 pages
Ingres 0212
No ratings yet
Ingres 0212
120 pages
Adm CV
No ratings yet
Adm CV
2 pages
Basic Netapp Configuration and Administration (Bnca) : Who Should Attend
No ratings yet
Basic Netapp Configuration and Administration (Bnca) : Who Should Attend
6 pages
Study
No ratings yet
Study
11 pages
Hadoop, A Distributed Framework For Big Data
No ratings yet
Hadoop, A Distributed Framework For Big Data
55 pages
Hadoop Cluster - Architecture, Core Components
100% (1)
Hadoop Cluster - Architecture, Core Components
9 pages
Hands-On Keyboard 5 - Wireshark
No ratings yet
Hands-On Keyboard 5 - Wireshark
16 pages
B Perf Tuning Guide
No ratings yet
B Perf Tuning Guide
92 pages
Oracle DBA Training and Certification Guide - Oracle University
No ratings yet
Oracle DBA Training and Certification Guide - Oracle University
1 page
NetApp SnapMirror Strategic Customer Presentation PDF
No ratings yet
NetApp SnapMirror Strategic Customer Presentation PDF
20 pages
High Availability and Load Balancing For Postgresql Databases: Designing and Implementing.
100% (1)
High Availability and Load Balancing For Postgresql Databases: Designing and Implementing.
8 pages
NetBackup102 WebUIGuide MySQLAdmin
No ratings yet
NetBackup102 WebUIGuide MySQLAdmin
38 pages
Module 3 - IP SAN - FCIP and NAS - PPT (Updated) 2023 24
100% (1)
Module 3 - IP SAN - FCIP and NAS - PPT (Updated) 2023 24
67 pages
Sample Resume
No ratings yet
Sample Resume
3 pages
Resume Jyoti Hans PreSales
No ratings yet
Resume Jyoti Hans PreSales
3 pages
CV Shyamili Nandagopal Mainframe
No ratings yet
CV Shyamili Nandagopal Mainframe
4 pages
Cassandra Installation Review
No ratings yet
Cassandra Installation Review
6 pages
2020 Presentation EN
No ratings yet
2020 Presentation EN
43 pages
Setting Up Hadoop Cluster With Cloudera Manager and Impala
100% (2)
Setting Up Hadoop Cluster With Cloudera Manager and Impala
23 pages
CV Varsha Gupta 2 (1) (1) .7 Years Exp
No ratings yet
CV Varsha Gupta 2 (1) (1) .7 Years Exp
4 pages
Hadoop Interview Questions
No ratings yet
Hadoop Interview Questions
28 pages
Oracle Netapp Best Practices
No ratings yet
Oracle Netapp Best Practices
47 pages
Apache Cassandra Database - Instaclustr
No ratings yet
Apache Cassandra Database - Instaclustr
8 pages
Cloudera Administration
No ratings yet
Cloudera Administration
424 pages
Hadoop Interviews Q
No ratings yet
Hadoop Interviews Q
9 pages
Vsicm8 m07 Deploy Vms
No ratings yet
Vsicm8 m07 Deploy Vms
107 pages
Jboss Logs
100% (1)
Jboss Logs
2 pages
Ega Nikesh Reddy: Professional Summary
No ratings yet
Ega Nikesh Reddy: Professional Summary
4 pages
Openstack Object Storage Datasheet
No ratings yet
Openstack Object Storage Datasheet
1 page
NetApp - MetroCluster (Design and Implement)
100% (1)
NetApp - MetroCluster (Design and Implement)
40 pages
Consolidating HP Serviceguard For Linux and Oracle RAC 10g Clusters
100% (1)
Consolidating HP Serviceguard For Linux and Oracle RAC 10g Clusters
7 pages
Heena Sharma: Areer Rofile
No ratings yet
Heena Sharma: Areer Rofile
5 pages
HADOOP
100% (1)
HADOOP
35 pages
Amitkumar Chhaparwal: Professional Summary
No ratings yet
Amitkumar Chhaparwal: Professional Summary
4 pages
C# Introduction
No ratings yet
C# Introduction
36 pages
Linux PDF
No ratings yet
Linux PDF
48 pages
Course Catalog: August 2022
No ratings yet
Course Catalog: August 2022
12 pages
VSICM8_M03_Install_Config_ESXi_1123
No ratings yet
VSICM8_M03_Install_Config_ESXi_1123
23 pages
Ganesh Pati Resume - 2022
No ratings yet
Ganesh Pati Resume - 2022
4 pages
Module 8 - Database Services
No ratings yet
Module 8 - Database Services
33 pages
Flash Recovery Area - Space Management Warning and Alerts
No ratings yet
Flash Recovery Area - Space Management Warning and Alerts
4 pages
Vijay Resume
No ratings yet
Vijay Resume
2 pages
Pub Websphere Application Server Administration Using Jython
No ratings yet
Pub Websphere Application Server Administration Using Jython
496 pages
Mongodb Cookbook: Chapter No.1 "Installing and Starting The Mongodb Server"
100% (1)
Mongodb Cookbook: Chapter No.1 "Installing and Starting The Mongodb Server"
40 pages
1z0 027
No ratings yet
1z0 027
55 pages
Database Connection 1
100% (1)
Database Connection 1
5 pages
ERD To Relational Model Example
No ratings yet
ERD To Relational Model Example
7 pages
Dbms Lab Manual
No ratings yet
Dbms Lab Manual
37 pages
ASCP Pegging Information Logic
No ratings yet
ASCP Pegging Information Logic
3 pages
SQL Injection: Prof. Kirtankumar Rathod Dept. of Computer Science ISHLS, Indus University
No ratings yet
SQL Injection: Prof. Kirtankumar Rathod Dept. of Computer Science ISHLS, Indus University
13 pages
1 8.10.2024 Database Basics, DDL-1
No ratings yet
1 8.10.2024 Database Basics, DDL-1
5 pages
Concurrency Control
No ratings yet
Concurrency Control
21 pages
Complex Joi in Obiee 11g
No ratings yet
Complex Joi in Obiee 11g
3 pages
FB
No ratings yet
FB
27 pages
Oracle LAB 6 Solution
No ratings yet
Oracle LAB 6 Solution
7 pages
Hive Full Lecture
No ratings yet
Hive Full Lecture
17 pages
Uit 1 & Unit 2 Notes
No ratings yet
Uit 1 & Unit 2 Notes
79 pages
Unit-5 (Creating and Altering Database and Tables (SQL)
No ratings yet
Unit-5 (Creating and Altering Database and Tables (SQL)
23 pages
Ahmed Ali (Database Design & Concepts)
100% (1)
Ahmed Ali (Database Design & Concepts)
37 pages
Database Normalization
No ratings yet
Database Normalization
7 pages
Assignment 2
No ratings yet
Assignment 2
4 pages
Manipulating Data I: Oracle 12c: Introduction To SQL
No ratings yet
Manipulating Data I: Oracle 12c: Introduction To SQL
11 pages
ADOdb For Python
No ratings yet
ADOdb For Python
9 pages
Total Estimado: $0.05 Cobranças de Serviços Da AWS
No ratings yet
Total Estimado: $0.05 Cobranças de Serviços Da AWS
4 pages
Unit 4
No ratings yet
Unit 4
15 pages
DP-900 questions
No ratings yet
DP-900 questions
29 pages
LDC Script
No ratings yet
LDC Script
3 pages
JDBC Drivers and Connection Strings
No ratings yet
JDBC Drivers and Connection Strings
4 pages
DBMS Lab
No ratings yet
DBMS Lab
80 pages
23bca3co1 Database Management Systems Question Bank 2024 Revised (2)
No ratings yet
23bca3co1 Database Management Systems Question Bank 2024 Revised (2)
13 pages
Sistem Manajemen Basis Data
No ratings yet
Sistem Manajemen Basis Data
5 pages
Oracle SQL Notes
No ratings yet
Oracle SQL Notes
3 pages
DBMS and MySql-2
No ratings yet
DBMS and MySql-2
94 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Cassandra Notes

Uploaded by

Cassandra Notes

Uploaded by

http://wiki.apache.

Info from website: http://kkovacs.eu/cassandra-vs-mongodb-vs-couchdb-vs-redis

Gossip (Internal communication)

Accessing the Cassandra system.log File

Understanding ways to communicate with Cassandra: -

Understanding a Cassandra Database: -

Working inside a keyspace: -

Defining Columns And Data Type: -

Defining a primary key: -

Reconizing a partition key: -

Specifying a descending clustering order

Understanding Ways to Write Data

Using the INSERT INTO command

Using the COPY command

How Data is stored in C*

How Data is stored on Disk

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.