Apache Cassandra
Apache Cassandra
Apache Cassandra
Apache Cassandra is an open source distributed 1.0, released Oct 17 2011, added integrated com-
database management system designed to handle large pression, leveled compaction, and improved read
amounts of data across many commodity servers, provid- performance[14]
ing high availability with no single point of failure. Cas-
sandra oers robust support for clusters spanning multi- 1.1, released Apr 23 2012, added self-tuning
ple datacenters,[1] with asynchronous masterless replica- caches, row-level isolation, and support for mixed
tion allowing low latency operations for all clients. ssd/spinning disk deployments[15]
Cassandra also places a high value on performance. In 1.2, released Jan 2 2013, added clustering across
2012, University of Toronto researchers studying NoSQL virtual nodes, inter-node communication, atomic
systems concluded that In terms of scalability, there is batches, and request tracing[16]
a clear winner throughout our experiments. Cassandra
achieves the highest throughput for the maximum number 2.0, released Sep 4 2013, added lightweight transac-
of nodes in all experiments although this comes at the tions (based on the Paxos consensus protocol), trig-
price of high write and read latencies.[2] gers, improved compactions
Cassandras data model is a partitioned row store with 2.0.4, released Dec 30 2013, added allowing spec-
tunable consistency.[3] Rows are organized into tables; the ifying datacenters to participate in a repair, client
rst component of a tables primary key is the partition encryption support to sstableloader, allow removing
key; within a partition, rows are clustered by the remain- snapshots of no-longer-existing CFs[17]
ing columns of the key.[4] Other columns may be indexed
separately from the primary key.[5] 2.1.0 released Sep 10 2014 [18]
Tables may be created, dropped, and altered at runtime
without blocking updates and queries.[6]
2 Licensing and support
Cassandra does not support joins or subqueries, except
for batch analysis via Hadoop. Rather, Cassandra empha-
sizes denormalization through features like collections.[7] Apache Cassandra is an Apache Software Foundation
project, so it has an Apache License (version 2.0).
0.8, released Jun 2 2011, added the Cassandra Scalability Read and write throughput both increase lin-
Query Language (CQL), self-tuning memtables, early as new machines are added, with no downtime
and support for zero-downtime upgrades[13] or interruption to applications.
1
2 6 PROMINENT USERS
Fault-tolerant Data is automatically replicated to mul- 1. RandomPartitioner (RP): This partitioner randomly
tiple nodes for fault-tolerance. Replication across distributes the key-value pairs over the network, re-
multiple data centers is supported. Failed nodes can sulting in a good load balancing. Compared to OPP,
be replaced with no downtime. more nodes have to be accessed to get a number of
keys.
Tunable consistency Writes and reads oer a tunable
2. OrderPreservingPartitioner (OPP): This partitioner
level of consistency, all the way from writes never
distributes the key-value pairs in a natural way so
fail to block for all replicas to be readable, with
that similar keys are not far away. The advantage is
the quorum level in the middle.[3]
that fewer nodes have to be accessed. The drawback
is the uneven distribution of the key-value pairs.
MapReduce support Cassandra has Hadoop integra-
tion, with MapReduce support. There is support
also for Apache Pig and Apache Hive.[20]
6 Prominent users
Query language Cassandra introduces CQL
@WalmartLabs[24] (previously Kosmix) uses Cas-
(Cassandra Query Language), a SQL-like alter-
sandra with SSD
native to the traditional RPC interface. Language
drivers are available for Java (JDBC), Python Apple uses 75,000 Cassandra nodes, as revealed at
(DBAPI2), Node.JS (Helenus), Go (gocql) and Cassandra Summit San Francisco 2014,[25] although
C++.[21] it has not elaborated for which products, services or
features.
IBM has done research in building a scalable email Facebook moved o its pre-Apache Cassandra deploy-
system based on Cassandra.[38] ment in late 2010 when they replaced Inbox Search with
the Facebook Messaging platform.[36] In 2012, Facebook
Mahalo.com uses Cassandra to record user activity began using Apache Cassandra in its Instagram unit.[59]
logs and topics for their Q&A website[39][40]
Cassandra is the most popular wide column store,[60] and
Netix uses Cassandra as their back-end database in September 2014 surpassed Sybase to become the 9th
for their streaming services[41][42] most popular database, close behind Microsoft Access
and SQLite.[61]
Ooyala built a scalable, exible, real-time analytics
engine using Cassandra[43]
[11] The Apache Software Foundation Announces Apache [28] Re: Cassandra users survey. Mail-archive.com. 2009-
Cassandra Release 0.6 : The Apache Software Founda- 11-21. Archived from the original on 17 April 2010. Re-
tion Blog trieved 2010-03-29.
[12] The Apache Software Foundation Announces Apache [29] 4 Months with Cassandra, a love story |Cloudkick, man-
Cassandra 0.7 : The Apache Software Foundation Blog age servers better
[13] [Cassandra-user] [RELEASE] 0.8.0 - Grokbase [30] Finley, Klint (2011-02-18). This Week in Consolidation:
HP Buys Vertica, Constant Contact Buys Bantam Live and
[14] Cassandra 1.0.0. Is Ready for the Enterprise More. Read Write Enterprise.
[15] The Apache Software Foundation Announces Apache [31] Eure, Ian. Looking to the future with Cassandra.
Cassandra v1.1 : The Apache Software Foundation
Blog [32] Quinn, John. Saying Yes to NoSQL; Going Steady with
Cassandra.
[16] The Apache Software Foundation Announces Apache
Cassandra v1.2 : The Apache Software Foundation [33] Schonfeld, Erick. As Digg Struggles, VP Of Engineering
Blog. apache.org. Retrieved 11 December 2014. Is Shown The Door.
5
[34] Is Cassandra to Blame for Digg v4s Failures?". [56] King, Ryan (2010-07-10). Cassandra at Twitter Today.
blog.twitter.com. San Fransisco, CA, USA: Twitter. Re-
[35] Niet compatibele browser. Facebook. Retrieved 2010- trieved 2014-06-20.
03-29.
[57] Onnen, Erik. From 100s to 100s of Millions.
[36] Muthukkaruppan, Kannan. The Underlying Technology
of Messages. [58] Wicke, Gabriel. Wikimedia REST content API is now
available in beta.
[37] Cozzi, Martin (2011-08-31). Cassandra at Formspring.
[59] Rick Branson (2013-06-26). Cassandra at Instagram.
[38] BlueRunner: Building an Email Service in the Cloud.
DataStax. Retrieved 2013-07-25.
ieee.org. 2009-07-20. Retrieved 2010-03-29.
[39] Mahalo.com powered by Apache Cassandra". DataS- [60] DB-Engines. DB-Engines Ranking of Wide Column
tax.com. Santa Clara, CA, USA: DataStax. 2012-04-10. Stores.
Retrieved 2014-06-13.
[61] DB-Engines. DB-Engines Ranking.
[40] Watch Cassandra at Mahalo.com |DataStax Episodes |Blip
[62] Introduction to Cassandra Architecture. Edureka.co.
[41] Cockcroft, Adrian (2011-07-11). Migrating Netix from Retrieved 11 December 2014.
Datacenter Oracle to Global Cassandra. slideshare.net.
Retrieved 2014-06-13.
[47] Grigorik, Ilya (2011-03-29). Webpulp TV: Scaling Pos- Lakshman, Avinash (2008-08-25). Cassandra - A
tRank with Ilya Grigorik. structured storage system on a P2P Network. En-
gineering @ Facebooks Notes. Retrieved 2014-06-
[48] Hadoop and Cassandra (at Rackspace)". Stu Hood.
17.
2010-04-23. Retrieved 2011-09-01.
[49] david [ketralnis] (2010-03-12). whats new on reddit: The Apache Cassandra Project. Forest Hill, MD,
She who entangles men. blog.reddit. Archived from the USA: The Apache Software Foundation. Retrieved
original on 25 March 2010. Retrieved 2010-03-29. 2014-06-17.
[50] Posted by the reddit admins at (2010-05-11). blog.reddit Project Wiki. Forest Hill, MD, USA: The Apache
-- whats new on reddit: reddits May 2010 State of the Software Foundation. Retrieved 2014-06-17.
Servers report. blog.reddit. Archived from the original
on 14 May 2010. Retrieved 2010-05-16. Hewitt, Eben (2010-12-01). Adopting Apache
Cassandra. infoq.com. InfoQ, C4Media Inc. Re-
[51] Pattishall, Dathan Vance (2011-03-23). Cassandra is my
NoSQL Solution but.
trieved 2014-06-17.
10.2 Images
File:ASF-logo.svg Source: http://upload.wikimedia.org/wikipedia/commons/c/cd/ASF-logo.svg License: Apache License 2.0 Contribu-
tors: http://www.apache.org/ Original artist: Apache Software Foundation (ASF)
File:Ambox_important.svg Source: http://upload.wikimedia.org/wikipedia/commons/b/b4/Ambox_important.svg License: Public do-
main Contributors: Own work, based o of Image:Ambox scales.svg Original artist: Dsmurat (talk contribs)
File:Commons-logo.svg Source: http://upload.wikimedia.org/wikipedia/en/4/4a/Commons-logo.svg License: ? Contributors: ? Original
artist: ?
File:Edit-clear.svg Source: http://upload.wikimedia.org/wikipedia/en/f/f2/Edit-clear.svg License: Public domain Contributors: The
Tango! Desktop Project. Original artist:
The people from the Tango! project. And according to the meta-data in the le, specically: Andreas Nilsson, and Jakub Steiner (although
minimally).
File:Folder_Hexagonal_Icon.svg Source: http://upload.wikimedia.org/wikipedia/en/4/48/Folder_Hexagonal_Icon.svg License: Cc-by-
sa-3.0 Contributors: ? Original artist: ?
File:Free_Software_Portal_Logo.svg Source: http://upload.wikimedia.org/wikipedia/commons/3/31/Free_and_open-source_
software_logo_%282009%29.svg License: Public domain Contributors: FOSS Logo.svg Original artist: Free Software Portal Logo.svg
(FOSS Logo.svg): ViperSnake151
File:Wiki_letter_w_cropped.svg Source: http://upload.wikimedia.org/wikipedia/commons/1/1c/Wiki_letter_w_cropped.svg License:
CC-BY-SA-3.0 Contributors:
Wiki_letter_w.svg Original artist: Wiki_letter_w.svg: Jarkko Piiroinen