NoSQL Complete QB
NoSQL Complete QB
NoSQL Basics
1. What is NoSQL database?
• NoSQL is a type of database management system (DBMS) that is designed to handle and store large volumes
of unstructured and semistructured data.
• Instead of tables and predefined schemas to store data, NoSQL databases use flexible data models that can
adapt to changes in data structures and are capable of scaling horizontally and handle growing amounts of
data.
• NoSQL is also referred to as “non-SQL” or “non-relational” Databases, but the term has evolved to mean “not
only SQL”, as NoSQL databases have expanded to include a wide range of different database architectures
and data models.
• NoSQL originally referring to non SQL or non-relational is a database that provides a mechanism for storage
and retrieval of data. This data is modeled in means other than the tabular relations used in relational
databases.
• Such databases came into existence in the late 1960s, but did not obtain the NoSQL moniker until a surge of
popularity in the early twenty- first century. NoSQL databases are used in real-time web applications and big
data and their use is increasing over time.
1. Dynamic schema: NoSQL databases do not have a fixed schema and can accommodate changing
data structures without the need for migrations or schema alterations.
2. Horizontal scalability: NoSQL databases are designed to scale out by adding more nodes to a
database cluster, making them well-suited for handling large amounts of data and high levels of
traffic.
3. Document-based: Some NoSQL databases, such as MongoDB, use a document-based data model,
where data is stored in semi-structured format, such as JSON or BSON.
4. Key-value-based: Other NoSQL databases, such as Redis, AmazonDb, Oracle DB use a key-value
data model, where data is stored as a collection of key-value pairs.
5. Column-based: Some NoSQL databases, such as Cassandra, HBASE, Hypertable use a column-
based data model, where data is organized into columns instead of rows.
6. Distributed and high availability: NoSQL databases are often designed to be highly available and
to automatically handle node failures and data replication across multiple nodes in a database
cluster.
Flexibility: NoSQL databases allow developers to store and retrieve data
7. Performance: NoSQL databases are optimized for high performance and can handle a high volume
of reads and writes, making them suitable for big data and real-time applications.
1
3. Column-oriented: Examples - Hbase, Big Table, Accumulo
A column-oriented database is a non-relational database that stores the data in columns instead of rows. That
means when we want to run analytics on a small number of columns, you can read those columns directly
without consuming memory with the unwanted data.
Columnar databases are designed to read data more efficiently and retrieve the data with greater speed. A
columnar database is used to store a large amount of data.
Documents can be stored and retrieved in a form that is much closer to the data objects used in applications
which means less translation is required to use these data in the applications. In the Document database, the
particular elements can be accessed by using the index value that is assigned for faster querying.
2
6. Discuss horizontal and vertical scalability in NoSQL databases.
Scaling alters the size of a system. In the scaling process, we either compress or expand the system to
meet the expected needs. The scaling operation can be achieved by adding resources to meet the smaller
expectation in the current system, by adding a new system to the existing one, or both.
Types of Scaling:
Scaling can be categorized into 2 types:
Vertical Scaling: When new resources are added to the existing system to meet the expectation, it is
known as vertical scaling.
Consider a rack of servers and resources that comprises the existing system. (as shown in the figure).
Now when the existing system fails to meet the expected needs, and the expected needs can be met by
just adding resources, this is considered vertical scaling. Vertical scaling is based on the idea of adding
more power(CPU, RAM) to existing systems, basically adding more resources.
Vertical scaling is not only easy but also cheaper than Horizontal Scaling. It also requires less time to be
fixed.
Horizontal Scaling: When new server racks are added to the existing system to meet the higher
expectation, it is known as horizontal scaling.
3
8. Differences between Horizontal and Vertical Scaling.
4
5
23. Differentiate between RDBMS and MongoDB.
Apart from the terms differences, a few other differences are shown below
1. Relational databases are known for enforcing data integrity. This is not an explicit requirement in
MongoDB.
2. RDBMS requires that data be normalized first so that it can prevent orphan records and duplicates
Normalizing data then has the requirement of more tables, which will then result in more table joins,
thus requiring more keys and indexes. As databases start to grow, performance can start becoming an
issue. Again this is not an explicit requirement in MongoDB. MongoDB is flexible and does not need the
data to be normalized first.
24. Give the structure of document in MongoDB. Explain with example. (same as 12)
6
25. Explain MongoDB data Modelling with example.
7
26. Give the methods to create, update, read and delete document/documents in MongoDB
with example.
8
9
27. Explain the methods: - find (), pretty (), skip (), limit (), sort ()
The find() Method
To query data from MongoDB collection, you need to use MongoDB's find() method.
Syntax
The basic syntax of find() method is as follows −
>db.COLLECTION_NAME.find()
find() method will display all the documents in a non-structured way.
Example
Assume we have created a collection named mycol as –
>db.mycol.find({},{"title":1,_id:0}).limit(1).skip(1)
{"title":"NoSQL Overview"}
>
10
The sort() Method
To sort documents in MongoDB, you need to use sort() method. The method accepts a document containing
a list of fields along with their sorting order. To specify sorting order 1 and -1 are used. 1 is used for
ascending order while -1 is used for descending order.
Syntax
The basic syntax of sort() method is as follows −
>db.COLLECTION_NAME.find().sort({KEY:1})
Example
Consider the collection myycol has the following data.
28. What is the meaning of projecting fields in MongoDB? Explain with examples.
In MongoDB, projection means selecting only the necessary data rather
than selecting whole of the data of a document. If a document has 5
fields and you need to show only 3, then select only 3 fields from them.
29. What are aggregation pipeline stages in MongoDB? List and give their use.
($project, $match, $group, $sort, $skip, $limit, $unwind, $out)
• Aggregation pipeline means the possibility to execute an operation on some input and use the output
as the input for the next command and so on. MongoDB also supports same concept in aggregation
framework. There is a set of possible stages and each of those is taken as a set of documents as an
input and produces a resulting set of documents (or the final resulting JSON document at the end of
the pipeline). This can then in turn be used for the next stage and so on.
11
• $skip − With this, it is possible to skip forward in the list of documents for a given amount of
documents.
• $limit − This limits the amount of documents to look at, by the given number starting from the
current positions.
• $unwind − This is used to unwind document that are using arrays. When using an array, the data is
kind of pre-joined and this operation will be undone with this to have individual documents again.
Thus with this stage we will increase the amount of documents for the next stage.
For example:
Assume we have inserted a document in a database named posts;
Using regex Expression
The following regex query searches for all the posts containing string tutorialspoint in it –
> db.posts.find({post_text:{$regex:"tutorialspoint"}}).pretty()
{
"_id" : ObjectId("5dd7ce28f1dd4583e7103fe0"),
"post_text" : "enjoy the mongodb articles on tutorialspoint",
"tags" : [
"mongodb",
"tutorialspoint"
]
}
>db.posts.find({post_text:/tutorialspoint/})
12
HBase
31. What is HBase?
• HBase is a distributed column-oriented database built on top of the Hadoop file system.
• It is an open-source project and is horizontally scalable.
• HBase is a data model that is like Google’s big table designed to provide quick random access to
huge amounts of structured data. It leverages the fault tolerance provided by the Hadoop File System
(HDFS).
• It is a part of the Hadoop ecosystem that provides random real-time read/write access to data in the
Hadoop File System.
• One can store the data in HDFS either directly or through HBase.
• Data consumer reads/accesses the data in HDFS randomly using HBase. HBase sits on top of the
Hadoop File System and provides read and write access.
13
33. Differentiate between HBase and RDBMS.
CouchDB
Why CouchDB:
• CouchDB is easy to use. There is one word to describe CouchDB “Relax” It is also the byline of
CouchDB official logo.
• CouchDB have an HTTP-based REST API, which makes communication with the database very
easy.
• CouchDB has the simple structure of HTTP resources and methods (GET, PUT, DELETE) that are
easy to understand and use.
• In CouchDB, data is stored in the flexible document-based structure so, there is no need to worry
about the structure of the data.
• CouchDB facilitates users with powerful data mapping, which allows querying, combining, and
filtering the information.
• CouchDB provides easy-to-use replication, using which you can copy, share, and synchronize the
data between databases and machines.
14
35. Give the features of CouchDB.
Following is a list of most attractive features of CouchDB:
• Document Storage: CouchDB is a NoSQL database which follows document storage. Documents are
the primary unit of data where each field is uniquely named and contains values of various data types
such as text, number, Boolean, lists, etc. Documents don't have a set limit to text size or element
count.
• Browser Based GUI: CouchDB provides an interface Futon which facilitates a browser-based GUI to
handle your data, permission and configuration.
• Replication: CouchDB provides the simplest form of replication. There is no other database is so
simple to replicate.
• ACID Properties: The CouchDB file layout follows all the features of ACID properties. Once the
data is entered in to the disc, it will not be overwritten. Document updates (add, edit, delete) follow
Atomicity, i.e., they will be saved completely or not saved at all. The database will not have any
partially saved or edited documents. Almost all of these updates are serialized and any number of
clients can read a document without waiting and without being interrupted.
• JSONP for Free: If you update your config to allow_jsonp = true then your database is accessible
cross domain for GET requests. Authentication and Session Support: CouchDB facilitates you to
keep authentication open via a session cookie like web application.
• Security: CouchDB also provides database-level security. The permissions per database are separated
into readers and admins. Readers can both read and write to the database.
• Validation: You can validate the inserted data into the database by combining with authentication to
ensure the creator of the document is the one who is logged in.
• Map/Reduce List and Show: The main reason behind the popularity of MongoDB and CouchDB is
map/reduce system.
15
37. Differentiate between MongoDB and CouchDB.
curl www.Facebook.com/
You can access the homepage of the CouchDB by sending a GET request to the CouchDB instance
installed.
curl http://127.0.0.1:5984/
This gives you a JSON document as shown below where CouchDB specifies the details such as version
number, name of the vendor, and version of the software.
16
$ curl http://127.0.0.1:5984/
{
"couchdb" : "Welcome",
"uuid" : "8f0d59acd0e179f5e9f0075fa1f5e804",
"version" : "1.6.1",
"vendor" : {
"name":"The Apache Software Foundation",
"version":"1.6.1"
}
}
39. Explain curl utility commands to create, update, delete documents with examples.
Creating a Database
You can create a database in CouchDB using cURL with PUT header using the following syntax −
$ curl -X PUT http://127.0.0.1:5984/database_name
Example
As an example, using the above given syntax create a database with name my_database as shown below.
Verification
Verify whether the database is created, by listing out all the databases as shown below. Here you can
observe the name of newly created database, "my_database" in the list
17
Updating Documents using cURL
You can update a document in CouchDB by sending an HTTP request to the server using PUT method
through cURL utility. Following is the syntax to update a document.
Example
Suppose there is a document with id 001 in the database named my_database. You can delete this as shown
below.
First of all, get the revision id of the document that is to be updated. You can find the _rev of the document
in the document itself, therefore get the document as shown below.
Using −X, we can specify a custom request method of HTTP we are using, while communicating with the
HTTP server. In this case, we are using Delete method. To delete a database /database_name/database_id/ is
not enough. You have to pass the recent revision id through the url. To mention attributes of any data
structure "?" is used.
Example
Suppose there is a document in database named my_database with document id 001. To delete this
document, you have to get the rev id of the document. Get the document data as shown below.
18
40. Explain CouchDB HTTP API.
Using HTTP request headers, you can communicate with CouchDB. Through these requests we can
retrieve data from the database, store data in to the database in the form of documents, and we can view
as well as format the documents stored in a database.
While communicating with the database we will use different request formats like get, head, post, put,
delete, and copy. For all operations in CouchDB, the input data and the output data structures will be in
the form of JavaScript Object Notation (JSON) object.
Following are the different request formats of HTTP Protocol used to communicate with CouchDB.
• GET − This format is used to get a specific item. To get different items, you have to send specific
url patterns. In CouchDB using this GET request, we can get static items, database documents and
configuration, and statistical information in the form of JSON documents (in most cases).
• HEAD − The HEAD method is used to get the HTTP header of a GET request without the body of
the response.
• POST − Post request is used to upload data. In CouchDB using POST request, you can set values,
upload documents, set document values, and can also start certain administration commands.
• PUT − Using PUT request, you can create new objects, databases, documents, views and design
documents.
• DELETE − Using DELETE request, you can delete documents, views, and design documents.
• COPY − Using COPY method, you can copy documents and objects.
While communicating with the database we will use different request formats like get, head, post, put,
delete, and copy. For all operations in CouchDB, the input data and the output data structures will be in the
form of JavaScript Object Notation (JSON) object.
Following are the different request formats of HTTP Protocol used to communicate with CouchDB.
• GET − This format is used to get a specific item. To get different items, you have to send specific
url patterns. In CouchDB using this GET request, we can get static items, database documents and
configuration, and statistical information in the form of JSON documents (in most cases).
• HEAD − The HEAD method is used to get the HTTP header of a GET request without the body of
the response.
• POST − Post request is used to upload data. In CouchDB using POST request, you can set values,
upload documents, set document values, and can also start certain administration commands.
• PUT − Using PUT request, you can create new objects, databases, documents, views and design
documents.
• DELETE − Using DELETE request, you can delete documents, views, and design documents.
• COPY − Using COPY method, you can copy documents and objects.
Response Headers
These are the headers of the response sent by the server. These headers give information about the content
send by the server as response.
• Content-type − This header specifies the MIME type of the data returned by the server. For most
request, the returned MIME type is text/plain.
• Cache-control − This header suggests the client about treating the information sent by the server.
CouchDB mostly returns the must-revalidate, which indicates that the information should be
revalidated if possible.
• Content-length − This header returns the length of the content sent by the server, in bytes.
• Etag − This header is used to show the revision for a document, or a view.
19
Cassandra
20
44. Diagrammatically explain Cassandra architecture with its components.
o Cassandra was designed to handle big data workloads across multiple nodes without a single point of
failure. It has a peer-to-peer distributed system across its nodes, and data is distributed among all the
nodes in a cluster.
o In Cassandra, each node is independent and at the same time interconnected to other nodes. All the
nodes in a cluster play the same role.
o Every node in a cluster can accept read and write requests, regardless of where the data is actually
located in the cluster.
o In the case of failure of one node, Read/Write requests can be served from other nodes in the network.
21
45. What is CQL? Explain write and read operations in Cassandra.
Cassandra Query Language (CQL) is used to access Cassandra through its nodes. CQL treats the
database (Keyspace) as a container of tables. Programmers use cqlsh: a prompt to work with CQL or
separate application language drivers.
The client can approach any of the nodes for their read-write operations. That node (coordinator) plays a
proxy between the client and the nodes holding the data.
Write Operations:
Every write activity of nodes is captured by the commit logs written in the nodes. Later the data will be
captured and stored in the mem-table. Whenever the mem-table is full, data will be written into the SStable
data file. All writes are automatically partitioned and replicated throughout the cluster. Cassandra periodically
consolidates the SSTables, discarding unnecessary data.
Read Operations
In Read operations, Cassandra gets values from the mem-table and checks the bloom filter to find the
appropriate SSTable which contains the required data.
There are three types of read request that is sent to replicas by coordinators.
o Direct request
o Digest request
o Read repair request
The coordinator sends direct request to one of the replicas. After that, the coordinator sends the digest request
to the number of replicas specified by the consistency level and checks if the returned data is an updated data.
After that, the coordinator sends digest request to all the remaining replicas. If any node gives out of date
value, a background read repair request will update that data. This process is called read repair mechanism.
22
46. Explain Cassandra Data Model: Cluster & Keyspace.
Data model in Cassandra is totally different from normally we see in RDBMS. Let's see how
Cassandra stores its data.
Cluster
Cassandra database is distributed over several machines that are operated together. The
outermost container is known as the Cluster which contains different nodes. Every node
contains a replica, and in case of a failure, the replica takes charge. Cassandra arranges the
nodes in a cluster, in a ring format, and assigns data to them.
Keyspace
Keyspace is the outermost container for data in Cassandra. Following are the basic attributes of
Keyspace in Cassandra:
• Replication factor: It specifies the number of machine in the cluster that will receive copies of the
same data.
• Replica placement Strategy: It is a strategy which species how to place replicas in the ring.
There are three types of strategies such as:
1) Simple strategy (rack-aware strategy)
2) old network topology strategy (rack-aware strategy)
3) network topology strategy (datacenter-shared strategy)
• Column families: column families are placed under keyspace. A keyspace is a container for a list of
one or more column families while a column family is a container of a collection of rows.
• Each row contains ordered columns. Column families represent the structure of your data.
• Each keyspace has at least one and often many column families. In Cassandra, a well data model is
very important because a bad data model can degrade
• performance, especially when you try to implement the RDBMS concepts on Cassandra.
47. Explain the concept of indexing and ordering in CouchDB & Cassandra.
A B-tree index scales well for large amounts of data. In CouchDB, the B-tree implementation has
specialized features like MultiVersion Concurrency Control and append-only design. MultiVersion
Concurrency Control (MVCC) implies that multiple reads and writes can occur in parallel without
the need for exclusive locking. All writes are sequenced and reads are not impacted by writes. An
“append-only” design refers to a data storage approach where new data is only added (appended) to
the database, and existing data is not updated or deleted.
23
INDEXING IN APACHE CASSANDRA:
• Indexing in Apache Cassandra is a way to improve the efficiency and performance of queries on non-
primary key columns.
• In Cassandra, data is organized in tables and each table has a primary key, which consists of one or
more columns that uniquely identify each row in the table.
• Queries that use the primary key to retrieve data are very efficient, but queries that use other columns
in the WHERE clause can be slower.
• Cassandra has secondary indexes that enable querying on columns other than the main key columns
to solve this problem.
• A secondary index is built on a table’s column, and it maintains a different index data structure that
associates the values of the indexed column with the associated table rows.
• Searching up the rows in the index and then obtaining the relevant data from the table, enables
queries on that column to be processed quickly.
24
49. Differentiate between RDBMS and Cassandra.
25
50. Explain the concept of index in Apache Cassandra.
Index:
• As we can access data using attributes which having the partition key.
• For Example, if Emp_id is a column name for Employee table and if it is partition key of that table
then we can filter or search data with the help of partition key.
• In this case we can used WHERE clause to define condition over attribute and to search data.
• But suppose if there exists a column which is not a partition key of that table and we want to filter or
to search or to access data using WHERE clause then the query will not be executed and will give an
error.
• So, to access data in that case using attributes other than the partition key for fast and efficient
lookup of data matching a given condition then we need to define index. It can be used for various
purpose like for collections, static columns, collection columns, and any other columns except
counter columns.
List
List is used in the cases where
• the order of the elements is to be maintained, and
• a value is to be stored multiple times.
You can get the values of a list data type using the index of the elements in the list.
Creating a Table with List
Given below is an example to create a sample table with two columns, name and email. To store multiple
emails, we are using list.
cqlsh:tutorialspoint> CREATE TABLE data(name text PRIMARY KEY, email list<text>);
SET
Set is a data type that is used to store a group of elements. The elements of a set will be returned in a sorted
order.
Creating a Table with Set
The following example creates a sample table with two columns, name and phone. For storing multiple
phone numbers, we are using set.
cqlsh:tutorialspoint> CREATE TABLE data2 (name text PRIMARY KEY, phone set<varint>);
26
MAP
Map is a data type that is used to store a key-value pair of elements.
Creating a Table with Map
The following example shows how to create a sample table with two columns, name and address. For
storing multiple address values, we are using map.
cqlsh:tutorialspoint> CREATE TABLE data3 (name text PRIMARY KEY, address map<timestamp, text>);
Redis
Features of Redis
• Speed: Redis stores data in primary memory, blazing at 110,000 SETs/second and 81,000
GETs/second even on basic Linux setups. Command pipelining and multi-value commands
turbocharge communication.
• Persistence: Data lives in memory but saves changes asynchronously on disk based on time or
update count. Supports append-only file persistence.
• Data Structures: Supports strings, hashes, sets, lists, sorted sets, bitmaps, hyperloglogs, and
geospatial indexes, enabling diverse data manipulation.
• Atomic Operations: Operations on different data types are atomic, ensuring safety for various
actions like setting keys, adding/removing set elements, or increasing counters.
• Supported Languages: Offers extensive language support from ActionScript to Tcl, catering to
diverse developer preferences.
• Master/Slave Replication: Simple setup with one line in the config file. Syncs 10 million keys in 21
seconds on an Amazon EC2 instance.
• Sharding: Effortlessly distributes datasets across multiple instances, simplifying scalability.
• Portability: Written in ANSI C, works on Linux, BSD, macOS, and more, but lacks official
Windows support (although it might work with Cygwin).
27
53. Explain Redis Architecture.
There are two main processes in Redis architecture:
o Redis Client
o Redis Server
These client and server can be on same computer or two different computers.
Redis server is used to store data in memory . It controls all type of management and forms the main part
of the architecture. You can create a Redis client or Redis console client when you install Redis
application or you can use.
54. Explain Redis keys, strings, hashes, lists, sets, sorted sets, transactions commands with examples.
Redis keys:
Redis keys commands are used for managing keys in Redis. Following is the syntax for using redis keys
commands.
Syntax
redis 127.0.0.1:6379> COMMAND KEY_NAME
Example
redis 127.0.0.1:6379> SET Saail redis
OK
redis 127.0.0.1:6379> DEL Saail
(integer) 1
In the above example, DEL is the command, while Saail is the key. If the key is deleted, then the output of
the command will be (integer) 1, otherwise it will be (integer) 0.
Similarly we can use DEL, DUMP, Exists, TTL, etc
Redis Strings
Redis strings commands are used for managing string values in Redis. Following is the syntax for using
Redis string commands.
Syntax
redis 127.0.0.1:6379> COMMAND KEY_NAME
Example
redis 127.0.0.1:6379> SET tutorialspoint redis
OK
redis 127.0.0.1:6379> GET tutorialspoint
"redis"
In the above example, SET and GET are the commands, while tutorialspoint is the key.
28
Redis Hashes:
Redis Hashes are maps between the string fields and the string values. Hence, they are the perfect data type
to represent objects.
In Redis, every hash can store up to more than 4 billion field-value pairs.
Example
redis 127.0.0.1:6379> HMSET tutorialspoint name "redis tutorial"
description "redis basic commands for caching" likes 20 visitors 23000
OK
redis 127.0.0.1:6379> HGETALL tutorialspoint
1) "name"
2) "redis tutorial"
3) "description"
4) "redis basic commands for caching"
5) "likes"
6) "20"
7) "visitors"
8) "23000"
In the above example, we have set Redis tutorials detail (name, description, likes, visitors) in hash named
‘tutorialspoint’.
Redis Lists:
Redis Lists are simply lists of strings, sorted by insertion order. You can add elements in Redis lists in the
head or the tail of the list.
Maximum length of a list is 232 - 1 elements (4294967295, more than 4 billion of elements per list).
Example
redis 127.0.0.1:6379> LPUSH tutorials redis
(integer) 1
redis 127.0.0.1:6379> LPUSH tutorials mongodb
(integer) 2
redis 127.0.0.1:6379> LPUSH tutorials mysql
(integer) 3
redis 127.0.0.1:6379> LRANGE tutorials 0 10
1) "mysql"
2) "mongodb"
3) "redis"
In the above example, three values are inserted in Redis list named ‘tutorials’ by the command LPUSH.
29
Redis SETS:
Redis Sets are an unordered collection of unique strings. Unique means sets does not allow repetition of data
in a key.
In Redis set add, remove, and test for the existence of members in O(1) (constant time regardless of the
number of elements contained inside the Set). The maximum length of a list is 232 - 1 elements
(4294967295, more than 4 billion of elements per set).
Example
redis 127.0.0.1:6379> SADD tutorials redis
(integer) 1
redis 127.0.0.1:6379> SADD tutorials mongodb
(integer) 1
redis 127.0.0.1:6379> SADD tutorials mysql
(integer) 1
redis 127.0.0.1:6379> SADD tutorials mysql
(integer) 0
redis 127.0.0.1:6379> SMEMBERS tutorials
1) "mysql"
2) "mongodb"
3) "redis"
In the above example, three values are inserted in Redis set named ‘tutorials’ by the command SADD.
Redis Sorted Sets are similar to Redis Sets with the unique feature of values stored in a set. The difference
is, every member of a Sorted Set is associated with a score, that is used in order to take the sorted set
ordered, from the smallest to the greatest score.
In Redis sorted set, add, remove, and test for the existence of members in O(1) (constant time regardless of
the number of elements contained inside the set). Maximum length of a list is 232 - 1 elements (4294967295,
more than 4 billion of elements per set).
Example
redis 127.0.0.1:6379> ZADD tutorials 1 redis
(integer) 1
redis 127.0.0.1:6379> ZADD tutorials 2 mongodb
(integer) 1
redis 127.0.0.1:6379> ZADD tutorials 3 mysql
(integer) 1
redis 127.0.0.1:6379> ZADD tutorials 3 mysql
(integer) 0
redis 127.0.0.1:6379> ZADD tutorials 4 mysql
(integer) 0
redis 127.0.0.1:6379> ZRANGE tutorials 0 10 WITHSCORES
30
1) "redis"
2) "1"
3) "mongodb"
4) "2"
5) "mysql"
6) "4"
In the above example, three values are inserted with its score in Redis sorted set named ‘tutorials’ by the
command ZADD.
Redis Transactions:
Redis transactions allow the execution of a group of commands in a single step. Following are the two
properties of Transactions.
• All commands in a transaction are sequentially executed as a single isolated operation. It is not
possible that a request issued by another client is served in the middle of the execution of a Redis
transaction.
• Redis transaction is also atomic. Atomic means either all of the commands or none are processed.
Sample
Redis transaction is initiated by command MULTI and then you need to pass a list of commands that should
be executed in the transaction, after which the entire transaction is executed by EXEC command.
Following example explains how Redis transaction can be initiated and executed.
31
55. Differentiate between Redis and MongoDB & Redis and RDBMS.
Redis vs MongoDB
Redis vs RDBMS
32
Cloud databases
1. Schema-less Data Model: Unlike traditional relational databases, the Datastore is schema-less,
meaning that you do not need to define a fixed structure for your data beforehand. You can add
properties to entities (records) dynamically without modifying a formal schema.
2. Automatic Scaling: The Datastore can automatically scale to handle varying levels of read and write
traffic. Google manages the underlying infrastructure, ensuring that your application can scale
seamlessly as demand increases.
3. High Availability and Durability: Datastore is designed to be highly available and durable. It
replicates data across multiple data centers, and it provides strong consistency for reads and writes.
4. Querying: Datastore supports queries for retrieving data based on specific criteria. However, the
querying capabilities are a bit different from traditional relational databases, as it is optimized for
large-scale, distributed environments.
5. Transactions: The Datastore supports transactions to ensure the consistency of data. This allows you
to perform a series of operations on entities as a single, atomic unit.
6. Indexes: Datastore automatically creates indexes for your queries. You can also define custom
indexes to optimize specific queries.
5. Integration with App Engine: Datastore is tightly integrated with Google App Engine, making it
easy to use for developers building applications on this platform. However, it can also be used
independently of App Engine.
1. Schema-Free: Like many NoSQL databases, SimpleDB is schema-free, meaning you can add and
remove attributes (fields) on the fly without a predefined schema. This flexibility is useful for
applications with evolving or dynamic data requirements.
2. Data Storage: SimpleDB stores data in domains, which are roughly equivalent to database tables.
Each item within a domain is similar to a record or row in a traditional relational database.
3. Attributes and Values: Data is stored as key-value pairs within items. Each item can have multiple
attributes, and each attribute has a corresponding value. This structure allows for efficient and
flexible data modeling.
33
4. Automatic Scaling: SimpleDB automatically scales in response to changes in data volume and
query traffic. AWS manages the infrastructure, ensuring that the database can handle varying
workloads.
5. Availability and Durability: SimpleDB is designed to be highly available and durable. Data is
automatically replicated across multiple servers and data centers, providing fault tolerance.
6. Query Language: SimpleDB supports a SQL-like query language for retrieving data. Queries are
expressed in a language called Simple Query Language (SQL), and you can use conditions to filter
and sort results.
7. Indexed Data: SimpleDB automatically indexes all attributes, making queries efficient. You can also
specify custom indexing for specific attributes.
Consistency –
Consistency means that the nodes will have the same copies of a replicated data item visible for various
transactions. A guarantee that every node in a distributed cluster returns the same, most recent and a
successful write. Consistency refers to every client having the same view of the data. There are various
types of consistency models. Consistency in CAP refers to sequential consistency, a very strong form of
consistency.
Availability –
Availability means that each read or write request for a data item will either be processed successfully or
will receive a message that the operation cannot be completed. Every non-failing node returns a response
for all the read and write requests in a reasonable amount of time. The key word here is “every”. In
simple terms, every node (on either side of a network partition) must be able to respond in a reasonable
amount of time.
Partition Tolerance –
Partition tolerance means that the system can continue operating even if the network connecting the
nodes has a fault that results in two or more partitions, where the nodes in each partition can only
communicate among each other. That means, the system continues to function and upholds its
consistency guarantees in spite of network partitions. Network partitions are a fact of life. Distributed
systems guaranteeing partition tolerance can gracefully recover from partitions once the partition heals.
The CAP theorem states that distributed databases can have at most two of the three properties:
consistency, availability, and partition tolerance. As a result, database systems prioritize only two
properties at a time.
CA(Consistency and Availability)-
The system prioritizes availability over consistency and can respond with possibly stale data.
Example databases: Cassandra, CouchDB, Riak, Voldemort.
34
AP(Availability and Partition Tolerance)-
The system prioritizes availability over consistency and can respond with possibly stale data.
The system can be distributed across multiple nodes and is designed to operate reliably even in the face
of network partitions.
Example databases: Amazon DynamoDB, Google Cloud Spanner.
Riak is used as an eventually consistent system in that the data you want to read should remain
available in most failure scenarios, although it may not be the most up-to-date version of that
data.
Features of Riak:
35
Graph database: Neo4J
60. What are graph databases? Give its structure with examples.
Instead of storing all data on a single server, we distribute it across several servers.
This reduces the load on a single resource and instead distributes it equally across
all the servers. This allows us to serve more requests and traffic from the growing
number of customers while maintaining performance.
36
• Sharding can be used in system design interviews to help demonstrate a candidate’s
understanding of scalability and database design. When designing a sharded database, the
following key considerations should be taken into account:
• Data distribution: How the data will be split across the shards, either based on a specific key such
as the user ID or by using a hash function.
• Shard rebalancing: How the data will be balanced across the shards as the amount of data
changes over time.
• Query routing: How queries will be directed to the correct shard, either by using a dedicated
routing layer or by including the shard information in the query.
• Data consistency: How data consistency will be maintained across the shards, for example by
using transaction logs or by employing a distributed database system.
• Failure handling: How the system will handle the failure of one or more shards, including data
recovery and data redistribution.
• Performance: How the sharded database will perform in terms of read and write speed, as well as
overall system performance and scalability.
Horizontal scalability, on the other hand, focuses on expanding operational capacity by adding more
servers. These servers can range from large, robust ones to smaller, more affordable units. When your
architecture is horizontally scalable, meeting growing operational needs becomes as simple as adding
more identical servers to distribute the load. Big players like Amazon have mastered this technique,
scaling
up their infrastructure during peak times, such as the holiday shopping season, and downsizing
afterward. They've even monetized this spare compute power by renting it out to other businesses.
37
• Cypher query language − Neo4j provides a declarative query language to represent the graph
visually, using an ascii-art syntax. The commands of this language are in human readable format and
very easy to learn.
• No joins − Using Neo4j, it does NOT require complex joins to retrieve connected/related data as it is
very easy to retrieve its adjacent node or relationship details without joins or indexes.
38
67. Explain Neo4j Graph Database building blocks: -
Nodes, Properties, Relationships, Labels, Data Browser
Node
Node is a fundamental unit of a Graph. It contains properties with key-value pairs as shown in the following
image.
Here, Node Name = "Employee" and it contains a set of properties as key-value pairs.
Properties
Where Key is a String and Value may be represented using any Neo4j Data types.
Relationships
Relationships are another major building block of a Graph Database. It connects two nodes as depicted in
the following figure.
Here, Emp and Dept are two different nodes. "WORKS_FOR" is a relationship between Emp and Dept
nodes.
As it denotes, the arrow mark from Emp to Dept, this relationship describes −
Emp WORKS_FOR Dept
Labels
Label associates a common name to a set of nodes or relationships. A node or relationship can contain one
or more labels. We can create new labels to existing nodes or relationships. We can remove the existing
labels from the existing nodes or relationships.
39
68. What is Neo4j CQL and its features?
CQL stands for Cypher Query Language
Neo4j CQL
• Is a query language for Neo4j Graph Database.
• Is a declarative pattern-matching language.
• Follows SQL like syntax.
• Syntax is very simple and in human readable format.
69. What are read and write clauses of Neo4j? list and explain.
40
Memecached
41
Components of Memcached:
Memcached is made up of 4 main components. These components allow the client and the server to work
together in order to deliver cached data as efficiently as possible:.
1. Client Software: It is used to give a list of available Memcached servers.
2. A Client-based hashing algorithm: It chooses a server based on the key.
3. Server Software: It is used to store values and their keys into an internal hash table.
4. LRU: LRU stands for Least Recently Used. This determines when to throw out old data or reuse
memory.
Features of Memcached
o It is open source.
o It is very scalable; just add boxes with memory to spare.
o Memcached runs as a standalone service. So, if you take your application down, the cached data will
remain in memory as long as the service runs.
o Memcached server is a big hash table.
o It reduces the database load.
o It is very efficient for websites with high database load.
o The cache nodes are very ignorant: which means they have no knowledge about other nodes
participating. This handles the management and configuration of such a system extremely easy.
o It is distributed under BSD (Berkeley Software Distribution) license.
o It is a client server application over UDP or TCP.
42
72. Differentiate between Memcached and Redis.
43