0% found this document useful (0 votes)
25 views43 pages

NoSQL Complete QB

NoSQL databases are designed to manage large volumes of unstructured and semi-structured data, offering flexible data models that allow for dynamic schema and horizontal scalability. They encompass various types such as document-based, key-value stores, column-oriented, and graph databases, each with unique features and use cases. Unlike traditional SQL databases, NoSQL databases prioritize performance and flexibility, making them suitable for real-time applications and big data environments.

Uploaded by

Ganesh Pandey
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views43 pages

NoSQL Complete QB

NoSQL databases are designed to manage large volumes of unstructured and semi-structured data, offering flexible data models that allow for dynamic schema and horizontal scalability. They encompass various types such as document-based, key-value stores, column-oriented, and graph databases, each with unique features and use cases. Unlike traditional SQL databases, NoSQL databases prioritize performance and flexibility, making them suitable for real-time applications and big data environments.

Uploaded by

Ganesh Pandey
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 43

NOSQL Technologies

NoSQL Basics
1. What is NoSQL database?
• NoSQL is a type of database management system (DBMS) that is designed to handle and store large volumes
of unstructured and semistructured data.
• Instead of tables and predefined schemas to store data, NoSQL databases use flexible data models that can
adapt to changes in data structures and are capable of scaling horizontally and handle growing amounts of
data.
• NoSQL is also referred to as “non-SQL” or “non-relational” Databases, but the term has evolved to mean “not
only SQL”, as NoSQL databases have expanded to include a wide range of different database architectures
and data models.
• NoSQL originally referring to non SQL or non-relational is a database that provides a mechanism for storage
and retrieval of data. This data is modeled in means other than the tabular relations used in relational
databases.
• Such databases came into existence in the late 1960s, but did not obtain the NoSQL moniker until a surge of
popularity in the early twenty- first century. NoSQL databases are used in real-time web applications and big
data and their use is increasing over time.

2. What are the features of NoSQL databases?

1. Dynamic schema: NoSQL databases do not have a fixed schema and can accommodate changing
data structures without the need for migrations or schema alterations.
2. Horizontal scalability: NoSQL databases are designed to scale out by adding more nodes to a
database cluster, making them well-suited for handling large amounts of data and high levels of
traffic.
3. Document-based: Some NoSQL databases, such as MongoDB, use a document-based data model,
where data is stored in semi-structured format, such as JSON or BSON.
4. Key-value-based: Other NoSQL databases, such as Redis, AmazonDb, Oracle DB use a key-value
data model, where data is stored as a collection of key-value pairs.
5. Column-based: Some NoSQL databases, such as Cassandra, HBASE, Hypertable use a column-
based data model, where data is organized into columns instead of rows.
6. Distributed and high availability: NoSQL databases are often designed to be highly available and
to automatically handle node failures and data replication across multiple nodes in a database
cluster.
Flexibility: NoSQL databases allow developers to store and retrieve data
7. Performance: NoSQL databases are optimized for high performance and can handle a high volume
of reads and writes, making them suitable for big data and real-time applications.

3. What are the types of NoSQL databases?


Types of NoSQL databases and the name of the databases system that falls in that category are:
1. Graph Databases: Examples - Amazon Neptune, Neo4j
2. Key value store: Examples - Memcached. Redis, Coherence
3. Column-oriented: Examples - Hbase, Big Table, Accumulo
4. Document-based: Examples - MongoDB, CouchDB, Cloudant

1. Graph Databases: Examples - Amazon Neptune, Neo4j


Graph-based databases focus on the relationship between the elements. It stores the data in the form of nodes
in the database. The connections between the nodes are called links or relationships.

2. Key value store: Examples - Memcached. Redis, Coherence


A key-value store is a nonrelational database. The simplest form of a NoSQL database is a key-value store.
Every data element in the database is stored in key-value pairs. The data can be retrieved by using a unique
key allotted to each element in the database. The values can be simple data types like strings and numbers or
complex objects.

1
3. Column-oriented: Examples - Hbase, Big Table, Accumulo
A column-oriented database is a non-relational database that stores the data in columns instead of rows. That
means when we want to run analytics on a small number of columns, you can read those columns directly
without consuming memory with the unwanted data.
Columnar databases are designed to read data more efficiently and retrieve the data with greater speed. A
columnar database is used to store a large amount of data.

4. Document-based: Examples - MongoDB, CouchDB, Cloudant


The document-based database is a nonrelational database. Instead of storing the data in rows and columns
(tables), it uses the documents to store the data in the database. A document database stores data in JSON,
BSON, or XML documents.

Documents can be stored and retrieved in a form that is much closer to the data objects used in applications
which means less translation is required to use these data in the applications. In the Document database, the
particular elements can be accessed by using the index value that is assigned for faster querying.

4. Differentiate between SQL and NoSQL databases.


SQL NoSQL
Databases are categorized as Relational Database NoSQL databases are categorized as Non-relational or
Management System (RDBMS). distributed database system.
SQL databases have fixed or static or predefined NoSQL database have dynamic schema
schema.
SQL databases display data in form of tables so it is NoSQL databases display data as collection of key-
known as table-based database. value pair, documents, graph databases or wide-
column stores.
SQL databases are vertically scalable. NoSQL databases are horizontally scalable.
SQL databases use a powerful language "Structured In NoSQL databases, collection of documents are used
Query Language" to define and manipulate the data. to query the data. It is also called unstructured query
language. It varies from database to database.
SQL databases are best suited for complex queries. NoSQL databases are not so good for complex queries
because these are not as powerful as SQL queries.
SQL databases are not best suited for hierarchical data NoSQL databases are best suited for hierarchical data
storage. storage.
MySQL, Oracle, Sqlite, PostgreSQL and MS-SQL etc. MongoDB, BigTable, Redis, RavenDB, Cassandra,
are the example of SQL database. Hbase, Neo4j, CouchDB etc. are the example of nosql
database

5. What is database scalability? Explain.


Database scalability refers to the ability of a database to handle increasing amounts of data, numbers of
users, and types of requests without sacrificing performance or availability. A scalable database tackles
these database server challenges and adapts to growing demands by either adding resources such as
hardware or software, by optimizing its design and configuration, or by undertaking some combined
strategy.
Vertical database scalability means adding more processing power and memory to a single server.
Horizontal database scalability means adding more servers (i.e., database nodes) to distribute the
workload.
Techniques such as sharding or replication are integral to horizontal scalability. These methods partition
and distribute data across multiple servers to improve performance and reliability.
Some scalable NoSQL databases use a distributed architecture to achieve high scalability and
availability. These databases are often designed to be horizontally scalable, meaning that data is
partitioned across multiple servers, and new servers can be added to the system to increase capacity as
needed.

2
6. Discuss horizontal and vertical scalability in NoSQL databases.
Scaling alters the size of a system. In the scaling process, we either compress or expand the system to
meet the expected needs. The scaling operation can be achieved by adding resources to meet the smaller
expectation in the current system, by adding a new system to the existing one, or both.
Types of Scaling:
Scaling can be categorized into 2 types:
Vertical Scaling: When new resources are added to the existing system to meet the expectation, it is
known as vertical scaling.
Consider a rack of servers and resources that comprises the existing system. (as shown in the figure).
Now when the existing system fails to meet the expected needs, and the expected needs can be met by
just adding resources, this is considered vertical scaling. Vertical scaling is based on the idea of adding
more power(CPU, RAM) to existing systems, basically adding more resources.
Vertical scaling is not only easy but also cheaper than Horizontal Scaling. It also requires less time to be
fixed.

Horizontal Scaling: When new server racks are added to the existing system to meet the higher
expectation, it is known as horizontal scaling.

7. Explain ACID properties in RDBMS and in distributed systems.


To maintain consistency in a database, before and after a transaction,
ACID properties are required. ACID stands for Atomicity, Consistency,
Isolation and Durability:
• Atomicity – Guarantees that each transaction is accurately
executed. If not, the process will stop and the database will revert
back to its previous state. This prevents data corruption or loss to the
dataset.
• Consistency – A processed transaction will never endanger the
structural integrity of the database. Ensures that a processed
transaction does not affect the validity of the database, by only
allowing updates according to established rules and policies
• Isolation – Transactions cannot compromise the integrity of other
transactions by interacting with them while they are still in progress.
• Durability – committed transactions will remain committed even
upon system failures

Methods to Achieve ACID Compliance on Distributed Systems


Two-Phase Commit
One common approach for maintaining atomicity is to use the Two Phase Commit (2PC) method, which
enables coordinated transaction management over a distributed system. Coordination refers to the process of
agreement between the distributed system nodes to ensure that the transaction is not committed until all
partitions in the distributed environment acknowledge the transaction.

Multi-Version Concurrency Control


A common approach for ensuring isolation as well as atomicity is the Multi Version Concurrency
Control (MVCC), an effective algorithm that creates point-in-time consistent snapshots in a data system.
With this method, the data is not overridden but is versioned

3
8. Differences between Horizontal and Vertical Scaling.

Document Databases (eg: mongodb)

9. Explain aggregate models in NoSQL. (Document Databases, Ky-Value Stores, Column-


family stores, graph databases)

4
5
23. Differentiate between RDBMS and MongoDB.

Apart from the terms differences, a few other differences are shown below
1. Relational databases are known for enforcing data integrity. This is not an explicit requirement in
MongoDB.
2. RDBMS requires that data be normalized first so that it can prevent orphan records and duplicates
Normalizing data then has the requirement of more tables, which will then result in more table joins,
thus requiring more keys and indexes. As databases start to grow, performance can start becoming an
issue. Again this is not an explicit requirement in MongoDB. MongoDB is flexible and does not need the
data to be normalized first.

24. Give the structure of document in MongoDB. Explain with example. (same as 12)

6
25. Explain MongoDB data Modelling with example.

• The data in MongoDB has a flexible schema.


• Unlike in SQL databases, where you must have a table’s schema declared before inserting data,
MongoDB’s collections do not enforce document structure.
• This sort of flexibility is what makes MongoDB so powerful.
• When modeling data in Mongo, keep the following things in mind
1. What are the needs of the application – Look at the business needs of the application and see
what data and the type of data needed for the application. Based on this, ensure that the structure
of the document is decided accordingly.
2. What are data retrieval patterns – If you foresee a heavy query usage then consider the use of
indexes in your data model to improve the efficiency of queries.
3. Are frequent inserts, updates and removals happening in the database? Reconsider the use of
indexes or incorporate sharding if required in your data modeling design to improve the
efficiency of your overall MongoDB environment.

The below example shows how a document can be modelled in MongoDB.


1. The _id field is added by MongoDB to uniquely identify the document in the
collection.
2. What you can note is that the Order Data (OrderID, Product, and Quantity) which in
RDBMS will normally be stored in a separate table, while in MongoDB it is actually
stored as an embedded document in the collection itself. This is one of the key
differences in how data is odelled in MongoDB.

7
26. Give the methods to create, update, read and delete document/documents in MongoDB
with example.

8
9
27. Explain the methods: - find (), pretty (), skip (), limit (), sort ()
The find() Method
To query data from MongoDB collection, you need to use MongoDB's find() method.
Syntax
The basic syntax of find() method is as follows −
>db.COLLECTION_NAME.find()
find() method will display all the documents in a non-structured way.
Example
Assume we have created a collection named mycol as –

> use sampleDB


switched to db sampleDB
> db.createCollection("mycol")
{ "ok" : 1 }
>

The pretty() Method


To display the results in a formatted way, you can use pretty() method.
Syntax
>db.COLLECTION_NAME.find().pretty()
Example
Following example retrieves all the documents from the collection named mycol and arranges them in an
easy-to-read format.
> db.mycol.find().pretty()
Skip() Method
Apart from limit() method, there is one more method skip() which also accepts number type argument and is
used to skip the number of documents.
Syntax
The basic syntax of skip() method is as follows −
>db.COLLECTION_NAME.find().limit(NUMBER).skip(NUMBER)
Example
Following example will display only the second document.

>db.mycol.find({},{"title":1,_id:0}).limit(1).skip(1)
{"title":"NoSQL Overview"}
>

The Limit() Method


To limit the records in MongoDB, you need to use limit() method. The method accepts one number type
argument, which is the number of documents that you want to be displayed.
Syntax
The basic syntax of limit() method is as follows −
>db.COLLECTION_NAME.find().limit(NUMBER)
Example
Consider the collection myycol has the following data.

{_id : ObjectId("507f191e810c19729de860e1"), title: "MongoDB Overview"},


{_id : ObjectId("507f191e810c19729de860e2"), title: "NoSQL Overview"},
{_id : ObjectId("507f191e810c19729de860e3"), title: "Tutorials Point Overview"}

10
The sort() Method
To sort documents in MongoDB, you need to use sort() method. The method accepts a document containing
a list of fields along with their sorting order. To specify sorting order 1 and -1 are used. 1 is used for
ascending order while -1 is used for descending order.
Syntax
The basic syntax of sort() method is as follows −

>db.COLLECTION_NAME.find().sort({KEY:1})

Example
Consider the collection myycol has the following data.

{_id : ObjectId("507f191e810c19729de860e1"), title: "MongoDB Overview"}


{_id : ObjectId("507f191e810c19729de860e2"), title: "NoSQL Overview"}
{_id : ObjectId("507f191e810c19729de860e3"), title: "Tutorials Point Overview"}

28. What is the meaning of projecting fields in MongoDB? Explain with examples.
In MongoDB, projection means selecting only the necessary data rather
than selecting whole of the data of a document. If a document has 5
fields and you need to show only 3, then select only 3 fields from them.

The find() Method


To query data from MongoDB collection, you need to use MongoDB's find() method.
Syntax
The basic syntax of find() method is as follows −
>db.COLLECTION_NAME.find()
find() method will display all the documents in a non-structured way.
Example
Assume we have created a collection named mycol as −

> use sampleDB


switched to db sampleDB
> db.createCollection("mycol")
{ "ok" : 1 }
>

29. What are aggregation pipeline stages in MongoDB? List and give their use.
($project, $match, $group, $sort, $skip, $limit, $unwind, $out)
• Aggregation pipeline means the possibility to execute an operation on some input and use the output
as the input for the next command and so on. MongoDB also supports same concept in aggregation
framework. There is a set of possible stages and each of those is taken as a set of documents as an
input and produces a resulting set of documents (or the final resulting JSON document at the end of
the pipeline). This can then in turn be used for the next stage and so on.

• $project − Used to select some specific fields from a collection.


• $match − This is a filtering operation and thus this can reduce the amount of documents that are
given as input to the next stage.
• $group − This does the actual aggregation as discussed above.
• $sort − Sorts the documents.

11
• $skip − With this, it is possible to skip forward in the list of documents for a given amount of
documents.
• $limit − This limits the amount of documents to look at, by the given number starting from the
current positions.
• $unwind − This is used to unwind document that are using arrays. When using an array, the data is
kind of pre-joined and this operation will be undone with this to have individual documents again.
Thus with this stage we will increase the amount of documents for the next stage.

30. Explain MongoDB $reg regular expression operator with example.


Regular Expressions are frequently used in all languages to search for a pattern or word in any string.
MongoDB also provides functionality of regular expression for string pattern matching using
the $regex operator. MongoDB uses PCRE (Perl Compatible Regular Expression) as regular expression
language.
Unlike text search, we do not need to do any configuration or command to use regular expressions.

For example:
Assume we have inserted a document in a database named posts;
Using regex Expression
The following regex query searches for all the posts containing string tutorialspoint in it –

> db.posts.find({post_text:{$regex:"tutorialspoint"}}).pretty()
{
"_id" : ObjectId("5dd7ce28f1dd4583e7103fe0"),
"post_text" : "enjoy the mongodb articles on tutorialspoint",
"tags" : [
"mongodb",
"tutorialspoint"
]
}

The same query can also be written as −

>db.posts.find({post_text:/tutorialspoint/})

12
HBase
31. What is HBase?
• HBase is a distributed column-oriented database built on top of the Hadoop file system.
• It is an open-source project and is horizontally scalable.
• HBase is a data model that is like Google’s big table designed to provide quick random access to
huge amounts of structured data. It leverages the fault tolerance provided by the Hadoop File System
(HDFS).
• It is a part of the Hadoop ecosystem that provides random real-time read/write access to data in the
Hadoop File System.
• One can store the data in HDFS either directly or through HBase.
• Data consumer reads/accesses the data in HDFS randomly using HBase. HBase sits on top of the
Hadoop File System and provides read and write access.

32. How is storage mechanism in HBase? Explain.


• HBase is a column-oriented database and the tables in it are sorted by row.
• The table schema defines only column families, which are the key value pairs.
• A table have multiple column families and each column family can have any number of columns.
• Subsequent column values are stored contiguously on the disk. Each cell value of the table has a
timestamp.
• In short, in an HBase:
o Table is a collection of rows.
o Row is a collection of column families.
o Column family is a collection of columns.
o Column is a collection of key value pairs.
• Given below is an example schema of table in HBase.

13
33. Differentiate between HBase and RDBMS.

CouchDB

34. What is CouchDB? Why to use CouchDB?


• CouchDB is an open-source NoSQL database which focuses on ease of use. It is developed
• by Apache.
• It is fully compatible to web. CouchDB uses JSON to store data, JavaScript as its query language to
transform the documents, using MapReduce, and HTTP for an API.

Why CouchDB:
• CouchDB is easy to use. There is one word to describe CouchDB “Relax” It is also the byline of
CouchDB official logo.
• CouchDB have an HTTP-based REST API, which makes communication with the database very
easy.
• CouchDB has the simple structure of HTTP resources and methods (GET, PUT, DELETE) that are
easy to understand and use.
• In CouchDB, data is stored in the flexible document-based structure so, there is no need to worry
about the structure of the data.
• CouchDB facilitates users with powerful data mapping, which allows querying, combining, and
filtering the information.
• CouchDB provides easy-to-use replication, using which you can copy, share, and synchronize the
data between databases and machines.

14
35. Give the features of CouchDB.
Following is a list of most attractive features of CouchDB:
• Document Storage: CouchDB is a NoSQL database which follows document storage. Documents are
the primary unit of data where each field is uniquely named and contains values of various data types
such as text, number, Boolean, lists, etc. Documents don't have a set limit to text size or element
count.
• Browser Based GUI: CouchDB provides an interface Futon which facilitates a browser-based GUI to
handle your data, permission and configuration.
• Replication: CouchDB provides the simplest form of replication. There is no other database is so
simple to replicate.
• ACID Properties: The CouchDB file layout follows all the features of ACID properties. Once the
data is entered in to the disc, it will not be overwritten. Document updates (add, edit, delete) follow
Atomicity, i.e., they will be saved completely or not saved at all. The database will not have any
partially saved or edited documents. Almost all of these updates are serialized and any number of
clients can read a document without waiting and without being interrupted.
• JSONP for Free: If you update your config to allow_jsonp = true then your database is accessible
cross domain for GET requests. Authentication and Session Support: CouchDB facilitates you to
keep authentication open via a session cookie like web application.
• Security: CouchDB also provides database-level security. The permissions per database are separated
into readers and admins. Readers can both read and write to the database.
• Validation: You can validate the inserted data into the database by combining with authentication to
ensure the creator of the document is the one who is logged in.
• Map/Reduce List and Show: The main reason behind the popularity of MongoDB and CouchDB is
map/reduce system.

36. Explain the architecture of CouchDB.


• CouchDB Engine: It is based on B-tree and in it, data is accessed by keys or key ranges which map
directly to the underlying B-tree operations. It is the core of the system which manages to store
internal data, documents, and views.
• HTTP Request: It is used to create indices and extract data from documents. It is written in
JavaScript that allows creating Adhoc views that are made of MapReduce jobs.
• Document: It stores a large amount of data.
• Replica Database: It is used for replicating data to a local or remote database and synchronizing
design documents.

15
37. Differentiate between MongoDB and CouchDB.

38. How to communicate with CouchDB using curl? Explain.


cURL utility is a way to communicate with CouchDB.
It is a tool to transfer data from or to a server, using one of the supported protocols (HTTP, HTTPS, FTP,
FTPS, TFTP, DICT, TELNET, LDAP or FILE). The command is designed to work without user interaction.
The cURL utility is available in operating systems such as UNIX, Linux, Mac OS X and Windows.
Using cURL Utility
You can access any website using cURL utility by simply typing cURL followed by the website address as
shown below −

curl www.Facebook.com/

You can access the homepage of the CouchDB by sending a GET request to the CouchDB instance
installed.

curl http://127.0.0.1:5984/

This gives you a JSON document as shown below where CouchDB specifies the details such as version
number, name of the vendor, and version of the software.

16
$ curl http://127.0.0.1:5984/
{
"couchdb" : "Welcome",
"uuid" : "8f0d59acd0e179f5e9f0075fa1f5e804",
"version" : "1.6.1",
"vendor" : {
"name":"The Apache Software Foundation",
"version":"1.6.1"
}
}

List of All Databases


Following is the syntax to get the list of all databases in CouchDB.

curl -X GET http://127.0.0.1:5984/_all_dbs

It gives you the list of all databases in CouchDB as shown below.

$ curl -X GET http://127.0.0.1:5984/_all_dbs


[ "_replicator" , "_users" ]

39. Explain curl utility commands to create, update, delete documents with examples.
Creating a Database
You can create a database in CouchDB using cURL with PUT header using the following syntax −
$ curl -X PUT http://127.0.0.1:5984/database_name
Example
As an example, using the above given syntax create a database with name my_database as shown below.

$ curl -X PUT http://127.0.0.1:5984/my_database


{"ok":true}

Verification
Verify whether the database is created, by listing out all the databases as shown below. Here you can
observe the name of newly created database, "my_database" in the list

$ curl -X GET http://127.0.0.1:5984/_all_dbs

[ "_replicator " , "_users" , "my_database" ]

17
Updating Documents using cURL
You can update a document in CouchDB by sending an HTTP request to the server using PUT method
through cURL utility. Following is the syntax to update a document.

curl -X PUT http://127.0.0.1:5984/database_name/document_id/ -d '{ "field" : "value", "_rev" : "revision id"


}'

Example
Suppose there is a document with id 001 in the database named my_database. You can delete this as shown
below.
First of all, get the revision id of the document that is to be updated. You can find the _rev of the document
in the document itself, therefore get the document as shown below.

$ curl -X GET http://127.0.0.1:5984/my_database/001


{
"_id" : "001",
"_rev" : "2-04d8eac1680d237ca25b68b36b8899d3 " ,
"age" : "23"
}

Deleting a Document using cURL Utility


You can delete a document in CouchDB by sending an HTTP request to the server using DELETE method
through cURL utility. Following is the syntax to delete a document.

curl -X DELETE http : // 127.0.0.1:5984 / database name/database id?_rev id

Using −X, we can specify a custom request method of HTTP we are using, while communicating with the
HTTP server. In this case, we are using Delete method. To delete a database /database_name/database_id/ is
not enough. You have to pass the recent revision id through the url. To mention attributes of any data
structure "?" is used.
Example
Suppose there is a document in database named my_database with document id 001. To delete this
document, you have to get the rev id of the document. Get the document data as shown below.

$ curl -X GET http://127.0.0.1:5984/my_database/001


{
" _id " : " 001 ",
" _rev " : " 2-04d8eac1680d237ca25b68b36b8899d3 " ,
" age " : " 23 "
}

18
40. Explain CouchDB HTTP API.
Using HTTP request headers, you can communicate with CouchDB. Through these requests we can
retrieve data from the database, store data in to the database in the form of documents, and we can view
as well as format the documents stored in a database.
While communicating with the database we will use different request formats like get, head, post, put,
delete, and copy. For all operations in CouchDB, the input data and the output data structures will be in
the form of JavaScript Object Notation (JSON) object.

Following are the different request formats of HTTP Protocol used to communicate with CouchDB.

• GET − This format is used to get a specific item. To get different items, you have to send specific
url patterns. In CouchDB using this GET request, we can get static items, database documents and
configuration, and statistical information in the form of JSON documents (in most cases).
• HEAD − The HEAD method is used to get the HTTP header of a GET request without the body of
the response.
• POST − Post request is used to upload data. In CouchDB using POST request, you can set values,
upload documents, set document values, and can also start certain administration commands.
• PUT − Using PUT request, you can create new objects, databases, documents, views and design
documents.
• DELETE − Using DELETE request, you can delete documents, views, and design documents.
• COPY − Using COPY method, you can copy documents and objects.

HTTP Request Formats

While communicating with the database we will use different request formats like get, head, post, put,
delete, and copy. For all operations in CouchDB, the input data and the output data structures will be in the
form of JavaScript Object Notation (JSON) object.

Following are the different request formats of HTTP Protocol used to communicate with CouchDB.

• GET − This format is used to get a specific item. To get different items, you have to send specific
url patterns. In CouchDB using this GET request, we can get static items, database documents and
configuration, and statistical information in the form of JSON documents (in most cases).
• HEAD − The HEAD method is used to get the HTTP header of a GET request without the body of
the response.
• POST − Post request is used to upload data. In CouchDB using POST request, you can set values,
upload documents, set document values, and can also start certain administration commands.
• PUT − Using PUT request, you can create new objects, databases, documents, views and design
documents.
• DELETE − Using DELETE request, you can delete documents, views, and design documents.
• COPY − Using COPY method, you can copy documents and objects.

Response Headers

These are the headers of the response sent by the server. These headers give information about the content
send by the server as response.

• Content-type − This header specifies the MIME type of the data returned by the server. For most
request, the returned MIME type is text/plain.
• Cache-control − This header suggests the client about treating the information sent by the server.
CouchDB mostly returns the must-revalidate, which indicates that the information should be
revalidated if possible.
• Content-length − This header returns the length of the content sent by the server, in bytes.
• Etag − This header is used to show the revision for a document, or a view.

19
Cassandra

41. What is Cassandra? Why to use it?


• Cassandra is defined as an open-source NoSQL data storage system that leverages a distributed
architecture to enable high availability, scalability, and reliability, managed by the Apache non-profit
organization.
• Cassandra is an open-source NoSQL distributed database that manages large amounts of data across
commodity servers. It is a decentralized, scalable storage system designed to handle vast volumes of
data across multiple commodity servers, providing high availability without a single point of failure.

42. Give important points of Cassandra.


• It is scalable, fault-tolerant, and consistent.
• It is a column-oriented database.
• Its distribution design is based on Amazon’s Dynamo and its data model on Google’s
Bigtable.
• Created at Facebook, it differs sharply from relational database management systems.
• Cassandra implements a Dynamo-style replication model with no single point of
failure, but adds a more powerful “column family” data model.
• Cassandra is being used by some of the biggest companies such as Facebook, Twitter, Cisco,
Rackspace, ebay, Twitter, Netflix, and more.

43. What are the features of Cassandra?


Cassandra has become so popular because of its outstanding technical features. Given below
are some of the features of Cassandra:
• Elastic scalability − Cassandra is highly scalable; it allows to add more hardware to
accommodate more customers and more data as per requirement.
• Always on architecture − Cassandra has no single point of failure and it is
continuously available for business-critical applications that cannot afford a failure.
• Fast linear-scale performance − Cassandra is linearly scalable, i.e., it increases your
throughput as you increase the number of nodes in the cluster. Therefore it maintains
a quick response time.
• Flexible data storage − Cassandra accommodates all possible data formats including:
structured, semi-structured, and unstructured. It can dynamically accommodate
changes to your data structures according to your need.
• Easy data distribution − Cassandra provides the flexibility to distribute data where
you need by replicating data across multiple data centers.
• Transaction support − Cassandra supports properties like Atomicity, Consistency,
Isolation, and Durability (ACID).
• Fast writes − Cassandra was designed to run on cheap commodity hardware. It
performs blazingly fast writes and can store hundreds of terabytes of data, without
sacrificing the read efficiency.

20
44. Diagrammatically explain Cassandra architecture with its components.
o Cassandra was designed to handle big data workloads across multiple nodes without a single point of
failure. It has a peer-to-peer distributed system across its nodes, and data is distributed among all the
nodes in a cluster.
o In Cassandra, each node is independent and at the same time interconnected to other nodes. All the
nodes in a cluster play the same role.
o Every node in a cluster can accept read and write requests, regardless of where the data is actually
located in the cluster.
o In the case of failure of one node, Read/Write requests can be served from other nodes in the network.

The main components of Cassandra are:


o Node: A Cassandra node is a place where data is stored.
o Data center: Data center is a collection of related nodes.
o Cluster: A cluster is a component which contains one or more data centers.
o Commit log: In Cassandra, the commit log is a crash-recovery mechanism. Every write operation
is written to the commit log.
o Mem-table: A mem-table is a memory-resident data structure. After commit log, the data will be
written to the mem-table. Sometimes, for a single-column family, there will be multiple mem-
tables.
o SSTable: It is a disk file to which the data is flushed from the mem-table when its contents reach
a threshold value.
o Bloom filter: These are nothing but quick, nondeterministic, algorithms for testing whether an
element is a member of a set. It is a special kind of cache. Bloom filters are accessed after every
query.

21
45. What is CQL? Explain write and read operations in Cassandra.
Cassandra Query Language (CQL) is used to access Cassandra through its nodes. CQL treats the
database (Keyspace) as a container of tables. Programmers use cqlsh: a prompt to work with CQL or
separate application language drivers.

The client can approach any of the nodes for their read-write operations. That node (coordinator) plays a
proxy between the client and the nodes holding the data.

Write Operations:

Every write activity of nodes is captured by the commit logs written in the nodes. Later the data will be
captured and stored in the mem-table. Whenever the mem-table is full, data will be written into the SStable
data file. All writes are automatically partitioned and replicated throughout the cluster. Cassandra periodically
consolidates the SSTables, discarding unnecessary data.

Read Operations
In Read operations, Cassandra gets values from the mem-table and checks the bloom filter to find the
appropriate SSTable which contains the required data.
There are three types of read request that is sent to replicas by coordinators.
o Direct request
o Digest request
o Read repair request
The coordinator sends direct request to one of the replicas. After that, the coordinator sends the digest request
to the number of replicas specified by the consistency level and checks if the returned data is an updated data.
After that, the coordinator sends digest request to all the remaining replicas. If any node gives out of date
value, a background read repair request will update that data. This process is called read repair mechanism.

22
46. Explain Cassandra Data Model: Cluster & Keyspace.

Data model in Cassandra is totally different from normally we see in RDBMS. Let's see how
Cassandra stores its data.

Cluster
Cassandra database is distributed over several machines that are operated together. The
outermost container is known as the Cluster which contains different nodes. Every node
contains a replica, and in case of a failure, the replica takes charge. Cassandra arranges the
nodes in a cluster, in a ring format, and assigns data to them.

Keyspace
Keyspace is the outermost container for data in Cassandra. Following are the basic attributes of
Keyspace in Cassandra:
• Replication factor: It specifies the number of machine in the cluster that will receive copies of the
same data.
• Replica placement Strategy: It is a strategy which species how to place replicas in the ring.
There are three types of strategies such as:
1) Simple strategy (rack-aware strategy)
2) old network topology strategy (rack-aware strategy)
3) network topology strategy (datacenter-shared strategy)
• Column families: column families are placed under keyspace. A keyspace is a container for a list of
one or more column families while a column family is a container of a collection of rows.
• Each row contains ordered columns. Column families represent the structure of your data.
• Each keyspace has at least one and often many column families. In Cassandra, a well data model is
very important because a bad data model can degrade
• performance, especially when you try to implement the RDBMS concepts on Cassandra.

47. Explain the concept of indexing and ordering in CouchDB & Cassandra.

INDEXING AND ORDERING IN COUCHDB


• Unlike MongoDB, the indexing features in CouchDB are automatic and triggered for all changed
data sets when they are read first after the change.
• When data is accessed for the first time a B-tree index is built out of this data. On subsequent
querying the data is returned from the B-tree and the underlying data is untouched.
• This means queries beyond the very first one leverage the B-tree index.

• The B-tree Index in CouchDB:

A B-tree index scales well for large amounts of data. In CouchDB, the B-tree implementation has
specialized features like MultiVersion Concurrency Control and append-only design. MultiVersion
Concurrency Control (MVCC) implies that multiple reads and writes can occur in parallel without
the need for exclusive locking. All writes are sequenced and reads are not impacted by writes. An
“append-only” design refers to a data storage approach where new data is only added (appended) to
the database, and existing data is not updated or deleted.

23
INDEXING IN APACHE CASSANDRA:
• Indexing in Apache Cassandra is a way to improve the efficiency and performance of queries on non-
primary key columns.
• In Cassandra, data is organized in tables and each table has a primary key, which consists of one or
more columns that uniquely identify each row in the table.
• Queries that use the primary key to retrieve data are very efficient, but queries that use other columns
in the WHERE clause can be slower.
• Cassandra has secondary indexes that enable querying on columns other than the main key columns
to solve this problem.
• A secondary index is built on a table’s column, and it maintains a different index data structure that
associates the values of the indexed column with the associated table rows.
• Searching up the rows in the index and then obtaining the relevant data from the table, enables
queries on that column to be processed quickly.

48. Differentiate between HBase and Cassandra.

24
49. Differentiate between RDBMS and Cassandra.

25
50. Explain the concept of index in Apache Cassandra.
Index:
• As we can access data using attributes which having the partition key.
• For Example, if Emp_id is a column name for Employee table and if it is partition key of that table
then we can filter or search data with the help of partition key.
• In this case we can used WHERE clause to define condition over attribute and to search data.
• But suppose if there exists a column which is not a partition key of that table and we want to filter or
to search or to access data using WHERE clause then the query will not be executed and will give an
error.
• So, to access data in that case using attributes other than the partition key for fast and efficient
lookup of data matching a given condition then we need to define index. It can be used for various
purpose like for collections, static columns, collection columns, and any other columns except
counter columns.

When to use an Index:


• Built-in indexes are the best option on a table which having many rows and that rows contain the
indexed value.
• In a particular column which column having more unique values in that case we can used indexing.
• Table which more overhead due to several reason like column having more entries then in that case
we can used indexing.
• To query and maintain the index we can used the indexing which is always a good option in that
case.
• Example:
Suppose you had a cricket match entry table with a million entries for player’s in hundreds of
matches and wanted to look up player’s rank by the number of match played. Many player’s ranks
will share the same column value for match year. The match_year column is a good option for an
index.

51. Write a short note on Cassandra Collections.


CQL provides the facility of using Collection data types. Using these Collection types, you can store
multiple values in a single variable.

List
List is used in the cases where
• the order of the elements is to be maintained, and
• a value is to be stored multiple times.
You can get the values of a list data type using the index of the elements in the list.
Creating a Table with List
Given below is an example to create a sample table with two columns, name and email. To store multiple
emails, we are using list.
cqlsh:tutorialspoint> CREATE TABLE data(name text PRIMARY KEY, email list<text>);

SET
Set is a data type that is used to store a group of elements. The elements of a set will be returned in a sorted
order.
Creating a Table with Set
The following example creates a sample table with two columns, name and phone. For storing multiple
phone numbers, we are using set.
cqlsh:tutorialspoint> CREATE TABLE data2 (name text PRIMARY KEY, phone set<varint>);

26
MAP
Map is a data type that is used to store a key-value pair of elements.
Creating a Table with Map
The following example shows how to create a sample table with two columns, name and address. For
storing multiple address values, we are using map.
cqlsh:tutorialspoint> CREATE TABLE data3 (name text PRIMARY KEY, address map<timestamp, text>);

Redis

52. What is Redis? Give its features.


Redis is a NoSQL database which follows the principle of key-value store. The key-value store provides
ability to store some data called a value, inside a key. You can recieve this data later only if you know the
exact key used to store it.
Redis is a flexible, open-source (BSD licensed), in-memory data structure store, used as database, cache,
and message broker. Redis is a NoSQL database so it facilitates users to store huge amount of data
without the limit of a Relational database.
Redis supports various types of data structures like strings, hashes, lists, sets, sorted sets, bitmaps,
hyperloglogs and geospatial indexes with radius queries.

Features of Redis
• Speed: Redis stores data in primary memory, blazing at 110,000 SETs/second and 81,000
GETs/second even on basic Linux setups. Command pipelining and multi-value commands
turbocharge communication.
• Persistence: Data lives in memory but saves changes asynchronously on disk based on time or
update count. Supports append-only file persistence.
• Data Structures: Supports strings, hashes, sets, lists, sorted sets, bitmaps, hyperloglogs, and
geospatial indexes, enabling diverse data manipulation.
• Atomic Operations: Operations on different data types are atomic, ensuring safety for various
actions like setting keys, adding/removing set elements, or increasing counters.
• Supported Languages: Offers extensive language support from ActionScript to Tcl, catering to
diverse developer preferences.
• Master/Slave Replication: Simple setup with one line in the config file. Syncs 10 million keys in 21
seconds on an Amazon EC2 instance.
• Sharding: Effortlessly distributes datasets across multiple instances, simplifying scalability.
• Portability: Written in ANSI C, works on Linux, BSD, macOS, and more, but lacks official
Windows support (although it might work with Cygwin).

27
53. Explain Redis Architecture.
There are two main processes in Redis architecture:
o Redis Client
o Redis Server
These client and server can be on same computer or two different computers.

Redis server is used to store data in memory . It controls all type of management and forms the main part
of the architecture. You can create a Redis client or Redis console client when you install Redis
application or you can use.

54. Explain Redis keys, strings, hashes, lists, sets, sorted sets, transactions commands with examples.

Redis keys:
Redis keys commands are used for managing keys in Redis. Following is the syntax for using redis keys
commands.
Syntax
redis 127.0.0.1:6379> COMMAND KEY_NAME
Example
redis 127.0.0.1:6379> SET Saail redis
OK
redis 127.0.0.1:6379> DEL Saail
(integer) 1
In the above example, DEL is the command, while Saail is the key. If the key is deleted, then the output of
the command will be (integer) 1, otherwise it will be (integer) 0.
Similarly we can use DEL, DUMP, Exists, TTL, etc

Redis Strings
Redis strings commands are used for managing string values in Redis. Following is the syntax for using
Redis string commands.
Syntax
redis 127.0.0.1:6379> COMMAND KEY_NAME
Example
redis 127.0.0.1:6379> SET tutorialspoint redis
OK
redis 127.0.0.1:6379> GET tutorialspoint
"redis"
In the above example, SET and GET are the commands, while tutorialspoint is the key.
28
Redis Hashes:
Redis Hashes are maps between the string fields and the string values. Hence, they are the perfect data type
to represent objects.
In Redis, every hash can store up to more than 4 billion field-value pairs.

Example
redis 127.0.0.1:6379> HMSET tutorialspoint name "redis tutorial"
description "redis basic commands for caching" likes 20 visitors 23000
OK
redis 127.0.0.1:6379> HGETALL tutorialspoint
1) "name"
2) "redis tutorial"
3) "description"
4) "redis basic commands for caching"
5) "likes"
6) "20"
7) "visitors"
8) "23000"

In the above example, we have set Redis tutorials detail (name, description, likes, visitors) in hash named
‘tutorialspoint’.

Similarly we can use: HDEL, HGET, HEXISTS, HKEYS, etc…

Redis Lists:
Redis Lists are simply lists of strings, sorted by insertion order. You can add elements in Redis lists in the
head or the tail of the list.
Maximum length of a list is 232 - 1 elements (4294967295, more than 4 billion of elements per list).

Example
redis 127.0.0.1:6379> LPUSH tutorials redis
(integer) 1
redis 127.0.0.1:6379> LPUSH tutorials mongodb
(integer) 2
redis 127.0.0.1:6379> LPUSH tutorials mysql
(integer) 3
redis 127.0.0.1:6379> LRANGE tutorials 0 10
1) "mysql"
2) "mongodb"
3) "redis"

In the above example, three values are inserted in Redis list named ‘tutorials’ by the command LPUSH.

29
Redis SETS:

Redis Sets are an unordered collection of unique strings. Unique means sets does not allow repetition of data
in a key.

In Redis set add, remove, and test for the existence of members in O(1) (constant time regardless of the
number of elements contained inside the Set). The maximum length of a list is 232 - 1 elements
(4294967295, more than 4 billion of elements per set).
Example
redis 127.0.0.1:6379> SADD tutorials redis
(integer) 1
redis 127.0.0.1:6379> SADD tutorials mongodb
(integer) 1
redis 127.0.0.1:6379> SADD tutorials mysql
(integer) 1
redis 127.0.0.1:6379> SADD tutorials mysql
(integer) 0
redis 127.0.0.1:6379> SMEMBERS tutorials
1) "mysql"
2) "mongodb"
3) "redis"

In the above example, three values are inserted in Redis set named ‘tutorials’ by the command SADD.

Redis Sorted Sets:

Redis Sorted Sets are similar to Redis Sets with the unique feature of values stored in a set. The difference
is, every member of a Sorted Set is associated with a score, that is used in order to take the sorted set
ordered, from the smallest to the greatest score.

In Redis sorted set, add, remove, and test for the existence of members in O(1) (constant time regardless of
the number of elements contained inside the set). Maximum length of a list is 232 - 1 elements (4294967295,
more than 4 billion of elements per set).

Example
redis 127.0.0.1:6379> ZADD tutorials 1 redis
(integer) 1
redis 127.0.0.1:6379> ZADD tutorials 2 mongodb
(integer) 1
redis 127.0.0.1:6379> ZADD tutorials 3 mysql
(integer) 1
redis 127.0.0.1:6379> ZADD tutorials 3 mysql
(integer) 0
redis 127.0.0.1:6379> ZADD tutorials 4 mysql
(integer) 0
redis 127.0.0.1:6379> ZRANGE tutorials 0 10 WITHSCORES
30
1) "redis"
2) "1"
3) "mongodb"
4) "2"
5) "mysql"
6) "4"

In the above example, three values are inserted with its score in Redis sorted set named ‘tutorials’ by the
command ZADD.

Redis Transactions:

Redis transactions allow the execution of a group of commands in a single step. Following are the two
properties of Transactions.

• All commands in a transaction are sequentially executed as a single isolated operation. It is not
possible that a request issued by another client is served in the middle of the execution of a Redis
transaction.
• Redis transaction is also atomic. Atomic means either all of the commands or none are processed.

Sample

Redis transaction is initiated by command MULTI and then you need to pass a list of commands that should
be executed in the transaction, after which the entire transaction is executed by EXEC command.

redis 127.0.0.1:6379> MULTI


OK
List of commands here
redis 127.0.0.1:6379> EXEC
Example

Following example explains how Redis transaction can be initiated and executed.

redis 127.0.0.1:6379> MULTI


OK
redis 127.0.0.1:6379> SET tutorial redis
QUEUED
redis 127.0.0.1:6379> GET tutorial
QUEUED
redis 127.0.0.1:6379> INCR visitors
QUEUED
redis 127.0.0.1:6379> EXEC
1) OK
2) "redis"
3) (integer) 1

31
55. Differentiate between Redis and MongoDB & Redis and RDBMS.
Redis vs MongoDB

Redis vs RDBMS

32
Cloud databases

56. Write a note on GOOGLE APP ENGINE DATA STORE.


• Google App Engine Datastore is a NoSQL database service provided by Google Cloud Platform
(GCP) as part of the Google App Engine platform. It is designed to store and retrieve data for
applications running on Google’s infrastructure.
• The Datastore is a fully managed, schema-less, and scalable database that automatically handles data
sharding and replication.

Key features of Google App Engine Datastore include:

1. Schema-less Data Model: Unlike traditional relational databases, the Datastore is schema-less,
meaning that you do not need to define a fixed structure for your data beforehand. You can add
properties to entities (records) dynamically without modifying a formal schema.
2. Automatic Scaling: The Datastore can automatically scale to handle varying levels of read and write
traffic. Google manages the underlying infrastructure, ensuring that your application can scale
seamlessly as demand increases.
3. High Availability and Durability: Datastore is designed to be highly available and durable. It
replicates data across multiple data centers, and it provides strong consistency for reads and writes.
4. Querying: Datastore supports queries for retrieving data based on specific criteria. However, the
querying capabilities are a bit different from traditional relational databases, as it is optimized for
large-scale, distributed environments.
5. Transactions: The Datastore supports transactions to ensure the consistency of data. This allows you
to perform a series of operations on entities as a single, atomic unit.
6. Indexes: Datastore automatically creates indexes for your queries. You can also define custom
indexes to optimize specific queries.
5. Integration with App Engine: Datastore is tightly integrated with Google App Engine, making it
easy to use for developers building applications on this platform. However, it can also be used
independently of App Engine.

57. Write a note on AMAZON SIMPLEDB.


Amazon SimpleDB is a distributed NoSQL database service provided by Amazon Web Services (AWS).
It is designed to provide a simple and scalable solution for storing and querying structured data.
SimpleDB is part of AWS’s suite of cloud computing services, offering a fully managed, schema-free
database that automatically scales to meet the demands of applications.

Key features of Amazon SimpleDB include:

1. Schema-Free: Like many NoSQL databases, SimpleDB is schema-free, meaning you can add and
remove attributes (fields) on the fly without a predefined schema. This flexibility is useful for
applications with evolving or dynamic data requirements.
2. Data Storage: SimpleDB stores data in domains, which are roughly equivalent to database tables.
Each item within a domain is similar to a record or row in a traditional relational database.
3. Attributes and Values: Data is stored as key-value pairs within items. Each item can have multiple
attributes, and each attribute has a corresponding value. This structure allows for efficient and
flexible data modeling.

33
4. Automatic Scaling: SimpleDB automatically scales in response to changes in data volume and
query traffic. AWS manages the infrastructure, ensuring that the database can handle varying
workloads.
5. Availability and Durability: SimpleDB is designed to be highly available and durable. Data is
automatically replicated across multiple servers and data centers, providing fault tolerance.
6. Query Language: SimpleDB supports a SQL-like query language for retrieving data. Queries are
expressed in a language called Simple Query Language (SQL), and you can use conditions to filter
and sort results.
7. Indexed Data: SimpleDB automatically indexes all attributes, making queries efficient. You can also
specify custom indexing for specific attributes.

58. Explain CAP theorem.


• The three letters in CAP refer to three desirable properties of distributed systems with replicated
data: consistency (among replicated copies), availability (of the system for read and write operations)
and partition tolerance (in the face of the nodes in the system being partitioned by a network fault).
• The CAP theorem states that it is not possible to guarantee all three of the desirable properties –
consistency, availability, and partition tolerance at the same time in a distributed system with data
replication.
• The theorem states that networked shared-data systems can only strongly support two of the
following three properties:

Consistency –
Consistency means that the nodes will have the same copies of a replicated data item visible for various
transactions. A guarantee that every node in a distributed cluster returns the same, most recent and a
successful write. Consistency refers to every client having the same view of the data. There are various
types of consistency models. Consistency in CAP refers to sequential consistency, a very strong form of
consistency.

Availability –
Availability means that each read or write request for a data item will either be processed successfully or
will receive a message that the operation cannot be completed. Every non-failing node returns a response
for all the read and write requests in a reasonable amount of time. The key word here is “every”. In
simple terms, every node (on either side of a network partition) must be able to respond in a reasonable
amount of time.

Partition Tolerance –
Partition tolerance means that the system can continue operating even if the network connecting the
nodes has a fault that results in two or more partitions, where the nodes in each partition can only
communicate among each other. That means, the system continues to function and upholds its
consistency guarantees in spite of network partitions. Network partitions are a fact of life. Distributed
systems guaranteeing partition tolerance can gracefully recover from partitions once the partition heals.

The CAP theorem states that distributed databases can have at most two of the three properties:
consistency, availability, and partition tolerance. As a result, database systems prioritize only two
properties at a time.
CA(Consistency and Availability)-
The system prioritizes availability over consistency and can respond with possibly stale data.
Example databases: Cassandra, CouchDB, Riak, Voldemort.

34
AP(Availability and Partition Tolerance)-
The system prioritizes availability over consistency and can respond with possibly stale data.
The system can be distributed across multiple nodes and is designed to operate reliably even in the face
of network partitions.
Example databases: Amazon DynamoDB, Google Cloud Spanner.

CP(Consistency and Partition Tolerance)-


The system prioritizes consistency over availability and responds with the latest updated data.
The system can be distributed across multiple nodes and is designed to operate reliably even in the face
of network partitions.
Example databases: Apache HBase, MongoDB, Redis.

59. What is Raik? Give its features.


Riak is a distributed database designed to deliver maximum data availability by distributing data
across multiple servers. As long as your Riak client can reach one Riak server, it should be able
to write data.

Riak is used as an eventually consistent system in that the data you want to read should remain
available in most failure scenarios, although it may not be the most up-to-date version of that
data.

Features of Riak:

• Distributed Architecture: Riak is designed to operate in a distributed environment, spreading


data across multiple nodes. This allows it to handle large amounts of data and provide fault
tolerance.
• High Availability: It ensures that data remains accessible even in the event of hardware
failures or network partitions. Riak's replication and partitioning strategies contribute to its
high availability.
• Fault Tolerance: It employs techniques like replication and data partitioning to ensure that
data is preserved and accessible, even if certain nodes in the cluster fail.
• AP Design: Riak follows the CAP theorem, prioritizing availability and partition tolerance
over strict consistency. This makes it suitable for scenarios where availability is crucial, even
in the face of network partitions.
• Key-Value Store: Riak operates as a distributed key-value store, making it easy to store and
retrieve data using keys.
• Conflict Resolution: It provides conflict resolution mechanisms to handle conflicts that may
arise due to concurrent updates to the same data in a distributed system.
• Riak Search: It offers Riak Search, which allows for querying and indexing data stored in
Riak using various search criteria.
• Multi-Datacenter Replication: Riak supports replication of data across multiple data centers,
enabling better disaster recovery and geographical distribution of data.
• HTTP and Protocol Buffers Interfaces: Riak provides HTTP and Protocol Buffers interfaces
for interacting with the database, offering flexibility in integrating it with various
applications and systems.
• Extensibility: It's designed to be extensible, allowing users to add custom functionality
through various hooks and plug-ins.

35
Graph database: Neo4J
60. What are graph databases? Give its structure with examples.

A graph is a pictorial representation of a set of objects where some pairs


of objects are connected by links. It is composed of two elements - nodes
(vertices) and relationships (edges).

Graph database is a database used to model the data in the form of


graph. In here, the nodes of a graph depict the entities while the
relationships depict the association of these nodes.

61. Differentiate between RDBMS and Graph database.

62. How to scale a graph database/ explain database sharding?


Sharding is a way of scaling horizontally. A sharded database architecture splits a
large database into several smaller databases. Each smaller component is called a
shard

Instead of storing all data on a single server, we distribute it across several servers.
This reduces the load on a single resource and instead distributes it equally across
all the servers. This allows us to serve more requests and traffic from the growing
number of customers while maintaining performance.

36
• Sharding can be used in system design interviews to help demonstrate a candidate’s
understanding of scalability and database design. When designing a sharded database, the
following key considerations should be taken into account:
• Data distribution: How the data will be split across the shards, either based on a specific key such
as the user ID or by using a hash function.
• Shard rebalancing: How the data will be balanced across the shards as the amount of data
changes over time.
• Query routing: How queries will be directed to the correct shard, either by using a dedicated
routing layer or by including the shard information in the query.
• Data consistency: How data consistency will be maintained across the shards, for example by
using transaction logs or by employing a distributed database system.
• Failure handling: How the system will handle the failure of one or more shards, including data
recovery and data redistribution.
• Performance: How the sharded database will perform in terms of read and write speed, as well as
overall system performance and scalability.

63. Explain Vertically and Horizontally scaling graph databases.


Vertical scalability involves enhancing the capability of a single server to manage increased operational
demands. This is done by beefing up its resources like memory, processing power, or storage. While it's
relatively straightforward for smaller setups to add more capacity, there are limitations on how much a
single server can handle, especially as you scale up. At higher levels, it becomes more complex and
expensive to upgrade a single server significantly.

Horizontal scalability, on the other hand, focuses on expanding operational capacity by adding more
servers. These servers can range from large, robust ones to smaller, more affordable units. When your
architecture is horizontally scalable, meeting growing operational needs becomes as simple as adding
more identical servers to distribute the load. Big players like Amazon have mastered this technique,
scaling
up their infrastructure during peak times, such as the holiday shopping season, and downsizing
afterward. They've even monetized this spare compute power by renting it out to other businesses.

64. What is Neo4j? what are its advantages.


Neo4j is one of the popular Graph Databases and Cypher Query Language (CQL). Neo4j is written in
Java Language. This tutorial explains the basics of Neo4j, Java with Neo4j, and Spring DATA with
Neo4j. The tutorial is divided into sections such as Neo4j Introduction, Neo4j CQL, Neo4j CQL
Functions, Neo4j Admin, etc.
Neo4j is the world's leading open source Graph Database which is developed using Java technology. It is
highly scalable and schema free (NoSQL).
Advantages of Neo4J:
• Flexible data model − Neo4j provides a flexible simple and yet powerful data model, which can be
easily changed according to the applications and industries.
• Real-time insights − Neo4j provides results based on real-time data.
• High availability − Neo4j is highly available for large enterprise real-time applications with
transactional guarantees.
• Connected and semi structures data − Using Neo4j, you can easily represent connected and semi-
structured data.
• Easy retrieval − Using Neo4j, you can not only represent but also easily retrieve (traverse/navigate)
connected data faster when compared to other databases.

37
• Cypher query language − Neo4j provides a declarative query language to represent the graph
visually, using an ascii-art syntax. The commands of this language are in human readable format and
very easy to learn.
• No joins − Using Neo4j, it does NOT require complex joins to retrieve connected/related data as it is
very easy to retrieve its adjacent node or relationship details without joins or indexes.

65. Give the features of Neo4j.


• Data model (flexible schema) − Neo4j follows a data model named native property graph model.
Here, the graph contains nodes (entities) and these nodes are connected with each other (depicted by
relationships). Nodes and relationships store data in key-value pairs known as properties.
In Neo4j, there is no need to follow a fixed schema. You can add or remove properties as per
requirement. It also provides schema constraints.
• ACID properties − Neo4j supports full ACID (Atomicity, Consistency, Isolation, and Durability)
rules.
• Scalability and reliability − You can scale the database by increasing the number of reads/writes,
and the volume without effecting the query processing speed and data integrity. Neo4j also provides
support for replication for data safety and reliability.
• Cypher Query Language − Neo4j provides a powerful declarative query language known as
Cypher. It uses ASCII-art for depicting graphs. Cypher is easy to learn and can be used to create and
retrieve relations between data without using the complex queries like Joins.
• Built-in web application − Neo4j provides a built-in Neo4j Browser web application. Using this,
you can create and query your graph data.
• Drivers − Neo4j can work with −
o REST API to work with programming languages such as Java, Spring, Scala etc.
o Java Script to work with UI MVC frameworks such as Node JS.
o It supports two kinds of Java API: Cypher API and Native Java API to develop Java
applications. In addition to these, you can also work with other databases such as MongoDB,
Cassandra, etc.
• Indexing − Neo4j supports Indexes by using Apache Lucence.

66. Explain Neo4j Data Model.


• The data model in Neo4j organizes data using the concepts of nodes and relationships. Both
nodes and relationships can have properties, which store the data items associated with nodes
and relationships.
• Nodes can have labels:
o A node can have zero, one, or several labels.
o The nodes that have the same label are grouped into a collection that identifies a subset
of the nodes in the database graph for querying purposes.
• Relationships are directed, each relationship has a start node and end node as well as a
relationship type, which serves a similar role to a node label by identifying similar relationships
that have the same relationship type Properties can be specified via a map pattern, which is
made of one or more “name : value” pairs enclosed in curly brackets.
• Example: (Lname: ‘Sharma’, Fname: ‘Nitin’, Minit : ‘B’).
• The Neo4j graph data model resembles how data is represented in the ER and EER models over
the conventional graph theory.

38
67. Explain Neo4j Graph Database building blocks: -
Nodes, Properties, Relationships, Labels, Data Browser

Node
Node is a fundamental unit of a Graph. It contains properties with key-value pairs as shown in the following
image.

Here, Node Name = "Employee" and it contains a set of properties as key-value pairs.

Properties

Property is a key-value pair to describe Graph Nodes and Relationships.


Key = Value

Where Key is a String and Value may be represented using any Neo4j Data types.

Relationships

Relationships are another major building block of a Graph Database. It connects two nodes as depicted in
the following figure.

Here, Emp and Dept are two different nodes. "WORKS_FOR" is a relationship between Emp and Dept
nodes.
As it denotes, the arrow mark from Emp to Dept, this relationship describes −
Emp WORKS_FOR Dept

Labels

Label associates a common name to a set of nodes or relationships. A node or relationship can contain one
or more labels. We can create new labels to existing nodes or relationships. We can remove the existing
labels from the existing nodes or relationships.

39
68. What is Neo4j CQL and its features?
CQL stands for Cypher Query Language
Neo4j CQL
• Is a query language for Neo4j Graph Database.
• Is a declarative pattern-matching language.
• Follows SQL like syntax.
• Syntax is very simple and in human readable format.

Neo4j CQL Features


o CQL is a query language for Neo4j Graph Database.
o Is a declarative pattern-matching language.
o The syntax of CQL is same like SQL syntax.
o Syntax of CQL is very simple and in human readable format.

69. What are read and write clauses of Neo4j? list and explain.

40
Memecached

70. What is Memcached, components, & its features?


• Memcached is pronounced as mem-cash-dee or mem-cached. It is a free, open-source, high-performance,
distributed memory object caching system. Memcached is used to speed up dynamic web applications by reducing
the database load. Memcached is used by all the major websites having huge data for example, YouTube,
Wikipedia, Twitter etc.
• Memcached is used in memory caching software because it is very easy to install on any Windows or Unix
system. It offers API integration for all the major languages like PHP, Java, C/C++, Python, Ruby, Perl etc.
• It stores the data based on key values for small arbitrary strings or objects including:

o Results of database calls


o API calls
o Page rendering

41
Components of Memcached:
Memcached is made up of 4 main components. These components allow the client and the server to work
together in order to deliver cached data as efficiently as possible:.
1. Client Software: It is used to give a list of available Memcached servers.
2. A Client-based hashing algorithm: It chooses a server based on the key.
3. Server Software: It is used to store values and their keys into an internal hash table.
4. LRU: LRU stands for Least Recently Used. This determines when to throw out old data or reuse
memory.

Features of Memcached
o It is open source.
o It is very scalable; just add boxes with memory to spare.
o Memcached runs as a standalone service. So, if you take your application down, the cached data will
remain in memory as long as the service runs.
o Memcached server is a big hash table.
o It reduces the database load.
o It is very efficient for websites with high database load.
o The cache nodes are very ignorant: which means they have no knowledge about other nodes
participating. This handles the management and configuration of such a system extremely easy.
o It is distributed under BSD (Berkeley Software Distribution) license.
o It is a client server application over UDP or TCP.

71. How does Memcached Work?


• Memcached has four main components and these components are what allow it to store and retrieve data.
Each item is comprised of a key, expiration time, and raw data.
• At a high-level Memcached works as follows:
o The client requests a piece of data then Memcached checks to see if it is stored in cache.
o There may a possibility of two possible outcomes:
o If the data is stored in cache: return the data from Memcached (there is no need to check the
database).
o The data is not stored in cache: query the database, retrieve the data, and subsequently store it
in Memcached.
o Whenever information is modified or the expiry value of an item has expired, Memcached update its
cache to ensure fresh content is delivered to the client.
• This setup has various Memcached servers and many clients. Clients use a hasing algorithm to determine
memcahed storage server for use. This helps to distribute the load.
• And then server computes a second hash of the key in order to determine where it should store the
corresponding value in an internal hash table. Some important things about Memcached architecture are:
o Data is only sent to one server
o Servers don't share data
o Servers keep the values in RAM - if RAM runs out the oldest value is discarded.

42
72. Differentiate between Memcached and Redis.

43

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy