UNIT 4 CAP MONGODB

Download as pdf or txt
Download as pdf or txt
You are on page 1of 23

UNIT 4 Topics covered:

1. CAP THEOREM
2. NoSQL, Types of NOSql databases,
3. MongoDB: Introduction, History of MongoDB, installation and configuration,
key features, core servers and tools, basic commands,
4. Comparison of relational databases to MongoDB, Cassendra, HBASE etc.

Unit 4_DBMS
Topic:The CAP Theorem in DBMS
The CAP theorem, originally introduced as the CAP principle, can be used to
explain some of the competing requirements in a distributed system with
replication. It is a tool used to make system designers aware of the trade-offs
while designing networked shared-data systems.
The three letters in CAP refer to three desirable properties of distributed systems
with replicated data: consistency (among replicated copies), availability (of the
system for read and write operations) and partition tolerance (in the face of the
nodes in the system being partitioned by a network fault). The CAP theorem states
that it is not possible to guarantee all three of the desirable properties –
consistency, availability, and partition tolerance at the same time in a distributed
system with data replication.
The theorem states that networked shared-data systems can only strongly support
two of the following three properties:
1. Consistency – Consistency means that the nodes will have the same copies
of a replicated data item visible for various transactions. A guarantee that
every node in a distributed cluster returns the same, most recent and a
successful write. Consistency refers to every client having the same view
of the data. There are various types of consistency models. Consistency in
CAP refers to sequential consistency, a very strong form of consistency.

2. Availability – Availability means that each read or write request for a data
item will either be processed successfully or will receive a message that
the operation cannot be completed. Every non-failing node returns a
response for all the read and write requests in a reasonable amount of time.
The key word here is “every”. In simple terms, every node (on either side
of a network partition) must be able to respond in a reasonable amount of
time.

DBMS NOTES BY DR. DEEPIKA BHATIA


3. Partition Tolerance – Partition tolerance means that the system can
continue operating even if the network connecting the nodes has a fault that
results in two or more partitions, where the nodes in each partition can only
communicate among each other. That means, the system continues to
function and upholds its consistency guarantees in spite of network
partitions. Network partitions are a fact of life. Distributed systems
guaranteeing partition tolerance can gracefully recover from partitions
once the partition heals.

The use of the word consistency in CAP and its use in ACID do not refer to the
same identical concept. In CAP, the term consistency refers to the consistency of
the values in different copies of the same data item in a replicated distributed
system. In ACID, it refers to the fact that a transaction will not violate the integrity
constraints specified on the database schema.
The CAP theorem states that distributed databases can have at most two of the
three properties: consistency, availability, and partition tolerance. As a result,
database systems prioritize only two properties at a time.
The following figure represents which database systems prioritize specific
properties at a given time:

CAP theorem with databases examples

1. CA(Consistency and Availability)-

DBMS NOTES BY DR. DEEPIKA BHATIA


The system prioritizes availability over consistency and can respond with
possibly stale data. Example databases: Cassandra, CouchDB, Riak, Voldemort.

2. AP(Availability and Partition Tolerance)-


The system prioritizes availability over consistency and can respond with
possibly stale data. The system can be distributed across multiple nodes and is
designed to operate reliably even in the face of network partitions. Example
databases: Amazon DynamoDB, Google Cloud Spanner.
3. CP(Consistency and Partition Tolerance)-The system prioritizes
consistency over availability and responds with the latest updated data.
The system can be distributed across multiple nodes and is designed to
operate reliably even in the face of network partitions.Example databases:
Apache HBase, MongoDB, Redis.

1. NoSQL, Types of NOSql databases


What is NoSQL?
NoSQL Database is a non-relational Data Management System, that does not require a fixed
schema. It avoids joins, and is easy to scale. The major purpose of using a NoSQL database is
for distributed data stores with humongous data storage needs. NoSQL is used for Big data and
real-time web apps. For example, companies like Twitter, Facebook and Google collect
terabytes of user data every single day.

NoSQL database stands for “Not Only SQL” or “Not SQL.” Though a better term would be
“NoREL”, NoSQL caught on. Carl Strozz introduced the NoSQL concept in 1998.

Traditional RDBMS uses SQL syntax to store and retrieve data for further insights. Instead, a
NoSQL database system encompasses a wide range of database technologies that can store
structured, semi-structured, unstructured and polymorphic data. Let’s understand about
NoSQL with a diagram in this NoSQL database tutorial:

DBMS NOTES BY DR. DEEPIKA BHATIA


Why NoSQL?
The concept of NoSQL databases became popular with Internet giants like Google, Facebook,
Amazon, etc. who deal with huge volumes of data. The system response time becomes slow
when you use RDBMS for massive volumes of data.

To resolve this problem, we could “scale up” our systems by upgrading our existing hardware.
This process is expensive.

The alternative for this issue is to distribute database load on multiple hosts whenever the load
increases. This method is known as “scaling out.”

DBMS NOTES BY DR. DEEPIKA BHATIA


NoSQL database is non-relational, so it scales out better than relational databases as they are
designed with web applications in mind.

Brief History of NoSQL Databases

• 1998- Carlo Strozzi use the term NoSQL for his lightweight, open-source relational
database
• 2000- Graph database Neo4j is launched
• 2004- Google BigTable is launched
• 2005- CouchDB is launched
• 2007- The research paper on Amazon Dynamo is released
• 2008- Facebooks open sources the Cassandra project
• 2009- The term NoSQL was reintroduced

Features of NoSQL
Non-relational

• NoSQL databases never follow the relational model


• Never provide tables with flat fixed-column records
• Work with self-contained aggregates or BLOBs
• Doesn’t require object-relational mapping and data normalization
• No complex features like query languages, query planners,referential integrity joins,
ACID

Schema-free

• NoSQL databases are either schema-free or have relaxed schemas


• Do not require any sort of definition of the schema of the data
• Offers heterogeneous structures of data in the same domain

• NoSQL is Schema-Free

DBMS NOTES BY DR. DEEPIKA BHATIA


Simple API

• Offers easy to use interfaces for storage and querying data provided
• APIs allow low-level data manipulation & selection methods
• Text-based protocols mostly used with HTTP REST with JSON
• Mostly used no standard based NoSQL query language
• Web-enabled databases running as internet-facing services

Distributed

• Multiple NoSQL databases can be executed in a distributed fashion


• Offers auto-scaling and fail-over capabilities
• Often ACID concept can be sacrificed for scalability and throughput
• Mostly no synchronous replication between distributed nodes Asynchronous Multi-
Master Replication, peer-to-peer, HDFS Replication
• Only providing eventual consistency
• Shared Nothing Architecture. This enables less coordination and higher distribution.

NoSQL is Shared Nothing.

Types of NoSQL Databases


NoSQL Databases are mainly categorized into four types: Key-value pair, Column-oriented,
Graph-based and Document-oriented. Every category has its unique attributes and limitations.
None of the above-specified database is better to solve all the problems. Users should select
the database based on their product needs.

Types of NoSQL Databases:

• Key-value Pair Based

DBMS NOTES BY DR. DEEPIKA BHATIA


• Column-oriented Graph
• Graphs based
• Document-oriented

Key Value Pair Based


Data is stored in key/value pairs. It is designed in such a way to handle lots of data and heavy
load.

Key-value pair storage databases store data as a hash table where each key is unique, and the
value can be a JSON, BLOB(Binary Large Objects), string, etc.

For example, a key-value pair may contain a key like “Website” associated with a value like
“Guru99”.

It is one of the most basic NoSQL database example. This kind of NoSQL database is used as
a collection, dictionaries, associative arrays, etc. Key value stores help the developer to store
schema-less data. They work best for shopping cart contents.

Redis, Dynamo, Riak are some NoSQL examples of key-value store DataBases. They are all
based on Amazon’s Dynamo paper.

DBMS NOTES BY DR. DEEPIKA BHATIA


Column-based
Column-oriented databases work on columns and are based on BigTable paper by Google.
Every column is treated separately. Values of single column databases are stored contiguously.

Column based NoSQL database


They deliver high performance on aggregation queries like SUM, COUNT, AVG, MIN etc. as
the data is readily available in a column.

Column-based NoSQL databases are widely used to manage data warehouses, business
intelligence, CRM, Library card catalogs,

HBase, Cassandra, HBase, Hypertable are NoSQL query examples of column based database.

Document-Oriented
Document-Oriented NoSQL DB stores and retrieves data as a key value pair but the value part
is stored as a document. The document is stored in JSON or XML formats. The value is
understood by the DB and can be queried.

Relational Vs. Document

DBMS NOTES BY DR. DEEPIKA BHATIA


In this diagram on your left you can see we have rows and columns, and in the right, we have
a document database which has a similar structure to JSON. Now for the relational database,
you have to know what columns you have and so on. However, for a document database, you
have data store like JSON object. You do not require to define which make it flexible.

The document type is mostly used for CMS systems, blogging platforms, real-time analytics &
e-commerce applications. It should not use for complex transactions which require multiple
operations or queries against varying aggregate structures.

Amazon SimpleDB, CouchDB, MongoDB, Riak, Lotus Notes, MongoDB, are popular
Document originated DBMS systems.

Graph-Based
A graph type database stores entities as well the relations amongst those entities. The entity is
stored as a node with the relationship as edges. An edge gives a relationship between nodes.
Every node and edge has a unique identifier.

Compared to a relational database where tables are loosely connected, a Graph database is a
multi-relational in nature. Traversing relationship is fast as they are already captured into the
DB, and there is no need to calculate them.

Graph base database mostly used for social networks, logistics, spatial data.

Neo4J, Infinite Graph, OrientDB, FlockDB are some popular graph -based databases.

Query Mechanism tools for NoSQL


The most common data retrieval mechanism is the REST-based retrieval of a value based on
its key/ID with GET resource

Document store Database offers more difficult queries as they understand the value in a key-
value pair. For example, CouchDB allows defining views with MapReduce

What is the CAP Theorem?

DBMS NOTES BY DR. DEEPIKA BHATIA


CAP theorem is also called brewer’s theorem. It states that is impossible for a distributed data
store to offer more than two out of three guarantees

1. Consistency
2. Availability
3. Partition Tolerance

Consistency:

The data should remain consistent even after the execution of an operation. This means once
data is written, any future read request should contain that data. For example, after updating
the order status, all the clients should be able to see the same data.

Availability:

The database should always be available and responsive. It should not have any downtime.

Partition Tolerance:

Partition Tolerance means that the system should continue to function even if the
communication among the servers is not stable. For example, the servers can be partitioned
into multiple groups which may not communicate with each other. Here, if part of the database
is unavailable, other parts are always unaffected.

Eventual Consistency
The term “eventual consistency” means to have copies of data on multiple machines to get high
availability and scalability. Thus, changes made to any data item on one machine has to be
propagated to other replicas.

Data replication may not be instantaneous as some copies will be updated immediately while
others in due course of time. These copies may be mutually, but in due course of time, they
become consistent. Hence, the name eventual consistency.

BASE: Basically Available, Soft state, Eventual consistency

• Basically, available means DB is available all the time as per CAP theorem
• Soft state means even without an input; the system state may change
• Eventual consistency means that the system will become consistent over time

DBMS NOTES BY DR. DEEPIKA BHATIA


Advantages of NoSQL

• Can be used as Primary or Analytic Data Source


• Big Data Capability
• No Single Point of Failure
• Easy Replication
• No Need for Separate Caching Layer
• It provides fast performance and horizontal scalability.
• Can handle structured, semi-structured, and unstructured data with equal effect
• Object-oriented programming which is easy to use and flexible
• NoSQL databases don’t need a dedicated high-performance server
• Support Key Developer Languages and Platforms
• Simple to implement than using RDBMS
• It can serve as the primary data source for online applications.
• Handles big data which manages data velocity, variety, volume, and complexity
• Excels at distributed database and multi-data center operations
• Eliminates the need for a specific caching layer to store data
• Offers a flexible schema design which can easily be altered without downtime or
service disruption

Disadvantages of NoSQL

• No standardization rules
• Limited query capabilities
• RDBMS databases and tools are comparatively mature
• It does not offer any traditional database capabilities, like consistency when multiple
transactions are performed simultaneously.

DBMS NOTES BY DR. DEEPIKA BHATIA


• When the volume of data increases it is difficult to maintain unique values as keys
become difficult
• Doesn’t work as well with relational data
• The learning curve is stiff for new developers
• Open source options so not so popular for enterprises.

What is MongoDB? Introduction, Architecture, Features &


Example
What is MongoDB?

MongoDB is a document-oriented NoSQL database used for high volume data storage. Instead
of using tables and rows as in the traditional relational databases, MongoDB makes use of
collections and documents. Documents consist of key-value pairs which are the basic unit of
data in MongoDB. Collections contain sets of documents and function which is the equivalent
of relational database tables. MongoDB is a database which came into light around the mid -
2000s.

MongoDB is an open-source document-oriented database that is designed to store a large scale


of data and also allows you to work with that data very efficiently. It is categorized under the
NoSQL (Not only SQL) database because the storage and retrieval of data in the MongoDB
are not in the form of tables.

The MongoDB database is developed and managed by MongoDB.Inc under SSPL(Server Side
Public License) and initially released in February 2009. It also provides official driver support
for all the popular languages like C, C++, C#, and .Net, Go, Java, Node.js, Perl, PHP, Python,
Motor, Ruby, Scala, Swift, Mongoid. So, that you can create an application using any of these
languages. Nowadays there are so many companies that used MongoDB like Facebook, Nokia,
eBay, Adobe, Google, etc. to store their large amount of data.

How it works ?

Now, we will see how actually thing happens behind the scene. As we know that MongoDB is
a database server and the data is stored in these databases. Or in other words, MongoDB
environment gives you a server that you can start and then create multiple databases on it using

DBMS NOTES BY DR. DEEPIKA BHATIA


MongoDB.
Because of its NoSQL database, the data is stored in the collections and documents. Hence the
database, collection, and documents are related to each other as shown below:

• The MongoDB database contains collections just like the MYSQL database contains
tables. You are allowed to create multiple databases and multiple collections.

• Now inside of the collection we have documents. These documents contain the data we
want to store in the MongoDB database and a single collection can contain multiple
documents and you are schema-less means it is not necessary that one document is
similar to another.

• The documents are created using the fields. Fields are key-value pairs in the documents,
it is just like columns in the relation database. The value of the fields can be of any
BSON data types like double, string, boolean, etc.

• The data stored in the MongoDB is in the format of BSON documents. Here, BSON
stands for Binary representation of JSON documents. Or in other words, in the backend,
the MongoDB server converts the JSON data into a binary form that is known as BSON
and this BSON is stored and queried more efficiently.

• In MongoDB documents, you are allowed to store nested data. This nesting of data
allows you to create complex relations between data and store them in the same
document which makes the working and fetching of data extremely efficient as

DBMS NOTES BY DR. DEEPIKA BHATIA


compared to SQL. In SQL, you need to write complex joins to get the data from table
1 and table 2. The maximum size of the BSON document is 16MB.

MongoDB Features

1. Each database contains collections which in turn contains documents. Each document
can be different with a varying number of fields. The size and content of each document
can be different from each other.

2. The document structure is more in line with how developers construct their classes and
objects in their respective programming languages. Developers will often say that their
classes are not rows and columns but have a clear structure with key -value pairs.

3. The rows (or documents as called in MongoDB) doesn’t need to have a schema defined
beforehand. Instead, the fields can be created on the fly.

4. The data model available within MongoDB allows you to represent hierarchical
relationships, to store arrays, and other more complex structures more easily.

5. Scalability – The MongoDB environments are very scalable. Companies across the
world have defined clusters with some of them running 100+ nodes with around
millions of documents within the database

MongoDB Example

The below example shows how a document can be modeled in MongoDB.

1. The _id field is added by MongoDB to uniquely identify the document in the collection.

2. What you can note is that the Order Data (OrderID, Product, and Quantity ) which in
RDBMS will normally be stored in a separate table, while in MongoDB it is actually
stored as an embedded document in the collection itself. This is one of the key
differences in how data is modeled in MongoDB.

DBMS NOTES BY DR. DEEPIKA BHATIA


Key Components of MongoDB Architecture

Below are a few of the common terms used in MongoDB

1. _id – This is a field required in every MongoDB document. The _id field represents a
unique value in the MongoDB document. The _id field is like the document’s primary
key. If you create a new document without an _id field, MongoDB will automatically
create the field. So for example, if we see the example of the above customer table,
Mongo DB will add a 24 digit unique identifier to each document in the collection.

_Id CustomerID CustomerName

563479cc8a8a4246bd27d784 11 Guru99

563479cc7a8a4246bd47d784 22 Trevor Smith

563479cc9a8a4246bd57d784 33 Nicole

2. Collection – This is a grouping of MongoDB documents. A collection is the equivalent


of a table which is created in any other RDMS such as Oracle or MS SQL. A collection
exists within a single database. As seen from the introduction collections don’t enforce
any sort of structure.

3. Cursor – This is a pointer to the result set of a query. Clients can iterate through a
cursor to retrieve results.

4. Database – This is a container for collections like in RDMS wherein it is a container


for tables. Each database gets its own set of files on the file system. A MongoDB server
can store multiple databases.

DBMS NOTES BY DR. DEEPIKA BHATIA


5. Document – A record in a MongoDB collection is basically called a document. The
document, in turn, will consist of field name and values.

6. Field – A name-value pair in a document. A document has zero or more fields. Fields
are analogous to columns in relational databases.The following diagram shows an
example of Fields with Key value pairs. So in the example below CustomerID and 11
is one of the key value pair’s defined in the document.

7. JSON – This is known as JavaScript Object Notation. This is a human-readable, plain


text format for expressing structured data. JSON is currently supported in many
programming languages.

Just a quick note on the key difference between the _id field and a normal collection field. The
_id field is used to uniquely identify the documents in a collection and is automatically added
by MongoDB when the collection is created.

Why Use MongoDB?

Below are the few of the reasons as to why one should start using MongoDB

1. Document-oriented – Since MongoDB is a NoSQL type database, instead of having


data in a relational type format, it stores the data in documents. This makes MongoDB
very flexible and adaptable to real business world situation and requirements.

2. Ad hoc queries – MongoDB supports searching by field, range queries, and regular
expression searches. Queries can be made to return specific fields within documents.

3. Indexing – Indexes can be created to improve the performance of searches within


MongoDB. Any field in a MongoDB document can be indexed.

DBMS NOTES BY DR. DEEPIKA BHATIA


4. Replication – MongoDB can provide high availability with replica sets. A replica set
consists of two or more mongo DB instances. Each replica set member may act in the
role of the primary or secondary replica at any time. The primary replica is the main
server which interacts with the client and performs all the read/write operations. The
Secondary replicas maintain a copy of the data of the primary using built-in replication.
When a primary replica fails, the replica set automatically switches over to the
secondary and then it becomes the primary server.

5. Load balancing – MongoDB uses the concept of sharding to scale horizontally by


splitting data across multiple MongoDB instances. MongoDB can run over multiple
servers, balancing the load and/or duplicating data to keep the system up and running
in case of hardware failure.

Data Modelling in MongoDB

As we have seen from the Introduction section, the data in MongoDB has a flexible schema.
Unlike in SQL databases, where you must have a table’s schema declared before inserting data,
MongoDB’s collections do not enforce document structure. This sort of flexibility is what
makes MongoDB so powerful.

When modeling data in Mongo, keep the following things in mind

1. What are the needs of the application – Look at the business needs of the application
and see what data and the type of data needed for the application. Based on this, ensure
that the structure of the document is decided accordingly.

2. What are data retrieval patterns – If you foresee a heavy query usage then consider the
use of indexes in your data model to improve the efficiency of queries.

3. Are frequent inserts, updates and removals happening in the database? Reconsider the
use of indexes or incorporate sharding if required in your data modeling design to
improve the efficiency of your overall MongoDB environment.

Difference between MongoDB & RDBMS

Below are some of the key term differences between MongoDB and RDBMS

DBMS NOTES BY DR. DEEPIKA BHATIA


RDBMS MongoDB Difference

Table Collection In RDBMS, the table contains the columns and rows which are used to store the data w
this same structure is known as a collection. The collection contains documents w
Fields, which in turn are key-value pairs.

In RDBMS, the row represents a single, implicitly structured data item in a table. In M
Row Document
stored in documents.

Column Field In RDBMS, the column denotes a set of data values. These in MongoDB are know

Joins Embedded In RDBMS, data is sometimes spread across various tables and in order to show a
documents data, a join is sometimes formed across tables to get the data. In MongoDB, the data
a single collection, but separated by using Embedded documents. So there is no
MongoDB.

Apart from the terms differences, a few other differences are shown below

1. Relational databases are known for enforcing data integrity. This is not an explicit
requirement in MongoDB.

2. RDBMS requires that data be normalized first so that it can prevent orphan records and
duplicates Normalizing data then has the requirement of more tables, which will then
result in more table joins, thus requiring more keys and indexes.As databases start to
grow, performance can start becoming an issue. Again this is not an explicit requirement
in MongoDB. MongoDB is flexible and does not need the data to be normalized first.

MONGODB INSTALLATION

Installing and setting up MongoDB

Installing MongoDB Community Edition on Windows

Here are the steps to install MongoDB Community Edition on Windows:

1. Go to the MongoDB download


page: https://www.mongodb.com/try/download/community

DBMS NOTES BY DR. DEEPIKA BHATIA


2. Choose the latest version of MongoDB Community Edition and download the .msi file.

3. Run the .msi file and follow the instructions to install MongoDB on your computer.

4. Add MongoDB to the PATH environment variable by following these steps:

• Open the Control Panel and click on System and Security.

• Click on System.

• Click on Advanced system settings.

• Click on Environment Variables.

• Under System Variables, scroll down and select Path.

• Click on Edit.

• Click on New.

• Add the path to the MongoDB bin folder, which is typically “C:\Program
Files\MongoDB\Server<version>\bin”.

• Click OK to close all the windows.

5. Run the following command to start the MongoDB server:

mongod --dbpath C:\mongodb\data

This will start the MongoDB server and create the “data” folder in the “mongodb” folder in
your C:\ drive.

6. To connect to the MongoDB server, open another command prompt window and run the
following command:

mongo

This will start the MongoDB shell and connect to the MongoDB server running on your local
machine.

Installing MongoDB Community Edition on macOS

1. Go to the MongoDB download


page: https://www.mongodb.com/try/download/community

2. Choose the latest version of MongoDB Community Edition and download the .tgz file.

DBMS NOTES BY DR. DEEPIKA BHATIA


3. Once the download is complete, open the downloaded file to extract it.

4. Rename the extracted folder to “mongodb” and move it to the root directory of your
computer.

5. Add MongoDB to the PATH environment variable by running the following command
in the terminal:

echo 'export PATH="/usr/local/mongodb/bin:$PATH"' >> ~/.bash_profile

6. Run the following command to start the MongoDB server:

mongod --dbpath ~/mongodb/data

This will start the MongoDB server and create the “data” folder in your home directory.

7. To connect to the MongoDB server, open another terminal window and run the following
command:

mongo

This will start the MongoDB shell and connect to the MongoDB server running on your local
machine.

MONGODB BASIC COMMANDS:

For this refer commands from the below link:-

MongoDB Tutorial (w3schools.com)

What is a MongoDB Query? - GeeksforGeeks

Comparison of relational databases to MongoDB, Cassendra,


HBASE etc.

DBMS NOTES BY DR. DEEPIKA BHATIA


DBMS NOTES BY DR. DEEPIKA BHATIA
DBMS NOTES BY DR. DEEPIKA BHATIA
DBMS NOTES BY DR. DEEPIKA BHATIA

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy