UNIT 4 CAP MONGODB
UNIT 4 CAP MONGODB
UNIT 4 CAP MONGODB
1. CAP THEOREM
2. NoSQL, Types of NOSql databases,
3. MongoDB: Introduction, History of MongoDB, installation and configuration,
key features, core servers and tools, basic commands,
4. Comparison of relational databases to MongoDB, Cassendra, HBASE etc.
Unit 4_DBMS
Topic:The CAP Theorem in DBMS
The CAP theorem, originally introduced as the CAP principle, can be used to
explain some of the competing requirements in a distributed system with
replication. It is a tool used to make system designers aware of the trade-offs
while designing networked shared-data systems.
The three letters in CAP refer to three desirable properties of distributed systems
with replicated data: consistency (among replicated copies), availability (of the
system for read and write operations) and partition tolerance (in the face of the
nodes in the system being partitioned by a network fault). The CAP theorem states
that it is not possible to guarantee all three of the desirable properties –
consistency, availability, and partition tolerance at the same time in a distributed
system with data replication.
The theorem states that networked shared-data systems can only strongly support
two of the following three properties:
1. Consistency – Consistency means that the nodes will have the same copies
of a replicated data item visible for various transactions. A guarantee that
every node in a distributed cluster returns the same, most recent and a
successful write. Consistency refers to every client having the same view
of the data. There are various types of consistency models. Consistency in
CAP refers to sequential consistency, a very strong form of consistency.
2. Availability – Availability means that each read or write request for a data
item will either be processed successfully or will receive a message that
the operation cannot be completed. Every non-failing node returns a
response for all the read and write requests in a reasonable amount of time.
The key word here is “every”. In simple terms, every node (on either side
of a network partition) must be able to respond in a reasonable amount of
time.
The use of the word consistency in CAP and its use in ACID do not refer to the
same identical concept. In CAP, the term consistency refers to the consistency of
the values in different copies of the same data item in a replicated distributed
system. In ACID, it refers to the fact that a transaction will not violate the integrity
constraints specified on the database schema.
The CAP theorem states that distributed databases can have at most two of the
three properties: consistency, availability, and partition tolerance. As a result,
database systems prioritize only two properties at a time.
The following figure represents which database systems prioritize specific
properties at a given time:
NoSQL database stands for “Not Only SQL” or “Not SQL.” Though a better term would be
“NoREL”, NoSQL caught on. Carl Strozz introduced the NoSQL concept in 1998.
Traditional RDBMS uses SQL syntax to store and retrieve data for further insights. Instead, a
NoSQL database system encompasses a wide range of database technologies that can store
structured, semi-structured, unstructured and polymorphic data. Let’s understand about
NoSQL with a diagram in this NoSQL database tutorial:
To resolve this problem, we could “scale up” our systems by upgrading our existing hardware.
This process is expensive.
The alternative for this issue is to distribute database load on multiple hosts whenever the load
increases. This method is known as “scaling out.”
• 1998- Carlo Strozzi use the term NoSQL for his lightweight, open-source relational
database
• 2000- Graph database Neo4j is launched
• 2004- Google BigTable is launched
• 2005- CouchDB is launched
• 2007- The research paper on Amazon Dynamo is released
• 2008- Facebooks open sources the Cassandra project
• 2009- The term NoSQL was reintroduced
Features of NoSQL
Non-relational
Schema-free
• NoSQL is Schema-Free
• Offers easy to use interfaces for storage and querying data provided
• APIs allow low-level data manipulation & selection methods
• Text-based protocols mostly used with HTTP REST with JSON
• Mostly used no standard based NoSQL query language
• Web-enabled databases running as internet-facing services
Distributed
Key-value pair storage databases store data as a hash table where each key is unique, and the
value can be a JSON, BLOB(Binary Large Objects), string, etc.
For example, a key-value pair may contain a key like “Website” associated with a value like
“Guru99”.
It is one of the most basic NoSQL database example. This kind of NoSQL database is used as
a collection, dictionaries, associative arrays, etc. Key value stores help the developer to store
schema-less data. They work best for shopping cart contents.
Redis, Dynamo, Riak are some NoSQL examples of key-value store DataBases. They are all
based on Amazon’s Dynamo paper.
Column-based NoSQL databases are widely used to manage data warehouses, business
intelligence, CRM, Library card catalogs,
HBase, Cassandra, HBase, Hypertable are NoSQL query examples of column based database.
Document-Oriented
Document-Oriented NoSQL DB stores and retrieves data as a key value pair but the value part
is stored as a document. The document is stored in JSON or XML formats. The value is
understood by the DB and can be queried.
The document type is mostly used for CMS systems, blogging platforms, real-time analytics &
e-commerce applications. It should not use for complex transactions which require multiple
operations or queries against varying aggregate structures.
Amazon SimpleDB, CouchDB, MongoDB, Riak, Lotus Notes, MongoDB, are popular
Document originated DBMS systems.
Graph-Based
A graph type database stores entities as well the relations amongst those entities. The entity is
stored as a node with the relationship as edges. An edge gives a relationship between nodes.
Every node and edge has a unique identifier.
Compared to a relational database where tables are loosely connected, a Graph database is a
multi-relational in nature. Traversing relationship is fast as they are already captured into the
DB, and there is no need to calculate them.
Graph base database mostly used for social networks, logistics, spatial data.
Neo4J, Infinite Graph, OrientDB, FlockDB are some popular graph -based databases.
Document store Database offers more difficult queries as they understand the value in a key-
value pair. For example, CouchDB allows defining views with MapReduce
1. Consistency
2. Availability
3. Partition Tolerance
Consistency:
The data should remain consistent even after the execution of an operation. This means once
data is written, any future read request should contain that data. For example, after updating
the order status, all the clients should be able to see the same data.
Availability:
The database should always be available and responsive. It should not have any downtime.
Partition Tolerance:
Partition Tolerance means that the system should continue to function even if the
communication among the servers is not stable. For example, the servers can be partitioned
into multiple groups which may not communicate with each other. Here, if part of the database
is unavailable, other parts are always unaffected.
Eventual Consistency
The term “eventual consistency” means to have copies of data on multiple machines to get high
availability and scalability. Thus, changes made to any data item on one machine has to be
propagated to other replicas.
Data replication may not be instantaneous as some copies will be updated immediately while
others in due course of time. These copies may be mutually, but in due course of time, they
become consistent. Hence, the name eventual consistency.
• Basically, available means DB is available all the time as per CAP theorem
• Soft state means even without an input; the system state may change
• Eventual consistency means that the system will become consistent over time
Disadvantages of NoSQL
• No standardization rules
• Limited query capabilities
• RDBMS databases and tools are comparatively mature
• It does not offer any traditional database capabilities, like consistency when multiple
transactions are performed simultaneously.
MongoDB is a document-oriented NoSQL database used for high volume data storage. Instead
of using tables and rows as in the traditional relational databases, MongoDB makes use of
collections and documents. Documents consist of key-value pairs which are the basic unit of
data in MongoDB. Collections contain sets of documents and function which is the equivalent
of relational database tables. MongoDB is a database which came into light around the mid -
2000s.
The MongoDB database is developed and managed by MongoDB.Inc under SSPL(Server Side
Public License) and initially released in February 2009. It also provides official driver support
for all the popular languages like C, C++, C#, and .Net, Go, Java, Node.js, Perl, PHP, Python,
Motor, Ruby, Scala, Swift, Mongoid. So, that you can create an application using any of these
languages. Nowadays there are so many companies that used MongoDB like Facebook, Nokia,
eBay, Adobe, Google, etc. to store their large amount of data.
How it works ?
Now, we will see how actually thing happens behind the scene. As we know that MongoDB is
a database server and the data is stored in these databases. Or in other words, MongoDB
environment gives you a server that you can start and then create multiple databases on it using
• The MongoDB database contains collections just like the MYSQL database contains
tables. You are allowed to create multiple databases and multiple collections.
• Now inside of the collection we have documents. These documents contain the data we
want to store in the MongoDB database and a single collection can contain multiple
documents and you are schema-less means it is not necessary that one document is
similar to another.
• The documents are created using the fields. Fields are key-value pairs in the documents,
it is just like columns in the relation database. The value of the fields can be of any
BSON data types like double, string, boolean, etc.
• The data stored in the MongoDB is in the format of BSON documents. Here, BSON
stands for Binary representation of JSON documents. Or in other words, in the backend,
the MongoDB server converts the JSON data into a binary form that is known as BSON
and this BSON is stored and queried more efficiently.
• In MongoDB documents, you are allowed to store nested data. This nesting of data
allows you to create complex relations between data and store them in the same
document which makes the working and fetching of data extremely efficient as
MongoDB Features
1. Each database contains collections which in turn contains documents. Each document
can be different with a varying number of fields. The size and content of each document
can be different from each other.
2. The document structure is more in line with how developers construct their classes and
objects in their respective programming languages. Developers will often say that their
classes are not rows and columns but have a clear structure with key -value pairs.
3. The rows (or documents as called in MongoDB) doesn’t need to have a schema defined
beforehand. Instead, the fields can be created on the fly.
4. The data model available within MongoDB allows you to represent hierarchical
relationships, to store arrays, and other more complex structures more easily.
5. Scalability – The MongoDB environments are very scalable. Companies across the
world have defined clusters with some of them running 100+ nodes with around
millions of documents within the database
MongoDB Example
1. The _id field is added by MongoDB to uniquely identify the document in the collection.
2. What you can note is that the Order Data (OrderID, Product, and Quantity ) which in
RDBMS will normally be stored in a separate table, while in MongoDB it is actually
stored as an embedded document in the collection itself. This is one of the key
differences in how data is modeled in MongoDB.
1. _id – This is a field required in every MongoDB document. The _id field represents a
unique value in the MongoDB document. The _id field is like the document’s primary
key. If you create a new document without an _id field, MongoDB will automatically
create the field. So for example, if we see the example of the above customer table,
Mongo DB will add a 24 digit unique identifier to each document in the collection.
563479cc8a8a4246bd27d784 11 Guru99
563479cc9a8a4246bd57d784 33 Nicole
3. Cursor – This is a pointer to the result set of a query. Clients can iterate through a
cursor to retrieve results.
6. Field – A name-value pair in a document. A document has zero or more fields. Fields
are analogous to columns in relational databases.The following diagram shows an
example of Fields with Key value pairs. So in the example below CustomerID and 11
is one of the key value pair’s defined in the document.
Just a quick note on the key difference between the _id field and a normal collection field. The
_id field is used to uniquely identify the documents in a collection and is automatically added
by MongoDB when the collection is created.
Below are the few of the reasons as to why one should start using MongoDB
2. Ad hoc queries – MongoDB supports searching by field, range queries, and regular
expression searches. Queries can be made to return specific fields within documents.
As we have seen from the Introduction section, the data in MongoDB has a flexible schema.
Unlike in SQL databases, where you must have a table’s schema declared before inserting data,
MongoDB’s collections do not enforce document structure. This sort of flexibility is what
makes MongoDB so powerful.
1. What are the needs of the application – Look at the business needs of the application
and see what data and the type of data needed for the application. Based on this, ensure
that the structure of the document is decided accordingly.
2. What are data retrieval patterns – If you foresee a heavy query usage then consider the
use of indexes in your data model to improve the efficiency of queries.
3. Are frequent inserts, updates and removals happening in the database? Reconsider the
use of indexes or incorporate sharding if required in your data modeling design to
improve the efficiency of your overall MongoDB environment.
Below are some of the key term differences between MongoDB and RDBMS
Table Collection In RDBMS, the table contains the columns and rows which are used to store the data w
this same structure is known as a collection. The collection contains documents w
Fields, which in turn are key-value pairs.
In RDBMS, the row represents a single, implicitly structured data item in a table. In M
Row Document
stored in documents.
Column Field In RDBMS, the column denotes a set of data values. These in MongoDB are know
Joins Embedded In RDBMS, data is sometimes spread across various tables and in order to show a
documents data, a join is sometimes formed across tables to get the data. In MongoDB, the data
a single collection, but separated by using Embedded documents. So there is no
MongoDB.
Apart from the terms differences, a few other differences are shown below
1. Relational databases are known for enforcing data integrity. This is not an explicit
requirement in MongoDB.
2. RDBMS requires that data be normalized first so that it can prevent orphan records and
duplicates Normalizing data then has the requirement of more tables, which will then
result in more table joins, thus requiring more keys and indexes.As databases start to
grow, performance can start becoming an issue. Again this is not an explicit requirement
in MongoDB. MongoDB is flexible and does not need the data to be normalized first.
MONGODB INSTALLATION
3. Run the .msi file and follow the instructions to install MongoDB on your computer.
• Click on System.
• Click on Edit.
• Click on New.
• Add the path to the MongoDB bin folder, which is typically “C:\Program
Files\MongoDB\Server<version>\bin”.
This will start the MongoDB server and create the “data” folder in the “mongodb” folder in
your C:\ drive.
6. To connect to the MongoDB server, open another command prompt window and run the
following command:
mongo
This will start the MongoDB shell and connect to the MongoDB server running on your local
machine.
2. Choose the latest version of MongoDB Community Edition and download the .tgz file.
4. Rename the extracted folder to “mongodb” and move it to the root directory of your
computer.
5. Add MongoDB to the PATH environment variable by running the following command
in the terminal:
This will start the MongoDB server and create the “data” folder in your home directory.
7. To connect to the MongoDB server, open another terminal window and run the following
command:
mongo
This will start the MongoDB shell and connect to the MongoDB server running on your local
machine.