Mongodb Notes
Mongodb Notes
Mongodb Notes
The initial development of MongoDB began in 2007 when the company was building a platform as a
service similar to window azure.
"Window azure is a cloud computing platform and infrastructure, created by Microsoft, to build, deploy
and manage applications and service through a global network."
MongoDB was developed by a New York based organization named 10gen which is now known as
MongoDB Inc. It was initially developed as a PAAS (Platform as a Service). Later in 2009, it is
introduced in the market as an open source database server that was maintained and supported by
MongoDB Inc.
The first ready production of MongoDB has been considered from version 1.4 which was released in
March 2010.
MongoDB2.4.9 was the latest and stable version which was released on January 10, 2014.
What is MongoDb
As a definition, MongoDB is an open source database that uses a document-oriented data model and a
non-structured query language. It is one of the most powerful NoSQL systems and databases around
today
Being a NoSQL tool means it does not use the usual rows and columns that we so much associate with
the relational database management. It is an architecture that is built on collections and documents.
The basic unit of data in this database consists of a set of key-value pairs.
It allows documents to have different fields and structures. This database uses a document storage
format called BSON which is a binary style of JSON style documents.
The data model that MongoDB follows is a highly elastic one that lets you combine and store data of
multivariate types without having to compromise on the powerful indexing options, data access and
validation rules. There is no downtime when you want to dynamically modify the schemas. So what it
means that you can concentrate more on making your data work harder rather than spending more time
on preparing the data for the database
The architecture of MongoDB NoSQL Database
The database: In simple words it can be called as the physical container for data. Each of the databases
has its own set of files on the file system with multiple databases existing on a single MongoDB server.
The Collection: A group of database documents can be called as a collection. The RDBMS equivalent
of the collection is a table. The entire collection exists within a single database. There are no schemas
when it comes to collections. Inside the collection the various documents can have varied fields but
mostly the documents within a collection are meant for the same purpose or serving the same end goal.
The Document: A set of key-value pairs can be designated as a document. Documents are associated
with dynamic schemas. The benefit of having dynamic schemas is that document in a single collection
does not have to have the same structure or fields. Also the common fields in a collection’s document
can have varied types of data.
Replication: It supports Master slave replication. MongoDB uses native application to maintain
multiple copies of data. Preventing database downtime is one of replica set’s feature as it has self-
healing shard.
Multiple Servers: The database can run over multiple servers. Data is duplicated to foolproof the
system in case of hardware failure
Auto-Sharding: This process distributes data across multiple physical partitions called shards. Due
to sharding MongoDB has an automatic load balancing feature
Failure Handling: In MongoDb it’s easy to administer in case of failures. The huge numbers of
replicas give out increased protection and data availability against database downtime like rack
failures, multiple machine failures and data center failures or even network partitions
GridFS: Without complicating your stack any sizes of files can be stored. GridFS feature divides
files into smaller parts and stores them as separate document
Procedures: The combination of MongoDB JavaScript works well as the database uses the language
instead of procedures
Storage engines:
When comparing the two MongoDB default engines, the current and the former, there are four major
differences to note.
Scalability.
o WiredTiger performs better on multicore systems.
o MMAPv1 is not designed to scale with multiple cores; adding CPU cores does not
improve performance by much.
Concurrency.
o WiredTiger performs its locking on the Document level, whereas MMAPv1 performs it
on the Collection level, resulting in superior concurrency for WiredTiger.
Compression.
o WiredTiger supports gzip and snappy (default) compression for indexes and collections;
MMAPv1 does not support compression.
o The size of WiredTiger collections is smaller than MMAPv1, with or without
compression enabled.
o WiredTiger supports index-prefix compression, reducing the size of indexes both on disk
and loaded in-memory.
The enterprise version of MongoDB with WiredTiger includes an option for encryption at
rest.
In short, of course, but only if your workload is suitable for it. For example, MMAPv1 works very well
when you have large documents that you update frequently, but only in a few fields each time. With
WiredTiger in such a situation, you'd see much more I/O utilized in this workload; it might make sense to
use MMAPv1 instead.
Ultimately, WiredTiger performs well in most use-cases, whereas MMAPv1’s design choices make it
suitable in specific, specialized cases.
MongoDB CRUD Operations
Create Operations
Create or insert operations add new documents to a collection. If the collection does not currently exist,
insert operations will create the collection.
db.collection.insertOne()
db.collection.insertMany()
In MongoDB, insert operations target a single collection. All write operations in MongoDB
are atomic on the level of a single document.
Read Operations
Read operations retrieves documents from a collection; i.e. queries a collection for documents. MongoDB
provides the following methods to read documents from a collection:
db.collection.find()
Update Operations
Update operations modify existing documents in a collection. MongoDB provides the following methods
to update documents of a collection:
db.collection.updateOne()
db.collection.updateMany()
db.collection.replaceOne()
In MongoDB, update operations target a single collection. All write operations in MongoDB
are atomic on the level of a single document.
Delete Operations
Delete operations remove documents from a collection. MongoDB provides the following methods to
delete documents of a collection:
db.collection.deleteOne()
db.collection.deleteMany()
In MongoDB, _id field as the primary key for the collection so that each document can be uniquely
identified in the collection. The _id field contains a unique ObjectID value.
By default when inserting documents in the collection, if you don't add a field name with the _id in the
field name, then MongoDB will automatically add an Object id field as shown below
When you query the documents in a collection, you can see the ObjectId for each document in the
collection.
If you want to ensure that MongoDB does not create the _id Field when the collection is created and if
you want to specify your own id as the _id of the collection, then you need to explicitly define this while
creating the collection.
When explicitly creating an id field, it needs to be created with _id in its name.
If the command is executed successfully and now use the find command to display the documents in the
collection, the following Output will be shown
Journaling
MongoDB uses write ahead logging to an on-disk journal to guarantee write operation durability. The
MMAPv1 storage engine also requires the journal in order to provide crash resiliency.
The WiredTiger storage engine does not require journaling to guarantee a consistent state after a crash.
The database will be restored to the last consistent checkpoint during recovery. However, if MongoDB
exits unexpectedly in between checkpoints, journaling is required to recover writes that occurred after the
last checkpoint.
With journaling enabled, if mongod stops unexpectedly, the program can recover everything written to
the journal. MongoDB will re-apply the write operations on restart and maintain a consistent state. By
default, the greatest extent of lost writes, i.e., those not made to the journal, are those made in the last 100
milliseconds, plus the time it takes to perform the actual journal writes.
Disable Journaling
To disable journaling, start mongod with the --nojournal command line option.
You can get commit acknowledgement with the Write Concern and the j option.
With the MMAPv1 storage engine, MongoDB may pre allocate journal files if the mongod process
determines that it is more efficient to pre allocate journal files than create new journal files as needed.
Depending on your filesystem, you might experience a pre allocation lag the first time you start
a mongod instance with journaling enabled. The amount of time required to pre-allocate files might last
several minutes; during this time, you will not be able to connect to the database. This is a one-time pre
allocation and does not occur with future invocations.
To avoid pre allocation lag, you can pre allocate files in the journal directory by copying them from
another instance of mongod.
Pre allocated files do not contain data. It is safe to later remove them. But if you restart mongod with
journaling, mongod will create them again.
EXAMPLE
The following sequence pre allocates journal files for an instance of mongod running on port 27017 with
a database path of /data/db.
For demonstration purposes, the sequence starts by creating a set of journal files in the usual way.
1. Create a temporary directory into which to create a set of journal files:
mkdir ~/tmpDbpath
2. Create a set of journal files by starting a mongod instance that uses the temporary directory.
For example:
3. When you see the following log output, indicating mongod has the files, press CONTROL+C to
stop the mongod instance:
mv ~/tmpDbpath/journal/ data/db/
serverStatus
The serverStatus command returns database status information that is useful for assessing
performance.
Change the Group Commit Interval for MMAPv1
For the MMAPv1 storage engine, you can set the group commit interval using the --
journalCommitInterval command line option. The allowed range is 2 to 300 milliseconds.
Lower values increase the durability of the journal at the expense of disk performance.
On a restart after a crash, MongoDB replays all journal files in the journal directory before the server
becomes available. If MongoDB must replay journal files, mongod notes these events in the log output.