MongoDB Atlas Best Practices White Paper PDF
MongoDB Atlas Best Practices White Paper PDF
MongoDB Atlas Best Practices White Paper PDF
Managing MongoDB 12
Security 16
We Can Help 19
Resources 19
Introduction
MongoDB Atlas provides all of the features of MongoDB, MongoDB Atlas is versatile. It’s great for everything from a
without the operational heavy lifting required for any new quick Proof of Concept, to test/QA environments, to
application. MongoDB Atlas is available on-demand complete production clusters. If you decide you want to
through a pay-as-you-go model and billed on an hourly bring operations back under your control, it is easy to move
basis, letting you focus on what you do best. your databases onto your own infrastructure and manage
them using MongoDB Ops Manager or MongoDB Cloud
It’s easy to get started – use a simple GUI to select the
Manager. The user experience across MongoDB Atlas,
instance size, region, and features you need. MongoDB
Cloud Manager, and Ops Manager is consistent, ensuring
Atlas provides:
that disruption is minimal if you decide to migrate to your
• Security features to protect access to your data own infrastructure.
• Built in replication for always-on availability, tolerating MongoDB Atlas is automated, it’s easy, and it’s from the
complete data center failure creators of MongoDB. Learn more and take it for a spin.
• Backups and point in time recovery to protect against While MongoDB Atlas radically simplifies the operation of
data corruption MongoDB there are still some decisions to take to ensure
• Fine-grained monitoring to let you know when to scale. the best performance and reliability for your application.
Additional instances can be provisioned with the push This paper provides guidance on best practices for
of a button deploying, managing, and optimizing the performance of
your database with MongoDB Atlas.
• Automated patching and one-click upgrades for new
major versions of the database, enabling you to take This guide outlines considerations for achieving
advantage of the latest and greatest MongoDB features performance at scale with MongoDB Atlas across a
• A choice of cloud providers, regions, and billing options number of key dimensions, including instance size
1
selection, application patterns, schema design and single document in MongoDB. For operational applications,
indexing, and disk I/O. While this guide is broad in scope, it the document model makes JOINs redundant in many
is not exhaustive. Following the recommendations in this cases.
guide will provide a solid foundation for ensuring optimal
Where possible, store all data for a record in a single
application performance.
document. When data for a record is stored in a single
For the most detailed information on specific topics, please document the entire record can be retrieved in a single
see the on-line documentation at mongodb.com. Many seek operation, which is very efficient. In some cases it
links are provided throughout this white paper to help may not be practical to store all data in a single document,
guide users to the appropriate resources. or it may negatively impact other operations. Make the
trade-offs that are best for your application. MongoDB
supports ACID compliance at the document level;
Preparing for a MongoDB multi-document ACID transactions are available for users
Collections
Schema Design
Collections are groupings of documents. Typically all
Developers and data architects should work together to
documents in a collection have similar or related purposes
develop the right data model, and they should invest time in
for an application. It may be helpful to think of collections
this exercise early in the project. The requirements of the
as being analogous to tables in a relational database.
application should drive the data model, updates, and
queries of your MongoDB system. Given MongoDB's
dynamic schema, developers and data architects can Dynamic Schema & JSON Schema Validation
continue to iterate on the data model throughout the
MongoDB documents can vary in structure. For example,
development and deployment processes to optimize
documents that describe users might all contain the user id
performance and storage efficiency, as well as support the
and the last date they logged into the system, but only
addition of new application features. All of this can be done
some of these documents might contain the user's
without expensive schema migrations.
shipping address, and perhaps some of those contain
multiple shipping addresses. MongoDB does not require
Document Model that all documents conform to the same structure.
Furthermore, there is no need to declare the structure of
MongoDB stores data as documents in a binary
documents to the system – documents are self-describing.
representation called BSON. The BSON encoding extends
the popular JSON representation to include additional DBAs and developers have the option to define JSON
types such as int, long, date, and decimal128. BSON Schema Validation rules, enabling them to precisely control
documents contain one or more fields, and each field document structures, choose the level of validation
contains a value of a specific data type, including arrays, required, and decide how exceptions are handled. As a
sub-documents and binary data. It may be helpful to think result, DBAs can apply data governance standards, while
of documents as roughly equivalent to rows in a relational developers maintain the benefits of a flexible document
database, and fields as roughly equivalent to columns. model. Validation occurs during updates and inserts. When
However, MongoDB documents tend to have all related validation rules are added to a collection, existing
data for a given record or object in a single document, documents do not undergo validation checks until
whereas in a relational database, that data is usually modification.
normalized across rows in many tables. For example, data
that belongs to a parent-child relationship in two RDBMS
tables can frequently be collapsed (embedded) into a
2
Indexes To maintain predictable levels of database performance
while using transactions, developers should consider the
MongoDB uses B-tree indexes to optimize queries. Indexes
following:
are defined on a collection’s document fields. MongoDB
includes support for many indexes, including compound, • As a best practice, no more than 1,000 documents
geospatial, TTL, text search, sparse, partial, unique, and should be modified within a transaction. For operations
others. For more information see the section on indexing that need to modify more than 1,000 documents,
below. developers should break the transaction into separate
parts that process documents in batches.
There are, however, use cases where transactional ACID You can review all best practices in the documentation for
guarantees need to be applied to a set of operations that transactions.
span multiple documents, most commonly with apps that
Adding transactions does not make MongoDB a relational
deal with the exchange of value between different parties
database – many developers have already experienced that
and require “all-or-nothing” execution. Back office “System
the document model is superior to the relational one. All
of Record” or “Line of Business” (LoB) applications are the
best practices relating to MongoDB data modeling
typical class of workload where multi-document
continue to apply when using multi-document transactions,
transactions can be useful.
or to other relational-type features such as fully expressive
JOINs. Where practical, all data relating to an entity should
3
be stored in a single, rich document structure. Simply modeling the product and customer reviews as a single
moving tabular data normalized for relational tables into document it would be better to model each review or
MongoDB will not allow users to take advantage of groups of reviews as a separate document with a
MongoDB’s natural, fast, and flexible document model, or reference to the product document; while also storing the
its distributed systems architecture. key reviews in the product document for fast access.
For more information on schema design, please see Data In practice most documents are a few kilobytes or less.
Modeling Considerations for MongoDB in the MongoDB Consider documents more like rows in a table than the
Documentation. tables themselves. Rather than maintaining lists of records
in a single document, instead make each record a
document. For large media items, such as video or images,
Application Access Patterns consider using GridFS, a convention implemented by all the
While schema design has a huge influence on drivers that automatically stores the binary data across
performance, how the application accesses the data can many smaller documents.
also have a major impact.
Field names are repeated across documents and consume
Searching on indexed attributes is typically the single most space – RAM in particular. By using smaller field names
important pattern as it avoids collection scans. Taking it a your data will consume less space, which allows for a
step further, using cover
covereded queries avoids the need to larger number of documents to fit in RAM
access the collection data altogether. Covered queries
return results from the indexes directly without accessing GridFS
documents and are therefore very efficient. For a query to
be covered, all the fields included in the query must be For files larger than 16 MB, MongoDB provides a
present in an index, and all the fields returned by the query convention called GridFS, which is implemented by all
must also be present in that index. To determine whether a MongoDB drivers. GridFS automatically divides large data
query is a covered query, use the explain() method. If into 256 KB pieces called chunks and maintains the
the explain() output displays true for the indexOnly metadata for all chunks. GridFS allows for retrieval of
field, the query is covered by an index, and MongoDB individual chunks as well as entire documents. For example,
queries only that index to match the query and return the an application could quickly jump to a specific timestamp in
results. a video. GridFS is frequently used to store large binary files
such as images and videos directly in MongoDB, without
Rather than retrieving the entire document in your offloading them to a separate filesystem.
application, updating fields, then saving the document back
to the database, instead issue the update to specific fields.
This has the advantage of less network usage and reduced Data Lifecycle Management
database overhead. MongoDB provides features to facilitate the management
of data lifecycles, including Time to Live indexes, and
Document Size capped collections.
4
to 3600 seconds for a date field called lastActivity support insert-heavy (or writes which modify indexed
that exists in documents used to track user sessions and values) workloads.
their last interaction with the system. A background thread
MongoDB Atlas surfaces index information for a collection
will automatically check all these documents and delete
in the Data Explorer. The Indexes tab of the Data Explorer
those that have been idle for more than 3600 seconds.
lists the indexes and associated index information — the
Another example use case for TTL is a price quote that
index definition, the size, and the usage frequency — for
should automatically expire after a period of time.
the collection. Alternatively, an $indexStats aggregation
stage can be used to determine how frequently each index
Capped Collections is used.
5
databases. You should test every query in your application For more information, visit the Performance Advisor
using explain(). documentation.
6
MongoDB Atlas allows you to create indexes within the • Low selectivity indexes
indexes: An index should radically
Data Explorer UI in the Indexes tab. Atlas also enables you reduce the set of possible documents to select from.
to build indexes in a rolling fashion and reduce the impact For example, an index on a field that indicates gender is
of building indexes on replica sets and sharded clusters. To not as beneficial as an index on zip code, or even better,
maintain cluster availability during this process, Atlas phone number
removes one node from the cluster at a time starting with a
• Regular expr
expressions
essions: Indexes are ordered by value,
secondary.
hence leading wildcards are inefficient and may result in
full index scans. Trailing wildcards can be efficient if
Index Limitations there are sufficient case-sensitive leading characters in
the expression
As with any database, indexes consume disk space and
memory, so should only be used as necessary. Indexes can • Negation
Negation: Inequality queries can be inefficient with
impact update performance. An update must first locate respect to indexes. Like most database systems,
the data to change, so an index will help in this regard, but MongoDB does not index the absence of values and
index maintenance itself has overhead and this work will negation conditions may require scanning all
reduce update performance. documents. If negation is the only condition and it is not
selective (for example, querying an orders table, where
There are several index limitations that should be observed 99% of the orders are complete, to identify those that
when deploying MongoDB: have not been fulfilled), all records will need to be
scanned.
• A collection cannot have more than 64 indexes
• Eliminate unnecessary indexes
indexes: Indexes are
• Index entries cannot exceed 1024 bytes
resource-intensive: even with they consume RAM, and
• The name of an index must not exceed 125 characters as fields are updated their associated indexes must be
(including its namespace) maintained, incurring additional disk I/O overhead. To
• In-memory sorting of data without an index is limited to understand the effectiveness of the existing indexes
32MB. This operation is very CPU intensive, and being used, use the Indexes tab of the Data Explorer,
in-memory sorts indicate an index should be created to which shows all the indexes for a given collection and
optimize these queries the usage frequency. Alternatively, an $indexStats
aggregation stage can be used to determine how
frequently each index is used. If there are indexes that
Common Mistakes Regarding Indexes are not used then removing them will reduce storage
The following tips may help to avoid some common and speed up writes.
mistakes regarding indexes: • Partial indexes
indexes: If only a subset of documents need to
be included in a given index then the index can be made
• Use a compound index rather than index
partial by specifying a filter expression. e.g., if an index
intersection: For best performance when querying via
on the userID field is only needed for querying open
multiple predicates, compound indexes will generally be
orders then it can be made conditional on the order
a better option
status being set to in progress. In this way, partial
• Compound indexes
indexes: Compound indexes are defined indexes improve query performance while minimizing
and ordered by field. So, if a compound index is defined overheads.
for last name, first name and city, queries that
specify last name or last name and first name will
be able to use this index, but queries that try to search Working Sets
based on city will not be able to benefit from this
MongoDB makes extensive use of RAM to speed up
index. Remove indexes that are prefixes of other database operations. In MongoDB, all data is read and
indexes manipulated through in-memory representations of the
7
data. Reading data from memory is measured in Data Migration
nanoseconds and reading data from disk is measured in
milliseconds, thus reading from memory is orders of Users should assess how best to model their data for their
magnitude faster than reading from disk. applications rather than simply importing the flat file
exports of their legacy systems. In a traditional relational
The set of data and indexes that are accessed during database environment, data tends to be moved between
normal operations is called the working set. It is best systems using delimited flat files such as CSV. While it is
practice that the working set fits in RAM. It may be the possible to ingest data into MongoDB from CSV files, this
case the working set represents a fraction of the entire may in fact only be the first step in a data migration
database, such as in applications where data related to process. It is typically the case that MongoDB's document
recent events or popular products is accessed most data model provides advantages and alternatives that do
commonly. not exist in a relational data model.
When MongoDB attempts to access data that has not There are many options to migrate data from flat files into
been loaded in RAM, it must be read from disk. If there is rich JSON documents, including mongoimport, custom
free memory then the operating system can locate the data scripts, ETL tools and from within an application itself
on disk and load it into memory directly. However, if there is which can read from the existing RDBMS and then write a
no free memory, MongoDB must write some other data JSON version of the document back to MongoDB.
from memory to disk, and then read the requested data
into memory. This process can be time consuming and For importing data from a pre-existing MongoDB system,
significantly slower than accessing data that is already MongoDB Atlas includes a live migration service built into
resident in memory. the GUI. This functionality works with any MongoDB
replica set or sharded cluster running MongoDB 2.6 or
Some operations may inadvertently purge a large higher. Step-by-step tutorials for the Atlas Live Migraton
percentage of the working set from memory, which Service can be found here.
adversely affects performance. For example, a query that
scans all documents in the database, where the database Other tools such as mongodump and mongorestore, or
is larger than available RAM on the server, will cause MongoDB Atlas backups are also useful for moving data
documents to be read into memory and may lead to between different MongoDB systems.
portions of the working set being written out to disk. Other
examples include various maintenance operations such as
MongoDB Atlas Cluster Selection
compacting or repairing a database and rebuilding indexes.
The following recommendations are only intended to
If your database working set size exceeds the available
provide high-level guidance for hardware for a MongoDB
RAM of your system, consider provisioning an instance
deployment. The specific configuration of your hardware
with larger RAM capacity (scaling up) or sharding the
will be dependent on your data, queries, performance SLA,
database across additional instances (scaling out). Scaling
and availability requirements.
is an automated, on-line operation which is launched by
selecting the new configuration after clicking the
CONFIGURE button in MongoDB Atlas. For a discussion Memory
on this topic, refer to the section on Sharding Best
As with most databases, MongoDB performs best when
Practices later in the guide. It is easier to implement
the working set (indexes and most frequently accessed
sharding before the system’s resources are consumed, so
data) fits in RAM. Sufficient RAM is the most important
capacity planning is an important element in successful
factor for instance selection; other optimizations may not
project delivery.
significantly improve the performance of the system if
there is insufficient RAM. When selecting which MongoDB
Atlas instance size to use, opt for one that has sufficient
8
RAM to hold the full working data set (or the appropriate • Low CP
CPUU: Lower ratio of CPU to memory with high
subset if sharding). network performance. Well-suited for memory-intensive,
latency-sensitive workloads.
If your working set exceeds the available RAM, consider
using a larger instance type or adding additional shards to • General
General: A balance of compute, memory, and network
your system. resources. Well-suited for a wide variety of workloads.
• Loc
ocal
al NVMe S SSSD: Local Non-Volatile Memory Express
(NVMe) based SSDs. Well-suited for low latency, very
Storage
high I/O performance, and high sequential read
Using faster storage can increase database performance throughput to large data sets.
and latency consistency. Each node must be configured
with sufficient storage for the full data set, or for the subset
to be stored in a single shard. The storage speed and size Scaling a MongoDB Atlas
can be set when picking the MongoDB Atlas instance
during cluster creation or reconfiguration.
Cluster
Data volumes for customers deploying on AWS, Azure, and
Horizontal Scaling with Sharding
GCP are always encrypted. By default, Atlas automatically
adds storage to your deployment when disk utilization MongoDB Atlas provides horizontal scale-out for
reaches 90%. Auto-scaling for storage can be turned off. databases using a technique called sharding, which is
transparent to applications. MongoDB distributes data
across multiple Replica Sets called shards. With automatic
CPU
balancing, MongoDB ensures data is equally distributed
MongoDB Atlas instances are multi-threaded and can take across shards as data volumes grow or the size of the
advantage of many CPU cores. Specifically, the total cluster increases or decreases. Sharding allows MongoDB
number of active threads (i.e., concurrent operations) deployments to scale beyond the limitations of a single
relative to the number of CPUs can impact performance: server, such as bottlenecks in RAM or disk I/O, without
adding complexity to the application.
• Throughput increases as the number of concurrent
active operations increases up to and beyond the MongoDB Atlas supports multiple sharding policies,
number of CPUs enabling administrators to accommodate diverse query
• Throughput eventually decreases as the number of patterns:
concurrent active operations exceeds the number of
• Range-based shar sharding:
ding: Documents are partitioned
CPUs by some threshold amount
across shards according to the shard key value.
The threshold amount depends on your application. You Documents with shard key values close to one another
can determine the optimum number of concurrent active are likely to be co-located on the same shard. This
operations for your application by experimenting and approach is well suited for applications that need to
measuring throughput. optimize range-based queries.
• Hash-based sharsharding:
ding: Documents are uniformly
The larger MongoDB Atlas instances include more virtual
distributed according to an MD5 hash of the shard key
CPUs and so should be considered for heavily concurrent
value. Documents with shard key values close to one
workloads.
another are unlikely to be co-located on the same
shard. This approach guarantees a uniform distribution
Cluster Class — AWS Only of writes across shards – provided that the shard key
has high cardinality – making it optimal for
MongoDB Atlas offers different classes of clusters for
write-intensive workloads.
M40 and larger deployments on AWS:
9
Users should consider deploying a sharded cluster in the will go to the same shard even if they exhibit high
following situations: cardinality, thereby creating an insert hotspot. Instead,
the key should be evenly distributed.
• RAM Limit
Limitation:
ation: The size of the system's active
working set plus indexes is expected to exceed the 3. Query Isolation: Queries should be targeted to a specific
capacity of the maximum amount of RAM in the system shard to maximize scalability. If queries cannot be
isolated to a specific shard, all shards will be queried in
• Disk II/O
/O Limit
Limitation:
ation: The system will have a large
a pattern called scatter/gather, which is less efficient
amount of write activity, and the operating system will
than querying a single shard.
not be able to write data fast enough to meet demand,
or I/O bandwidth will limit how fast the writes can be 4. Ensure uniform distribution of shard keys: When shard
flushed to disk keys are not uniformly distributed for reads and writes,
operations may be limited by the capacity of a single
• Storage Limit
Limitation:
ation: The data set will grow to exceed shard. When shard keys are uniformly distributed, no
the storage capacity of a single node in the system single shard will limit the capacity of the system.
Applications that meet these criteria, or that are likely to do For more on selecting a shard key, see Considerations for
so in the future, should be designed for sharding in Selecting Shard Keys.
advance rather than waiting until they have consumed
available capacity. For applications that will eventually Avoid sc
scatter-gather
atter-gather queries
queries: In sharded systems,
benefit from sharding, users should consider which queries that cannot be routed to a single shard must be
collections they will want to shard and the corresponding broadcast to multiple shards for evaluation. Because these
shard keys when designing their data models. If a system queries involve multiple shards for each request they do
has already reached or exceeded its capacity, it will be not scale well as more shards are added.
challenging to deploy sharding without impacting the
Use hash-based shar sharding
ding when appr
appropriate
opriate: For
application's performance.
applications that issue range-based queries, range-based
Between 1 and 24 shards can be self-configured in sharding is beneficial because operations can be routed to
MongoDB Atlas GUI; customers interested in more than the fewest shards necessary, usually a single shard.
24 shards should contact MongoDB. However, range-based sharding requires a good
understanding of your data and queries, which in some
cases may not be practical. Hash-based sharding ensures
Sharding Best Practices a uniform distribution of reads and writes, but it does not
Users who choose to shard should consider the following provide efficient range-based operations.
best practices.
Apply best practices for bulk inserts
inserts: Pre-split data into
Select a good shar
shardd key
key: When selecting fields to use as multiple chunks so that no balancing is required during the
a shard key, there are at least three key criteria to consider: insert process. For more information see Create Chunks in
a Sharded Cluster in the MongoDB Documentation.
1. Cardinality: Data partitioning is managed in 64 MB
chunks by default. Low cardinality (e.g., a user's home Add ccapacity
apacity befor
before
e it is needed
needed: Cluster maintenance
country) will tend to group documents together on a is lower risk and more simple to manage if capacity is
small number of shards, which in turn will require added before the system is over utilized.
frequent rebalancing of the chunks; a single country is
also likely to exceed the 64 MB chunk size. Instead, a
shard key should exhibit high cardinality. Global Clusters
2. Insert Scaling: Writes should be evenly distributed
across all shards based on the shard key. If the shard Atlas Global Clusters allow organizations with distributed
key is monotonically increasing, for example, all inserts applications to easily geo-partition a single fully managed
10
database to provide low latency reads and writes anywhere automated, eliminating the need to manually intervene to
in the world. Data can be associated and directed to nodes recover nodes in the event of a failure.
in specified cloud regions, keeping it in close proximity to
A replica set consists of multiple replica nodes. At any
nearby application servers and local users, and helping with
given time, one member acts as the primary replica and the
compliance to data regulations like the General Data
other members act as secondary replicas. If the primary
Protection Regulation (GDPR). Global Clusters also allow
member fails for any reason (e.g., a failure of the host
organizations to easily replicate data worldwide for
system), one of the secondary members is automatically
multi-region fault tolerance and providing fast, responsive
elected to primary and begins to accept all writes; this is
access to any data set, anywhere.
typically completed in 2 seconds or less and reads can
You can set up global clusters — available on Amazon Web optionally continue on the secondaries.
Services, Microsoft Azure, and Google Cloud Platform —
Sophisticated algorithms control the election process,
with just a few clicks in the MongoDB Atlas UI. MongoDB
ensuring only the most suitable secondary member is
Atlas takes care of the deployment and management of
promoted to primary, and reducing the risk of unnecessary
infrastructure and database resources required to ensure
failovers (also known as "false positives"). The election
that data is written to and read from different regions. For
algorithm processes a range of parameters including
example, you may have an accounts collection that you
connectivity status and analysis of histories to identify
want to distribute among your three regions of business,
those replica set members that have applied the most
North America, Europe, and Asia. Atlas will recommend a
recent updates from the primary.
templated configuration of zones, each with unique region
mappings, to help reduce latency for distributed application A larger number of replica nodes provides increased
servers. You also have the option of creating your own protection against database downtime in case of multiple
zones by choosing cloud regions in an easy to use, visual machine failures.
interface. Atlas will then automatically deploy your
configuration and shard the accounts collection, ensuring MongoDB Atlas replica sets have a minimum of 3 nodes
that data tagged for a specific zone is appropriately routed. and can be configured to have a maximum of 50 nodes.
When performing reads and writes, you just need to include Atlas automatically distributes the nodes within a replica
a combination of the country code and the account set across the availability zones in the region you choose.
identifier in the query, and Atlas will ensure all operations The definition of an availability zone varies slightly between
are directed to the right shard in the right data zone. Zones AWS, Azure, and GCP but the general concept is similar.
can also contain different numbers of shards. For example, For example, on AWS, each availability zone is isolated and
if there are more application users in North America, you connected with low-latency links.
can provision more shards in that region and scale them on
Atlas also enables you to deploy across multiple regions
demand.
(within a single cloud provider) for better availability
For more information, visit the Global Clusters guarantees. Replica set members in additional regions will
documentation. participate in the election and automated failover process
should your primary fail. Note that you can also select your
preferred region; this is the region where reads and writes
Continuous Availability & Data will default to assuming that there are no active failure or
failover conditions.
Consistency
More information on replica sets can be found on the
Replication MongoDB documentation page.
Data Redundancy
MongoDB Atlas maintains multiple copies of data, called
replicas, using native replication. Replica failover is fully
11
Write Guarantees deployment. You may choose to read from secondaries if
your application can tolerate eventual consistency.
MongoDB allows administrators to specify the level of
persistence guarantee when issuing writes to the MongoDB Atlas allows you to easily deploy read-only
database, which is called the write concern. The following replica set members in different geographic regions using
options can be selected in the application code: the Atlas UI or API. Read-only replica set members allow
you to optimize for local reads (reduce read latency) across
• Write Ac
Acknowledged:
knowledged: This is the default write concern. different geographic regions using a single MongoDB
The mongod will confirm the execution of the write deployment. These replica set members will not participate
operation, allowing the client to catch network, duplicate in the election and failover process and can never be
key, Document Validation, and other exceptions elected to a primary replica set member.
• Replic
Replica a Ac
Acknowledged:
knowledged: It is also possible to wait for
A very useful option is primaryPreferred, which issues
acknowledgment of writes to other replica set members.
reads to a secondary replica only if the primary is
MongoDB supports writing to a specific number of
unavailable. This configuration allows for the continuous
replicas. This mode also ensures that the write is written
availability of reads during the short failover process.
to the journal on the secondaries. Because replicas can
be deployed across racks within data centers and For more on the subject of configurable reads, see the
across multiple data centers, ensuring writes propagate MongoDB Documentation page on replica set Read
to additional replicas can provide extremely robust Preference.
durability
12
Organizations and Projects The database nodes will automatically be kept up date with
the latest stable MongoDB and underlying operating
Atlas provides a hierarchy based on organizations and system software versions; rolling upgrades ensure that
projects to facilitate the management of your clusters. your applications are not impacted during upgrades. Atlas
Organizations can contain multiple projects and projects allows you to set a preferred window for automated
can contain multiple database clusters. Billing happens at maintenance and defer maintenance when needed, with
the organization level while preserving visibility into usage the exception of critical security patches.
in each project.
User must be a member of an organization to access the Monitoring & Capacity Planning
organization or a project within that organization.
Depending on the user’s role in the organization, the Atlas System performance and capacity planning are two
user may be also required to be a member of the project in important topics that should be addressed as part of any
order to access a project. At the organization level, you can MongoDB deployment. Part of your planning should involve
group users into teams. You can use teams to bulk assign establishing baselines on data volume, system load,
organization users to projects within the organization. performance, and system capacity utilization. These
baselines should reflect the workloads you expect the
This hierarchy allows you to: system to perform in production, and they should be
• Isolate different environments (for instance, revisited periodically as the number of users, application
development/qa/prod environments) from each other. features, performance SLA, or other factors change.
• Associate different users or teams with different Featuring charts and automated alerting, MongoDB Atlas
environments, or give different permissions to users in tracks key database and system health metrics including
different environments. disk free space, operations counters, memory and CPU
utilization, replication status, open connections, queues,
• Maintain separate cluster security configurations.
and node status.
• Create different alert settings. For example, configure
alerts for Production environments differently than Historic performance can be reviewed in order to create
Development environments. operational baselines and to support capacity planning.
Integration with existing monitoring tools is also
• Deploy into different regions or cloud platforms.
straightforward via the MongoDB Atlas RESTful API,
making the deep insights from MongoDB Atlas part of a
Deployments and Upgrades consolidated view across your operations.
All the user needs to do in order for MongoDB Atlas to Atlas provides a real-time performance panel that allows
automatically deploy the cluster is to select a handful of you to see what’s happening in your cluster live and
options: diagnose issues quickly. This view displays operations,
reads/writes, network in/out, memory and highlights the
• Instance size
hottest collections and slowest operations.
• Storage size (optional)
MongoDB Atlas allows administrators to set custom alerts
• Storage speed (optional) when key metrics are out of range. Alerts can be
• Number of replicas in the replica set (optional) configured for a range of parameters affecting individual
hosts and replica sets. Alerts can be sent via email,
• Number of shards (optional)
webhooks, Flowdock, HipChat, and Slack or integrated into
• Automated backups existing incident management systems such as PagerDuty,
Most of the automated deployment and configuration steps VictorOps, OpsGenie, and Datadog.
available in the Atlas UI can also be triggered with the
Atlas API.
13
When it's time to scale, just hit the CONFIGURATION CPU
button in the MongoDB Atlas GUI and choose the required
A variety of issues could trigger high CPU utilization. This
instance size and number of shards – the automated,
may be normal under most circumstances, but if high CPU
on-line scaling will then be performed.
utilization is observed without other issues such as disk
saturation or pagefaults, there may be an unusual issue
Things to Monitor in the system. For example, a MapReduce job with an
infinite loop, or a query that sorts and filters a large number
MongoDB Atlas monitors database-specific metrics,
of documents from the working set without good index
including page faults, ops counters, queues, connections
coverage, might cause a spike in CPU without triggering
and replica set status. Alerts can be configured against
issues in the disk system or pagefaults.
each monitored metric to proactively warn administrators of
potential issues before users experience a problem. The
MongoDB Atlas team are also monitoring the underlying Connections
infrastructure, ensuring that it is always in a healthy state.
MongoDB drivers implement connection pooling to
facilitate efficient use of resources. Each connection
Application Logs And Database Logs consumes 1MB of RAM, so be careful to monitor the total
number of connections so they do not overwhelm RAM
Application and database logs should be monitored for
and reduce the available memory for the working set. This
errors and other system information. It is important to
typically happens when client applications do not properly
correlate your application and database logs in order to
close their connections, or with Java in particular, that relies
determine whether activity in the application is ultimately
on garbage collection to close the connections.
responsible for other issues in the system. For example, a
spike in user writes may increase the volume of writes to
MongoDB, which in turn may overwhelm the underlying Op Counters
storage system. Without the correlation of application and
The utilization baselines for your application will help you
database logs, it might take more time than necessary to
determine a normal count of operations. If these counts
establish that the application is responsible for the
start to substantially deviate from your baselines it may be
increase in writes rather than some process running in
an indicator that something has changed in the application,
MongoDB.
or that a malicious attack is underway.
Page Faults
Queues
When a working set ceases to fit in memory, or other
If MongoDB is unable to complete all requests in a timely
operations have moved working set data out of memory,
fashion, requests will begin to queue up. A healthy
the volume of page faults may spike in your MongoDB
deployment will exhibit very short queues. If metrics start to
system.
deviate from baseline performance, requests from
applications will start to queue. The queue is therefore a
Disk good first place to look to determine if there are issues that
will affect user experience.
Beyond memory, disk I/O is also a key performance
consideration for a MongoDB system because writes are
journaled and regularly flushed to disk. Under heavy write Shard Balancing
load the underlying disk subsystem may become
One of the goals of sharding is to uniformly distribute data
overwhelmed, or other processes could be contending with
across multiple servers. If the utilization of server resources
MongoDB, or the storage speed chosen may be
is not approximately equal across servers there may be an
inadequate for the volume of writes.
underlying issue that is problematic for the deployment. For
14
example, a poorly selected shard key can result in uneven backup is only moments behind, minimizing exposure to
data distribution. In this case, most if not all of the queries data loss.
will be directed to the single mongod that is managing the
In additional, MongoDB Atlas includes queryable backups,
data. Furthermore, MongoDB may be attempting to
which allows you to perform queries against existing
redistribute the documents to achieve a more ideal balance
snapshots to more easily restore data at the document/
across the servers. While redistribution will eventually result
object level. Queryable backups allow you to accomplish
in a more desirable distribution of documents, there is
the following with less time and effort:
substantial work associated with rebalancing the data and
this activity itself may interfere with achieving the desired • Restore a subset of objects/documents within the
performance SLA. MongoDB cluster
If in the course of a deployment it is determined that a new • Identify whether data has been changed in an
shard key should be used, it will be necessary to reload the undesirable way by looking at previous versions
data with a new shard key because designation and values alongside current data
of the shard keys are immutable. To support the use of a • Identify the best point in time to restore a system by
new shard key, it is possible to write a script that reads comparing data from multiple snapshots
each document, updates the shard key, and writes it back
to the database.
mongodump
15
their complete IT infrastructure from a single management • Encryption. Protect data in motion over the network
UI. Issues that risk affecting customer experience can be and at rest in persistent storage
quickly identified and isolated to specific components –
To ensure a secure system right out of the box,
whether attributable to devices, hardware infrastructure,
authentication and IP Address whitelisting are
networks, APIs, application code, databases and, more.
automatically enabled.
The MongoDB drivers include an API that exposes query
Review the security section of the MongoDB Atlas
performance metrics to APM tools. Administrators can
documentation to learn more about each of the security
monitor time spent on each operation, and identify slow
features discussed below.
running queries that require further analysis and
optimization.
16
The application must pass the redaction logic to the Read-Only, Redacted Views
database on each request. It therefore relies on trusted
middleware running in the application to ensure the DBAs can define non-materialized views that expose only a
redaction pipeline stage is appended to any query that subset of data from an underlying collection, i.e. a view that
requires the redaction logic. filters out specific fields. DBAs can define a view of a
collection that's generated from an aggregation over
another collection(s) or view.
Encryption
Views are defined using the standard MongoDB Query
MongoDB Atlas provides encryption of data in flight over Language and aggregation pipeline. They allow the
the network and at rest on disk. inclusion or exclusion of fields, masking of field values,
filtering, schema transformation, grouping, sorting, limiting,
Support for SSL/TLS allows clients to connect to
and joining of data using $lookup and $graphLookup to
MongoDB over an encrypted channel. Clients are defined
another collection.
as any entity capable of connecting to MongoDB Atlas,
including: You can learn more about MongoDB read-only views from
the documentation.
• Users and administrators
• Applications
Atlas provides encryption of data at rest with encrypted Business Intelligence (BI) and analytics provides an
storage volumes. essential set of technologies and processes that
organizations have relied upon over many years to inform
Optionally, Atlas users can configure an additional layer of operational insight and guide strategic business decisions.
encryption on their data at rest using the MongoDB MongoDB’s flexible data model and dynamic schema allow
Encrypted Storage Engine and their Atlas-compatible key you to store data in rich, multi-dimensional documents to
management service. Currently, Atlas integrates with AWS quickly build and evolve your apps. However, your business
Key Management Service and Azure Key Vault. intelligence platforms expect fixed schemas and tabular
data. To support these platforms, organizations often rely
on developers to write and maintain complex extract,
Granular Database Auditing
transform, and load (ETL) jobs. ETLs move data between
Atlas allows you to answer detailed questions about disparate systems and manipulate its structure, but as new
system activity by tracking DDL, DML, and DCL data is introduced, ETLs must be updated for compatibility.
commands. You can easily select the actions you want If updates do not happen in a timely fashion, teams are not
audited as well as the MongoDB users, Atlas roles, or able to access their data to derive value, and in the worst
LDAP groups you want to audit from the Atlas UI. case, these jobs can break, causing downtime.
Alternatively, create an audit filter as a JSON string. The
The MongoDB BI Connector for Atlas lets you use your
auditing configuration applies to all dedicated clusters
MongoDB Atlas cluster as a data source for SQL-based BI
within an Atlas project, and audit logs can be downloaded
and analytics platforms. Using the BI Connector for Atlas is
in the UI or retrieved using the MongoDB Atlas API.
a recommended best practice when:
17
• You don't have the time or bandwidth to worry about against RAM. MongoDB must first fetch the working set
provisioning, patching, and failover of BI middleware into RAM, so prime the system with representative
queries for several minutes before running the tests to
The BI Connector for Atlas allows you to connect to your
get an accurate sense for how MongoDB will perform in
BI platform of choice such as Tableau, Qlik, Spotfire,
production.
Cognos, SAP Business Objects, Microstrategy, etc. You can
then use these tool to seamlessly create visualizations and • Monitor everything to loc locate
ate your b
bottlenec
ottlenecks
ks: It is
dashboards that will help you extract the insights and important to understand the bottleneck for a
hidden value in your multi-structured data. benchmark. Depending on many factors any component
of the overall system could be the limiting factor. A
You can learn more about the MongoDB BI Connector for variety of popular tools can be used with MongoDB –
Atlas from our documentation. many are listed in the manual.
18
Mobile and the backend database. MongoDB Mobile MongoDB Training helps you become a MongoDB expert,
allows mobile developers to use the full power of from design to operating mission-critical systems at scale.
MongoDB locally. Stitch Mobile Sync ensures that data Whether you're a developer, DBA, or architect, we can
is kept up to date across phones and all other clients in make you better at MongoDB.
real time.
Resources
We Can Help
For more information, please visit mongodb.com or contact
We are the MongoDB experts. Over 8,300 organizations us at sales@mongodb.com.
rely on our commercial products. We offer software and
Case Studies (mongodb.com/customers)
services to make your life easier:
Presentations (mongodb.com/presentations)
MongoDB Enterprise Advanced is the best way to run Free Online Training (university.mongodb.com)
MongoDB in your data center. It's a finely-tuned package Webinars and Events (mongodb.com/events)
of advanced software, support, certifications, and other Documentation (docs.mongodb.com)
services designed for the way you do business. MongoDB Enterprise Download (mongodb.com/download)
MongoDB Atlas database as a service for MongoDB
MongoDB Atlas is a database as a service for MongoDB,
(mongodb.com/cloud)
letting you focus on apps instead of ops. With MongoDB
MongoDB Stitch Serverless Platform (mongodb.com/
Atlas, you only pay for what you use with a convenient
cloud/stitch)
hourly billing model. With the click of a button, you can
MongoDB Mobile (mongodb.com/products/mobile)
scale up and down when you need to, with no downtime,
full security, and high performance.
19