Nosql Prepared
Nosql Prepared
S SRIDEVI
NoSQL why, what NoSQL why, what and when?
• No standardization rules
• Limited query capabilities
• RDBMS databases and tools are comparatively mature
• It does not offer any traditional database capabilities, like consistency
when multiple transactions are performed simultaneously.
• When the volume of data increases it is difficult to maintain unique values
as keys become difficult
• Doesn't work as well with relational data
• The learning curve is stiff for new developers
• Open source options so not so popular for enterprises.
What is the CAP Theorem?
• CAP theorem is also called brewer's theorem. It states that is impossible for a distributed data store to offer
more than two out of three guarantees
1. Consistency
2. Availability
3. Partition Tolerance
• Consistency:
• The data should remain consistent even after the execution of an operation. This means once data is written,
any future read request should contain that data. For example, after updating the order status, all the clients
should be able to see the same data.
• Availability:
• The database should always be available and responsive. It should not have any downtime.
• Partition Tolerance:
• Partition Tolerance means that the system should continue to function even if the communication among the
servers is not stable. For example, the servers can be partitioned into multiple groups which may not
communicate with each other. Here, if part of the database is unavailable, other parts are always unaffected.
• Eventual Consistency
• The term "eventual consistency" means to have copies of data on multiple
machines to get high availability and scalability. Thus, changes made to any data
item on one machine has to be propagated to other replicas.
• Data replication may not be instantaneous as some copies will be updated
immediately while others in due course of time. These copies may be mutually, but
in due course of time, they become consistent. Hence, the name eventual
consistency.
• BASE: Basically Available, Soft state, Eventual consistency
• Basically, available means DB is available all the time as per CAP theorem
• Soft state means even without an input; the system state may change
• Eventual consistency means that the system will become consistent over time
Types of NoSQL Databases:
• Amazon DynamoDB: Probably the most widely used key-value store database, in
fact, it was the research into DynamoDB that really started making NoSQL really
popular.
• Aerospike: Open-source database that is optimized for in-memory storage.
• Berkeley DB: Another open-source database that is a high-performance database
storage library, although it’s relatively basic.
• Memcached: Helps speed up websites by storing cache data in RAM, plus it’s free
and open-source.
• Riak: Made for developing apps, it works well with other databases and apps.
• Redis: A multi-purpose database that also acts as memory cache and message
broker.
• Refer :
https://www.predictiveanalyticstoday.com/top-sql-key-value-store-databases/
Column store NoSQL database
• Column stores are excellent at compression and therefore are efficient in terms of storage.
This means you can reduce disk resources while holding massive amounts of information in
a single column
• Since a majority of the information is stored in a column, aggregation queries are quite fast,
which is important for projects that require large amounts of queries in a small amount of
time.
• Scalability is excellent with column-store databases. They can be expanded nearly
infinitely, and are often spread across large clusters of machines, even numbering in
thousands. That also means that they are great for Massive Parallel Processing
• Load times are similarly excellent, as you can easily load a billion-row table in a few
seconds. That means you can load and query nearly instantly.
• Large amounts of flexibility as columns do not necessarily have to look like each other. That
means you can add new and different columns without disrupting the whole database. That
being said, entering completely new record queries requires a change to all tables.
• Overall, column-store databases are great for analytics and reporting: fast querying speeds
and abilities to hold large amounts of data without adding a lot of overhead make it ideal.
Disadvantages of Column
Databases
• Designing an indexing schema that’s effective is difficult and time
consuming. Even then, the said schema would still not be as effective
as simple relational database schemas.
• While this may not be an issue for some users, incremental data
loading is suboptimal and should be avoided if possible.
• This goes for all NoSQL database types and not just columnar ones.
Security vulnerabilities in web applications are ever present and the
fact that NoSQL databases lack inbuilt security features doesn’t help.
If security is your number one priority, you should either look into
relational databases you could employ or employ a well-defined
schema if possible.
• Online Transaction Processing (OLTP) applications are also not
compatible with columnar databases due to the way data is stored.
Column Databases
• Use cases
• Developers mainly use column databases in:
• Content management systems
• Blogging platforms
• Systems that maintain counters
• Services that have expiring usage
• Systems that require heavy write requests (like log
aggregators)
Examples of Column Database
• Examples of Columnar Database are
• 'Bigtable, Cassandra, HBase, Vertica, Druid, Accumulo, and
Hypertable
Column Database
Graph Based Data Model
Graph Databases
• Data Model:
• Nodes and Relationships
• Examples:
• Neo4j, OrientDB, InfiniteGraph, AllegroGraph
Graph Databases: Pros and Cons
• Pros:
• Powerful data model, as general as RDBMS
• Connected data locally indexed
• Easy to query
• Cons
• Sharding ( lots of people working on this)
• Scales UP reasonably well
• Requires rewiring your brain
What are graphs good for?
• Recommendations
• Business intelligence
• Social computing
• Geospatial
• Systems management
• Web of things
• Genealogy
• Time series data
• Product catalogue
• Web analytics
• Scientific computing (especially bioinformatics)
• Indexing your slow RDBMS
• And much more!
What is a Graph?
What is a Graph?
• An abstract representation of a set of objects where some pairs are
connected by links.
• Pseudo Graph
• Multi Graph
• Hyper Graph
More Kinds of Graphs
• Weighted Graph
• Labeled Graph
• Property Graph
What is a Graph Database?
• A database with an explicit graph structure
• Each node knows its adjacent nodes
• As the number of nodes increases, the cost of a local step (or hop)
remains the same
• Plus an Index for lookups
Relational Databases
Graph Databases
Neo4j
• Neo4j is the world's leading open source
Graph Database which is developed using Java technology. It
is highly scalable and schema free (NoSQL).
Neo4j Tips
• Each entity table is represented by a label on nodes
• Each row in a entity table is a node
• Columns on those tables become node properties.
• Join tables are transformed into relationships, columns on those
tables become relationship properties
Node in Neo4j
Relationships in Neo4j
• Relationships between nodes are a key part of Neo4j.
Relationships in Neo4j
Twitter and relationships
Properties
• Both nodes and relationships can have properties.
• Properties are key-value pairs where the key is a string.
• Property values can be either a primitive or an
array of one primitive type.
For example String, int and int[] values are valid for properties.
Properties
Paths in Neo4j
• A path is one or more nodes with connecting
relationships, typically retrieved as a query or traversal
result.
Starting and Stopping
Print the data
Remove the data
The Matrix Graph Database
Summary
• NoSQL is a non-relational DMS, that does not require a fixed schema, avoids joins, and is easy to scale
• The concept of NoSQL databases beccame popular with Internet giants like Google, Facebook,
Amazon, etc. who deal with huge volumes of data
• In the year 1998- Carlo Strozzi use the term NoSQL for his lightweight, open-source relational
database
• NoSQL databases never follow the relational model it is either schema-free or has relaxed schemas
• Four types of NoSQL Database are 1).Key-value Pair Based 2).Column-oriented Graph 3). Graphs based
4).Document-oriented
• NOSQL can handle structured, semi-structured, and unstructured data with equal effect
• CAP theorem consists of three words Consistency, Availability, and Partition Tolerance
• BASE stands for Basically Available, Soft state, Eventual consistency
• The term "eventual consistency" means to have copies of data on multiple machines to get high
availability and scalability
• NOSQL offer limited query capabilities