Futureinternet 15 00010
Futureinternet 15 00010
Futureinternet 15 00010
Article
NewSQL Databases Assessment: CockroachDB, MariaDB
Xpand, and VoltDB
Eduardo Pina 1 , Filipe Sá 1 and Jorge Bernardino 1,2, *
Abstract: Background: Relational databases have been a prevalent technology for decades, using
SQL (Structured Query Language) to manage data. However, the emergence of new technologies,
such as the web and the cloud, has brought the requirement to handle more complex data. NewSQL
is the latest technology that incorporates the ability to scale and ensures the availability of NoSQL
(Not Only SQL) without losing the ACID properties (Atomicity, Consistency, Isolation, Durability)
associated with relational databases. Methods: We evaluated CockroachDB, MariaDB Xpand, and
VoltDB with OSSpal methodology and experimentally using the Star Schema Benchmark (SSB). The
scalability and performance capabilities of each database were assessed. Results: Applying the
OSSpal methodology, the results showed that MariaDB Xpand outperformed CockroachDB and
VoltDB. On the other hand, we concluded that with Star Schema Benchmark, CockroachDB had better
scalability, while VoltDB had a faster query execution time. Conclusions: CockroachDB and VoltDB
are the best performing databases in terms of scalability and performance.
1. Introduction
Today’s business world is accelerating, where advances in web technology and the
Citation: Pina, E.; Sá, F.; Bernardino, creation of Internet-connected sensors and mobile devices have resulted in huge datasets
J. NewSQL Databases Assessment: that need to be processed and stored like never before. Relational Database Management
CockroachDB, MariaDB Xpand, and Systems, also known as RDBMS, are one of the most successful technologies in computing,
VoltDB. Future Internet 2023, 15, 10. being the default choice for model adoption in business worldwide and its SQL standard
https://doi.org/10.3390/fi15010010 language [1–3]. A DBMS consists of a collection of programs and services that allow the
Academic Editor: Wolf-Tilo Balke
user to insert, update, delete, and query the stored data, making it easier to maintain and
access the data [4]. With the huge volume and complex evolution of data, businesses have
Received: 11 November 2022 started to look for alternatives that allow them to work with high volume, velocity, and
Revised: 19 December 2022 variety. SQL’s increased capacity allows the management of huge amounts of data but is
Accepted: 20 December 2022 not sufficient for the requirements of Big Data, which demands rapid response and high
Published: 26 December 2022
scalability. With such challenges, Not Only SQL (NoSQL) offers resources such as flexibility
and horizontal scalability [5], but despite supporting the capability of highly available data
volume technologies, NoSQL systems do not ensure ACID properties [6].
Copyright: © 2022 by the authors.
NewSQL emerged as a set of innovative SQL database engines with high perfor-
Licensee MDPI, Basel, Switzerland. mance and scalability. These engines seek to promote the same performance and scalability
This article is an open access article improvement of NoSQL systems and design solutions that have the advantages of the rela-
distributed under the terms and tional model, and the benefit of using SQL language and fulfilling the ACID properties [7].
conditions of the Creative Commons In 2011, Matthew Aslett first used the term NewSQL in a business analysis report [8],
Attribution (CC BY) license (https:// which discussed the advent of new database systems. NewSQL systems come in differ-
creativecommons.org/licenses/by/ ent types, targeting different workloads and practices, opening up new opportunities
4.0/). in business, where real-time decisions are critical. Among several use cases, we have
streaming data, either from cameras or sensors, organizing a team and assigning roles and
responsibilities, risk monitoring forecast, financial markets, Internet of Things, etc. There is
no doubt that without NewSQL, these systems could only be developed using multiple
systems at too high a price, and some of these applications would not perform as fast [8].
Moreover, NewSQL databases are designed to meet distributed architectures and scalability
requirements, improving performance in such a way that horizontal scalability is a reality
and incorporating new storage mechanisms [9].
This paper assesses three of the most popular NewSQL databases according to DB-
Engines Ranking 2022 [10]: CockroachDB [11], MariaDB Xpand [12], and VoltDB [13]. The
main purpose of this work is to decide which one has the best functionalities based on the
OSSpal methodology and experimentally using the Star Schema Benchmark (SSB) to test
its scalability and performance.
The open source software assessment methodology, OSSpal, came out as a successor
of the Business Readiness Rating (OpenBRR), allowing users to make an evaluation with
decisive questions divided into categories [14]. OSSpal methodology combines quantitative
and qualitative software evaluation measures, which results in a numerical value that
permits the comparison between applications. Some categories, which accommodate
common characteristics, will be used to evaluate the open source and commercial solutions.
OSSpal methodology has been used by the scientific community in several scientific papers
and dissertations [15–19]. However, to the best of our knowledge, this is one of the first
studies to apply the OSSpal methodology to NewSQL databases.
Star Schema Benchmark (SSB) is a benchmark designed as an alternative to TPC-H
Benchmark. It is based on the TPC-H benchmark model with improvements that implement
a traditional star schema. Its goal is to measure the performance of database products and
to test a star schema strategy [20].
In the experiments using OSSpal methodology, we conclude that MariaDB has a
higher score than CockroachDB and VoltDB. Each NewSQL database was evaluated using a
collection of functionalities retrieved from the DB-Engines Ranking 2022, and other common
characteristics that allowed us to determine which database is the best. In the experimental
evaluation, using the Star Schema Benchmark, CockroachDB presented improved results
in scalability, while VoltDB had better results in query execution time.
The main contributions of this work are the following:
• Revealing the strengths and weaknesses of NewSQL databases;
• Best database according to the evaluation of the OSSpal methodology;
• Experimental evaluation of NewSQL databases using a standard benchmark;
• Best NewSQL database regarding performance and scalability;
• Limitations in the practical use of NewSQL Databases.
The rest of this paper is structured as follows. Section 2 presents related work. Section 3
describes the use of the OSSpal methodology and Star Schema Benchmark (SSB). Section 4
analyzes the three popular NewSQL databases, their advantages, and their limitations.
Section 5 presents the evaluation based on OSSpal methodology and Star Schema Bench-
mark. Section 6 discusses the results of the experiments. Finally, Section 7 presents the
conclusions and future work.
2. Related Work
There are not many works related to the evaluation of NewSQL databases and their
performance. This section presents some of the related works that evaluate NewSQL
database performance.
In Reference [21], the authors compared three NewSQL databases, VoltDB, MemSQL,
and NuoDB, using the YCSB benchmark. Comparing the results, MemSQL had better
results in most workloads, followed by VoltDB. NuoDB had some problems with data
ingestion, having worse results.
Another group of authors in Reference [22] performed experiments with four NewSQL
databases, VoltDB, NuoDB, CockroachDB, and MemSQL. They applied a set of operations
Future Internet 2023, 15, 10 3 of 23
to evaluate read, write, update latency, and query execution time among the databases.
NuoDB was the database with the best performance, followed by MemSQL, VoltDB, and
CockroachDB being the database with the highest latencies among the four databases.
In Reference [23], the authors performed a benchmarking test against MySQL and
Spanner, a NewSQL database. The authors used datasets provided by the Autonomous
City of Buenos Aires. The results proved that when measuring its behavior by increasing
the number of records and query complexity, Spanner performed better than MySQL.
In Reference [24], the authors performed an evaluation of two NewSQL databases,
MemSQL and VoltDB, using TPC-H benchmark queries with 1 GB of data. During the
evaluation, MemSQL performed better than VoltDB.
In Reference [25], the authors compared CockroachDB, MemSQL, NuoDB, and VoltDB
with two benchmarks, YCSB and Voter. Comparing the results with both benchmarks,
MemSQL performed better, having a higher throughput and lower latency. VoltDB and
NuoDB performed similarly, despite NuoDB restrictions with the open source version.
CockroachDB presented the worst results, with a lower number of transactions per second.
Another group of authors in Reference [26] performed a benchmarking test with
weather data collected from Romanian cities between 2008 and 2020 using several NewSQL
databases. The assessment was set with minimum resources, focusing on the read, write,
and update latencies, as well as query execution. CockroachDB and FaircomCO presented
higher values with write latency. Regarding query execution time, Citus, TIBCO, and
VoltDB presented good results, with HarperDB being the database with the best results.
Moreover, with update latency, CockroachDB presented inferior outcomes, while other
databases had some difficulties with the number of records. Lastly, the read latency test
showed that, once more, CockroachDB had the worst outcomes, while Citus and VoltDB
presented lower read latencies.
The work conducted in this paper differs from that of the other authors presented in
this section because we use a reliable benchmark (SSB) and a methodology (OSSpal) to
choose the best NewSQL database. Furthermore, we use the latest available versions of the
NewSQL databases.
• Query Flight 2: The queries Q2.1, Q2.2, and Q2.3 restrict the data in two dimensions,
by comparing revenues from all orders in all years for suppliers in a given region and
for a given product class.
• Query Flight 3: The queries Q3.1, Q3.2, Q3.3, and Q3.4 focus on restrictions in three
dimensions, calculate revenue volume over a given time period by customer country,
supplier country, and year within a given region.
• Query Flight 4: The queries Q4.1, Q4.2, and Q4.3 represent a ‘What-If’ sequence. It
begins with query Q4.1 using a group by on two dimensions and weak constraints on
three dimensions, and measure the aggregate profit, which is defined as (lo_revenue—
lo_supplycost).
4.1. CockroachDB
CockroachDB is a scalable database management system that was built in 2014 to
support demanding OLTP workloads while maintaining simultaneously strong consistency
and high availability [28]. CockroachDB, as its name implies and according to its creators,
is disaster-resistant through automatic replication and recovery mechanisms. CockroachDB
is a distributed relational database that scales without much effort, which is consistent
with ACID transactions and provides a traditional SQL API for structuring, manipulating,
and querying data [29]. Moreover, this database was built on a consistent key–value
store, with crash recovery capabilities, and offers horizontal scalability, present in many
NoSQL applications.
Future Internet 2023, 15, x FOR PEER REVIEW 6 of 24
CockroachDB’s architecture is designed in layers to make it easier to manage. Figure 1
shows a diagram of its architecture [30].
CockroachDBarchitecture
Figure1.1.CockroachDB
Figure architecturediagram.
diagram.
CockroachDB is divided into layers, and each one has the following functionalities:
CockroachDB is divided into layers, and each one has the following functionalities:
• SQL Layer: The highest level of abstraction for developers. This layer adds support
• SQL Layer: The highest level of abstraction for developers. This layer adds support
for a wide range of SQL expressions and syntax from PostgreSQL libraries, with
for a wide range of SQL expressions and syntax from PostgreSQL libraries, with some
some modifications;
modifications;
• Distributed Key–Value Store: This layer was implemented as a monolithic sorted map,
• Distributed Key–Value Store: This layer was implemented as a monolithic sorted
allowing to run multiple computers in parallel, being able to work with larger datasets;
• map, allowing to run multiple computers in parallel, being able to work with larger
Distributed Transactions: It is not necessarily considered a part of the layered architec-
datasets;
ture, but a necessary component of the system. Implementing distributed transactions
• Distributed Transactions:
allows to connect It is not
the layers necessarily
of the considered
architecture: a part
from SQL to of the layered
stores archi-on
and ranges
tecture, but a necessary component of the system. Implementing distributed transac-
each node;
tions allows to connect the layers of the architecture: from SQL to stores and ranges
on each node;
• Nodes: They are mostly considered as physical machines, virtual machines, or con-
tainers that include stores. The distributed key–value (KV) store routes messages to
nodes;
Future Internet 2023, 15, 10 6 of 23
• Nodes: They are mostly considered as physical machines, virtual machines, or con-
tainers that include stores. The distributed key–value (KV) store routes messages
to nodes;
• Store: Each node in the database can contain one or more shops, and in turn, each
shop can contain many ranges;
• Range: Ranges are the lowest-level unit of key–value data and every store contains
ranges. Each range is used to sort items in specific partitions within the store, using the
Raft consensus algorithm. The Raft algorithm is a variant of Paxos, which corresponds
to a family of protocols that solves consensus in a network of unreliable processors.
Figure
Future Internet 2023, 15, x FOR PEER REVIEW2 shows an example of three nodes running in a cluster, having access 7toofthe
24
CockroachDB web user interface (UI).
The main
The main advantages
advantages of of CockroachDB
CockroachDB are are[31]:
[31]:
•• Cockroach offers
Cockroach offers customer
customer support
support withwith migrations
migrations fromfrom other
other databases,
databases, as
as well
well as
as
helping users
helping users with
with difficulties
difficultiesin inusing
usingthis
thisdatabase;
database;
•• CockroachDB can be used anywhere. It can be deployed
CockroachDB can be used anywhere. It can be deployed inin virtual
virtual machines,
machines, contain-
contain-
ers, Amazon web services, and many other
ers, Amazon web services, and many other applications;applications;
•• CockroachDBmaintains
CockroachDB maintainsdata dataintegrity
integrityandandisisable
abletoto survive
survive crashes,
crashes, due
due to to
thethe
useuse
of
of ACID properties;
ACID properties;
•• CockroachDB offers
CockroachDB offershigh
highperformance
performanceand andavailability;
availability;
•• CockroachDB scales
CockroachDB scales horizontally
horizontallyand and offers
offerscloud
cloudsupport;
support;
•• CockroachDB has
CockroachDB has extensive
extensive documentation
documentationand andtutorials
tutorialsguides;
guides;
•• Supports PostgreSQL libraries to make use of SQL
Supports PostgreSQL libraries to make use of SQL commands. commands.
The main
The main limitations
limitations are
are the
the following
following [32]:
[32]:
•• CockroachDBdoes
CockroachDB doesnotnotsupport
support database
database transactions
transactions withwith high
high complexity,
complexity, sincesince
this
this database’s
database’s purpose
purpose is speed;
is speed;
•• Tomake
To makefull
fulluse
useofof CockroachDB,
CockroachDB,we we need
need to
to pay
pay for
for the
the enterprise
enterprise version,
version, whereas
whereas
the core
the core version
version can
can be used for most purposes with several functionality restrictions;
•• Multi-region
Multi-region tables
tables cannot
cannot bebe restored
restoredinto
intotables
tablesthat
thatare
arenot
notmulti-region
multi-regiontables;
tables;
•• SQL
SQL statements
statements comprising
comprising numerous
numerous subqueries
subqueries modifying the same same table
table can
can
cause
cause corruption;
corruption;
•• CockroachDB
CockroachDBdoes doesnotnotsupport
supportthe theuse
useofof
RESTORE
RESTORE with
withmulti-region table
multi-region localities;
table locali-
• The
ties;SET command, which allows modifying one of the session configuration variables,
• does not ROLLBACK
The SET command, which in a transaction;
allows modifying one of the session configuration varia-
• JSONB/JSON
bles, does not comparison
ROLLBACKoperators are not implemented.
in a transaction;
• JSONB/JSON comparison operators are not implemented.
Overall, CockroachDB is a database that has several functionalities, which allows
working with a large data volume and scale without much effort.
Future Internet 2023, 15, 10 7 of 23
Figure 3.
Figure 3. Xpand
Xpand Performance
Performance Topology.
Topology.
TheMaxScale
The MaxScalenodes
nodesmonitor
monitorthe
thehealth
health and
and availability
availability of of each
each node
node andand accept
accept cli-
clients
ents and application connections. The Xpand nodes receive queries from MaxScale
and application connections. The Xpand nodes receive queries from MaxScale nodes, store nodes,
storeindata
data in a distributed
a distributed manner,
manner, and execute
and execute queriesqueries using parallel
using parallel streaming.
streaming. Figure 4Figure
shows4
shows
an an example
example of the MariaDB
of the MariaDB Xpand
Xpand user user interface.
interface.
The main advantages of MariaDB Xpand are the following [37]:
• It provides distributed SQL capabilities and is ACID-compliant;
• It is highly available due to maintaining replicas of each slice (logical representation of
data that are saved in a partition of a disk, which contains pieces of user database and
tables), allowing to recover from a node failure without losing data;
• It can maintain multiple replicas of each slice and is zone-aware, allowing to recover
from multi-node failures or zone failures without losing data;
• Its rebalancer maintains data distribution, meaning that a node or zone failure causes
the creation of new replicas for each slice, and the rebalance then redistributes the data;
• Performs operations in parallel through the nodes to have the latest data;
Future Internet 2023, 15, 10 8 of 23
• It scales out because each node can read and write, plus reads are lockless. Writes do
not block reads, and additional nodes can be added to increase capacity.
However, MariaDB Xpand also has the following limitations [37]:
• Difficulty with migrations with modern software;
• Recovery of crash from data replication takes too long;
Figure 3. Xpand Performance Topology.
• No option to export stored procedures’ query results.
Overall, MariaDB
The MaxScale Xpand
nodes is a SQL
monitor the distributed
health and database thatofallows
availability scaling
each node out
and without
accept cli-
much difficulty, with the ability to access data through computers around the
ents and application connections. The Xpand nodes receive queries from MaxScale nodes, world. On the
other hand,initahas
store data limitations
distributed in analyzing
manner, query queries
and execute performance
using and in the
parallel ability to Figure
streaming. migrate4
data with modern applications.
shows an example of the MariaDB Xpand user interface.
(1) Shows the number of nodes associated with the current database. It allows to add or
remove nodes from the database;
(2) Configuration properties of the current database;
(3) VoltDB displays CPU, memory, latency, and transaction metrics to help the user
analyze the execution of the database.
The main advantages of VoltDB are the following [41,42]:
• In-memory storage: It uses synchronous replication for durability;
• Designed for online/operational transaction processing, OLTP;
• Optimized performance: Each execution engine is single-threaded. It removes latches,
locks, and buffer pool management in order to eliminate overhead and legacy archi-
tecture systems;
• VoltDB allows to scale without much difficulty;
• Distributed database: In this case, the tables are partitioned, with one partition
per node;
• VoltDB uses Java stored procedures to eliminate client–server round-trips;
• VoltDB provides several options for backing up database data and schema;
• High availability: VoltDB contains three capabilities, namely K-safety, network fault
tolerance, and live node rejoin.
The major limitations of VoltDB are the following:
• Not optimized for OLAP, since VoltDB is focused on OLTP, and is not good at solving
problems that require large database column scans;
• Not optimized to return an excessive amount of data from stored procedures;
• The database must fit into the available memory of the system;
• Not optimized to work with complex queries and huge tables;
• No support for foreign keys and check constraints;
• No Windows version available.
VoltDB is a fast in-memory database that provides high availability and fast query
response. On the other hand, VoltDB does not support the Windows operating system.
Table 1. Cont.
Moreover, VoltDB does not support Windows, and regarding supported languages,
each database has a variety of programming languages that allows us to create applications
to support each database. It is important to notice that all the latest releases are from 2022.
Overall, each solution was designed to support large dataset volumes, without losing
ACID properties.
5. Experimental Evaluation
In this section, we describe the assessment of each NewSQL Database, CockroachDB,
MariaDB Xpand, and VoltDB with OSSpal methodology, and the experimental evaluation
using the Star Schema Benchmark.
Categories Weight
Functionality 35%
Operational Software Characteristics 25%
Documentation 15%
Support and Service 15%
Software Technology Attributes 10%
The next step is to define and evaluate the different characteristics that involve
NewSQL solutions to analyze the ‘Functionality’ category. The features chosen to evalu-
ate the tools were based on [36]. At this stage, the characteristics that suit both types of
tools were included. For this, a score was assigned to each feature on the following scale:
1—slightly important, 2—important, 3—very important. Table 3 shows the weights and
Future Internet 2023, 15, 10 12 of 23
marks assigned to each metric according to what we consider to be the most important
features of a NewSQL database.
OLTP 3 3 3 3
CRUD 3 3 3 3
Replication 3 2 2 3
Partitioning 3 3 3 3
Consistency 3 3 3 3
Crash Recovery 3 3 3 3
Triggers (SQL) 2 0 2 0
Indexes (SQL) 2 2 2 2
PL/SQL functions 2 1 2 2
Cloud Support 1 1 1 1
Total 25 21 24 23
Final Score 3 4 4
After assigning the weight’s attribution to all categories, each tool evaluation is per-
formed to assess which database earns the highest score.
The final evaluation in each category, besides ‘Functionality’, is based on the follow-
ing formula:
Functionality 35% 3 4 4
Operational Software
25% 4 3.75 3.5
Characteristics
Software Technology
10% 4.35 3.75 3.75
Attributes
• In ‘Documentation’, CockroachDB has the highest score by having very good docu-
mentation available for users;
• In ‘Support and Services’, every single database has a score of 5, which means that all
databases have many people who require their services for training and webinars for
new updates;
• Finally, in ‘Software Technology Attributes’, CockroachDB has the maximum score.
As a result, the database with the highest score is MariaDB Xpand with 4.16, followed
by VoltDB with 4.10, and lastly, CockroachDB with 3.99.
Although these databases have close values, it will be useful to assess which one
has the best performance and scalability by evaluating its performance with a standard
benchmark, as in the next section.
SF = 1 SF = 10
(a) Average Query Execution Time with SF = 1 (b) Average Total Time Results with Scale Factor 1
Figure8.8.Results
Figure Results of
of query
query execution
execution time
time with
with SF
SF ==1.1.
On the other hand, CockroachDB has an almost linear execution time compared to the
other two databases. This demonstrates that no matter how complex the query is, the time
to process and return a result would be almost the same as the other queries.
Lastly, MariaDB Xpand takes more time to execute the queries. In the first flight
group, we were surprised that the execution time is almost the same as VoltDB. We were
expecting to
(a) Average Query Execution see close
Time results
with SF = 1 to VoltDB,(b)
butAverage
the remaining queries
Total Time took more
Results with time
ScaletoFactor
process,
1
Figure 9 shows the results of RAM usage in the queries execution with SF = 1. VoltDB
presents a better RAM consumption, with an average total of 34 MB, while CockroachDB
and MariaDB Xpand results are higher, with 78 MB and 198 MB, respectively. VoltDB has
a higher memory consumption in queries Q4.1 and Q4.2, with 85 MB and 98 MB usage,
Future Internet 2023, 15, 10 15 of 23
respectively. This last group of queries has more aggregations and data to be retrieved
from tables with large data volumes. In the remaining queries, the memory consumption
was below 100 MB.
especially Q2.1,
On the Q3.1,
other Q3.2,
hand, Q4.2, and Q4.3,
CockroachDB is thewhich
secondaredatabase
the queries
withthat
the access
highestmore tables
memory
and return
used, withthe
78 most data.database displayed better control over the usage of RAM, with
MB. This
smallFigure
peaks9 in
shows the Q1.2,
queries resultsQ3.1,
of RAM
and usage in the queries
Q4.1. Moreover, execution
given with SF of
the complexity = 1.queries
VoltDB
presents
executeda in
better RAM consumption,
this experiment, with an
CockroachDB hasaverage total memory
a controlled of 34 MB,usage.
while CockroachDB
and MariaDB
MariaDBXpand
Xpandresults are higher,
is the database with
with the78highest
MB and use198
of MB, respectively.
memory. VoltDBhas
This database has
aan
higher memory consumption in queries Q4.1 and Q4.2, with 85 MB and
increase of 2.54 times the usage of memory comparing to CockroachDB and 5.83 times 98 MB usage,
respectively. This last
the VoltDB usage of group
memory. of queries
MariaDBhasXpand
more aggregations
requires more and datawhen
RAM to be retrieved
queries withfrom
tables
more with large data
aggregated volumes.
operations areIn the remaining
performed, as wequeries,
can see the memory
with groups consumption
Q2, Q3, and Q4. was
below
Group100Q1MB.
has one aggregator operation so, the use of memory was relatively low.
(a) Average Memory Used per Query with SF = 1 (b) Average Total Memory Used with SF = 1
Figure 9. Results of memory used with SF = 1.
Figure 9. Results of memory used with SF = 1.
Figure 10 illustrates the results of CPU used in the queries execution with a SF = 1.
On the other hand, CockroachDB is the second database with the highest memory
MariaDB Xpand is the database with the lowest percentage of CPU usage, with 36.3%,
used, with 78 MB. This database displayed better control over the usage of RAM, with
while VoltDB has an average of 40.3%, and CockroachDB with 41.1% of CPU usage. Mar-
small peaks in queries Q1.2, Q3.1, and Q4.1. Moreover, given the complexity of queries
iaDB Xpand results show an increase in CPU usage from Q1.1 to Q2.2, since more tables
executed in this experiment, CockroachDB has a controlled memory usage.
were accessed, and from query Q3.2 to Q4.3, we notice a decrease in CPU usage. Even
MariaDB Xpand is the database with the highest use of memory. This database has an
though this group of queries has a higher number of tables with aggregations, MariaDB
increase of 2.54 times the usage of memory comparing to CockroachDB and 5.83 times the
Xpand displays adequate results with the last group of queries.
VoltDB usage of memory. MariaDB Xpand requires more RAM when queries with more
VoltDB is the second database with the highest use of CPU, with 40.3%. By analyzing
aggregated operations are performed, as we can see with groups Q2, Q3, and Q4. Group
the queries during VoltDB operation, we notice a substantial increase in queries Q2.1,
Q1 has one aggregator operation so, the use of memory was relatively low.
Q2.2, Q2.3, and Q3.2. These queries have an average use of CPU between 50% and 80%
Figure 10 illustrates the results of CPU used in the queries execution with a SF = 1.
since VoltDB requires more resources to process and return a result.
MariaDB Xpand is the database with the lowest percentage of CPU usage, with 36.3%, while
Lastly, CockroachDB has a higher average of CPU use, with an almost linear percent-
VoltDB has an average of 40.3%, and CockroachDB with 41.1% of CPU usage. MariaDB
age of CPU usage from query group Q2 to Q4. These queries have a higher processing
Xpand results show an increase in CPU usage from Q1.1 to Q2.2, since more tables were
time than group Q1 since these queries have more aggregations.
accessed, and from query Q3.2 to Q4.3, we notice a decrease in CPU usage. Even though
this group of queries has a higher number of tables with aggregations, MariaDB Xpand
displays adequate results with the last group of queries.
VoltDB is the second database with the highest use of CPU, with 40.3%. By analyzing
the queries during VoltDB operation, we notice a substantial increase in queries Q2.1, Q2.2,
Q2.3, and Q3.2. These queries have an average use of CPU between 50% and 80% since
VoltDB requires more resources to process and return a result.
Lastly, CockroachDB has a higher average of CPU use, with an almost linear percentage
of CPU usage from query group Q2 to Q4. These queries have a higher processing time
than group Q1 since these queries have more aggregations.
Future Internet 2023, 15, 10 16 of 23
Future Internet 2023, 15, x FOR PEER REVIEW 17 of 24
(a) Average Query Execution Time with SF = 10 (b) Average Total Time Results with SF = 10
Figure 11.
Figure 11. Results
Results of
of execution
execution time
time with
with SF
SF ==10.
10.
Figure 12 shows
CockroachDB, by the results the
increasing of RAM percentage
scale factor consumed
10 times, displaysinan
theaverage
query execution
increase of
timetimes.
7.47 with SF
The= 10. CockroachDB
results in Figure 11 demonstrates better
show a similar results
linear with memory
execution time in consumption,
Figure 8 using
with
SF = 1.anFurthermore,
average of only 145 MB. Throughout
CockroachDB manages toall queries,
have CockroachDB
a linear execution timedisplays good
despite the
memory control
complexity of thethrough the execution
queries and the increaseof in
each query,
data size. with group Q4 being the highest
MariaDB Xpand is the database which takes more time to execute the queries, with an
increase of 7.97 times compared to SF = 1. In this case, MariaDB Xpand does not show the
(a) Average Query Execution Time with SF
same performance in =the
10 first group of(b) Average
queries Q1Total Time Results
as VoltDB in Figure with SF = SF
8 with 10 = 1. In
addition, MariaDB Xpand is the database that
Figure 11. Results of execution time with SF = 10.
shows the highest query execution time, with
an increase of 1.52 times compared to CockroachDB and 8.1 times compared to VoltDB.
Figure
Figure1212shows
showsthe theresults
resultsofof
RAM
RAM percentage
percentage consumed
consumed in the query
in the execution
query time
execution
with SF = 10. CockroachDB demonstrates better results with memory
time with SF = 10. CockroachDB demonstrates better results with memory consumption, consumption, with
an average
with of onlyof145
an average onlyMB.145Throughout all queries,
MB. Throughout CockroachDB
all queries, displays
CockroachDB good memory
displays good
control
memory through
control the execution
through of each query,
the execution of eachwith group
query, withQ4 beingQ4
group the highest
being group to
the highest
consume memory, between 170 MB and 305 MB. The remaining queries remained below
225 MB.
an approximate increase of 9 times the consumption of RAM.
MariaDB Xpand is the database with the highest average consumption of memory,
having an average consumption of 367 MB. In query groups Q2, Q3, and Q4, the higher
the number of aggregations, the higher the consumption of memory, as this database re-
Future Internet 2023, 15, 10 quires greater memory usage when more aggregations are involved. 17 of 23
Despite these queries, query group flight Q1 displays lower RAM consumption, since
these queries search for data only in small tables, having simpler queries.
Figure
By 13 shows
contrast, VoltDBtheisresults of thehighest,
the second CPU percentage
with a total used in the memory
average query execution
use of 326with MB.
SF =first
The 10. CockroachDB
query flight groupis the database withadequate
Q1 displays the lowestresults
average CPU use,
between with
5 MB 55.3%,
and while
35 MB. The
MariaDB
other groupXpand and VoltDB
of query flights hadhavea an additional
higher increase of
consumption, 1.4 times
between 300compared
MB and 600 to SFMB,= 1,an
with 80.5% and
approximate 82.6%,
increase of respectively. As mentioned
9 times the consumption of before,
RAM. CockroachDB has sufficient
control over itsXpand
MariaDB resources,
is thedespite the complexity
database of eachaverage
with the highest query, demonstrating
consumption an almost
of memory,
linear percentage
having an averageof CPU use during
consumption of 367the processing
MB. In querytime
groups of each query.
Q2, Q3, and Q4, the higher the
numberOn oftheaggregations,
other hand, MariaDB
the higher Xpand appears in second
the consumption place, as
of memory, with
thisandatabase
average requires
use of
CPU of 80.5%. During the processing time of
greater memory usage when more aggregations are involved. each query, we noticed that in query flight
groupDespite these queries, query group flight Q1 displays lower RAM consumption,pro-
Q4, MariaDB Xpand exceeds the 100% scale. It is possible that one or more since
cesses executed
these queries at the
search forsame
data time
only with the tables,
in small query having
lead to simpler
an increase in resources in the
queries.
virtual machine.
Figure 13 shows the results of the CPU percentage used in the query execution with
Lastly,
SF = 10. VoltDB database
CockroachDB has the with
is the database highest
the CPU
lowest usage.
average WeCPUnotice a peak
use, of CPUwhile
with 55.3%, in
query Q3.4,
MariaDB having
Xpand andsurpassed
VoltDB have the scale of 100%, meaning
an additional increasethat during
of 1.4 timesthis evaluation,
compared to SF the= 1,
benchmark used one full core of the processor. We suppose that
with 80.5% and 82.6%, respectively. As mentioned before, CockroachDB has sufficient during the execution of
this query,
Future Internet 2023, 15, x FOR PEERcontrol
REVIEW overVoltDB had more
its resources, processes
despite in the background,
the complexity leading
of each query, to an increasean
demonstrating in CPU
19 almost
of 24
usage. The remaining queries have an average CPU usage
linear percentage of CPU use during the processing time of each query. between 60% and 100%, being
the database with the most resources used during this experiment.
On the 14
Figure other hand,
shows the MariaDB
sum of queryXpand appears
execution in for
time second place,
all SSB withwith
queries an average
SF = 1 anduse
of
SFCPU
= 10. of 80.5%.
These During
results the processing
demonstrate time of
that MariaDB each stands
Xpand query, out
we asnoticed that in with
the database query
the highest
flight grouptime
Q4, to execute Xpand
MariaDB all queries. It spent
exceeds thea100%
total of 67 s It
scale. with SF = 1 and
is possible 534one
that s with SF
or more
= 10, which
processes corresponds
executed at thetosame
about 8 times
time withmore. CockroachDB
the query lead to anisincrease
the second database with
in resources in the
47 s, 7.5machine.
virtual times faster than SF = 10 with 351 s. VoltDB is the fastest database, with 66 s, only
2.1-fold more
Lastly, than SF
VoltDB = 1 withhas
database 32 s.
theOverall,
highestVoltDB showedWe
CPU usage. satisfactory
notice a results
peak ofequally
CPU in
with SF
query = 1 having
Q3.4, and SF surpassed
= 10. This theproves
scalethat an in-memory
of 100%, meaningdatabase
that duringis significantly faster
this evaluation, the
compared with CockroachDB and MariaDB Xpand.
Future Internet 2023, 15, 10 18 of 23
Figure14.
Figure 14.Sum
Sumofofqueries
queriesbetween
betweenSF
SF==11and
andSF
SF==10.
10.
InInsummary,
summary, VoltDB
VoltDB presents
presents the
thebest
bestresults
resultsofofquery
queryexecution time
execution for for
time all SSB que-
all SSB
ries. Its scalability should also be noted: when the dataset size increases by 10
queries. Its scalability should also be noted: when the dataset size increases by 10 times,times, the
query execution time has only an increment of 2.1 times.
the query execution time has only an increment of 2.1 times.
6.6.Discussion
Discussion
The
Themain
main purpose
purpose ofof this paper isis to
to explore
explore the
thetopic
topicofofNewSQL
NewSQLdatabases
databasesand andto
toprovide
provideanananalysis
analysisofofthe
the best
best databases,
databases, using OSSpal methodology and a standard
methodology and a standard
benchmark
benchmark(SSB)(SSB)totoevaluate
evaluatetheir
theirperformance
performanceandandscalability.
scalability.
The
Thestudy
studyconducted
conductedshows that using
shows OSSpalOSSpal
that using methodology, MariaDBMariaDB
methodology, Xpand achieves
Xpand
4.16 points,4.16
achieves VoltDB achieves
points, VoltDB 4.10 points, 4.10
achieves and CockroachDB achieves 3.99achieves
points, and CockroachDB points, as3.99
shown in
points,
Figure 15. These values are only determined by analyzing and selecting key functionalities
as shown in Figure 15. These values are only determined by analyzing and selecting key
and qualities that
functionalities andeach database
qualities thatpresents, which can
each database bias the
presents, results.
which can Despite
bias the having
results.
close results, each database has been designed to work with large volumes of data, as
well as the capability of scaling. Therefore, having assessed each database, we conclude
that MariaDB Xpand is highlighted as the tool with the best characteristics, according to
OSSpal categories.
Despite having close results, each database has been designed to work with large volumes
of data, as well as the capability of scaling. Therefore, having assessed each database, we
Future Internet 2023, 15, 10 19 of 23
conclude that MariaDB Xpand is highlighted as the tool with the best characteristics, ac-
cording to OSSpal categories.
Figure 15.
Figure OSSpal evaluation
15. OSSpal evaluation results.
results.
In the
In the experimental
experimental evaluation,
evaluation, the
the Star
Star Schema
Schema Benchmark
Benchmark was was applied
applied toto analyze
analyze
the performance
the performance and andscalability
scalabilityof
ofeach
eachdatabase,
database,given
giventhe
thescale
scalefactor
factorofof11and
and10,10,corre-
corre-
sponding to 1 GB, and ~10 GB of data, respectively. Figure 16 illustrates
sponding to 1 GB, and ~10 GB of data, respectively. Figure 16 illustrates the total amount the total amount
of time
of time needed
needed to to upload
upload 11 GB
GB and
and 1010 GB
GB of
of data.
data. We
We presume
presume thatthat each
each database
database would
would
take 10 times to upload the data, but the results proved to be better.
take 10 times to upload the data, but the results proved to be better. This is due to This is due to the
the
upload system used by each database. CockroachDB uses an import
upload system used by each database. CockroachDB uses an import mechanism that al- mechanism that allows
data to
lows be to
data processed and stored
be processed in key–value
and stored stores,
in key–value passing
stores, through
passing several
through layerslayers
several of its
architecture directly to the storage layer, improving the efficiency of data
of its architecture directly to the storage layer, improving the efficiency of data import, import, ranking
first withfirst
ranking a total
withof 139 s. VoltDB
a total of 139uses an integrated
s. VoltDB uses Java mechanismJava
an integrated called csvloader, called
mechanism which
imports CSV files into the database. This utility allows data to be imported
csvloader, which imports CSV files into the database. This utility allows data to be im- efficiently into
memory in 397 s, decreasing performance penalties of disk accesses. However, we expected
ported efficiently into memory in 397 s, decreasing performance penalties of disk accesses.
that being an in-memory database, VoltDB would outperform the other two databases.
However, we expected that being an in-memory database, VoltDB would outperform the
Lastly, MariaDB Xpand uses a mechanism in Python called clustrix_import, which scans
other two databases. Lastly, MariaDB Xpand uses a mechanism in Python called clus-
the data to know how much space it needs to allocate in storage before importing data.
trix_import, which scans the data to know how much space it needs to allocate in storage
ItREVIEW
also uses a rebalancing system that distributes data throughout the existing nodes
Future Internet 2023, 15, x FOR PEERbefore
in
importing data. It also uses a rebalancing system that distributes data throughout 21 of 24
the cluster bringing optimization results. This database, despite having similarities to the
the existing nodes in the cluster bringing optimization results. This database, despite hav-
MySQL storage mechanism, lagged behind CockroachDB and VoltDB.
ing similarities to the MySQL storage mechanism, lagged behind CockroachDB and
VoltDB.
Author Contributions: Conceptualization, J.B. and F.S.; Methodology, E.P. and J.B.; Software, E.P.;
Validation, E.P., F.S. and J.B.; Formal analysis, E.P., F.S. and J.B.; Investigation, E.P.; Resources, E.P.;
Data curation, E.P.; Writing—original draft preparation, E.P.; Writing—review and editing, J.B. and
F.S.; Supervision, J.B. and F.S.; Project administration, J.B. and F.S.; Funding acquisition, J.B. All
authors have read and agreed to the published version of the manuscript.
Funding: This research received no external funding.
Conflicts of Interest: The authors declare no conflict of interest.
References
1. Grolinger, K.; Higashino, W.A.; Tiwari, A.; Capretz, M.A. Data management in cloud environments: NoSQL and NewSQL data
stores. J. Cloud Comput. 2013, 2, 24. [CrossRef]
2. Date, C.J. E. F. Codd and Relational Theory; Technics Publications: Basking Ridge, NJ, USA, 2022; pp. 1–404.
3. Venkatraman, S.; Fahd, K.; Kaspi, S.; Venkatraman, R. SQL versus NoSQL movement with big data analytics. Int. J. Inf. Technol.
Comput. Sci. 2016, 8, 59–66. [CrossRef]
4. Sharma, N. Overview of the Database Management System. Int. J. Adv. Res. Comput. Sci. 2017, 8, 362–369.
5. Abramova, V.; Bernardino, J.; Furtado, P. Experimental Evaluation of NoSQL Databases. Int. J. Database Manag. Syst. 2014, 6, 1–16.
[CrossRef]
6. Chaudhry, N.; Yousaf, M.M. Architectural assessment of NoSQL and NewSQL systems. Distrib. Parallel Databases 2020, 38, 881–926.
[CrossRef]
7. Stonebraker, M. Newsql: An alternative to nosql and old sql for new oltp apps. Commun. ACM 2012, 6–7.
8. Valduriez, P.; Jiménez-Peris, R.; Özsu, M.T. Distributed database systems: The case for NewSQL. In Transactions on Large-Scale
Data-and Knowledge-Centered Systems XLVIII; Hameurlain, A., Tjoa, A.M., Eds.; Springer: Berlin/Heidelberg, Germany, 2021;
Volume 12670, pp. 1–15.
9. Matthew, A. How Will the Database Incumbents Respond to NoSQL and NewSQL. Available online: https://15799.courses.cs.c
mu.edu/fall2013/static/papers/aslett-newsql.pdf (accessed on 26 July 2022).
10. DB-Engines Ranking. Available online: https://db-engines.com/en/ranking (accessed on 13 September 2022).
Future Internet 2023, 15, 10 22 of 23
11. Cockroach Labs, The Company Building CockroachDB. Available online: https://www.cockroachlabs.com/docs/releases/v21.2.
html (accessed on 22 December 2022).
12. MariaDB Xpand: Distributed SQL Database. Available online: https://mariadb.com/docs/xpand/release-notes/mariadb-xpa
nd-6/6-0-3/#Installation_Instructions (accessed on 22 December 2022).
13. Volt Active Data: Because Milliseconds Matter. Available online: https://docs.voltactivedata.com/v11docs/UsingVoltDB/
(accessed on 22 December 2022).
14. Wasserman, A.I.; Guo, X.; McMillian, B.; Qian, K.; Wei, M.Y.; Xu, Q. OSSpal: Finding and evaluating open source software. In
Proceedings of the IFIP International Conference on Open Source Systems, Buenos Aires, Argentina, 22–23 May 2017; Volume 496,
pp. 193–203.
15. Calçada, A.; Bernardino, J. Evaluation of Couchbase, CouchDB and MongoDB using OSSpal. In KDIR; SCITEPRESS—Science
and Technology Publications: Setúbal, Portugal, 2019; pp. 427–433.
16. Leite, N.; Pedrosa, I.; Bernardino, J. Open Source Business Intelligence Platforms Assessment using OSSpal Methodology. In
Proceedings of the 15th International Joint Conference on e-Business and Telecommunications (ICETE), Porto, Portugal, 26–28 July
2018; pp. 356–362.
17. António, O.; Jorge, B. OSSPal Assessment of Self-Service BI and Analytics Software. In Proceedings of the CAPSI 2020, Porto,
Portugal, 11–12 October 2020; p. 23. Available online: https://aisel.aisnet.org/capsi2020/23 (accessed on 22 December 2022).
18. Ferreira, T.; Pedrosa, I.; Bernardino, J. Integration of Business Intelligence with e-commerce. In Proceedings of the 14th Iberian
Conference on Information Systems and Technologies (CISTI), Coimbra, Portugal, 19–22 June 2019; pp. 1–7. [CrossRef]
19. Cardoso, T.; Penela, J.; Rosa, A.; Wanzeller, C.; Martins, P.; Abbasi, M. OSSpal Qualitative and Quantitative Comparison:
Couchbase, CouchDB, and MongoDB. In Marketing and Smart Technologies; Springer: Singapore, 2022; pp. 141–150.
20. O’Neil, P.; O’Neil, E.; Chen, X.; Revilak, S. The star schema benchmark and augmented fact table indexing. In TCPEB; Springer:
Berlin/Heidelberg, Germany, 2009; pp. 237–252.
21. Astrova, I.; Koschel, A.; Wellermann, N.; Klostermeyer, P. Performance Benchmarking of NewSQL Databases with Yahoo Cloud
Serving Benchmark. In Proceedings of the Future Technologies Conference, Virtual, 5–6 November 2020; pp. 271–281.
22. Kaur, K.; Sachdeva, M. Performance evaluation of NewSQL databases. In ICISC; IEEE: Piscataway, NJ, USA, 2017; pp. 1–5.
23. Murazzo, M.; Gómez, P.; Rodríguez, N.; Medel, D. Database NewSQL Performance Evaluation for Big Data in the Public Cloud.
In Cloud Computing and Big Data; Naiouf, M., Chichizola, F., Rucci, E., Eds.; JCC&BD 2019. Communications in Computer and
Information Science; Springer: Cham, Switzerland, 2019; Volume 1050. [CrossRef]
24. Oliveira, J.; Bernardino, J. NewSQL Databases—MemSQL and VoltDB Experimental Evaluation. In Proceedings of the 9th
International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management—KEOD, Madeira,
Portugal, 1–3 October 2017; pp. 276–281, ISBN 978-989-758-272-1. [CrossRef]
25. Knob, R.; Schreiner, G.; Frozza, A.; Mello, R. Uma Análise de Soluções NewSQL. In Anais da XV Escola Regional de Banco de Dados;
SBC: Porto Alegre, Brazil, 2019; pp. 21–30. [CrossRef]
26. Hahn, S.M.L.; Chereja, I.; Matei, O. Comparison of the Performance of NewSQL Databases Based on Linux OS. In Data Science and
Intelligent Systems. CoMeSySo 2021; Silhavy, R., Silhavy, P., Prokopova, Z., Eds.; Lecture Notes in Networks and Systems; Springer:
Cham, Switzerland, 2021. [CrossRef]
27. Sanchez, J. A review of star schema benchmark. arXiv 2016, arXiv:1606.00295.
28. Taft, R.; Sharif, I.; Matei, A.; VanBenschoten, N.; Lewis, J.; Grieger, T.; Niemi, K.; Woods, A.; Birzin, A.; Poss, R.; et al. CockroachDB:
The resilient geo-distributed sql database. In Proceedings of the 2020 ACM SIGMOD International Conference on Management of
Data, New York, NY, USA, 14–19 June 2020; pp. 1493–1509.
29. The New Stack: Meet CockroachDB, the Resilient SQL Database. Available online: https://www.cockroachlabs.com/blog/the-n
ew-stack-meet-cockroachdb-the-resilient-sql-database/ (accessed on 19 July 2022).
30. Google Spanner Inspires CockroachDB to Outrun It. Available online: https://www.nextplatform.com/2017/02/22/google-spa
nner-inspires-cockroachdb-outrun/ (accessed on 30 August 2022).
31. Start a Local Cluster (Insecure). Available online: https://www.cockroachlabs.com/docs/stable/start-a-local-cluster.html
(accessed on 30 August 2022).
32. Known Limitations in CockroachDB v22.1. Available online: https://www.cockroachlabs.com/docs/stable/known-limitations.
html#a-multi-region-table-cannot-be-restored-into-a-non-multi-region-table (accessed on 9 October 2022).
33. MariaDB Corporation. Evaluating MariaDB Xpand and Cockroach with Sysbench. [White Paper] March. 2022. Available
online: https://go.mariadb.com/22Q2-WC-GLBL-DBaaS-Xpand-vs-CockroachDB-with-Sysbench-DB1139_LP-Registration.
html (accessed on 22 December 2022).
34. Deploy Xpand Performance Topology—MariaDB. Available online: https://mariadb.com/docs/deploy/topologies/xpand-per
formance/xpand-6/ (accessed on 19 July 2022).
35. Deploy Xpand Topology—Enterprise Documentation—MariaDB. Available online: https://mariadb.com/docs/deploy/topologi
es/xpand/xpand-6/ (accessed on 19 July 2022).
36. Architecture of MariaDB Xpand. Available online: https://mariadb.com/docs/architecture/components/xpand/ (accessed on
16 July 2022).
37. MariaDB Xpand Reviews, Ratings & Features 2022—Gartner. Available online: https://www.gartner.com/reviews/market/clo
ud-database-management-systems/vendor/mariadb/product/mariadb-xpand (accessed on 18 July 2022).
Future Internet 2023, 15, 10 23 of 23
38. Stonebraker, M.; Weisberg, A. The VoltDB Main Memory DBMS. IEEE Data Eng. Bull. 2013, 36, 21–27.
39. ODBMS, VoltDB Technical Overview. Available online: http://www.odbms.org/wp-content/uploads/2013/11/VoltDBTechnic
alOverview.pdf (accessed on 19 July 2022).
40. Using the VoltDB Deployment Manager Web Interface. Available online: https://docs.voltdb.com/v7docs/AdminGuide/Depp
loyWebUI.php (accessed on 20 July 2022).
41. Almassabi, A.; Bawazeer, O.; Adam, S. Top NewSQL databases and features classification. Int. J. Database Manag. Syst. 2018,
10, 11–31. [CrossRef]
42. VoltDB Active Data Documentation Administrator’s Guide. Available online: https://docs.voltdb.com/AdminGuide/ (accessed
on 10 October 2022).
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.