access.2020.3006383
access.2020.3006383
access.2020.3006383
fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.3006383, IEEE Access
Date of publication xxxx 00, 0000, date of current version xxxx 00, 0000.
Digital Object Identifier 10.1109/ACCESS.2017.Doi Number
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.3006383, IEEE Access
blockchain is achieved with the Proof of Work (PoW) In this paper, we have developed a decentralized
protocol that requires users' CPU power to link new blocks application model in Python that is connected to a Neo4j
of transactions to the existing blockchain, and thereby database [26], where blockchain data are stored. Following
forming a continuous record that cannot be altered without the basic principles of Buterin and Griffith’s original paper
redoing all the work. In the case of a fork, this process, also [20], we practiced a Casper-like consensus mechanism to
known as mining [7], encourages users to always mine on function alongside the most popular block proposal
top of the longest chain, since it came from the largest pool mechanisms: the PoW and the PoS protocols. A major part
of CPU power and so it is the most difficult to reproduce. of our work was to incorporate Neo4j in the functionality of
The inherent characteristics of blockchain architecture, the above mechanisms and ultimately improving their
like transparency, verifiability, privacy, and anonymity, performance. For that reason, we developed a versatile
have encouraged since then, various industries and Graph Model for our blockchain database that allows for a
operational domains to further explore its numerous multilevel viewing of the stored data and, by extension,
benefits and applications [8]. Blockchain technology has numerous ways of accessing them. From the Neo4j Desktop
also its drawbacks, with scalability [9], security, and energy [27] application, we were able to monitor and visualize
consumption [10] problems being the most significant. changes in the deployed graph database in various use case
Nonetheless, new protocols and solutions are continually scenarios. Another advantage of the blockchain graph
being developed [11]-[14] to address these problems and to database is the ease in applying analytical methods to the
consolidate the blockchain technology and the stored data and to the relationships between them with
decentralized model, potentially transforming the way graph analysis tools. This innovation could solidify the
people choose to transact globally [15]. One such example blockchain analytics field by facilitating the evaluation of
is the Proof of Stake (PoS) protocol [16] that attempts to blockchain’s components and the behavior of the network’s
restrict PoW's wastefulness, by using tokens instead of nodes. For this reason, we ran a series of simulated
computational work, as a scarce and well-distributed experiments and by utilizing the annotated graph model, we
resource to prevent cheap attacks to the blockchain. tested the efficiency of the implemented technologies and
However, PoS stakeholders’ incentives [17] differ from mechanisms in preventing the most common blockchain
those of PoW miners’ in a way that may compromise attacks; namely the 51% Attack, Catastrophic Crashes, and
network’s security. Virtually the most profitable tactic for a the Attack from dynamic validator sets.
stakeholder is to vote on every branch of the blockchain The rest of the paper is structured as follows: In Section
tree2, thus making it harder to identify the most reliable II we present the theoretical background for the tools and
chain and reach a clear consensus. To tackle the so-called mechanisms developed in this paper. In Section III we
Nothing-at-Stake problem [18], Ethereum [19] developers discuss briefly about the published work that technically
created a partial consensus mechanism, called Casper [20], relates to blockchain and the ideas proposed in our paper. In
that combines the PoS research and Byzantine Fault Section IV we describe the architecture of the decentralized
Tolerance (BFT) [21] consensus theory. Casper overlays an application, while details about the implementation and the
existing blockchain and offers the appropriate tools and functionality of its components are given in Section V. In
regulations to readjust participants' incentives [22], so that Section VI we run our application and test the performance
they always consent to the most secure chain. This of the employed blockchain data model and the security of
technology is so recent that it has yet to be tested in a real the implemented protocols against the most common
cryptocurrency, leaving some problems associated with still blockchain attacks. Section VII is the conclusion of this
open. paper, where we summarize our findings and suggest
Along with the troubleshooting, efforts are also being possible applications for the mechanisms we developed.
made to involve new tools and test new approaches in
blockchain technology [23]-[25], expanding its capabilities II. BACKGROUND
and applications. In this context, and because of the high
interconnection of blockchain data, the representation of A. CASPER CONSENSUS MECHANISM
blockchain as a distributed graph database is far from Casper is a partial consensus mechanism combining Proof
absurd. Relationships between its data, keep blockchain of Stake algorithm research and Byzantine fault-tolerant
coherent, and may bear information of great analytical consensus theory. Casper’s operations are backed by a
value. Only a database that natively embraces relationships group of particular nodes, the validators [28], who are
is able to store, process, and query those connections responsible for voting on checkpoints and finalizing
efficiently. While other databases compute relationships at transactions. A checkpoint is only a regular block, whose
query time through expensive JOIN operations, a graph height in the blockchain tree is an exact multiple of a
database stores connections alongside the data in the model, number. In Ethereum, for instance, this number is set to
allowing millions of connections per second to be traversed. 100, so through the resultant checkpoint tree, validators can
finalize every 100 blocks at once, rather than voting on
every single block.
2
The forking of chains in the ledger results in tree like structure rooted Every node can become a validator by depositing at least
at the Genesis Block. the predetermined minimum amount of tokens. The number
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.3006383, IEEE Access
of tokens deposited also represents the stake of the deposits. In case of a rule violation, Casper guarantees that
validator, which rises and falls with rewards and penalties. all relevant evidence can be found, and the offenders can be
A node’s voting power is determined by his share of the identified.
number of tokens deposited by all validators. Hence, when For the mathematical proof of the above proposition we
we say “2/3 of validators”, we are referring to the deposit- will be working on the checkpoint tree. Given two finalized
weighted fraction. To exit the validator sets and collect his checkpoints xm and yn on two conflicting subchains, there
share a node must publish a withdraw message. After are two distinct chains of supermajority links from a
exiting, the node is forever forbidden to re-enter the sets. common starting checkpoint s (whether that is the Genesis
Validators can broadcast a vote message containing four Block or not) to xm and yn respectively:
pieces of information: two checkpoints of the same s → y0 → y1 → … → yn → yn+1
subchain3 s and t, together with their respective heights h(s) and
and h(t). Therefore a vote can be represented with a link s → x0 → x1 → … → xm → xm+1
from a source to a target checkpoint. Where, xm+1 and yn+1 are the children of xm and yn
respectively, since xm and yn are finalized (finalization
If at least 2/3 of the validators (by deposit) have published rule). The heights of all checkpoints xj , yi in the above
the same vote with source s and target t, then s → t is called chains should be different, otherwise rule I is violated.
a supermajority link. Without loss of generality we assume that h(xm) > h(yn),
A checkpoint c is called justified if (1) it is the root, or (2) hence that h(xm) > h(yn+1), since h(xj) ≠ h(yi) . Let k be
there exists a supermajority link c’ → c where checkpoint the lowest integer such that h(xk) > h(yn+1); then h(xk-1) <
c’ is justified. h(yn) (or h(xk-1) = h(yn), which again violates rule I). This
A checkpoint c is called finalized if (1) it is the root or (2) it implies the existence of a supermajority link xk-1 → xk ,
is justified, and there is a supermajority link c → c’ where where h(xk-1) < h(yn) < h(yn+1) h(xk), thus violating rule II.
c’ is a direct child of c. If two conflicting supermajority links l1 and l2 exist, we can
conclude that at least 1/3 of the validators violated the
slashing conditions, since at least 2/3 of the validators have
published l1 and at least 2/3 of the validators have published
l2.
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.3006383, IEEE Access
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.3006383, IEEE Access
selecting the heaviest sub tree rooted at each fork. By doing as a separate Web application of the same structure, to
so, developers aim to avoid scenarios where an attacker's ensure equality. For the purposes of this paper, we deployed
chain can grow longer without him having the majority of the P2P network on a single machine by having each node
network's computing power. An example of that is when run on a different port of Python's local development
larger blocks are created that take longer to propagate server. This implementation allowed us to uniquely identify
through the network, thus resulting in more forks to occur. each node by its port number so that it functions as the
In that case, the Greedy Heaviest-Observed Sub-Tree rule node's address.
proposes that those off the main chain blocks can still
contribute to it's validity. That idea of using graphs as a
way to optimize performance and security of distributed
ledgers was further examined in the cryptocurrency space.
A well-known blockchain protocol that introduces a unique
graph based model in blockchain is IOTA's tangle. This
protocol is entirely based on a DAG, which is used for
storing and verifying transactions by connecting them to
others, already confirmed. Nevertheless, this
implementation differs in many ways from the typical
blockchain structures since it doesn't use blocks to store
transactions, it combines the roles of transaction issuers and
transaction approvers and unlike most protocols it doesn't
include monetary rewards. Acknowledging the tremendous
benefits that graph solutions can provide in distributed
ledgers, we propose a model, that stores and connects
blockchain's digital entities in a Neo4j database.
Modeling blockchain as a graph database is not a novel
idea [43]. Several studies [44]-[46] have highlighted the
analytical value within the blockchain data and the
relationships between them that can be optimally exploited
through a high-fidelity blockchain graph model. In [44]
specifically, this was done, by parsing and deserializing the
Bitcoin raw binary data files into a suitable format for FIGURE 3. Transaction broadcasting sequence diagram.
importing into Neo4j. Then, they ran the annotated
graphthrough a graph-analysis framework that uses path- Communication between nodes is enabled through the
dependent Cypher queries to extract and summarize useful Flask-RESTful extension; each node stores its peers' ports-
statistics. This implementation paves the way for a addresses and can transmit messages to them by merely
blockchain analytics field that focuses on identifying and invoking the suitable API Resources with a supported
even predicting behaviors in both the nodes and their HTTPS method. Newcomer nodes query one or more IP
published messages. In our paper we extend the idea of a addresses hardcoded into their scripts that act like DNS
graph blockchain database by also incorporating the graph seeds, by storing and transmitting peers’ IP addresses. The
model into the core functions of blockchain and its procedure followed when a node broadcasts a transaction to
mechanics. Furthermore, we suggest a both flexible and the network can be visualized in the sequence diagram of
lean blockchain model that negates the need for a locking- Figure 3.
unlocking graph mechanism by being stored alongside the Lastly, every peer initializes and utilizes a Neo4j
traditional blockchain for a low memory overhead. This distributed graph database, in which blockchain data are
implantation intends to access data used by consensus stored and dynamically accessed.
protocols and block-proposal schemes at much greater
speeds than the traditional way, ultimately resulting in B. BLOCKCHAIN GRAPH MODEL
higher performance decentralized systems. Representing common blockchain data in a Graph Database
can be arranged in a forthright manner; a node can be
IV. SYSTEM OVERVIEW labeled either as Block or as Transaction, with dedicated
attributes in each case. Two consecutive blocks are linked
A. P2P NETWORK with a "CHILD_OF" relationship, while transactions are
Every decentralized application is supported by a P2P connected to the Blocks they belong to, with an
network [47] where members can interact with one another "INCLUDED_IN" relationship. Following this model, we
without the need for a trusted authority. In our model, we can depict any data broadcasted in the network, that is
simulate such a network by utilizing the Python Flask stored in a Block and has attributes, as a separate node or
Microframework. In particular, every node is implemented label in the Neo4j database.
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.3006383, IEEE Access
V. IMPLEMENTATION DETAILS
FIGURE 4. Blockchain database Graph Model.
A. BLOCK PROPOSAL MECHANISMS
However, the benefits of this implementation are not 1) PROOF OF WORK
limited to presenting data in an organized and In Proof-of-Work blockchains, nodes compete with each
comprehensible way. Neo4j offers the ability to quickly other to solve a cryptographic puzzle, like producing hashes
access stored information by utilizing graph theory with specific patterns. This procedure, known as mining, is
algorithms as well as to apply analytics [48] on blockchain implemented in our application and uses three main
data, by executing graph algorithms through complex components: a hash function, a random number generator,
Cypher queries. Nevertheless, not all of the information is and a winner verification method.
worth storing in the Neo4j graph database. The separation Every prospective miner first initiates a subprocess for
criteria are related to the usability of the stored data in mining the next block. The procedure begins by
blockchain functions and their analytical value. The constructing the new block for the miner’s selected chain as
efficiency of Neo4j is further optimized when the graph an object of class Block. For this block to be published, the
model expedites the retrieval of inaccessible native miner must first solve it by appending random numbers to
dataresulting in a higher performance system. its header and hashing the resulted string. When an SHA-
Hence, the graph database design is not absolute but 256 hashcode [52] with a specific amount of zeros is
rather is to be considered as a versatile tool, completely produced, the block is considered solved and is broadcasted
dedicated to its blockchain, containing the information of in the network. The number of zeros required is defined by
value and connecting them according to the needs of its the difficulty parameter, with which we can adjust the
mechanisms and protocols. One such example could be average block time.
calculating a user's balance, which would require finding all After receiving and verifying the new block, the other
transactions that he participates in by crawling each block miners terminate any ongoing mining processes and update
in the blockchain tree. A blockchain that values such metric their local data. Python multiprocessing library allows for
should store and connect users’ transactions in an optimal the mining process to be executed in parallel, without
way. Also, another practice might involve the handling of impeding the rest of the Flask Server operations.
smart contracts [49][50] perhaps in an e-shop application.
In that case a useful indicator could be the credibility score
of a user, calculated by retrieving the contracts they were 2) PROOF OF STAKE
involved in and by considering the credibility scores of the The Proof of Stake block proposal system, simulates the
other parties as well as the method of the contract’s mining process by using instead of computational effort,
resolution [51] (agreement, dispute, use of mediator etc.). as proof of the block constructors’ reliability. Here, users
Hence, the graph model of such an application should who possess a larger amount of tokens have a higher
include smart contract nodes, link them with the users that chance of being selected, as they have stronger incentives
participate in them and store a resolution method property. to protect the network and the value of the cryptocurrency.
Bearing in mind the above principles, we have designed In our implementation, a node can apply to become a
a graph model that facilitates the most common stakeholder by broadcasting an HTTPS message to the
blockchain’s operations as well as those of Casper’s network. Nodes of the same chain will store the received
consensus mechanism, while allowing for path-dependent application with the applicant's public key and stake. The
queries to be applied and information to be retrieved in a election process is triggered by sending a "start_pos"
resourceful manner. The design presented in Figure 4 takes HTTPS request to one of its nodes and is essentially a
advantage of blockchain’s highly interconnected data and stake-weighted random choice between applicants. If the
"start_pos" requests are sent to randomly selected nodes,
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.3006383, IEEE Access
eventually, the chain with the most online nodes will grow block A will have its validator sets modified as shown in
the longest. However, this affects neither the security nor Figure 5.
the finality of the consensus protocol, since here, unlike In this way, current validators authorize the next by
PoW, chain length is not a measure of difficulty. finalizing specific blocks. After a validator's withdrawal
Instead, we use the staking of each block to determine gets finalized, he will remain in the "current_rear" set until
which chain was the hardest to create and thus the hardest another checkpoint gets finalized. Further, by having both
to be reproduced. The problem that emerges here is that front and rear validators agree, for a checkpoint to get
stakeholder incentives run counter to this security measure, finalized, we ensure that all new validators vote in line with
since the optimal tactic for them is to bet on the blocks of their predecessors, achieving a continuous line of consensus
every branch. Thus staking-wise, the weaker blocks are throughout the blockchain. The final measure that we need
made indistinguishable from the stronger ones, and the to adopt is preventing the out of order checkpoint
Staking measure becomes unreliable. Casper aims to alter finalization. This step is crucial for Casper's safety since
these incentives by assigning consensus to the validators. otherwise, there can occur scenarios where conflicting
justification and finalization votes have been sent by
B. CASPER-LIKE CONSENSUS PROTOCOL disjointed validator sets,
1) DYNAMIC VALIDATOR SETS and therefore it is impossible to trace and punish the
To achieve finality in our Proof of Stake protocol, we offenders.
implemented a Casper-like consensus protocol, that can 2) VOTING
work over any block proposal scheme as well. This Given the changes in the operation of the validator sets and
mechanism should also allow the switching of validators the new security risks arising from them, we redefine a
without compromising blockchain's security. Our design supermajority link, a justified and a finalized checkpoint as
incorporates the key principles [53] laid down by Casper's follows:
creators into one straightforward simple implementation. An ordered pair of blocks (s, t), has a supermajority link,
Specifically, we suggest the use of two dynamic validator if both at least 2/3 of checkpoint's t "current_front"
sets responsible for voting on checkpoints; the validator set have published votes s → t and at least 2/3 of
"current_front" and the "current_rear" validator sets. In checkpoint's t "current_rear" validator set of t have
addition to these two, we utilize the "new_front" structure published votes s → t.
to store the next state of the current_front validator set, thus Given an ordered pair of checkpoints (s, t), t is
greatly simplifying the process. Every block stores and considered justified, if there is a supermajority link s → t.
handles these structures, recognizing in that way its Given a checkpoint s' and its direct child t', s' is
expected voters and the future state of all three validator considered finalized, if s' is justified, there is a
sets. A node can become or quit being a validator by supermajority link s' → t' and all votes justifying and
broadcasting a deposit or a withdraw HTTPS message, finalizing checkpoint s' are included in the subchain before
respectively, to the rest of the network. Every block inherits the creation of the next checkpoint
its parent’s structures, and subsequently, depending on the
deposits and withdrawals it includes, it adds and removes
As with the implementation of the validator sets, we use
nodes from its new_front set. Changes in the voting
three auxiliary data structures that are stored in each block
validator sets take effect after the finalization of a
and facilitate the voting process. In the front_votes and
checkpoint. So, if checkpoint C gets finalized, the next
rear_votes structures, we store the sent links with the
block B created in the same subchain, that is also a child of
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.3006383, IEEE Access
majority rates received, while in the "justified" list. we store cumbersome information to be optimally retrieved. In this
the justified checkpoints. section, we run the decentralized application and examine
A node broadcasts a vote s → t, via a "submit_vote" how the employed graph model enhances the performance
HTTPS message. For this vote to be valid, it must contain of the blockchain and its components. The Neo4j Desktop
the required information; "source_hash" , "source_height", application we are using, runs the Neo4j Browser version
"target_hash", "target_height", "public_key". When 4.0.1 and the Neo4j Server version 3.4.10, providing an
importing such a vote into a block b, a series of tasks are environment to visualize dataand work on our local Neo4j
performed by the miner/stakeholder: databases. Figure 6 shows the versatility of our graph
i) checking that blocks b and t belong to the same model, that can be traversed in multiple ways depending on
subchain; ii) calculating the sender's stake participation in the type of information requested. We present two instances
each of the validator sets of block t and confirms that at in which we recorded significant performance
least one of them is greater than 0%; iii) adding the augmentation by utilizing the Neo4j Graph Database;
calculated percentages to the total vote rates that the s → t calculating balances, tracing offending validators.
link has already received in the "front_votes" and The calculation of a user's balance in the blockchain tree
"rear_votes" sets; iv) including t in the "justified" list in the can be performed rapidly, by pinpointing his corresponding
case of s → t being a justification link, that has achieved a Neo4j node in the "Users" label and parsing his "To" and
supermajority of at least 66%. "From" relationships. For the same calculation to be done
These data structures depict the votes included in the along only one branch, we execute a directional paths
current block's ancestors and not in the whole blockchain finder algorithm from the final block b to the user node and
tree. Hence, each block entry can be uniquely described keep all "From" and "To" relationships included in those
only by the block's height. paths.Our design allows each block to connect only to its
Finally, when creating a new checkpoint, the miner will previous block through a backwards "CHILD_OF"
check its stored data in order to determine whether the relationship. Since each block can have only one parent
previous checkpoint in the chain got finalized. If the block, we can isolate a single chain from block b to the
finalization supermajority link exists and the previous Genesis Block with a path consisting of “CHILD_OF”
checkpoint also appears justified, we understand that all relationships.
votes required have arrived in time and are stored in blocks The following Cypher query returns the total number of
of the same subchain. In this case, the miner notifies peers tokens sent to "ADDRESS" with transactions that are
that the previous checkpoint is to be considered finalized included in the chain extending from block of hash
“HASH” to the Genesis block:
VI. EVALUATION MATCH (b:Blocks{hash: HASH }) - [:CHILD_OF*0..]
A. RUNNING THE SYSTEM -> () - [k] - (t:Transactions) - [:TO] ->
While blockchain owes many of its advantages in the way (u:Users{hex:'ADDRESS'})
that it organizes its data, there is scope for improvement on RETURN SUM(t.amount)
accessing them. A key point of our research was to suggest
a graph model, that abolishes the need for crawling each Unlike the classic blockchain structure, the graphical
block in the blockchain and allows for the otherwise model here allows for bidirectional traversing of entities in
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.3006383, IEEE Access
the requested path. We can evaluate and analyze any query traditional blockchain database in terms of performance,
through the Neo4j by looking at its execution plan: The scalability and queryability. However, these
PROFILE command runs the given statement while implementations follow a NoSQL approach meaning that
keeping track of how many rows pass through each they store sets of disconnected aggregates, that makes it
operator, and how much each operator needs to interact difficult to use them for connected data. The most common
with the storage layer to retrieve the necessary data. strategy for adding relationships to such stores is to embed
Profiling the above query reveals how the Neo4j Planner an aggregate’s identifier inside the field belonging to
takes advantage of blockchain's data model and optimizes another aggregate, effectively introducing foreign keys.
the match by taking the node-degree into account when But, this requires joining aggregates at the application level,
checking for the connection, starting on the smaller side which given blockchain's high interconnectivity quickly
while caching internally. Specifically, it avoids crawling becomes prohibitively expensive. We find that graph
each block and examining the hundreds of transactions databases optimally exploit the benefits of blockchain's
contained in them; instead it follows only the transactions unique architecture and create versatile data structures that
that the requested node participates in. can be traversed in real time.
Casper bases its effectiveness on its ability to identify The performance evaluation of our graph tool compared
and punish malicious validators. In our Casper-like to that of the document-oriented approach is in agreement
protocol, we incentivize the network's nodes to track and with the results of several studies [55][56] that suggest the
report those offenders by offering them financial rewards in overall superiority of graph databases regarding querying
the case of a successful slashing. Furthermore, all evidence time of the connected information. Specifically, in Figure 7
for a rule violation can be discovered and recovered by any we present our findings regarding the average query
node, as all the sent votes are stored publicly in the execution time for both our private blockchain application
blockchain. that follows a document-oriented database architecture and
The incorporation of Neo4j into our application and the our Neo4j blockchain database in logarithmic scale. The
complex Cypher queries it allows, further facilitates the query used in this case was a simple balance calculation for
detection of such offending votes in the blockchain. a specific user. To have a fair comparison, memory
Specifically, we can request all distinct pairs of conflicting consumption in Neo4j should not exceed 13GB of RAM,
votes sent by the same validator with two simple Cypher which is what an Ethereum full-node uses. To achieve we
queries: set both heap and page cache to 4GB each, assuring that
when combined with the extra memory that JVM needs to
MATCH (v1:Vote), (v2:Vote) function correctly, Neo4j’s process memory consumption
WHERE v1.r_from = v2.r_from will not grow beyond the desired levels.
AND v1.target_height = v2.target_height
AND ID(v1) < ID(v2)
RETURN v1.r_from, v1, v2
Returns all distinct pairs of votes; v1, v2, sent by the same
validator with targets at the same height.
The above queries apply to all published votes in the FIGURE 7. Average query execution time for Neo4j and Blockchain
blockchain tree and not in a specific branch. The process of databases.
tracking offenders is greatly simplified since sent votes are
indexed with the "Votes" label and serial block access is no While that the graph model and the appropriate Cypher
longer required., queries can simplify the procedures performed by the
If measured by traditional DB criteria, traditional protocols and mechanisms that function in the blockchain,
blockchain, seems poor: throughput is only a few the space complexity of our implementation, should also be
transactions per second, capacity is a few GB and most considered. We can calculate the Neo4j stored records'
importantly it has essentially no querying abilities, thus sizes as follows: 15 B for Nodes, 34 B for Relationships, 41
making it unsuitable for applying statistics on its data. B for Properties for nodes and relationships. In our graph
Several efforts [54] have been made to improve the model, we suggest that a Transaction consists of the
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.3006383, IEEE Access
following contents: the Transaction Node, 3 Relationships The simulated results are gathered with the aid of a python
(FROM, TO, IN) and 2 Properties (amount, hash). Hence a script that executes Cypher queries to the Neo4j database,
published Transaction occupies 1 × 15 + 3 × 34 + where the native blockchain data are stored. Thus, we once
2 × 41 = 199 B of disk space. Comparing this number to more showcase the benefit of the blockchain graph model
the average Transaction size in Bitcoin which is 600 B, we in quickly pinpointing and extracting high-value analytics
understand that storing and utilizing our graph model regarding, in this case, the evaluation the blockchain’s
alongside the traditional blockchain data would mean a mechanisms as well as the behavior of its participants.
33% increase of the total blockchain size.
For the computation, memory configuration depends on
how much virtual memory will the JVM for Neo4j use at
runtime. Thus, the more the memory allocated matches the
size of the database, the less swapping of data will occur at
runtime, ultimately resulting in higher performance. Storing
the 258 GB of Bitcoin’s data [57] in our graph model would
require almost 258 × 0.33 = 86 GB of memory. However,
the memory used by the Neo4j instance is the collection of
data requested by the client. For path-dependent queries, a
precise calculation of that can be complicated since it may
involve duplicate nodes and relationships. On the contrary,
we can make a reasonable estimation of the memory
required for the two simpler queries used to spot offending
validators. In that case, we request all N pairs of votes that
consist of one node and four properties. Hence, the memory
required for this query to be optimally executed is 𝑁 × 𝑁 ×
(15 + 4 × 41) = 179𝑁 2 B. To further reduce the space
complexity of our implementation and improve its
effectiveness we are exploring additional graph models and FIGURE 7. Consensus with Fixed and Fluctuating Inactivity Leak.
tools that will entirely base their operation on them.
1) CATASTROPHIC CRASHES
B. PREVENTING ATTACKS If more than 1/3 of the validators are disconnected from the
In this section we test the robustness of the mechanisms network due to computer failures, network partitioning, or
developed against the most renowned blockchain attacks. In risky behavior, it is virtually impossible to finalize new
the case of Casper, security against several types of attacks blocks. In this case, the Inactivity Leak can help the
is provided by its nature. Casper can tolerate 1/3 of the blockchain recover, by gradually decreasing the stake of the
validators being malicious in achieving finality; any percent offline validators and thus weakening their voting
larger than that can stop the network from finalizing any power. Inactivity Leak's value can either be fixed or
new checkpoints. On the contrary, its security is only fluctuating and the money deducted from the inactive
compromised when the dishonest validators achieve validators can either be erased or returned to them
supermajority in both validator sets; thus being able to fully sometime after they get back online. In our study, we are
control the finalization of new checkpoints. Sybil attacks focusing on optimizing the role of the Inactivity Leak in
are prevented as Casper operates in a Proof of Stake Casper's security. In other terms, penalties should be
manner; the size of a validator's deposit determines his adjusted, so that the network can effectively and quickly
voting power. To further reduce the impact of multiple- overcome a catastrophic crash, while voting remains
address users, Casper requires a large number of tokens ultimately profitable for validators with short absences.
being deposited, to become a validator. We examine the efficacy of a fixed and a fluctuating
While we saw how the Neo4j can assist in tracking Inactivity Leak in achieving consensus, after a Catastrophic
Casper’s offending validators, safety under other types of Crash occurs, at which 50% of current validators
attacks is to be examined through a series of simulated disconnect. To simulate this, we initialized 1000 validators
experiments on our application. The vulnerability of PoS of which 500 were only online and able to cast a vote. The
Casper systems in 51% attacks, and the optimization of consensus rates on justification and finalization votes can
parameters for a consensus protocol resistant in catastrophic be stored in the checkpoint nodes as a separate property,
crashes will be the main focal points of this section. while not significantly affecting the overall space
The simulation process is enabled through a batch script complexity of our Neo4j implementation, since checkpoints
that initializes nodes and performs the basic functions of are sparsely distributed throughout the blockchain tree. In
miners and validators. The above process also provides for both cases of Inactivity Leaks, penalties are initially set to
the existence of side-chains that can be created in pseudo- -1% and take place per 10 checkpoints. Should consensus
random manner; that is an adjustable parameter that not be reached during that period, non-fixed Inactivity Leak
determines the possibility of a fork to occur on each block. decreases by 5% until at least one block gets finalized. On
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.3006383, IEEE Access
FIGURE 8. Consensus with Fixed and Fluctuating Inactivity Leaks for subchains of varying strengths
Figure 7 , we can observe the change in the percentage of into account different probabilities of the main chain being
active validators, justified and finalized checkpoints for a voted.
stable and a fluctuating (Figure 7) Inactivity Leak. The results displayed in Figure 8 for both types of
If we now consider the existence of additional candidate Inactivity Leak suggest that the fluctuating inactivity leak
main chains in the network, we can assume that the diminishes the influence that the relative strengths and the
probability of a checkpoint being voted is proportional to its number of subchains have on the speed of reaching
subchain's relative strength. The relative strength of a consensus, since in every case the first finalized block
checkpoint or a subchain derives from the criteria that appear around epoch 9. In the instance of the fixed
validators use when voting. Thus, in a PoS blockchain with inactivity penalty consensus is highly dependent on the
honest validators, the relative strength of X could be strength of the candidate blocks, delaying the first
interpreted as the total amount of tokens staked on X checkpoint finalization as long as 30 epochs in some cases.
compared with that on checkpoints at the same height. However, the acceleration of consensus that a Fluctuating
Now, we simulate the previous voting process, while taking Leak offers associates additional risk with the role of
validators, as it shrinks the time window within which more than 3% of the total tokens are shown. The two
validators can recover a crash without suffering extensive distributions appear almost identical after 1,000,000 blocks,
losses on their deposits. which means that the model followed maintains any
financial differences between the nodes of the network
2) 51% ATTACKS without expanding them percentage-wise.
The 51% attack refers to a blockchain attack performed by Like many other BFT protocols, Casper uses 1/3 as the
a group of miners that control more than 50% of the maximum number of faults it can tolerate. Given n total
network's mining or computing power and could potentially nodes, of which there are f byzantine nodes, we need at
control new transactions' confirmation to double-spend least t nodes to agree to reach consensus. Assuming that the
coins. Here we examine whether such an attack could be n-f nodes are split into two equally sized groups of (n - f)
feasible in our Proof of Stake-Casper model and whether /2, we want to make sure that the influence of the byzantine
the inherent features of these mechanisms favor the Voting nodes that may act arbitrarily isn't enough to achieve
Power centralization amongst stakeholders and validators, consensus. Hence t > (n - f) / 2 + f, ensuring that the two
respectively. groups cannot decide different things and result in a safety
Regarding PoS, Voting Power centralization can be failure. For liveness, we make sure that the n - f nodes can
checked using the PoS scheme we have implemented in our come to a consensus, without the cooperation of the f
network. According to this, stakeholders have a chance of byzantine nodes. Thus (n - f ) ≥ t. By combining the two
being selected proportionally to their share of capital. The constraints, we get n/3 > f as the fault tolerance threshold of
winning node will be rewarded with a fixed amount of Casper.
cryptocurrencies. In our simulation we initialized 1000 In this way, to adequately control Casper’s finalization
nodes with the voting power distribution being similar to process, the attackers would have to acquire at least 67% of
that of real PoS networks. Then, we initiate the stakeholder the total deposits in both validator sets. Still, the 34% in just
election process for 1,000,000 simulated blocks and check one validator set would be enough to block the network
again for potential wealth accumulation. To calculate the from finalizing any new checkpoints. Another fundamental
total stake of each stakeholder in each case, we take parameter is that Casper's structure should be such, that it
advantage of the Cypher queries presented in Section A, does not amplify economic differences between validators,
which greatly simplify the process. The initial distribution jeopardizing the sets' decentralized character. Other than
of hashrate as well as the distribution after the election being the backbone of Casper’s operation the adopted
process is presented in Figure 9 (a) in a detailed and reward-punishment system also dictates the motives in
simplified form where only PoS stakeholders that possess consensus groups and hence the possibility of centralization
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.3006383, IEEE Access
of power in them. To underline the significance of rewards, exponentially. Moreover, those powerful nodes would have
we tested two different approaches in our Casper-like a stronger influence on the rest of the validators who would
protocol: a system that encourages consensus and another be incentivized to follow the majority to reach consensus
that encourages participation. and earn the bonus.
The results in each case were gathered with the aid of The danger of voting power centralization, lurking in
Cypher queries, for quickly pinpointing each validator’s consensus-enforcing reward systems leads us to the
votes through the “Vote” label and retrieving their voting participation valued approach of the original Casper. This
percentages in each set from the node’s properties. While can be achieved with a deposit-equivalent reward given to
the initial distribution of PoS voting power depicted those who have cast at least N votes during a predetermined
previously can be set similar to that of the real PoS period with targets of height > h, where h is the height of
cryptocurrencies, the novelty of Casper's ideas imposes the highest justified checkpoint, so that the broadcasted
several restrictions when selecting appropriate data inputs. votes are relevant with the checkpoint finalization process.
However, the fact that Ethereum requests 1500 ETH as the By doing so, powerful validators have no advantages over
minimum validator's deposit, a demand that only 5000 the rest of the set as long as everyone participates in the
addresses can fulfill at the moment, in combination with the voting process. This rewarding system is deceptively more
risk associated with being a validator allows for a good similar to that of the PoS than the previous one as long as
estimation of the front and rear sets sizes. By extension, the Casper is responsible for the finalization, the only
distribution of voting power can be directed by that of the difference being that in PoS non-participating nodes face 0
stake distribution of the top 5000 Ethereum addresses. projected profits instead of Casper’s negative penalties.
Finally, to resolve dead-ends, validator's minimum deposit
is made expensive, while gradually lowering participation
rewards when no consensus is reached for several periods.
With this amendment, a consensus-blocking attack would
be costly and less profitable for the attackers than
participating honestly in the voting process.
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.3006383, IEEE Access
penalties and rewards can prevent hashrate centralization [11] S. Kim, Y. Kwon, and S. Cho, ‘‘A survey of scalability solutions on
blockchain,’’ in Proc. Int. Conf. Inf. Commun. Technol. Converg.
scenarios
(ICTC), Oct. 2018, pp. 1204–1207.
Our simplified model allows for a lean and versitile [12] J. Kwon, ‘‘Tendermint: Consensus without mining,’’ Tech. Rep.,
blockchain implementation that exploits the benefits of May 2014.
graph databases over the SQL approaches in both storing [13] A. Kosba, A. Miller, E. Shi, Z. Wen, and C. Papamanthou, ‘‘Hawk:
The blockchain model of cryptography and privacy-preserving smart
and accessing the blockchain interconnected data. At the
contracts,’’ in Proc. IEEE Symp. Secur. Privacy (SP), May 2016, pp.
same time blockchain data analysis is enabled through 839–858.
graph analytics and social network analysis numerous graph [14] I. Eyal, A. E. Gencer, E. G. Sirer, and R. van Renesse, ‘‘Bitcoin-NG:
representations of the stored data to accurately evaluate the A scalable blockchain protocol,’’ in Proc. 13th USENIX Symp.
Netw. Syst. Design Implement. (NSDI), 2016, pp. 45–59.
operation of the involved mechanisms as well as the
[15] M. Swan, Blockchain: Blueprint for a New Economy. Newton, MA,
behaviours of network's agents. The promising results of USA: O’Reilly Media, 2015.
our research provide a clear direction for studying and [16] S. King and S. Nadal, ‘‘Ppcoin: Peer-to-peer crypto-currency with
developing other memory efficient graph models that proof of-stake,’’ Tech. Rep., Aug. 2012.
[17] C. Buragohain, D. Agrawal, and S. Suri, “A game theoretic
optimally exploit the benefits of this technology at both
framework for incentives in P2P systems,” Proceedings Third
operating and analytical level. However, it is left as future International Conference on Peer-to-Peer Computing (P2P2003),
research to implement and deploy them in real blockchain Oct. 2003.
applications, which is always the final measure of [18] V. Buterin, “On Stake,” Ethereum Blog, Jul-2014. [Online].
Available:
evaluation.
https://blog.ethereum.org/2014/07/05/stake/?source=post_page.
Concurrently, other innovations are continually being [19] G. Wood, ‘‘Ethereum: A secure decentralised generalised transaction
developed in the blockchain field, which may become the ledger,’’ Ethereum Project Yellow Paper, vol. 151, pp. 1–32, Apr.
solutions to blockchain's most critical challenges. Most 2014.
[20] V. Buterin and V. Griffith “Casper the Friendly Finality Gadget,”
notable is the "Lightning Network" [23] payment protocol,
2017, arXiv:1710.09437. [Online]. Available: https://arxiv.org/abs/
which overlays an existing blockchain and tackles 1710.09437
cryptocurrencies scaling problems. The incorporation of a [21] M. Castro and B. Liskov, ‘‘Practical Byzantine fault tolerance,’’ in
graphic model into the operation of such a micropayment Proc. OSDI, vol. 99, 1999, pp. 173–186.
[22] V. Buterin, D. Reijsbergen, S. Leonardos, and G. Piliouras,
system that will monitor fraudulent transactions amongst all
“Incentives in Ethereum’s Hybrid Casper Protocol,” 2019 IEEE
channels may eliminate the need for outsourcing trust to International Conference on Blockchain and Cryptocurrency (ICBC),
'watchtower' nodes and, thus, expand its limitations. 2019
It is left for future research as well to examine how the [23] J. Poon and T. Dryja, “The Bitcoin Lightning Network.” [Online].
Available: http://lightning.network/lightning-network-paper.pdf.
proposed reference implementation can be applied in upper
[24] “IBM Research: Behind the Architecture of Hyperledger Fabric,”
level transaction consolidation frameworks such as the IBM Research Blog, 08-Feb-2019. [Online]. Available:
Lightning Network and explore whether it can enhance the https://www.ibm.com/blogs/research/2018/02/architecture-
security features and the analytical methods for path- hyperledger-fabric/.
[25] M. Samaniego R. Deters "Blockchain as a service for IoT" 2016
dependent queries and relationships analytics of
IEEE International Conference on Internet of Things (iThings) pp.
transactions in the blocks. 433-436 2016.
[26] “Neo4j Database,” Neo4j Graph Database Platform. [Online].
REFERENCES Available: https://neo4j.com/neo4j-graph-database/?ref=home-
[1] S. Haber and W. Stornetta, “How to time-stamp a digital document,” banner/.
Journal of Cryptology, vol. 3, no. 2, 1991. [27] “Neo4j Desktop User Interface Guide,” Neo4j Graph Database
[2] K. F. Buford, H. H. Yu, and E. K. Lua, P2P networking and Platform. [Online]. Available: https://neo4j.com/developer/neo4j-
applications. Amsterdam: Elsevier/Morgan Kaufmann, 2009. desktop/.
[3] G. Hileman and M. Rauchs, “2017 Global Cryptocurrency [28] NKB Group, “Ethereum releases Casper v0.1: A short description for
Benchmarking Study,” SSRN Electronic Journal, 2017. validators,” Medium, 15-May-2018. [Online]. Available:
[4] J. Hendler "Web 3.0 Emerging" Computer vol. 42 no. 1 pp. 111-113 https://medium.com/@theNKBGroup/ethereum-releases-casper-v0-
2009. 1-a-short-description-for-validators-3e0a7676d286
[5] N. Chowdhury, “Consensus Mechanisms of Blockchain,” Inside [29] V. Buterin, “Immediate message-driven GHOST as FFG fork choice
Blockchain, Bitcoin, and Cryptocurrencies, pp. 49–60, 2019. rule,” Ethereum Research, 14-Jul-2018. [Online]. Available:
[6] S. Nakamoto, ‘‘Bitcoin: A peer-to-peer electronic cash system,’’ https://ethresear.ch/t/immediate-message-driven-ghost-as-ffg-branch-
Tech. Rep., 2008. choice-rule/2561.
[7] W. Wang, D. T. Hoang, P. Hu, Z. Xiong, D. Niyato, P. Wangm, Y. [30] I. Robinson, J. Webber, and E. Eifrem, Graph Databases: New
Wen and D. I. Kim, "A survey on consensus mechanisms and mining Opportunities for Connected Data . Sebastopol, CA: OReilly &
strategy management in blockchain networks," IEEE Access vol. 7 Associates, 2015.
pp. 22328-22370 2018. [31] O. Panzarino, Learning Cypher. Birmingham, United Kingdom:
[8] D. Tapscott and A. Tapscott, Blockchain revolution: how the Packt Publishing, 2014.
technology behind Bitcoin is changing money, business and the [32] “What is SAP HANA? An unrivaled data platform for the digital
world. UK: Portfolio Penguin, 2018. age,” SAP. [Online]. Available:
[9] K. Croman, C. Decker, I. Eyal, A. E. Gencer, A. Juels, A. Kosba, A. https://www.sap.com/products/hana.html?infl=32095c59-c617-45d7-
Miller, P. Saxena, E. Shi, E. G. Sirer, D. Song, and R. Wattenhofer, a13d-8af08c419145.
“On Scaling Decentralized Blockchains,” Financial Cryptography [33] “RedisGraph,” Redis Labs. [Online]. Available:
and Data Security Lecture Notes in Computer Science, pp. 106–125, https://redislabs.com/redis-enterprise/redis-graph/.
2016 [34] Neo4j, “openCypher,” openCypher · openCypher. [Online].
[10] “Bitcoin Energy Consumption Index,” Digiconomist. [Online]. Available: http://www.opencypher.org/.
Available: https://digiconomist.net/bitcoin-energy-consumption.
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.3006383, IEEE Access
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.