BOSSA: A Decentralized System For Proofs of Data Retrievability and Replication

Download as pdf or txt
Download as pdf or txt
You are on page 1of 13

786 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 32, NO.

4, APRIL 2021

BOSSA: A Decentralized System for Proofs of


Data Retrievability and Replication
Dian Chen, Haobo Yuan , Shengshan Hu ,
Qian Wang , Senior Member, IEEE, and Cong Wang , Senior Member, IEEE

Abstract—Proofs of retrievability and proofs of replication are two cryptographic tools that enable a remote server to prove that the
users’ data has been correctly stored. Nevertheless, the literature either requires the users themselves to perform expensive
verification jobs, or relies on a “fully trustworthy” third party auditor (TPA) to execute the public verification. In addition, none of existing
solutions consider the underlying incentive issues behind a rational server who is motivated to collect users’ data but tries to evade the
replication checking in order to save storage resources. In this article, we propose the first decentralized system for proofs of data
retrievability and replication—BOSSA, which is incentive-compatible for each party and realizes automated auditing atop off-the-shelf
blockchain platforms. We deal with issues such as proof enforcements to catch malicious behaviors, new metrics to measure the
contributions, and reward distributions to create a fair reciprocal environment. BOSSA also incorporates privacy-enhancing techniques
to prevent decentralized peers (including blockchain nodes) from inferring private information about the outsourced data. Security
analysis is presented in the context of integrity, privacy, and reliability. We implement a prototype based on BOSSA leveraging the smart
contracts of Ethereum blockchain. Our extensive experimental evaluations demonstrate the practicality of our proposal.

Index Terms—Proofs of retrievability, proofs of replication, decentralized system, blockchain

1 INTRODUCTION other clouds [4] have already implied the fact that S cannot
be fully trusted.
S a new computing paradigm, cloud computing pro-
A vides a convenient approach for ordinary users to enjoy
powerful computing and storage resources. Users can out-
Proofs of retrievability [5], [6], [7], [8], [9], [10], [11], [12]
and proofs of replication [13], [14], [15], [16], [17], [18], [19]
are two typical cryptographic methods allowing the server
source their data to a cloud service provider to get rid of
to prove that the original files as well as all the replicas are
complex local data management. Despite promising pros-
correctly stored. In order to avoid keeping users on-line and
pects, outsourcing data to a remote server S also raises
free them from expensive auditing tasks, the state-of-the-art
severe security concerns since it deprives the physical con-
solutions for proofs of retrievability and replication heavily
trol of the data owner. One of the challenging problems is
rely on a semi-trusted third party auditor (TPA) to execute
ensuring the correctness of data. A malicious server may
public verifications. However, such methods suffer from
discard the data that is rarely accessed for the purpose of
the following two main drawbacks:
saving resources, or cover up data loss accident for main-
taining reputation. In addition, S usually claims that the  Collusion Attack. The literature usually assumes that
data will be stored together with several replicas for ensur- no collusion happens between TPA and the server S.
ing high reliability [1], it takes limited liability in their Ser- This strong assumption may be easily corrupted in
vice Level Agreements (SLAs) [2], especially for the data practice driven by certain interests, and S can easily
loss. And recent accidents happened in Tencent Cloud [3], create fake proofs that pass the verifications once it
colludes with TPA. Furthermore, from the users’
perspective, it is hard to tell whether or not collusion
 Dian Chen and Qian Wang are with the Key Laboratory of Aerospace happened.
Information Security and Trusted Computing, Ministry of Education, the  Corruption Attack. Dependence on an “always-online”
School of Cyber Science and Engineering, Wuhan University, Wuhan, TPA is vulnerable to single point of failures caused
Hubei 430072, China, and also with the State Key Laboratory of Cryptol-
ogy, Beijing 100878, China. E-mail: {dianchen, qianwang}@whu.edu.cn. by unpredictable accidents, e.g., regional power out-
 Haobo Yuan is with the School of Computer Science, Wuhan University, ages, or malicious hacking attacks, like DDoS.
Wuhan, Hubei 430072, China. E-mail: yuanhaobo@whu.edu.cn. We owe these vulnerabilities to the great control power of
 Shengshan Hu is with the School of Cyber Science and Engineering, Huaz-
TPA which plays a centralized role. Therefore, constructing
hong University of Science and Technology, Wuhan, Hubei 430074,
China. E-mail: hushengshan@hust.edu.cn. a decentralized framework for auditing can fundamentally
 Cong Wang is with the Department of Computer Science, City University solve the above problems. The blockchain has been widely
of Hong Kong, Hong Kong. E-mail: congwang@cityu.edu.hk. adopted for replacing the third party in the fields like e-vot-
Manuscript received 2 Feb. 2020; revised 16 Aug. 2020; accepted 5 Oct. 2020. ing [20], auction [21] and fair exchange [22] due to its non-
Date of publication 12 Oct. 2020; date of current version 10 Nov. 2020. repudiation and non-tampering properties. Replacing the
(Corresponding author: Qian Wang.)
Recommended for acceptance by R. Tolosana. TPA with the blockchain makes the whole auditing process
Digital Object Identifier no. 10.1109/TPDS.2020.3030063 trackable by users. As a result, the users are able to verify
1045-9219 ß 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See ht_tps://www.ieee.org/publications/rights/index.html for more information.

Authorized licensed use limited to: Siksha O Anusandhan University. Downloaded on February 02,2021 at 05:40:08 UTC from IEEE Xplore. Restrictions apply.
CHEN ET AL.: BOSSA: A DECENTRALIZED SYSTEM FOR PROOFS OF DATA RETRIEVABILITY AND REPLICATION 787

each step of proofs retrievability and replication. Moreover, contracts, the intermediate results, etc. For an auditing system
the decentralized nature of the blockchain improves the naively built atop the blockchain, the open nature of the block-
tolerance of single point failures. In summary, the blockchain chain enables the adversary to record proofs and recover the
is a promising solution to defend against collusion attack and original data. To prevent such privacy leakage, we incorpo-
corruption attack. rate privacy-enhancing techniques for the generation of
In addition, for the data replica service that aims to guar- proofs while ensuring their correctness.
antee the data reliability, the cloud may intentionally delete
replicas for saving storage resources without concerns of 1.2 Contribution
being caught. Inspired by the idea of decentralized storage In summary, we make the following contributions:
network (DSN) that tries to build a peer-to-peer network
allowing peers buying and selling idle disk space, decen-  We propose a novel decentralized framework
tralizing data replication makes it much more difficult for BOSSA for proofs of data retrievability and replica-
the attackers to misbehave, as each step for the data trans- tion. Our design addresses security risks and caters
mission, storing and deleting is recorded and trackable. to a more real-world scenario, which has not been
considered so far in existing works.
1.1 Design Challenges and Our Solutions  To guarantee the reliability of BOSSA, we propose a
In this work, we propose the Blockchain based OutSourcing time-restricted proof forcing S and farmers to prove
Storage and Auditing (BOSSA) scheme, a brand-new gen- data availability, and a reward mechanism based on
eral framework for proofs of retrievability and replication. the new metric contribution rate to create a fair recip-
BOSSA separately stores the original data on the server and rocal environment. Privacy-enhancing techniques
the corresponding replicas on peers (also called farmers in are also incorporated to protect users’ private data.
BOSSA) of a decentralized network atop blockchain, and  We provide the security analysis for BOSSA with
integrities of them will be checked respectively by carefully- respect to integrity, privacy, and reliability. A proto-
designed smart contracts. BOSSA can naturally defend type of BOSSA is implemented on Ethereum, and
against the collusion attack and the corruption attack, while evaluated. The experimental evaluations show that
provides visibility-enabled data replication. BOSSA doesn’t our proposal is feasible and incurs tolerable over-
heavily rely on a specific consensus model like [23], which heads for each party.
gives the opportunity that BOSSA can be built on off-the-
shelf blockchain systems supporting Turing-complete smart 1.3 Related Work
contract [24], [25]. To design BOSSA, there are several chal- Proofs of Retrievability/Data Possession. Both Proof of Data
lenges we have to cope with. Possession (PDP) and Proofs of Retrievability (PoR) aim to
First of all, simply combining existing proof of retriev- ensure that the outsourced data is stored on S correctly.
ability schemes with the blockchain fails to achieve our Juels and Kaliski proposed the first PoR scheme [9] and Ate-
desired goals since the blockchain only supports simple niese et al. [5] constructed the first PDP scheme. However,
functionality. The most critical problem is that the block- their schemes do not provide public verification, which
chain cannot actively issue challenges and reacts only based requires users keeping on-line during the auditing process.
on received transactions. This property allows the adver- Guan et al. [26] leveraged the indistinguishability obfusca-
sary to escape from auditing. To prevent such lazy behav- tion to realize public verification, but their scheme has poor
iors, we make use of the feature that blocks are appended to performance due to the usage of the indistinguishability
the chain in an approximately-fixed rate, and propose a obfuscation. Shacham and Waters [10] then constructed a
time-restricted proof forcing both the cloud and farmers to public verification scheme based on BLS signature [27], but
prove data availability. their scheme may cause privacy-leakage during the audit-
Second, our scheme organizes peers, i.e., farmers, to store ing [12]. Armknecht [8], Xu [7] and Yang [6] proposed low-
the replicas of users’ data. However, farmers are not fully cost public PoR schemes by modifying the private verifica-
trusted and may leave the network arbitrarily, which causes tion scheme of [10], however, the TPA in their scheme holds
replicas loss, damaging the reliability of replicas. To address the secret key for auditing, which is not suitable to public
this problem, we design an incentive mechanism to moti- blockchain platforms.
vate peers to store and provide replicas when needed. Spe- Proofs of Replication. Different from PDP/PoR, Proofs of
cifically, we define a reward mechanism where farmers are Replication focuses on ensuring that the replicas of original
periodically rewarded if they can provide valid proofs of data are correctly stored on S as it claimed. Most proofs of
replicas on a regular basis. Meanwhile, part of the reward is replication schemes are based on indistinguishable encryp-
temporarily frozen until farmers’ promised storage services tion/encoding [14], [16], [28] or noticeable replica genera-
expired, which discourages farmers from leaving the net- tion time [13], [18], [19], [29]. MR-PDP proposed in [14]
work. To further motivate farmers to share replicas, we requires users to encrypt replicas and upload them to S,
define a metric called contribution rate, and connect farmers’ and lets users audit the corresponding replicas. Hao et al.
frozen reward with the contribution rate, such that only [28] leveraged similar methods, and their scheme supports
farmers with 100 percent contribution rate can retrieve all public verification. Damgard et al. [16] generalized the
the frozen rewards. indistinguishable encryption/encoding based model which
Third, most of blockchain platforms like Ethereum store utilizes a trapdoor function without which S cannot gener-
transactions in plaintext, any participant is able to retrieve the ate correct replicas on-the-fly. However, these schemes
information of transactions, including the inputs of smart bring heavy bandwidth and computation costs to users, as

Authorized licensed use limited to: Siksha O Anusandhan University. Downloaded on February 02,2021 at 05:40:08 UTC from IEEE Xplore. Restrictions apply.
788 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 32, NO. 4, APRIL 2021

the user encrypts/encodes and uploads replicas. Other sol- A block consists of transactions’ data and a header which
utions let S generate replicas, but the time cost of the generally contains information for consensus and security: a
replicas generation is noticeable, like RSA-based puzzle Timestamp, a hash of predecessor’s block, a digest of transac-
combining linear feedback shift registers [13], chained tions (i.e. MerkleRoot), and a Nonce used for solving consen-
sequential encryption [17], and butterfly construction [15]. sus puzzles. Blocks are organized by the hash pointing to
But these schemes rely on an assumption that the replicas the predecessor block, forming a hierarchically expanding
generation is time-costly and cannot be parallelized. In chain. Cryptographic mechanisms, like hashing, signature,
BOSSA, we propose a new paradigm in which a decentral- guarantee that the blockchain is non-repudiation and non-
ized storage network atop blockchain is used to store repli- tampering, and consensus models, e.g., proof of work, proof
cas. With the openness of the blockchain, S’s action (i.e., of stake, motivate majority participants to maintain only one
data replication) is publicly recorded, which is visible to the correct chain.
users. BOSSA neither brings heavy costs to the user, nor Bitcoin, the initial blockchain implementation, provides
relies on the time-assumption. In addition, this paradigm restricted customizable scripts, which limits it to a
also benefits S as its storage is reduced (most of clouds’ “cryptocurrency”, instead of a general computational plat-
basic storage services promise to store several replicas in form. The idea is later brought by the following implemen-
the same region). tations such as Ethereum [24], [41], HyperLedger [25], [42].
Decentralized Storage. The main purpose of decentralized Ethereum, the first cryptocurrency supporting general com-
storage schemes is to reuse idle storage resources of personal puting, provides a virtual machine (EVM) and Turing-
computers (PC) by motivating PC users to share their unused complete opcodes (EVM Bytes). The computation results of
disk space. Existing decentralized storage network schemes, smart contracts inherit the nature of trust from the block-
like Storj [30], Sia [31], utilize Merkle tree-based data auditing chain, hence Ethereum is widely applied in fields like e-vot-
scheme whose proof size is related to the size of data, which ing [20], auction [21], fair exchange [22], etc.
causes high communication cost for auditing large data. IPFS
[32] plans to offer an open global p2p storage network, how- 2.2 Proofs of Retrievability
ever, it lacks an incentive mechanism such that the storage A Proofs of Retrievability (PoR) scheme typically contains
resources may be provided by few nodes. In addition, it also three roles including a user, a remote server (also plays as a
fails to provide an auditing mechanism to check the integrity prover) and a verifier (i.e., the TPA in public verification or
of data stored by nodes. Filecoin [23] is built atop IPFS and the user itself in the private verification scenario). To prove
provides the aforementioned missing features. Except for that data is retrievable, a challenge-response protocol is exe-
incentive mechanism, Filecoin leverages a computationally cuted between the prover and the verifier, and the data
expensive proofs of replica scheme [17] to ensure the correct retrievability can be guaranteed with a high probability if
data processing and builds a consensus mechanism. Similar the prover presents a valid response (proof). A PoR scheme
idea is also presented in [33], [34], [35]. Compared to Filecoin, [10] can be defined as follows:
BOSSA is a more general framework which is compatible
with other blockchain platforms like Ethereum. Definition 1. A PoR scheme consists of the following four
algorithms:
1.4 Organization  Setupð1 Þ ! ðsk; pkÞ: This probabilistic algorithm is
The rest of the paper is organized as follows. Section 2 run by the user. It takes the security parameter as input
presents some backgrounds. In Section 3 we define the and outputs key pair ðpk; skÞ for the scheme setup.
model of our scheme, threat model and security goals,  StoreðD; skÞ ! ðD0 ; tagÞ: The user runs this algo-
before we introduce key techniques of our scheme in Sec- rithm to encode data and generates metadata for audit-
tion 4. And concrete construction of our scheme is described ing. This algorithm takes as inputs key sk and user’s
in Section 5. We analyze our scheme in Section 6, evaluate data D, and outputs tags tag and encoded data D0 .
the performance of our scheme in Section 7. We conclude  ProveðD0 ; tag; pk; chalÞ ! f: This algorithm is run
the paper in Section 8. by the prover to prove data integrity. It takes as input
the verifier’s challenge chal, outsourced data D0 , the
2 BACKGROUND corresponding tags tag and public key pk, and returns
2.1 Blockchain and Smart Contract a proof f.
Blockchain, a cryptographic primitive derived from Bitcoin  Verifyðf; chal; pk½; skÞ ! ð>; ?Þ: This algorithm is
proposed by Nakamoto Satoshi [36] in 2008, has been exten- run by the verifier. It takes as input the proof f, public
sively studied and adopted in both academies [37], [38], [39] key pk (sk is needed in private verification), and deter-
and industries [40]. Blockchain can be viewed as a distrib- mines whether the prover stores data honestly.
uted block-wise ledger allowing any participants to access
and modify (appending only). Generally speaking, it can be 3 PROBLEM FORMULATION
generalized as the following formulation: 3.1 System Model
The system model of BOSSA is depicted in Fig. 1. Our sys-
Blockn ¼Headern k Transactions; tem involves the following entities: the cloud server S, the
Headern ¼HashðBlockn1 Þ k Nonce k cloud user U, and a decentralized network including nodes
with two kinds of roles: miners M and farmers F . M repre-
Timestamp k MerkleRoot: sents the nodes who maintain the blockchain and execute

Authorized licensed use limited to: Siksha O Anusandhan University. Downloaded on February 02,2021 at 05:40:08 UTC from IEEE Xplore. Restrictions apply.
CHEN ET AL.: BOSSA: A DECENTRALIZED SYSTEM FOR PROOFS OF DATA RETRIEVABILITY AND REPLICATION 789

can be regarded as a trusted and public entity. Concretely,


most of the computational power of the blockchain is held
by honest miners who will execute smart contracts faithfully
[46]. Any node in the blockchain network can learn internal
states and messages sent to the blockchain.
The farmers from the decentralized network, rent out
their idle disk space to store U’s data replicas. We assume
that there are massive farmers, the number of which is
enough to store all the users’ replicas. The farmers may
leave the network irresponsibly if they lose the interest of
storing replicas. Therefore, it is highly necessary to guaran-
tee the availability of replicas, i.e., motivating the farmers to
act honestly.
Fig. 1. System model.

smart contracts. F represents the nodes joining in the decen- 3.3 Security Goals
tralized storage network for renting out their idle storage Storage Correctness. This property guarantees both outsourced
resources. original data and replica are retrievable. We formalize this
In BOSSA, S is willing to collect the original data from U property by adopting extraction algorithm in [10].
for accomplishing data analytics tasks such as prediction or
recommendation [43], [44], [45]. Similar to existing works Definition 3. We say that the scheme guarantees storage correct-
[7], [12], the data outsourced to S is kept in plaintext in ness if the prover cannot forge a proof make the verifier accept,
order to maximize their potential value (e.g., data mining). and there is an extraction algorithm such that for an
To save resources and take full advantage of idle storage "-admissible prover (S or F ) who can provide " fraction of a
resources on F , S further outsources replicas to F . At the valid proof, an efficient extraction algorithm will recover data
same time, U delegates M to audit the original data on S with overwhelming probability.
and replicas on F , respectively. Privacy Preservation. This property guarantees the nodes
The core functionalities of BOSSA are defined below. from the decentralized network (including M and F ) can-
Definition 2. The algorithms for proofs of retrievability and not infer any private information about the data.
proofs of replication in BOSSA are defined as follows: Definition 4. We say the scheme achieves privacy preservation
 Setupð1 Þ ! ðskU ; pkU ; skS ; pkS Þ: Given the security if: 1) for a probabilistic polynomial time (PPT) adversary who
parameter , this probabilistic algorithm outputs pri- acts as the verifier, namely M, it is computationally hard to
vate and public key pairs for U and S. extract private information from the prover’s proof; 2) the repli-
cas stored in the decentralized network do not leak any private
 U:StoreðD; skU Þ ! ðD ; auxÞ: This probabilistic
information about the original data.
algorithm takes the original data D and U’s private key
skU as input, and outputs D and the auxiliary infor- Replica Reliability. This property guarantees that the origi-
mation aux. nal data can be recovered from replica blocks stored by F .
 S:SealðD ; ekÞ ! R: This algorithm takes data D
received from the user and an encryption key ek as
input, and outputs a replica R. 4 KEY TECHNIQUES
 S:DistributeðR; skS Þ ! fRi ; aux0i gi2½0;KÞ : This algo-
rithm, taking as input the replica R and S’s secret key 4.1 Building Blocks and Notations
skS , outputs K replica blocks Ri with auxiliary infor- 4.1.1 Building Blocks
mation aux0i . Definition 5. For a q-ary alphabet Sq , a ðn; k; dÞ erasure code
 S:ProveðD ; auxÞ ! f: This algorithm takes as scheme consists of two algorithms Encode : Sk ! Sn and
input the data D stored by S, the auxiliary informa- Decode : Sndþ1 ! Sk . Typically, we say a code is a maxi-
tion aux, then outputs the proof f in a privacy-pre- mum distance separable (MDS) code, if d ¼ n  k þ 1.
serving manner.
 F :ProveðRi ; aux0i Þ ! f0i : F calculates and outputs Definition 6. Let e : G1  G2 ! GT be a bilinear map where G1 ,
the proof f0i based on replica Ri and aux0i . G2 and GT are three multiplicative cyclic groups of prime order
 VerifyX ðpkX ; fÞ ! f>; ?g: This algorithm verifies p. It has following properties:
S’s or F ’s proof f and outputs either > (TRUE) or ?  Given u 2 G1 , v 2 G2 and x; y 2 Zp , we have
(FALSE). eðux ; vy Þ ¼ eðu; vÞxy .
 e is non-degenerate, namely, eðg1 ; g2 Þ 6¼ 1 where g1 , g2
are generators of G1 and G2 respectively.
3.2 Threat Model
 There is an efficient algorithm to compute eðu; vÞ for
Following previous works [13], [14] we consider a rational
any u 2 G1 , v 2 G2 .
S, who behaves correctly in most of the time unless misbe-
having brings more benefits, e.g., deleting data blocks sel- In addition, we define the following collision-resist hash
dom accessed for resource-saving. The blockchain network functions:

Authorized licensed use limited to: Siksha O Anusandhan University. Downloaded on February 02,2021 at 05:40:08 UTC from IEEE Xplore. Restrictions apply.
790 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 32, NO. 4, APRIL 2021

TABLE 1
Notations

Notation Description
L Distributed ledger of the blockchain Fig. 2. Illustration of time-restricted proof. The prover must present the
X:Addr Address of party X in blockchain proof after Itr ¼ n blocks. As long as the prover’s proof is accepted in yel-
name An unique identifier for outsourced original file low blocks (i.e., bnþ1 ; bnþ2 as r ¼ 2), it can be regarded as providing a
namerep An unique identifier for outsourced replica proof on time.
services Services F has accepted
Bstart The block number when F starts its storage services privacy-preserving data auditing of Wang et al. [12].
Bend The block number when F ’s services are ended
Meanwhile, we have also made some changes that make
Bsel The number of block chosen by S and F to derive
challenges our design more suitable for the blockchain environment.
Bnow The number of latest block First, different from these schemes where the challenges
Blast The block of the previous block that chosen as challenge are generated by the auditor, we require that the provers
by S and F (i.e., S and F ) generate challenges by themselves since
D Deposit secured on the blockchain
p The price that U is going to give to F or S
there are no active auditors in our scheme. To ensure an
r A radius used for relaxing strictness of the proof unbiased challenge, we leverage the latest block of the
Itr A time interval before both S and F present their next proof blockchain as a publicly traceable and uncontrollable
randomness source. In order to prevent lazy behaviors
of provers, we design a time-restricted proof mechanism
to force them to prove. Second, directly offloading repli-
h1 : f0; 1g ! G1 , it maps arbitrary string to an ele- cas to the decentralized network may cause privacy leak-
ment of the group G1 . age. Moreover, if some nodes storing a portion of
 h2 : GT ! Zp , it maps elements of GT to finite set Zp . replicas leave the network, it could result in the useless-
 h3 : f0; 1g ! f0; 1g , it maps arbitrary string to con- ness of the whole replicas. Hence, we propose an algo-
stant-size bits, where  is the security parameter. rithm S:Seal to encode replica to be outsourced and
Finally, we define fKGen; Enc; Decg a symmetric encryp- then encrypt the private data with the aid of the sym-
tion algorithm with chose-plaintext-attack (CPA) security [47]. metric encryption algorithm. Last but not least, we
extend the structure of the authenticator inside each rep-
4.1.2 Other Notations lica block. Specifically, we add an extra identifier of the
replica and the indices of the replica block into authenti-
We summarize notations used in this paper in Table 1. For
cators, to prevent F from pretending to be storing all the
ease of expression, we utilize L to represent the distributed
data which is not at all. More details can be found in
ledger in the blockchain system. L is stored and maintained
Section 5.2.
by M. We index an entity in the blockchain by X:Addr, e.g.,
U:Addr. In the following discussion, Bnow represents the lat-
4.3 Time-Restricted Proof
est block number of the blockchain.
The dictionary L:Roster stores ðD; Bstart ; Bend ; servicesÞ We define the time-restricted proof as follows.
and is indexed by farmer’s address F :Addr. The attribute D Definition 7. We use Blast to denote the number of the block
is defined as the deposit paid by F . The list services records from which the prover generates challenges for calculating the
replicas stored by F . Bstart represents the time when F last valid proof, and Bsel denotes the block number of the block
starts its storage services, and Bend represents the time picked for the new proof. Let Itr be a time interval represented
when the services expire. by the block numbers and a radius r represent the maximum
The dictionary L:Audit is indexed by name, the unique gap between Bsel and Bnow . Thus, during the verification proce-
identifier of U’s file (see Section 5.2). The attributes of dure, we say the prover provides proof on time only if the fol-
L:Audit consist of the tuple ðBlast ; D; pkU ; Itr; p; rÞ. For a file lowing equation holds:
name, the attribute D in L:AuditðnameÞ denotes the deposit
paid by U who outsources the file name. pkU is the public
key of U. The rest parameters will be explained later in our Blast þ Itr ¼ Bsel  Bnow  Blast þ Itr þ r:
construction.
L:Replica is a dictionary taking namerep as the key. It con-
We illustrate the time-restricted proof mechanism in
sists of the tuple ðD; pkS ; Itr; r; p; LÞ, where D represents the
Fig. 2. As M cannot actively audit the prover (S and F ), we
deposit provided by U, and pkS is the public key of S. The
demand that the prover always generates challenges from
list L consists of ðtotalAsk; totalReply; D; sigi ; Blast Þ and is
the latest block (and set Bsel ¼ Bnow ) and provides the proof
indexed by F :Addr. The attribute D in L represents the
to M periodically (after Itr blocks). The radius r is set for tol-
deposit paid by the farmer when it stores replicas. The other
erating the processing delay of M.
parameters will be explained in our construction.
4.4 Contribution Rate
4.2 Design Innovations Considering the case that some farmers may be not willing
To prove data retrievability and possession of replicas, to contribute replica blocks when the data is needed, we
BOSSA incorporates several auditing techniques—com- define a new metric, contribution rate, to measure F ’s contri-
pact proofs of retrievability of Shacham et al. [10] and bution, which also directly affects F ’s final reward.

Authorized licensed use limited to: Siksha O Anusandhan University. Downloaded on February 02,2021 at 05:40:08 UTC from IEEE Xplore. Restrictions apply.
CHEN ET AL.: BOSSA: A DECENTRALIZED SYSTEM FOR PROOFS OF DATA RETRIEVABILITY AND REPLICATION 791

Definition 8. Let totalAsk be the amount of replica requests dur-


ing F ’s services, and let totalReply be the amount of F ’s
responses. Then, the contribution rate cr is calculated as follows:

 totalReply
totalAsk > 0
cr ¼ totalAsk ; cr 2 ½0; 1:
1 totalAsk ¼ 0

We give an example for a better explanation. When S


gives replica blocks to F i , both totalAsk and totalReply are
set to 0. In the case that S needs replica blocks stored by F i ,
the counter totalAsk of F i is incremented by 1, and only
when F i provides the stored data to S, F i ’s counter
totalReply is incremented by 1. Later, when F i terminates Fig. 3. A toy example of S:Seal. We give an example about sealing 22-
the storage service, the monetary reward it gets is its deposit block data with ðn1 ¼ 8; k1 ¼ 6Þ and ðn2 ¼ 6; k2 ¼ 4Þ. The light grey
blocks represent original data with two zero-padding blocks. The dotted
multiplied by the contribution rate, and the rest of the box in the upper right corner circles out blocks newly generated in the
deposit is refunded to U. first phase (line 3) of the Algorithm 1. Blocks generated in the second
phase (line 9) are circled in the dotted box below (it may have a little dis-
crepancies with Algorithm 1 since they are put together). For simplifica-
4.5 Reward Mechanism tion, we omit the representation of encryption. Each column of data later
S will be rewarded for proving storage services. Suppose U is outsourced to different farmers.
and S have come to an agreement about a price p, each time, S
honestly proves the integrity of data, U will pay p to S. 5 OUR CONSTRUCTION
Likewise, we adopt the same reward mechanism for F
5.1 Overview
with an extra modification. Considering the following sce-
nario, after providing several proofs, F ’s reward has far Before diving into the details of BOSSA, we give an over-
exceeded its total deposit, then it can quit the network arbi- view of our idea. Recall the model specified in Section 3,
trarily without worrying about the financial loss. Therefore, BOSSA provides a new paradigm for data integrity auditing
we set a factor  2 ð0; 1Þ to prevent this behavior. After a and data replication, which uses the blockchain network to
valid proof of F , M withholds ð1  Þ  p of monetary periodically audit the server and store data replicas. Specifi-
reward instead of giving all of them to F . The frozen part of cally, for ensuring the server to store data correctly, BOSSA
monetary reward is refunded only when F ’s service adopts a conventional PoR scheme but replaces the TPA
expires. A proper selection of  (e.g.,  ¼ 0:5) can make F with the blockchain. As the blockchain only receives transac-
more sticky to the network without damaging its long-term tions passively and cannot issue challenges to the server, the
benefits. time-restricted proof is used to force the server to provide
proofs actively. Besides, another important component of
BOSSA is storing replicas among the peers in the blockchain
Algorithm 1. S:Seal
network, which are called farmers. Naively sending replicas
Input: D :¼ fm0 ; m1 ; . . . ; mjDj1 g, encryption key ek and to farmers raises concerns about data privacy, reliability, and
ðn1 ; k1 Þ, ðn2 ; k2 Þ for encoding retrievability. In BOSSA, the server encodes and encrypts
Output: sealed replica R replicas for guaranteeing privacy and reliability, and the
1: reshape D into a ðdjDj k1 e; k1 Þ matrix and pad the last row farmers are spurred to prove they maintain the intactness of
with 0 if the width of the last row is less than k1 replicas. Besides, the farmers are rewarded for providing
2: for i ¼ 1; . . . ; djDjk1 e do   valid proofs (i.e. storing replicas honestly), and their rewards
3: fm0i;j gj¼0;...;n1 1 ¼ Encek Encodeðn1 ;k1 Þ ðm ~ i Þ , and m
~i ¼ are also determined by whether they could provide replicas
fmi;j gj¼0;...;k1 1 when needed, which is measured by contribution rate. Farm-
4: end for ers, who have 100 percent contribution rate, can fully retrieve
5: for j ¼ 0; . . . ; k1  1 do their reward.
6: denote the jth column of the encoded data as D0j :¼ In Section 5.2, algorithms of PoR are presented. In
fm00;j ; . . . ; m0jD0 j1;j g where jD0j j ¼ djDj
k1 e
j
jD0 j
Section 5.3, we will walk through the details of BOSSA
7: reshape D0j into a ðd k2j e; k2 Þ matrix and pad with zero including data outsourcing, data replication, proving and
jD0 j
8: for i ¼ 0; . . . ; d k2j e  1 do verification, replicas retrieving, and rewarding.
9: m~r i;j ¼ Encodeðn2 ;k2 Þ ðfm0ik þl;j gl¼0;...;k 1 Þ and m
~r i;j :¼
2 2
r r r
ðmin2 ;j ; min2 þ1;j ; . . . ; mðiþ1Þn2 ;j Þ
10: end for 5.2 Algorithm Specification
11: end for We give the specification of auditing algorithms. For the
12: R ¼ fmri;j gi¼0;...;djDjn2 e1;j¼0;...;n 1 data auditing, we leverage some cryptographic tools in [10],
k1 k2 1
[12], with modifications stated in Section 4.2.

Authorized licensed use limited to: Siksha O Anusandhan University. Downloaded on February 02,2021 at 05:40:08 UTC from IEEE Xplore. Restrictions apply.
792 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 32, NO. 4, APRIL 2021

Fig. 5. State diagram of F . Texts in italic font style represent active


actions or events of F , while texts in sansserif font style are messages
from smart contract (i.e. M).
Fig. 4. State diagram of smart contract instance on M.
name k i k sÞ. Finally, S distributes K replica blocks Ri ¼
Setup. In Setup, U first generates a key pair fmri;j gj2½0;sÞ together with aux0i ¼ ðfs  i;j gj2½0;sÞ ; Vi Þ to ith
ðspkU ; sskU Þ for signing. Then, it randomly picks the fol- farmer, where i 2 ½0; KÞ.
R R R
lowing elements: xU Zp , gU G1 , g G1 , and calcu- Proofs Generation for the Original Data. S runs S:Prove to
lates yU ¼ g2 based on a generator g2 of G2 . For U, the
xU prove that the original data is stored correctly. In this algo-
public key is pkU ¼ ðspkU ; yU ; g2 ; gU ; eð gU ; yU Þ; g Þ, and the rithm, the privacy of the original data is well protected.
secret key is skU ¼ ðxU ; sskU Þ. S takes the block Bnow as the randomness source, and
R
Besides, U also generates key pair which will be sent to S then calculates challenges P fði; ni Þgi2I ; nQ
i Zp ; I ½0; nÞ.
during replica outsourcing. U picks a pair of signing keys Then S calculates m0 ¼ i2I ni mi and s ¼ i2I s i ni 2 G1 . To
R
ðsskS ; spkS Þ, a random element xS Zp , a random element blind m0 and s, S first chooses three random elements rs ,
R
gS G1 and computes yS ¼ g2 , where g2 is a generator of
xS
rm , r from Zp and calculates T ¼ eðg ; g2 Þrs  eð gU ; yU Þrm 2
G2 . The public key of S is pkS ¼ ðspkS ; yS ; g2 ; gS Þ and the GT . For hiding m , S computes m ¼ rm þ gm , here g ¼
0 0

secret key is skS ¼ ðxS ; sskS Þ. h2 ðT Þ 2 Zp . Then, S calculates S ¼ s  g r to hide s, and cal-
Outsourcing Original Data. In U:Store algorithm, U pre- culates & ¼ rs þ gr to hide r. Finally, S sends f ¼
processes its data D 2 f0; 1g before outsourcing it to S. ðv; &; m; S; T; Bnow Þ to M.
First, U encodes D with the erasure codes [48] and gets D . Proofs Generation for Replicas. The farmers prove that they
In this algorithm, U’s data D is interpreted as n blocks, i.e., store replicas honestly through F :Prove. Here, We assume
D ¼ ðm0 ; m1 ; . . . ; mn1 Þ. To identify the data, U randomly that F i stores the ith block of replica. Similar to S:Prove, F i
R
chooses an element name from Zp as the identifier. For each generates challenges fðj; yj Þgj2J , vj Zp , J ½0; sÞ, from
data block mi , U computes a unique authenticator s i ¼ the newest block Bnow . Then F calculates an aggregated
Q i
ðh1 ðWi Þ  gU mi ÞxU 2 G1 where Wi ¼ name k i. Furthermore, authenticators s i ¼ P j2J ðs i;j Þ 2 G1 and an aggregated
 nj

U generates a signature of the identifier name: v ¼ v0 k sampled sectors mi ¼ j2J nj mri;j . After that, F sends f0i ¼
SSigsskU ðv0 Þ where v0 ¼ name k n. Then D and aux ¼ ðVi ; mi ; s i ; Bnow Þ to M.
ðfs i g0i < n ; vÞ are sent to S. Proofs Verification. After receiving proof from S, M exe-
Sealing Replicas. To ensure reliability and protect privacy cutes VerifyS algorithm. First, M verifies the validity of v0
of the replica, we propose the algorithm S:Seal. A detailed in v based on the signature SSigsskU ðv0 Þ. If the verification
algorithm is presented in Algorithm 1, and a toy example is fails, M returns ?, otherwise, recovers the total block num-
depicted in Fig. 3. In this algorithm, data is first encoded ber n and name from v0 . Once the validation passes, M
using the erasure code and encrypted with the symmetric computes g ¼ h2 ðT Þ and utilizes the block number of block-
encryption algorithm (i.e. line 3 of the Algorithm 1). The chain provided by S to compute the same challenges
encrypted data is divided into pieces and will be stored by fði; ni Þgi2I . Then M checks the verification equation:
different farmers. Before being sent to the farmer, each piece !g !
of the replica is encoded again (see line 9 of the Algorithm g ?
Y
1). The first encoding of S:Seal guarantees data can be T  eðS ; g2 Þ ¼ e h1 ðWi Þ ni
gU ; yU  eðg ; g2 Þ& :
 m

i2I
retrieved even small part of farmers are unaccessible. The
second encoding ensures each piece of the replica is also (1)
If the above equation holds, then M outputs >, otherwise
loss tolerance, and also reduces the verification cost (see returns ?.
Section 6.1). We leverage Reed-Solomon [48] code as the era- Verifying F i ’s proof is simpler than that of S. First, M
sure code due to its property of being a MSD code. parses Vi as namerep k name k i k s k sig. If sig is an invalid
Replicas Distribution. In this algorithm, S calculates signature from S for ðnamerep k name k i k sÞ, then M aborts
authenticators for replicas. At first, the replica R is divided and returns ?, otherwise, M calculates the same challenge
into K blocks, and each block contains s sectors, i.e., R ¼ fði; nj Þgj2J like F i , and verifies the following equation:
fmri;j g0i < K;0j < s . For each sector, S generates an authen- !
ticator s  i;j 0
ðh1 ðWi;j
mr
Þ  gS i;j ÞxS 2 G1 . In each authentica-    ? Y  nj
mj
0
0 0 e s i ; g2 ¼ e h1 Wi;j  gS ; yS : (2)
tor, we use h1 ðWi;j Þ, where Wi;j ¼ namerep k name k i k j, to
j2J
represent mri;j sector. Besides, for each replica block i, an
extra signature Vi is generated by computing Vi ¼ If the above equation holds, then the verification result is >,
namerep k name k i k s k sig, where sig ¼ SSigsskP ðnamerep k otherwise ?.

Authorized licensed use limited to: Siksha O Anusandhan University. Downloaded on February 02,2021 at 05:40:08 UTC from IEEE Xplore. Restrictions apply.
CHEN ET AL.: BOSSA: A DECENTRALIZED SYSTEM FOR PROOFS OF DATA RETRIEVABILITY AND REPLICATION 793

Fig. 7. F and S proving for reward.

interval Itrs , price ps , radius rs . U initializes a smart contract


instance on M with these parameters when it uploads the
data to S. The details can be found in Fig. 6.
Outsourcing Replicas. The process for outsourcing replicas is
detailed in Fig. 6. Note that for ease of presentation, we assume
that S is required to generate one replica of the original data.
A malicious S may control several dummy farmers and
pretend to store replicas honestly, and generate proof on-the-
fly during auditing. To prevent such a sybil attack from S, M
selects farmers randomly. Concretely, M calculates the hash
of farmers’ addresses concatenated with the latest block, and
checks whether the hashed value is less than a threshold th.
This guarantees that the selection of farmers is randomized.
Fig. 6. Data outsourcing. Farmers can previously calculate the hash to check whether it
can be selected before it sends the message to M. Besides,
5.3 Decentralized Network Construction when F wants to store replicas, M checks Bstart of F , namely
Based on the auditing algorithms in Section 5.2, in this part, Bstart þ D < Bnow . This prevents a malicious S from generat-
we give detailed construction of BOSSA. In BOSSA, each ing dummy farmers on-the-fly. It is clear that only when a
party can be viewed as a state machine, and we illustrate malicious S pre-generates numerous dummy farmers, can S
the state diagram of F and M in Figs. 4 and 5, respectively. performs sybil attacks, and it requires a large amount of
Farmer Enrollment. The farmer who wants to join the net- deposit.
work should register before providing storage services. Spe- Proving for Reward. Both S and F prove data integrity for
cifically, the farmer F submits a capability Bend representing earning rewards through the time-restricted proof, and they
when its services expire. To prevent F from leaving the net- are rewarded based on the reward mechanism proposed in
work arbitrarily, it must put a deposit into the network, Section 4.5. If the proof is valid, M sets Blast to Bsel , then
which is recorded by D in L:RosterðF :AddrÞ. The list rewards the provers and updates U’s deposit. The detailed
services is set to be empty during F ’s enrollment. At the process is depicted in Fig. 7. Note that M should check that
same time, M sets Bstart to be Bnow . F ’s service has not expired to prevent F from cheating.
Outsourcing Original Data. We assume that both U and S Fetching Replica. When S tries to reconstruct the original
have reached an agreement with the following parameters: data from replicas, it sends message rep-fetch to M together

Authorized licensed use limited to: Siksha O Anusandhan University. Downloaded on February 02,2021 at 05:40:08 UTC from IEEE Xplore. Restrictions apply.
794 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 32, NO. 4, APRIL 2021

Fig. 9. Data-loss detection probability versus challenge size. We give an


illustration about challenge sizes for achieving various loss-detection
probabilities under different degrees of t.

deposit during the enrollment, and the last one allows it to


obtain the deposit for storing replica blocks (which is
refunded based on the contribution rate). Besides, a new
farmer must be found for restoring its replicas. Only after
transferring all the replicas to the new farmer, can it gets
back the frozen deposit. Details can be found in Fig. 8.

6 SYSTEM ANALYSIS
6.1 Choice of Challenge Size
The challenges against both S and F are based on the probabi-
listic framework [5], through which the workload of the provers
can be noticeably reduced. This is because, during the auditing,
the provers randomly sample a subset of stored data, as
required by the verifier, instead of looking through all of the
data. Meanwhile, with the aid of unbiased-random challenges
(usually picked by the verifier), the cheater will be caught with
an overwhelming probability, if the data loss is beyond a bound.
Supposing that the prover tampers with n  t; t 2 ð0; 1Þ
blocks out of n data blocks in total, and he is asked to sample c
blocks. We let X denote a discrete random variable representing
Fig. 8. F leaves the network. the number of deleted blocks chosen during the proof, and we
define PX as the corresponding probability of X. Thus we have
with namerep . Upon receiving the message from S,
M increases the totalAsk of farmers by 1 who are PX ¼ P fX 1g
in L:Replicaðnamerep Þ:L, and broadcasts a message ¼ 1  P fX ¼ 0g
rep- fetch-request for acknowledging farmers. A rational nnt n1nt ncþ1nt
farmer will share its local replica block Ri with S for maxi- ¼1  ...
n n1 ncþ1
mizing his profits (see Section 6.2.3). When S receives c
1  ð1  tÞ :
enough replica blocks, it can reconstruct the original data
and validate replicas through the comparison with its previ- Clearly, if we fix the number of t, PX is independent of the
ously-stored hashes. For those farmers who contribute their number of data blocks. For example, when t ¼ 1%, PX can
local replica blocks, S will notify M to increase totalReply reach 95 percent as long as the number of sampled blocks is
with message rep-fetch-finalize. And M broadcasts not less than 300 (see Fig. 9).
rep-fetch-confirmed for notifying farmers. The workloads of proving and verification are related to the
Farmer Logout. When a farmer wants to leave the network number of challenges, namely the size of c. However, since
and takes back all its deposit (including the frozen part of both original data and replica are encoded with the erasure
the reward), the following conditions must be met: 1) it is code, we can sacrifice storage space in exchange for efficient
not storing replicas i.e., L:RosterðF :AddrÞ:services ¼ ;; 2) verification and proving. Recall that a ðn2 ; k2 Þ encoding is
the storage service it promised should expire, i.e., Bend  applied on each column of replica which is individually stored
Bnow ; 3) it proves correctly each time i.e., 0  Bend  Blast < by farmers, and the redundancy rate is n2 =k2 . If we raise the
Itr. The first two conditions allow the farmer to take back its redundant rate, we could take a less rigorous challenge with a

Authorized licensed use limited to: Siksha O Anusandhan University. Downloaded on February 02,2021 at 05:40:08 UTC from IEEE Xplore. Restrictions apply.
CHEN ET AL.: BOSSA: A DECENTRALIZED SYSTEM FOR PROOFS OF DATA RETRIEVABILITY AND REPLICATION 795

smaller challenge size. For example, if we take a 2 redun- 6.2.2 Privacy Preservation
dancy-rate encoding, we can aggressively reduce challenge Theorem 2. In the random oracle model, if Enc is CPA-secure
size to 7, as the data loss of 50 percent can be detected with a and Diffie-Hellman problem is hard, BOSSA achieves privacy
99 percent probability (as depicted in Fig. 9) while the whole preservation defined in Definition 4.
replica can still be recovered.
Proof (sketch). The replicas of original data are encrypted
6.2 Security Analysis through Enc before being distributed, hence participants
6.2.1 Storage Correctness of the decentralized network cannot learn private informa-
tion as long as they don’t have the encryption key, and the
Definition 9. The Computational Diffie-Hellman assumption is auditing procedure also leaks no private information. So,
define as follows. Given a tuple ðg; gx ; gy Þ 2 G, where G is a
the point is that there should be no privacy leakage during
multiplicative cyclic group in order of p, g is a generator of G,
R auditing S.
and x; y Zp , then the probability that a PPT adversary A
To prove privacy preservation holds for S, we show a
outputs gxy is no more than a negligible probability  simulator without knowing m0 and s can provide a valid
Pr½Aðg; gx ; gy Þ ¼ gxy  < : proof to M. We assume it controls the random oracle h2
and answers queries from M. To provide a valid proof to
Theorem 1. In the random oracle model, if CDH assumption M, the simulator randomly picks g and m0 2 Zp and s 2
holds in bilinear groups, then BOSSA guarantees the storage G1 , and calculates m, & and S. Then, the simulator lets
correctness.
!g !
Proof (sketch). First, we prove that an honest S can always Y ni eðg ; g2 Þ&
m
T ¼e h1 ðWi Þ 
gU ; yU  ;
pass the verification. To show that, for the Equation (1),
i2I
eðSg ; g2 Þ
we have
and fills the random oracle h2 ðT Þ with g. The simulator’s
LHE ¼ T  eðSg ; g2 Þ proof is ðT; &; m; SÞ. It is clear that the simulator’s proof is
¼ eðg ; g2 Þrd eðgU ; yU Þrm  valid if the simulator answers g when M queries h2 ðT Þ.
! The simulator knows nothing about m0 and s, it means
Y x g
e ðh1 ðWi Þni 
gU mi ni Þ U grg
 ; g2 the proof leaks no information about S. u
t
i2I
!
Y
¼ eðg ; g2 Þ rs þrg
e h1 ð W i Þ ni g

gU rm þgm0
; yU 6.2.3 Replica Reliability
i2I
!g !
According the analysis in Section 6.2.1, as long as F provides a
&
Y ni
valid proof, it can convince the verifier that the data is stored
¼ eðg ; g2 Þ e h1 ðWi Þ gmU ; yU
 ¼ RHE: properly. According to our reward distribution mechanism,
i2I
rewards from the user will be given to F . Since partial rewards
Thus, this property ensures the soundness of BOSSA, are temporarily frozen in the deposit of F , the more F proves,
and guarantees the interests of S. Then, we prove that for the more deposit it will accumulate. When it wants to get the
an "-admissible S who can correctly provide "-fraction deposit back, the deposit is refunded based on the contribution
proofs, original data can be extracted from it. rate cr, which is determined by whether F contributes its local
We assume the verifier controls a random oracle h2 replicas when there is a request for replicas from S. Hence, a
and it answers hash queries from S. Upon receiving the rational F who is eager to maximize its profits will actively
proof from S, the verifier rewinds to the point when S contribute its local replica blocks.
queries h2 ðT Þ, and sends a different g  ¼ h2 ðT Þ to S. Hav- Finally, we remark that as replicas are encoded by era-
ing two different proofs ðm; S; &Þ and ðm ; S; &  Þ, the veri- sure codes, when F goes offline accidentally during the rep-
fier extracts m0 and s by calculating lica fetching, BOSSA can still guarantee the reliability of the
replicas, as long as the geographical locations of the farmers
m  m ðrm þ gm0 Þ  ðrm þ g  m0 Þ are sufficiently dispersed.
m0 ¼ ¼ ;
g  g g  g
&  &  ðrm þ grÞ  ðrm þ g  rÞ 7 IMPLEMENTATION AND EVALUATION
r¼ ¼ ;
g  g g  g 7.1 Implementation Setup
S All the parties in our prototype are instantiated on a local
s ¼ r: Ubuntu 18.04 LTS with an Intel Core i5-8400 CPU, 12GB of
g
RAM, and a 7200 RPM Seagate 1 TB hard disk. We adopt
The pair ðm0 ; sÞ is the proof for verification in [10], and Ethereum as the underlying blockchain system in our
based on Theorem 4.2 in [10], if S forges both m0 and s, it implementation. The smart contracts are written using
cannot pass the verification, except with a negligible Solidity and deployed to Ganache-cli (a local Ethereum sim-
probability. Then, based on Theorem 4.3 in [10], if S gives ulation) and Ropsten (an official testnet). Other parties are
at least " valid proofs to the verifier, the data can be implemented with JavaScript combined with C/C++. We
recovered from the proofs through an extraction utilize Keccak256 as the hash function and 128-bit-of-secu-
algorithm. rity Barreto-Naehrig curve as the ellipse curve. We use ate-
As for an "-admissible F , the storage correctness is pairing [49] for bilinear paring since it is based on the ellip-
achieved based on theorems in [10]. u
t tic curve suitable for Ethereum. And we use Jerasure, an

Authorized licensed use limited to: Siksha O Anusandhan University. Downloaded on February 02,2021 at 05:40:08 UTC from IEEE Xplore. Restrictions apply.
796 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 32, NO. 4, APRIL 2021

Fig. 10. Time and gas costs of BOSSA. (a) Time costs for pre-processing original data using multiple threads versus data size. (b) Time costs for
pre-processing one replica using multiple threads versus data size. (c) The time costs of S’s proof generation under different challenge sizes with
and without I/O. (d) The probabilities of failing to detect data corruption and the proof verification gas costs versus the number of sampled blocks.
(e) Gas costs of M’ actions when replica is stored by K ¼ 10 farmers (O.D. stands for Original Data). (f)The gas cost of replica fetching and transfer-
ring versus the number of farmers storing replica.

instance of Reed-Solomon coding, for implementing erasure negligible costs. The generation of authenticators of 100 M
codes. To evaluate economic costs, we take the exchange size costs about 164 s on U, and 330 s for a replica of 200 M (as
rate of 3  109 Ether per gas (namely 3 Gwei per gas), and we encode original data into double size) on S.
150 USD per Ether (December, 2019). Proof Generation and On-Chain Verification. The overheads of
proof generation on S and F are only related to the size of the
7.2 Circumventing Gas Limitation of Ethereum challenge. We test the overheads of sampling 7 sectors of a rep-
lica block, and sampling both 300 and 480 (for about 99 percent
To prevent malicious users from launching DDoS attacks by
loss detection probability) blocks of original data as discussed
sending infinite loops to miners, Ethereum uses gas to measure
in Section 6.1. The results, shown in Table 2, demonstrate our
overheads for running smart contracts, and sets a limitation
proving process is efficient, and it only costs tens of millisec-
upon it. To circumvent the gas limitation, the implementation of
onds. In Fig. 10c, we evaluate the I/O effect on the proof gener-
our prototype is slightly different from our design. In BOSSA,
ation on S. We can see that the data size greatly affects the time
the main gas costs are caused by the multiplications and hashes
cost. This is due to the fact that the time cost of proof generation
related to the size of challenges. In our prototype, we split those
is mainly dominated by the I/O cost (e.g., reading data and
procedures into several sub-functions whose gas costs are below
authenticators from hard drivers).
the gas bound. For example, if it is required to perform 300 times
For the on-chain verification, the gas costs are 27,063,392
of G1 multiplications, we then run sub-functions 10 times, each
and 42,979,515 when sampling 300 blocks and 480 blocks
of which runs G1 multiplication 30 times. This enables us to
(Fig. 10d), respectively. On the other hand, the gas cost of veri-
avoid exceeding the gas limitation. Note that this substitution
fying a farmer’s proof is much lower, which is 997,151 gas
increases total gas costs since we must store temporary variables
(i.e., about $0.45). Note that this cost is for a single farmer, and
and trigger multiple calls of the smart contracts.
the total costs are K times larger when there are K farmers to
store replica. Since the underlying blockchain platform may
7.3 Experimental Results affect the time cost of the verification process, we further
Pre-Processing. We first test time for generating key pairs (i.e., deploy the smart contracts to Ropsten, an official public
the U:Setup). The time cost is about 0.84 ms, which is negligi-
ble. In Fig. 10a, we measure the time costs of U for pre-proc- TABLE 2
essing the original data before outsourcing it to S (i.e., the Overheads of Proof Generation and Verification for 100M Data
U:Store algorithm) and Fig. 10b presents the time cost of S to
S F
pre-process one replica before distributing it to farmers (i.e.,
the algorithms of S:Distribute and S:Seal). We randomly File size (MB) 100 100
generate testing data ranging from 10 to 100 MB. The result Sample blocks 300 480 7
Proving (ms) 33.2 48.8 10.8
shows that the time cost increases linearly with the data size.
Verification (gas) 27,063,392 42,979,515 997,151
The time costs are dominated by authenticators generation, USD 12.18 19.34 0.45
while both replica encryption and replica coding incur

Authorized licensed use limited to: Siksha O Anusandhan University. Downloaded on February 02,2021 at 05:40:08 UTC from IEEE Xplore. Restrictions apply.
CHEN ET AL.: BOSSA: A DECENTRALIZED SYSTEM FOR PROOFS OF DATA RETRIEVABILITY AND REPLICATION 797

(No. 2020YFB1005500), in part by the NSFC under Grants


61822207 and U1636219, in part by the Outstanding Youth
Foundation of Hubei Province under Grant 2017CFA047,
and in part by the Fundamental Research Funds for the
Central Universities under Grant 2042019kf0210. The work
of Cong Wang was supported in part by Research Grants
Council of Hong Kong under Grants CityU 11212717 and
CityU 11217819 and by the Innovation and Technology
Commission of Hong Kong under ITF Project ITS/145/19.
The work of Shengshan Hu was supported by the Funda-
mental Research Funds for the Central Universities under
Grant 2020kfyXJJS075.
Fig. 11. Time cost of verification in Ropsten versus the number of sam-
pled blocks. REFERENCES
[1] Data protection in Amazon S3, Accessed: Jul. 31, 2019. [Online].
Available: https://docs.aws.amazon.com/AmazonS3/latest/
Ethereum testnet. We measure the time cost from sending the dev/DataDurability.html
[2] H. Zhou, X. Ouyang, Z. Ren, J. Su, C. de Laat, and Z. Zhao, “A
proof to recording verification result on the chain. The time blockchain based witness model for trustworthy cloud service
cost for verifying S’s proof is presented in Fig. 11. It can be level agreement enforcement,” in Proc. IEEE Conf. Comput. Com-
seen that although heavily affected by the throughput of the mun., 2019, pp. 1567–1575.
testnet and the latency of the network, the time cost still [3] Tencent cloud user claims $1.6 million compensation for data loss,
2018. [Online]. Available: https://technode.com/2018/08/06/
roughly increases with the number of the sampled blocks. tencent-cloud-user-claims-1–6-million-compensation-for-data-loss
Note that techniques like sharding [50], [51] and state channel [4] Google loses data as lightning strikes, 2015. [Online]. Available:
[52] can further improve the efficiency of verification. https://www.bbc.com/news/technology-33989384
[5] G. Ateniese et al., “Provable data possession at untrusted stores,” in
Interactions With the Smart Contract. In BOSSA, other pro- Proc. 14th ACM Conf. Comput. Commun. Secur., 2007, pp. 598–609.
cesses, such as initialization, data fetching, also need to inter- [6] A. Yang, J. Xu, J. Weng, J. Zhou, and D. S. Wong, “Lightweight
act with M (through the smart contracts). In Fig. 10e, we and privacy-preserving delegatable proofs of storage with data
dynamics in cloud storage,” IEEE Trans. Cloud Comput., to be pub-
evaluate the gas costs of those processes. We can see that lished, doi: 10.1109/TCC.2018.2851256.
those costs are much lower (about several cents) than that of [7] J. Xu, A. Yang, J. Zhou, and D. S. Wong, “Lightweight delegatable
proof verification. Meanwhile, among these processes, the proofs of storage,” in Proc. Eur. Symp. Res. Comput. Secur., 2016,
costs of outsourcing original data and replicas are the highest. pp. 324–343.
[8] F. Armknecht, J.-M. Bohli, G. O. Karame, Z. Liu, and C. A. Reuter,
This is caused by the large call parameters (420 and 460 B) “Outsourced proofs of retrievability,” in Proc. ACM Conf. Comput.
for storing U’s and S’s public keys respectively. Commun. Secur., 2014, pp. 831–843.
[9] A. Juels and B. S. Kaliski Jr, “PORs: Proofs of retrievability for large fil-
Fig. 10f shows the impact of the number of farmers on the es,” in Proc. 14th ACM Conf. Comput. Commun. Secur., 2007, pp. 584–597.
gas costs when fetching and transferring replicas. The gas [10] H. Shacham and B. Waters, “Compact proofs of retrievability,” J.
costs increase linearly with the number of the farmers. This Cryptol., vol. 26, no. 3, pp. 442–483, 2013.
is due to the limitation of data structures supported in [11] Q. Wang, C. Wang, K. Ren, W. Lou, and J. Li, “Enabling public audit-
ability and data dynamics for storage security in cloud computing,”
Ethereum. We add farmers’ addresses into an array, such IEEE Trans. Parallel Distrib. Syst., vol. 22, no. 5, pp. 847–859, May 2011.
that it needs to traverse the whole array for searching a spe- [12] C. Wang, S. S. M. Chow, Q. Wang, K. Ren, and W. Lou, “Privacy-
cific farmer. In addition, we have to simultaneously increase preserving public auditing for secure cloud storage,” IEEE Trans.
Comput., vol. 62, no. 2, pp. 362–375, Feb. 2013.
the counter totalAsk for each farmer by 1 when fetching rep- [13] F. Armknecht, L. Barman, J.-M. Bohli, and G. O. Karame, “Mirror:
licas, and it charges an extra fee due to the changing of the Enabling proofs of data replication and retrievability in the cloud,” in
blockchain states. To avoid exceeding the gas limitation, S Proc. 25th USENIX Conf. Secur. Symp., 2016, pp. 1051–1068.
can fetch replica blocks separately. [14] R. Curtmola, O. Khan, R. Burns, and G. Ateniese, “MR-PDP:
Multiple-replica provable data possession,” in Proc. 28th Int. Conf.
Distrib. Comput. Syst., 2008, pp. 411–420.
[15] I. Leontiadis and R. Curtmola, “Secure storage with replication
8 CONCLUSION and transparent deduplication,” in Proc. 8th ACM Conf. Data Appl.
In this paper, we proposed BOSSA, a novel decentralized Secur. Privacy, 2018, pp. 13–23.
[16] I. Damgard, C. Ganesh, and C. Orlandi, “Proofs of replicated stor-
framework for proofs of data replication and retrievability. age without timing assumptions,” in Proc. Annu. Int. Cryptol.
Different from existing schemes, BOSSA is built atop off-the- Conf., 2019, pp. 355–380.
shelf blockchain platforms, where each participant is fairly [17] J. Benet, D. Dalrymple, and N. Greco, “Proof of replication,”
treated and incentivized to faithfully follow the auditing pro- Tech. Rep., pp. 1–10, 2017.
[18] B. Fisch, “PoReps: Proofs of space on useful data,” Tech. Rep.,
tocol. We proposed a proof enforcement mechanism to catch pp. 1–60, 2018.
lazy behaviors, and a new metric as well as a reward distribu- [19] B. Fisch, J. Bonneau, N. Greco, and J. Benet, “Scaling proof-of-
tion mechanism to create a fair reciprocal environment. Our replication for filecoin mining,” Tech. Rep., pp. 1–25, 2018.
[20] A. B. Ayed, “A conceptual secure blockchain-based electronic
experiments show that BOSSA incurs tolerable costs and is voting system,” Int. J. Netw. Secur. Appl., vol. 9, no. 3, pp. 1–9, 2017.
feasible in practice. [21] S. Wu, Y. Chen, Q. Wang, M. Li, C. Wang, and X. Luo, “CReam: A
smart contract enabled collusion-resistant e-auction,” IEEE Trans.
ACKNOWLEDGMENTS Inf. Forensics Security, vol. 14, no. 7, pp. 1687–1701, Jul. 2019.
[22] S. Dziembowski, L. Eckey, and S. Faust, “FairSwap: How to fairly
The work of Qian Wang was supported in part by the exchange digital goods,” in Proc. ACM SIGSAC Conf. Comput.
National Key Research and Development Program of China Commun. Secur., 2018, pp. 967–984.

Authorized licensed use limited to: Siksha O Anusandhan University. Downloaded on February 02,2021 at 05:40:08 UTC from IEEE Xplore. Restrictions apply.
798 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 32, NO. 4, APRIL 2021

[23] Filecoin, 2017. [Online]. Available: https://filecoin.io/filecoin.pdf Dian Chen received the BE degree from Wuhan
[24] G. Wood, “Ethereum: A secure decentralised generalised transac- University, Wuhan, China, in 2018. He is currently
tion ledger,” Ethereum Project Yellow Paper, vol. 151, pp. 1–32, 2014. working toward the master’s degree with the School
[25] C. Cachin, “Architecture of the hyperledger blockchain fabric,” in of Cyber Science and Engineering, Wuhan Univer-
Proc. Workshop Distrib. Cryptocurrencies Consensus Ledgers, 2016, sity, Wuhan, China. His research interests include
vol. 310, pp. 1–4. information security and blockchain technology.
[26] C. Guan, K. Ren, F. Zhang, F. Kerschbaum, and J. Yu, “Symmetric-
key based proofs of retrievability supporting public verification,”
in Proc. Eur. Symp. Res. Comput. Secur., 2015, pp. 203–223.
[27] D. Boneh, B. Lynn, and H. Shacham, “Short signatures from the
weil pairing,” in Proc. Int. Conf. Theory Appl. Cryptol. Inf. Secur.,
2001, pp. 514–532. Haobo Yuan is currently working toward the under-
[28] Z. Hao and N. Yu, “A multiple-replica remote data possession graduate degree majoring in computer science and
checking protocol with public verifiability,” in Proc. 2nd Int. Symp. technology at Wuhan University, Wuhan, China.
Data Privacy E-Commerce, 2010, pp. 84–89. His research interests include information security
[29] D. Boneh, J. Bonneau, B. B€ unz, and B. Fisch, “Verifiable delay and blockchain technology.
functions,” in Proc. Annu. Int. Cryptol. Conf., 2018, pp. 757–788.
[30] Storj, 2018. [Online]. Available: https://storj.io/white-paper
[31] Sia: Simple decentralized storage, 2014. [Online]. Available:
https://sia.tech/sia.pdf
[32] J. Benet, “IPFS-content addressed, versioned, P2P file system,”
2014, arXiv:1407.3561.
[33] A. Miller, A. Juels, E. Shi, B. Parno, and J. Katz, “Permacoin: Shengshan Hu received the BE and PhD degrees
Repurposing bitcoin work for data preservation,” in Proc. IEEE in computer science and technology from Wuhan
Symp. Security Privacy, 2014, pp. 475–490. University, Wuhan, China, in 2014 and 2019,
[34] B. Sengupta, S. Bag, S. Ruj, and K. Sakurai, “Retricoin: Bitcoin respectively. He is currently an associate professor
based on compact proofs of retrievability,” in Proc. 17th Int. Conf. with the School of Cyber Science and Engineering,
Distrib. Comput. Netw., 2016, pp. 1–10. Huazhong University of Science and Technology.
[35] S. Ruj, M. S. Rahman, A. Basu, and S. Kiyomoto, “BlockStore: A His research interest focuses on privacy-enhancing
secure decentralized storage framework on blockchain,” in Proc. technologies, AI security, and blockchain.
IEEE 32nd Int. Conf. Adv. Inf. Netw. Appl., 2018, pp. 1096–1103.
[36] S. Nakamoto, “Bitcoin: A peer-to-peer electronic cash system,” 2008.
[Online]. Available: https://bitcoin.org/bitcoin.pdf
[37] A. Kiayias, A. Russell, B. David, and R. Oliynykov, “Ouroboros: A Qian Wang (Senior Member, IEEE) received the
provably secure proof-of-stake blockchain protocol,” in Proc. PhD degree from the Illinois Institute of Technology,
Annu. Int. Cryptol. Conf., 2017, pp. 357–388. Chicago, Illinois. He is currently a professor with
[38] S. Hu, C. Cai, Q. Wang, C. Wang, X. Luo, and K. Ren, “Searching an
the School of Cyber Science and Engineering,
encrypted cloud meets blockchain: A decentralized, reliable and fair
Wuhan University. His research interests include AI
realization,” in Proc. IEEE Conf. Comput. Commun., 2018, pp. 792–800. security, data storage, search and computation out-
[39] S. Hu, C. Cai, Q. Wang, C. Wang, Z. Wang, and D. Ye, sourcing security and privacy, wireless system
“Augmenting encrypted search: A decentralized service realiza- security, and applied cryptography etc. He received
tion with enforced execution,” IEEE Trans. Dependable Secure Com- National Science Fund for Excellent Young Schol-
put., to be published, doi: 10.1109/TDSC.2019.2957091.
ars of China, in 2018. He is also an expert under
[40] Libra, 2019. [Online]. Available: https://libra.org/en-US/white- National “1000 Young Talents Program” of China.
paper/ He is a recipient of the 2018 IEEE TCSC Award for excellence in scalable
[41] V. Buterin et al., “A next-generation smart contract and decentral- computing (Early Career Researcher), and the 2016 IEEE Asia-Pacific Out-
ized application platform,” White Paper, 2014. standing Young Researcher Award. He is also a co-recipient of several best
[42] E. Androulaki et al., “Hyperledger fabric: A distributed operating paper awards from IEEE DSC’19, IEEE ICDCS’17, IEEE TrustCom’16,
system for permissioned blockchains,” in Proc. 13th EuroSys Conf., WAIM’14, and IEEE ICNP’11 etc. He is serving as an associate editor of
2018, pp. 1–15. the IEEE Transactions on Dependable and Secure Computing (TDSC),
[43] S. Hu, L. Y. Zhang, Q. Wang, Z. Qin, and C. Wang, “Towards pri- IEEE Transactions on Information Forensics and Security (TIFS), and
vate and scalable cross-media retrieval,” IEEE Trans. Dependable IEEE Internet of Things Journal (IoT-J). He is a member of ACM.
Secure Comput., to be published, doi: 10.1109/TDSC.2019.2926968.
[44] Q. Wang, M. He, M. Du, S. S. M. Chow, R. W. F. Lai, and Q. Zou,
Cong Wang (Senior Member, IEEE) is an associ-
“Searchable encryption over feature-rich data,” IEEE Trans. Dependable ate professor with the Department of Computer
Secure Comput., vol. 15, no. 3, pp. 496–510, May/Jun. 2018. Science, City University of Hong Kong. His current
[45] C. Wang, Q. Wang, K. Ren, N. Cao, and W. Lou, “Toward secure and research interests include data and network secu-
dependable storage services in cloud computing,” IEEE Trans. Services rity, blockchain and decentralized applications, and
Comput., vol. 5, no. 2, pp. 220–232, Apr.–Jun. 2012. privacy-enhancing technologies. He is one of the
[46] M. Saad et al., “Exploring the attack surface of blockchain: A sys- founding members of the Young Academy of Scien-
tematic overview,” 2019, arXiv: 1904.03487. ces of Hong Kong. He received the Outstanding
[47] J. Katz and Y. Lindell, Introduction to Modern Cryptography. London, Researcher Award (junior faculty), in 2019, the Out-
U.K./Boca Raton, FL, USA: Chapman and Hall/CRC, 2014. standing Supervisor Award, in 2017 and the presi-
[48] I. S. Reed and G. Solomon, “Polynomial codes over certain finite dent’s awards, in 2019 and 2016, all from the City
fields,” J. Soc. Ind. Appl. Math., vol. 8, no. 2, pp. 300–304, 1960. University of Hong Kong. He is a co-recipient of the IEEE INFOCOM Test of
[49] Ate-pairing, Jul. 31, 2019. [Online]. Available: https://github. Time Paper Award 2020, Best Student Paper Award of IEEE ICDCS 2017,
com/herumi/ate-pairing and the Best Paper Award of IEEE ICPADS 2018, and MSN 2015. His
[50] L. Luu, V. Narayanan, C. Zheng, K. Baweja, S. Gilbert, and P. Saxena, research has been supported by multiple government research fund agen-
“A secure sharding protocol for open blockchains,” in Proc. ACM cies, including National Natural Science Foundation of China, Hong Kong
SIGSAC Conf. Comput. Commun. Secur., 2016, pp. 17–30. Research Grants Council, and Hong Kong Innovation and Technology
[51] M. Zamani, M. Movahedi, and M. Raykova, “RapidChain: Scaling Commission. He serves/has served as an associate editor of the IEEE
blockchain via full sharding,” in Proc. ACM SIGSAC Conf. Comput. Transactions on Dependable and Secure Computing, IEEE Internet of
Commun. Secur., 2018, pp. 931–948. Things Journal, IEEE Networking Letters, and TPC co-chairs for a number
[52] A. Miller, I. Bentov, R. Kumaresan, and P. McCorry, “Sprites and state of IEEE conferences/workshops. He is a member of ACM.
channels: Payment networks that go faster than lightning,” in Proc.
Int. Conf. Financial Cryptogr. Data Secur., 2019, pp. 508–526. " For more information on this or any other computing topic,
please visit our Digital Library at www.computer.org/csdl.

Authorized licensed use limited to: Siksha O Anusandhan University. Downloaded on February 02,2021 at 05:40:08 UTC from IEEE Xplore. Restrictions apply.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy