DROPS: Division and Replication of Data in Cloud For Optimal Performance and Security

Download as pdf or txt
Download as pdf or txt
You are on page 1of 14

This article has been accepted for publication in a future issue of this journal, but has not been

fully edited. Content may change prior to final publication. Citation


information: DOI 10.1109/TCC.2015.2400460, IEEE Transactions on Cloud Computing
IEEE TRANSACTIONS ON CLOUD COMPUTING 2018 1

DROPS: Division and Replication of Data in


Cloud for Optimal Performance and Security
Mazhar Ali, Student Member, IEEE, Kashif Bilal, Student Member, IEEE, Samee U. Khan, Senior
Member, IEEE, Bharadwaj Veeravalli, Senior Member, IEEE, Keqin Li, Senior Member, IEEE, and
Albert Y. Zomaya, Fellow, IEEE

Abstract—Outsourcing data to a third-party administrative control, as is done in cloud computing, gives rise to security concerns.
The data compromise may occur due to attacks by other users and nodes within the cloud. Therefore, high security measures are
required to protect data within the cloud. However, the employed security strategy must also take into account the optimization
of the data retrieval time. In this paper, we propose Division and Replication of Data in the Cloud for Optimal Performance and
Security (DROPS) that collectively approaches the security and performance issues. In the DROPS methodology, we divide a
file into fragments, and replicate the fragmented data over the cloud nodes. Each of the nodes stores only a single fragment
of a particular data file that ensures that even in case of a successful attack, no meaningful information is revealed to the
attacker. Moreover, the nodes storing the fragments, are separated with certain distance by means of graph T-coloring to prohibit
an attacker of guessing the locations of the fragments. Furthermore, the DROPS methodology does not rely on the traditional
cryptographic techniques for the data security; thereby relieving the system of computationally expensive methodologies. We
show that the probability to locate and compromise all of the nodes storing the fragments of a single file is extremely low. We
also compare the performance of the DROPS methodology with ten other schemes. The higher level of security with slight
performance overhead was observed.

Index Terms—Centrality, cloud security, fragmentation, replication, performance.

1 I NTRODUCTION due to the core technology′ s implementation (virtual


machine (VM) escape, session riding, etc.), cloud ser-
T HE cloud computing paradigm has reformed the
usage and management of the information tech-
nology infrastructure [7]. Cloud computing is char-
vice offerings (structured query language injection,
weak authentication schemes, etc.), and arising from
acterized by on-demand self-services, ubiquitous net- cloud characteristics (data recovery vulnerability, In-
work accesses, resource pooling, elasticity, and mea- ternet protocol vulnerability, etc.) [5]. For a cloud to be
sured services [22, 8]. The aforementioned character- secure, all of the participating entities must be secure.
istics of cloud computing make it a striking candidate In any given system with multiple units, the highest
for businesses, organizations, and individual users level of the system′ s security is equal to the security
for adoption [25]. However, the benefits of low-cost, level of the weakest entity [12]. Therefore, in a cloud,
negligible management (from a users perspective), the security of the assets does not solely depend on
and greater flexibility come with increased security an individual’s security measures [5]. The neighboring
concerns [7]. entities may provide an opportunity to an attacker to
Security is one of the most crucial aspects among bypass the users defenses.
those prohibiting the wide-spread adoption of cloud The off-site data storage cloud utility requires users
computing [14, 19]. Cloud security issues may stem to move data in cloud’s virtualized and shared envi-
ronment that may result in various security concerns.
● M. Ali, K. Bilal, and S. U. Khan are with the Department
Pooling and elasticity of a cloud, allows the physi-
of Electrical and Computer Engineering, North Dakota cal resources to be shared among many users [22].
State University, Fargo, ND 58108-6050, USA. E-mail: Moreover, the shared resources may be reassigned to
{mazhar.ali,kashif.bilal,samee.khan}@ndsu.edu
other users at some instance of time that may result
● B. Veeravallii is with the Department of Electrical and Com- in data compromise through data recovery method-
puter Engineering, The National University of Singapore. E-mail: ologies [22]. Furthermore, a multi-tenant virtualized
elebv@nus.edu.sg
environment may result in a VM to escape the bounds
● K. Li is with the Department of Computer Science, State University of virtual machine monitor (VMM). The escaped VM
of New York , New Paltz, NY 12561. E-mail: lik@ndsu.edu can interfere with other VMs to have access to unau-
● A.Y. Zomaya is with the School of Information Technologies, The
thorized data [9]. Similarly, cross-tenant virtualized
University of Sydney, Sydney, NSW 2006, Australia. E-mail: al- network access may also compromise data privacy
bert.zomaya@sydney.edu.au and integrity. Improper media sanitization can also
leak customer′ s private data [5].
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation
information: DOI 10.1109/TCC.2015.2400460, IEEE Transactions on Cloud Computing
IEEE TRANSACTIONS ON CLOUD COMPUTING 2

fragments and to further improve the security, we


select the nodes in a manner that they are not adjacent
and are at certain distance from each other. The node
separation is ensured by the means of the T-coloring
[6]. To improve data retrieval time, the nodes are se-
lected based on the centrality measures that ensure an
improved access time. To further improve the retrieval
time, we judicially replicate fragments over the nodes
that generate the highest read/write requests. The
selection of the nodes is performed in two phases.
In the first phase, the nodes are selected for the initial
placement of the fragments based on the centrality
measures. In the second phase, the nodes are selected
for replication. The working of the DROPS methodol-
Fig. 1: The DROPS methodology
ogy is shown as a high-level work flow in Fig. 1. We
implement ten heuristics based replication strategies
as comparative techniques to the DROPS methodol-
The data outsourced to a public cloud must be ogy. The implemented replication strategies are: (a)
secured. Unauthorized data access by other users and A-star based searching technique for data replication
processes (whether accidental or deliberate) must be problem (DRPA-star), (b) weighted A-star (WA-star),
prevented [14]. As discussed above, any weak entity (c) A-star, (d) suboptimal A-star1 (SA1), (e) subop-
can put the whole cloud at risk. In such a scenario, timal A-star2 (SA2), (f) suboptimal A-star3 (SA3), (g)
the security mechanism must substantially increase Local Min-Min, (h) Global Min-Min, (i) Greedy algo-
an attacker’s effort to retrieve a reasonable amount rithm, and (j) Genetic Replication Algorithm (GRA).
of data even after a successful intrusion in the cloud. The aforesaid strategies are fine-grained replication
Moreover, the probable amount of loss (as a result of techniques that determine the number and locations
data leakage) must also be minimized. of the replicas for improved system performance. For
A cloud must ensure throughput, reliability, and our studies, we use three Data Center Network (DCN)
security [15]. A key factor determining the throughput architectures, namely: (a) Three tier, (b) Fat tree, and
of a cloud that stores data is the data retrieval time (c) DCell. We use the aforesaid architectures because
[21]. In large-scale systems, the problems of data re- they constitute the modern cloud infrastructures and
liability, data availability, and response time are dealt the DROPS methodology is proposed to work for the
with data replication strategies [3]. However, placing cloud computing paradigm.
replicas data over a number of nodes increases the
attack surface for that particular data. For instance, Our major contributions in this paper are as follows:
storing m replicas of a file in a cloud instead of one
replica increases the probability of a node holding file ● We develop a scheme for outsourced data that
to be chosen as attack victim, from n1 to m n
, where n takes into account both the security and per-
is the total number of nodes. formance. The proposed scheme fragments and
From the above discussion, we can deduce that replicates the data file over cloud nodes.
both security and performance are critical for the ● The proposed DROPS scheme ensures that even
next generation large-scale systems, such as clouds. in the case of a successful attack, no meaningful
Therefore, in this paper, we collectively approach information is revealed to the attacker.
the issue of security and performance as a secure ● We do not rely on traditional cryptographic tech-
data replication problem. We present Division and niques for data security. The non-cryptographic
Replication of Data in the Cloud for Optimal Perfor- nature of the proposed scheme makes it faster to
mance and Security (DROPS) that judicially fragments perform the required operations (placement and
user files into pieces and replicates them at strategic retrieval) on the data.
locations within the cloud. The division of a file into ● We ensure a controlled replication of the file frag-
fragments is performed based on a given user criteria ments, where each of the fragments is replicated
such that the individual fragments do not contain any only once for the purpose of improved security.
meaningful information. Each of the cloud nodes (we
use the term node to represent computing, storage, The remainder of the paper is organized as follows.
physical, and virtual machines) contains a distinct Section 2 provides an overview of the related work in
fragment to increase the data security. A successful the field. In Section 3, we present the preliminaries.
attack on a single node must not reveal the loca- The DROPS methodology is introduced in Section 4.
tions of other fragments within the cloud. To keep Section 5 explains the experimental setup and results,
an attacker uncertain about the locations of the file and Section 6 concludes the paper.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation
information: DOI 10.1109/TCC.2015.2400460, IEEE Transactions on Cloud Computing
IEEE TRANSACTIONS ON CLOUD COMPUTING 3

2 R ELATED W ORK n shares is carried out through the (k, n) threshold


secret sharing scheme. The network is divided into
Juels et al. [10] presented a technique to ensure the clusters. The number of replicas and their placement
integrity, freshness, and availability of data in a cloud. is determined through heuristics. A primary site is
The data migration to the cloud is performed by the selected in each of the clusters that allocates the repli-
Iris file system. A gateway application is designed cas within the cluster. The scheme presented in [21]
and employed in the organization that ensures the combines the replication problem with security and
integrity and freshness of the data using a Merkle access time improvement. Nevertheless, the scheme
tree. The file blocks, MAC codes, and version numbers focuses only on the security of the encryption key.
are stored at various levels of the tree. The proposed The data files are not fragmented and are handled as
technique in [10] heavily depends on the user′ s em- a single file. The DROPS methodology, on the other
ployed scheme for data confidentiality. Moreover, the hand, fragments the file and store the fragments on
probable amount of loss in case of data tempering as a multiple nodes. Moreover, the DROPS methodology
result of intrusion or access by other VMs cannot be focuses on the security of the data within the cloud
decreased. Our proposed strategy does not depend computing domain that is not considered in [21].
on the traditional cryptographic techniques for data
security. Moreover, the DROPS methodology does 3 P RELIMINARIES
not store the whole file on a single node to avoid
compromise of all of the data in case of successful Before we go into the details of the DROPS methodol-
attack on the node. ogy, we introduce the related concepts in the follow-
ing for the ease of the readers.
The authors in [11] approached the virtualized and
multi-tenancy related issues in the cloud storage by
utilizing the consolidated storage and native access 3.1 Data Fragmentation
control. The Dike authorization architecture is pro- The security of a large-scale system, such as cloud de-
posed that combines the native access control and pends on the security of the system as a whole and the
the tenant name space isolation. The proposed system security of individual nodes. A successful intrusion
is designed and works for object based file systems. into a single node may have severe consequences, not
However, the leakage of critical information in case of only for data and applications on the victim node, but
improper sanitization and malicious VM is not han- also for the other nodes. The data on the victim node
dled. The DROPS methodology handles the leakage may be revealed fully because of the presence of the
of critical information by fragmenting data file and whole file [17]. A successful intrusion may be a result
using multiple nodes to store a single file. of some software or administrative vulnerability [17].
The use of a trusted third party for providing In case of homogenous systems, the same flaw can
security services in the cloud is advocated in [22]. The be utilized to target other nodes within the system.
authors used the public key infrastructure (PKI) to en- The success of an attack on the subsequent nodes
hance the level of trust in the authentication, integrity, will require less effort as compared to the effort on
and confidentiality of data and the communication the first node. Comparatively, more effort is required
between the involved parties. The keys are generated for heterogeneous systems. However, compromising
and managed by the certification authorities. At the a single file will require the effort to penetrate only
user level, the use of temper proof devices, such as a single node. The amount of compromised data can
smart cards was proposed for the storage of the keys. be reduced by making fragments of a data file and
Similarly, Tang et. al. have utilized the public key storing them on separate nodes [17, 21]. A successful
cryptography and trusted third party for providing intrusion on a single or few nodes will only provide
data security in cloud environments [20]. However, access to a portion of data that might not be of
the authors in [20] have not used the PKI infrastruc- any significance. Moreover, if an attacker is uncertain
ture to reduce the overheads. The trusted third party about the locations of the fragments, the probability
is responsible for the generation and management of of finding fragments on all of the nodes is very low.
public/private keys. The trusted third party may be a Let us consider a cloud with M nodes and a file
single server or multiple servers. The symmetric keys with z number of fragments. Let s be the number of
are protected by combining the public key cryptogra- successful intrusions on distinct nodes, such that s>z.
phy and the (k, n) threshold secret sharing schemes. The probability that s number of victim nodes contain
Nevertheless, such schemes do not protect the data all of the z sites storing the file fragments (represented
files against tempering and loss due to issues arising by P(s,z)) is given as:
from virtualization and multi-tenancy.
A secure and optimal placement of data objects in a s M −s
( )( )
distributed system is presented in [21]. An encryption z s−z
P (s, z) = . (1)
key is divided into n shares and distributed on differ- M
( )
ent sites within the network. The division of a key into s
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation
information: DOI 10.1109/TCC.2015.2400460, IEEE Transactions on Cloud Computing
IEEE TRANSACTIONS ON CLOUD COMPUTING 4

If M = 30, s = 10, and z = 7, then P (10, 7) = 0.0046. TABLE 1: Notations and their meanings
However, if we choose M = 50, s = 20, and z = 15, Symbols Meanings
then P (20, 15) = 0.000046. With the increase in M, the M Total number of nodes in the cloud
probability of a state reduces further. Therefore, we N Total number of file fragments to be placed
can say that the greater the value of M, the less prob- Ok k-th fragment of file
able that an attacker will obtain the data file. In cloud ok Size of Ok
systems with thousands of nodes, the probability for Si i-th node
an attacker to obtain a considerable amount of data, si Size of S i
reduces significantly. However, placing each fragment ceni Centrality measure for S i
once in the system will increase the data retrieval colS i Color assigned to S i
time. To improve the data retrieval time, fragments T A set containing distances by which assignment of
can be replicated in a manner that reduces retrieval fragments must be separated
time to an extent that does not increase the aforesaid rki Number of reads for Ok from S i
probability. Rki Aggregate read cost of rki
wki Number of writes for Ok from S i
3.2 Centrality Wki Aggregate write cost of wki
N N ik Nearest neighbor of S i holding Ok
The centrality of a node in a graph provides the
c(i,j) Communication cost between S i and S j
measure of the relative importance of a node in the
Pk Primary node for Ok
network. The objective of improved retrieval time
Rk Replication schema of Ok
in replication makes the centrality measures more
RT Replication time
important. There are various centrality measures; for
instance, closeness centrality, degree centrality, be-
tweenness centrality, eccentricity centrality, and eigen-
3.2.3 Eccentricity
vector centrality. We only elaborate on the closeness,
betweenness, and eccentricity centralities because we The eccentricity of a node n is the maximum distance
are using the aforesaid three centralities in this work. to any node from a node n [24]. A node is more central
For the remainder of the centralities, we encourage in the network, if it is less eccentric. Formally, the
the readers to review [24]. eccentricity can be given as:
E(va ) = maxb d(va , vb ), (4)
3.2.1 Betweenness Centrality
The betweenness centrality of a node n is the number where d(va , vb ) represents the distance between node
of the shortest paths, between other nodes, passing va and node vb . It may be noted that in our evaluation
through n [24]. Formally, the betweenness centrality of the strategies the centrality measures introduced
of any node v in a network is given as: above seem very meaningful and relevant than using
simple hop-count kind of metrics.
δab (v)
Cb (v) = ∑ , (2)
a≠v≠b δab
3.3 T-coloring
where δab is the total number of shortest paths be- Suppose we have a graph G = (V, E) and a set T
tween a and b, and δab (v) is the number of shortest containing non-negative integers including 0. The T-
paths between a and b passing through v. The variable coloring is a mapping function f from the vertices of V
Cb (v) denotes the betweenness centrality for node v. to the set of non-negative integers, such that ∣f(x)- f(y)∣
∉ T , where (x, y) ∈ E. The mapping function f assigns
3.2.2 Closeness Centrality
a color to a vertex. In simple words, the distance
A node is said to be closer with respect to all of between the colors of the adjacent vertices must not
the other nodes within a network, if the sum of the belong to T. Formulated by Hale [6], the T-coloring
distances from all of the other nodes is lower than problem for channel assignment assigns channels to
the sum of the distances of other candidate nodes the nodes, such that the channels are separated by a
from all of the other nodes [24]. The lower the sum distance to avoid interference.
of distances from the other nodes, the more central is
the node. Formally, the closeness centrality of a node
v in a network is defined as: 4 DROPS
N −1 4.1 System Model
Cc (v) = , (3)
∑ d(v, a) Consider a cloud that consists of M nodes, each with
a≠v its own storage capacity. Let S i represents the name
where N is total number of nodes in a network and of i-th node and si denotes total storage capacity of
d(v, a) represents the distance between node v and S i . The communication time between S i and S j is
node a. the total time of all of the links within a selected path
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation
information: DOI 10.1109/TCC.2015.2400460, IEEE Transactions on Cloud Computing
IEEE TRANSACTIONS ON CLOUD COMPUTING 5

from S i to S j represented by c(i, j). We consider N successful attack on a node might put the data con-
number of file fragments such that Ok denotes k-th fidentiality or integrity, or both at risk. The aforesaid
fragment of a file while ok represents the size of k-th scenario can occur both in the case of intrusion or
fragment. Let the total read and write requests from S i accidental errors. In such systems, performance in
for Ok be represented by rki and wki , respectively. Let terms of retrieval time can be enhanced by employing
Pk denote the primary node that stores the primary replication strategies. However, replication increases
copy of Ok . The replication scheme for Ok denoted by the number of file copies within the cloud. Thereby,
Rk is also stored at Pk . Moreover, every S i contains increasing the probability of the node holding the file
a two-field record, storing Pk for Ok and N N ik that to be a victim of attack as discussed in Section 1.
represents the nearest node storing Ok . Whenever Security and replication are essential for a large-scale
there is an update in Ok , the updated version is system, such as cloud, as both are utilized to provide
sent to Pk that broadcasts the updated version to all services to the end user. Security and replication must
of the nodes in Rk . Let b(i,j) and t(i,j) be the total be balanced such that one service must not lower the
bandwidth of the link and traffic between sites S i service level of the other.
and S j , respectively . The centrality measure for S i In the DROPS methodology, we propose not to
is represented by ceni . Let colS i store the value of store the entire file at a single node. The DROPS
assigned color to S i . The colS i can have one out of two methodology fragments the file and makes use of the
values, namely: open color and close color. The value cloud for replication. The fragments are distributed
open color represents that the node is available for such that no node in a cloud holds more than a
storing the file fragment. The value close color shows single fragment, so that even a successful attack on
that the node cannot store the file fragment. Let T be the node leaks no significant information. The DROPS
a set of integers starting from zero and ending on a methodology uses controlled replication where each
prespecified number. If the selected number is three, of the fragments is replicated only once in the cloud to
then T = {0, 1, 2, 3}. The set T is used to restrict the improve the security. Although, the controlled repli-
node selection to those nodes that are at hop-distances cation does not improve the retrieval time to the level
not belonging to T. For the ease of reading, the most of full-scale replication, it significantly improves the
commonly used notations are listed in Table 1. security.
Our aim is to minimize the overall total network In the DROPS methodology, user sends the data file
transfer time or replication time (RT) or also termed to cloud. The cloud manager system (a user facing
as replication cost (RC). The RT is composed of two server in the cloud that entertains user’s requests)
factors: (a) time due to read requests and (b) time due upon receiving the file performs: (a) fragmentation,
to write requests. The total read time of Ok by S i from (b) first cycle of nodes selection and stores one frag-
N N ik is denoted by Rki and is given by: ment over each of the selected node, and (c) second
cycle of nodes selection for fragments replication.
Rki = rki ok c(i, N N ik ). (5) The cloud manager keeps record of the fragment
placement and is assumed to be a secure entity.
The total time due to the writing of Ok by S i ad-
The fragmentation threshold of the data file is spec-
dressed to the Pk is represented as Wki and is given:
ified to be generated by the file owner. The file owner
Wki = wki ok (c(i, Pk ) + ∑ c(Pk , j)). (6) can specify the fragmentation threshold in terms of
(j∈Rk ),j≠i either percentage or the number and size of different
fragments. The percentage fragmentation threshold,
The overall RT is represented by:
for instance, can dictate that each fragment will be
M N of 5% size of the total size of the file. Alternatively,
RT = ∑ ∑ (Rki + Wki ) (7) the owner may generate a separate file containing
i=1 k=1
information about the fragment number and size, for
The storage capacity constraint states that a file frag- instance, fragment 1 of size 5,000 Bytes, fragment 2
ment can only be assigned to a node, if storage of size 8,749 Bytes. We argue that the owner of the
capacity of the node is greater or equal to the size file is the best candidate to generate fragmentation
of fragment. The bandwidth constraint states that threshold. The owner can best split the file such that
b(i, j) ≥ t(i, j)∀i, ∀j. The DROPS methodology as- each fragment does not contain significant amount
signs the file fragments to the nodes in a cloud that of information as the owner is cognizant of all the
minimizes the RT, subject to capacity and bandwidth facts pertaining to the data. The default percentage
constraints. fragmentation threshold can be made a part of the
Service Level Agreement (SLA), if the user does not
specify the fragmentation threshold while uploading
4.2 DROPS the data file. We primarily focus the storage system
In a cloud environment, a file in its totality, stored security in this work with an assumption that the
at a node leads to a single point of failure [17]. A communication channel between user and the cloud
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation
information: DOI 10.1109/TCC.2015.2400460, IEEE Transactions on Cloud Computing
IEEE TRANSACTIONS ON CLOUD COMPUTING 6

is secure. other fragments cannot be determined. The attacker


can only keep on guessing the location of the other
Algorithm 1 Algorithm for fragment placement fragments. However, as stated previously in Section
Inputs and initializations: 3.1, the probability of a successful coordinated attack
O = {O1 , O2 , ..., ON } is extremely minute. The process is repeated until all
o = {sizeof (O1 ), sizeof (O2 ), ...., sizeof (ON )} of the fragments are placed at the nodes. Algorithm
col = {open color, close color} 1 represents the fragment placement methodology.
cen = {cen1 , cen2 , ..., cenM } In addition to placing the fragments on the central
col ← open color∀ i nodes, we also perform a controlled replication to
cen ← ceni ∀ i increase the data availability, reliability, and improve
Compute: data retrieval time. We place the fragment on the
for each Ok ∈ O do node that provides the decreased access cost with an
select S i ∣ S i ← indexof(max(ceni )) objective to improve retrieval time for accessing the
if colS i = open color and si >= ok then fragments for reconstruction of original file. While
S i ← Ok replicating the fragment, the separation of fragments
si ← si − ok as explained in the placement technique through T-
colS i ← close color coloring, is also taken care off. In case of a large
S i ’← distance(S i , T ) ▷ /*returns all nodes at number of fragments or small number of nodes, it
distance T from S i and stores in temporary set S i ’*/ is also possible that some of the fragments are left
colS i′ ← close color without being replicated because of the T-coloring.
end if As discussed previously, T-coloring prohibits to store
end for the fragment in neighborhood of a node storing a
fragment, resulting in the elimination of a number of
Once the file is split into fragments, the DROPS nodes to be used for storage. In such a case, only for
methodology selects the cloud nodes for fragment the remaining fragments, the nodes that are not hold-
placement. The selection is made by keeping an equal ing any fragment are selected for storage randomly.
focus on both security and performance in terms of The replication strategy is presented in Algorithm 2.
the access time. We choose the nodes that are most To handle the download request from user, the
central to the cloud network to provide better access cloud manager collects all the fragments from the
time. For the aforesaid purpose, the DROPS method- nodes and re-assemble them into a single file. After-
ology uses the concept of centrality to reduce access wards, the file is sent to the user.
time. The centralities determine how central a node is
based on different measures as discussed in Section Algorithm 2 Algorithm for fragment′ s replication
3.2. We implement DROPS with three centrality mea- for each Ok in O do
sures, namely: (a) betweenness, (b) closeness, and (c) select S i that has max(Rki + Wki )
eccentricity centrality. However, if all of the fragments if colS i = open color and si >= ok then
are placed on the nodes based on the descending S i ← Ok
order of centrality, then there is a possibility that si ← si − ok
adjacent nodes are selected for fragment placement. colS i ← close color
Such a placement can provide clues to an attacker as S i ’← distance(S i , T ) ▷ /*returns all nodes at
to where other fragments might be present, reducing distance T from S i and stores in temporary set S i ’*/
the security level of the data. To deal with the security colS i′ ← close color
aspects of placing fragments, we use the concept of end if
T-coloring that was originally used for the channel end for
assignment problem [6]. We generate a non-negative
random number and build the set T starting from
zero to the generated random number. The set T is
used to restrict the node selection to those nodes that 4.3 Discussion
are at hop-distances not belonging to T. For the said A node is compromised with a certain amount of an
purpose, we assign colors to the nodes, such that, attacker’s effort. If the compromised node stores the
initially, all of the nodes are given the open color. data file in totality, then a successful attack on a cloud
Once a fragment is placed on the node, all of the nodes node will result in compromise of an entire data file.
within the neighborhood at a distance belonging to However, if the node stores only a fragment of a file,
T are assigned close color. In the aforesaid process, then a successful attack reveals only a fragment of
we lose some of the central nodes that may increase a data file. Because the DROPS methodology stores
the retrieval time but we achieve a higher security fragments of data files over distinct nodes, an attacker
level. If somehow the intruder compromises a node has to compromise a large number of nodes to obtain
and obtains a fragment, then the location of the meaningful information. The number of compromised
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation
information: DOI 10.1109/TCC.2015.2400460, IEEE Transactions on Cloud Computing
IEEE TRANSACTIONS ON CLOUD COMPUTING 7

TABLE 2: Various attacks handled by DROPS methodology

Attack Description
Data Recovery Rollback of VM to some previous state. May expose previously stored data.
Cross VM attack Malicious VM attacking co-resident VM that may lead to data breach.
Improper media sanitization Data exposure due to improper sanitization of storage devices.
E-discovery Data exposure of one user due to seized hardware for investigations related to some other users.
VM escape A malicious user or VM escapes from the control of VMM. Provides access to storage and compute devices.
VM rollback Rollback of VM to some previous state. May expose previously stored data.

nodes must be greater than n because each of the com- the fragment.
promised node may not give fragment in the DROPS
methodology as the nodes are separated based on
the T-coloring. Alternatively, an attacker has to com-
5 E XPERIMENTAL SETUP AND RESULTS
promise the authentication system of cloud [23]. The The communicational backbone of cloud computing is
effort required by an attacker to compromise a node the Data Center Network (DCN) [2]. In this paper, we
(in systems dealing with fragments/shares of data) is use three DCN architectures namely: (a) Three tier, (b)
given in [23] as: Fat tree, and (c) DCell [1]. The Three tier is the legacy
DCN architecture. However, to meet the growing de-
EConf = min(EAuth , n × EBreakIn ), (8) mands of the cloud computing, the Fat tree and Dcell
architectures were proposed [2]. Therefore, we use
where EConf is the effort required to compromise the aforementioned three architectures to evaluate the
the confidentiality, EAuth is the effort required to performance of our scheme on legacy as well as state
compromise authentication, and EBreakIn is the effort of the art architectures. The Fat tree and Three tier
required to compromise a single node. Our focus in architectures are switch-centric networks. The nodes
this paper is on the security of the data in the cloud are connected with the access layer switches. Multiple
and we do not take into account the security of the access layer switches are connected using aggregate
authentication system. Therefore, we can say that to layer switches. Core layers switches interconnect the
obtain n fragments, the effort of an attacker increases aggregate layer switches.. The Dcell is a server centric
by a factor of n. Moreover, in case of the DROPS network architecture that uses servers in addition
methodology, the attacker must correctly guess the to switches to perform the communication process
nodes storing fragments of file. Therefore, in the worst within the network [1]. A server in the Dcell architec-
case scenario, the set of nodes compromised by the ture is connected to other servers and a switch. The
attacker will contain all of the nodes storing the file lower level dcells recursively build the higher level
fragments. From Equation (1), we observe that the dcells. The dcells at the same level are fully connected.
probability of the worst case to be successful is very For details about the aforesaid architectures and their
low. The probability that some of the machines (av- performance analysis, the readers are encouraged to
erage case) storing the file fragments will be selected read [1] and [2].
is high in comparison to the worst case probability.
However, the compromised fragments will not be
enough to reconstruct the whole data. In terms of 5.1 Comparative techniques
the probability, the worst, average, and best cases are We compared the results of the DROPS methodol-
dependent on the number of nodes storing fragments ogy with fine-grained replication strategies, namely:
that are selected for an attack. Therefore, all of the (a) DRPA-star, (b) WA-star, (c) A-star, (d) SA1, (e)
three cases are captured by Equation (1). SA2, (f) SA3, (g) Local Min-Min, (h) Global Min-
Besides the general attack of a compromised node, Min, (i) Greedy algorithm, and (j) Genetic Replication
the DROPS methodology can handle the attacks in Algorithm (GRA). The DRPA-star is a data replication
which attacker gets hold of user data by avoiding or algorithm based on the A-star best-first search algo-
disrupting security defenses. Table 2 presents some of rithm. The DRPA-star starts from the null solution
the attacks that are handled by the DROPS methodol- that is called a root node. The communication cost
ogy. The presented attacks are cloud specific that stem at each node n is computed as: cost(n) = g(n) + h(n),
from clouds core technologies. Table 2 also provides a where g(n) is the path cost for reaching n and h(n) is
brief description of the attacks. It is noteworthy that called the heuristic cost and is the estimate of cost
even in case of successful attacks (that are mentioned), from n to the goal node. The DRPA-star searches
the DROPS methodology ensures that the attacker all of the solutions of allocating a fragment to a
gets only a fragment of file as DROPS methodology node. The solution that minimizes the cost within
stores only a single fragment on the node. Moreover, the constraints is explored while others are discarded.
the successful attack has to be on the node that stores The selected solution is inserted into a list called
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation
information: DOI 10.1109/TCC.2015.2400460, IEEE Transactions on Cloud Computing
IEEE TRANSACTIONS ON CLOUD COMPUTING 8

the OPEN list. The list is ordered in the ascending values for µc and µm is advocated in [16].The best
order so that the solution with the minimum cost chromosome represents the solution. GRA utilizes mix
is expanded first. The heuristic used by the DRPA- and match strategy to reach the solution. More details
star is given as h(n) = max(0, (mmk(n)g(n))), where about GRA can be obtained from [16].
mmk(n) is the least cost replica allocation or the max-
min RC. Readers are encouraged to see the details 5.2 Workload
about DRPA-star in [13]. The WA-Star is a refinement
The size of files were generated using a uniform dis-
of the DRPA-star that implements a weighted func-
tribution between 10Kb and 60 Kb. The primary nodes
tion to evaluate the cost. The function is given as:
were randomly selected for replication algorithms. For
f (n) = f (n) + h(n) + (1 − (d(n)/D)h(n). The variable
the DROPS methodology, the S i′ s selected during the
d(n) represents the depth of the node n and D denotes
first cycle of the nodes selection by Algorithm 1 were
the expected depth of the goal node [13]. The A-star
considered as the primary nodes.
is also a variation of the DRPA-star that uses two lists,
The capacity of a node was generated using a
OPEN and FOCAL. The FOCAL list contains only
uniform distribution between ( 12 CS)C and ( 32 CS)C,
those nodes from the OPEN list that have f greater
where 0 ≤ C ≥ 1. For instance, for CS = 150 and
than or equal to the lowest f by a factor of 1 + .
C = 0.6 the capacities of the nodes were uniformly
The node expansion is performed from the FOCAL list
distributed between 45 and 135. The mean value of
instead of the OPEN list. Further details about WA-
g in the OPEN and FOCAL lists was selected as the
Star and A-star can be found in [13]. The SA1 (sub-
value of , for WA-star and A-star, respectively. The
optimal assignments), SA2, and SA3 are DRPA-star
value for level R was set to ⌊ d2 ⌋, where d is the depth
based heuristics. In SA1, at level R or below, only the
of the search tree(number of fragments).
best successors of node n having the least expansion
The read/write (R/W) ratio for the simulations
cost are selected. The SA2 selects the best successors
that used fixed value was selected to be 0.25 (The
of node n only for the first time when it reaches
R/W ratio reflecting 25% reads and 75% writes within
the depth level R. All other successors are discarded.
the cloud). The reason for choosing a high workload
The SA3 works similar to the SA2, except that the
(lower percentage of reads and higher percentage
nodes are removed from OPEN list except the one
of writes) was to evaluate the performance of the
with the lowest cost. Readers are encouraged to read
techniques under extreme cases. The simulations that
[13] for further details about SA1, SA2, and SA3. The
studied the impact of change in the R/W ratio used
LMM can be considered as a special case of the bin
various workloads in terms of R/W ratios. The R/W
packing algorithm. The LMM sorts the file fragments
ratios selected were in the range of 0.10 to 0.90. The
based on the RC of the fragments to be stored at a
selected range covered the effect of high, medium, and
node. The LMM then assigns the fragments in the
low workloads with respect to the R/W ratio.
ascending order. In case of a tie, the file fragment
with minimum size is selected for assignment (name
local Min-Min is derived from such a policy). The 5.3 Results and Discussion
GMM selects the file fragment with global minimum We compared the performance of the DROPS method-
of all the RC associated with a file fragment. In case ology with the algorithms discussed in Section 5.1.
of a tie, the file fragment is selected at random. The The behavior of the algorithms was studied by: (a)
Greedy algorithm first iterates through all of the M increasing the number of nodes in the system, (b)
cloud nodes to find the best node for allocating a increasing the number of objects keeping number
file fragment. The node with the lowest replication of nodes constant, (c) changing the nodes storage
cost is selected. The second node for the fragment capacity, and (d) varying the read/write ratio. The
is selected in the second iteration. However, in the aforesaid parameters are significant as they affect the
second iteration that node is selected that produces problem size and the performance of algorithms [13].
the lowest RC in combination with node already
selected. The process is repeated for all of the file 5.3.1 Impact of increase in number of cloud nodes
fragments. Details of the greedy algorithm can be We studied the performance of the placement tech-
found in [18]. The GRA consists of chromosomes rep- niques and the DROPS methodology by increasing the
resenting various schemes for storing file fragments number of nodes. The performance was studied for
over cloud nodes. Every chromosome consists of M the three discussed cloud architectures. The numbers
genes, each representing a node. Every gene is a N of nodes selected for the simulations were 100, 500,
bit string. If the k-th file fragment is to be assigned 1,024, 2,400, and 30,000. The number of nodes in
to S i , then the k-th bit of i-th gene holds the value the Dcell architecture increases exponentially [2]. For
of one. Genetic algorithms perform the operations of a Dcell architecture, with two nodes in the Dcell0 ,
selection, crossover, and mutation. The value for the the architecture consists of 2,400 nodes. However,
crossover rate (µc ) was selected as 0.9, while for the increasing a single node in the Dcell0 , the total nodes
mutation rate (µm ) the value was 0.01. The use of the increases to 30, 000 [2]. The number of file fragments
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation
information: DOI 10.1109/TCC.2015.2400460, IEEE Transactions on Cloud Computing
IEEE TRANSACTIONS ON CLOUD COMPUTING 9

80 80
DRPA-star DRPA-star
LMM LMM
WA-star 70 WA-star
70
GMM GMM

Aeps-star AƐ
Aeps-star
60
60 SA1 SA1
SA2 SA2
RC savings (%)

RC savings (%)
SA3 50 SA3
50 Greedy Greedy
GRA GRA
40
DROPS-BC DROPS-BC
40 DROPS-CC DROPS-CC
DROPS-EC 30 DROPS-EC

30
20

20 10
100 500 1024 2400 30000 100 500 1024 2400 30000
No. of nodes No. of nodes

(a) (b)

Fig. 2: (a) RC versus number of nodes (Three tier) (b) RC versus number of nodes (Fat tier)

80 25
DRPA-star DROPS-BC
LMM DROPS-CC
WA-star 24 DROPS-EC
70
GMM

Aeps-star
23
60 SA1
SA2
RC savings (%)

RC savings (%)
SA3 22
50 Greedy
GRA
21
DROPS-BC
40 DROPS-CC
DROPS-EC 20

30
19

20 18
100 500 1024 2400 30000 100 500 1024 2400 30000
No. of nodes No. of nodes

(a) (b)

Fig. 3: (a) RC versus number of nodes (Dcell) (b) RC versus number of nodes for DROPS variations with
maximum available capacity constraint (Three tier)

24 30
DROPS-BC DROPS-BC
DROPS-CC DROPS-CC
DROPS-EC DROPS-EC
23 28

22 26
RC savings (%)

RC savings (%)

21 24

20 22

19 20

18 18
100 500 1024 2400 30000 100 500 1024 2400 30000
No. of nodes No. of nodes

(a) (b)

Fig. 4: RC versus number of nodes for DROPS variations with maximum available capacity constraints (a) Fat
tree (b) Dcell

was set to 50. For the first experiment we used tecture exhibits better inter node connectivity and
C = 0.2. Fig. 2 (a), Fig. 2 (b), and Fig. 3 (a) show robustness [2]. The DRPA-star gave best solutions as
the results for the Three tier, Fat tree, and Dcell compared to other techniques and registered consis-
architectures, respectively. The reduction in network tent performance with the increase in the number
transfer time for a file is termed as RC. In the figures, of nodes. Similarly, WA-star, A-star, GRA, greedy,
the BC stands for the betweenness centrality, the CC and SA3 showed almost consistent performance with
stands for closeness centrality, and the EC stands for various number of nodes. The performance of LMM
eccentricity centrality. The interesting observation is and GMM gradually increased with the increase in
that although all of the algorithms showed similar number of nodes since the increase in the number of
trend in performance within a specific architecture, nodes increased the number of bins. The SA1 and SA2
the performance of the algorithms was better in the also showed almost constant performance in all of the
Dcell architecture as compared to three tier and fat three architectures. However, it is important to note
tree architectures. This is because the Dcell archi- that SA2 ended up with a decrease in performance
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation
information: DOI 10.1109/TCC.2015.2400460, IEEE Transactions on Cloud Computing
IEEE TRANSACTIONS ON CLOUD COMPUTING 10

80 80
DRPA-star DRPA-star
LMM LMM
70 WA-star WA-star
70
GMM GMM

Aeps-star AƐ
Aeps-star
60
SA1 60 SA1
SA2 SA2
RC savings (%)

RC savings (%)
50 SA3 SA3
Greedy 50 Greedy
GRA GRA
40
DROPS-BC DROPS-BC
DROPS-CC 40 DROPS-CC
30 DROPS-EC DROPS-EC

30
20

10 20
50 100 200 300 400 500 50 100 200 300 400 500
No. of fragments No. of fragments

(a) (b)

Fig. 5: (a) RC versus number of file fragments (Three tier) (b) RC versus number of file fragments (Fat tier)

as compared to the initial performance. This may be is that nodes with higher eccentricity are closer to all
due to the fact that SA2 only expands the node with other nodes in the network that results in lower RC
minimum cost when it reaches at certain depth for value for accessing the fragments.
the first time. Such a pruning for the first time, might
have purged nodes by providing better global access 5.3.2 Impact of increase in number of file fragments
time. The DROPS methodology, did not employ full- The increase in number of file fragments can strain
scale replication. Every fragment is replicated only the storage capacity of the cloud that, in turn may
once in the system. The smaller number of replicas of affect the selection of the nodes. To study the impact
any fragment and separation of nodes by T-coloring on performance due to increase in number of file
decreased the probability of finding that fragment by fragments, we set the number of nodes to 30,000. The
an attacker. Therefore, the increase in the security numbers of file fragments selected were 50, 100, 200,
level of the data is accompanied by the drop in 300, 400, and 500. The workload was generated with
performance as compared to the comparative tech- C = 45% to observe the effect of increase number
niques discussed in this paper. It is important to note of file fragments with fairly reasonable amount of
that the DROPS methodology was implemented using memory and to discern the performance of all the
three centrality measures namely: (a) betweenness, (b) algorithms. The results are shown in Fig. 5 (a), Fig. 5
closeness, and (c) eccentricity. However, Fig. 2(a) and (b), and Fig. 6 (a) for the Three tier, Fat tree, and Dcell
Fig. 2(b) show only a single plot. Due to the inherent architectures, respectively. It can be observed from the
structure of the Three tier and Fat tree architectures, plots that the increase in the number of file fragments
all of the nodes in the network are at the same reduced the performance of the algorithms, in general.
distance from each other or exist at the same level. However, the greedy algorithm showed the most
Therefore, the centrality measure is the same for all of improved performance. The LMM showed the highest
the nodes. This results in the selection of same node loss in performance that is little above 16%. The loss in
for storing the file fragment. Consequently, the per- performance can be attributed to the storage capacity
formance showed the same value and all three lines constraints that prohibited the placements of some
are on the same points. However, this is not the case fragments at nodes with optimal retrieval time. As
for the Dcell architecture. In the Dcell architecture, discussed earlier, the DROPS methodology produced
nodes have different centrality measures resulting in similar results in three tier and fat tree architectures.
the selection of different nodes. It is noteworthy to However, from the Dcell architecture, it is clear that
mention that in Fig 3(a), the eccentricity centrality the DROPS methodology with eccentricity centrality
performs better as compared to the closeness and be- maintains the supremacy on the other two centralities.
tweenness centralities because the nodes with higher
eccentricity are located closer to all other nodes within 5.3.3 Impact of increase in storage capacity of nodes
the network. To check the effect of closeness and Next, we studied the effect of change in the nodes
betweenness centralities, we modified the heuristic storage capacity. A change in storage capacity of the
presented in Algorithm 1. Instead of selecting the nodes may affect the number of replicas on the node
node with criteria of only maximum centrality, we due to storage capacity constraints. Intuitively, a lower
selected the node with: (a) maximum centrality and node storage capacity may result in the elimination
(b) maximum available storage capacity. The results of some optimal nodes to be selected for replication
are presented in Fig. 3 (b), Fig. 4 (a), and Fig. 4 (b). It is because of violation of storage capacity constraints.
evident that the eccentricity centrality resulted in the The elimination of some nodes may degrade the per-
highest performance while the betweenness centrality formance to some extent because a node giving lower
showed the lowest performance. The reason for this access time might be pruned due to non-availability
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation
information: DOI 10.1109/TCC.2015.2400460, IEEE Transactions on Cloud Computing
IEEE TRANSACTIONS ON CLOUD COMPUTING 11

90 80
DRPA-star DRPA-star
LMM LMM
80 WA-star 70 WA-star
GMM GMM
70 AƐ
Aeps-star AƐ
Aeps-star
60
SA1 SA1
SA2 SA2
RC savings (%)

RC savings (%)
60
SA3 50 SA3
50 Greedy Greedy
GRA GRA
40
DROPS-BC DROPS-BC
40
DROPS-CC DROPS-CC
DROPS-EC 30 DROPS-EC
30

20 20

10 10
50 100 200 300 400 500 20 25 30 35 40 45
No. of fragments Node storage capacity

(a) (b)

Fig. 6: (a) RC versus number of file fragments (Dcell) (b) RC versus nodes storage capacity (Three tier)

80 80
DRPA-star DRPA-star
LMM LMM
70 WA-star 70 WA-star
GMM GMM

Aeps-star AƐ
Aeps-star
60 60
SA1 SA1
SA2 SA2
RC savings (%)

RC savings (%)
50 SA3 50 SA3
Greedy Greedy
GRA GRA
40 40
DROPS-BC DROPS-BC
DROPS-CC DROPS-CC
30 DROPS-EC 30 DROPS-EC

20 20

10 10
20 25 30 35 40 45 20 25 30 35 40 45
Node storage capacity Node storage capacity

(a) (b)

Fig. 7: (a) RC versus nodes storage capacity (Fat tree) (b) RC versus nodes storage capacity (Dcell)

90 90
DRPA-star DRPA-star
LMM LMM
80 WA-star 80 WA-star
GMM GMM

Aeps-star AƐ
Aeps-star
70 70
SA1 SA1
SA2 SA2
RC savings (%)

RC savings (%)

60 SA3 60 SA3
Greedy Greedy
GRA GRA
50 50
DROPS-BC DROPS-BC
DROPS-CC DROPS-CC
40 DROPS-EC 40 DROPS-EC

30 30

20 20
10 20 30 40 50 60 70 80 90 10 20 30 40 50 60 70 80 90
R/W ratio R/W ratio

(a) (b)

Fig. 8: (a) RC versus R/W ratio (Three tree) (b) RC versus R/W ratio (Fat tree)

90
DRPA-star
LMM
cation of fragments, increasing the performance gain.
80 WA-star
GMM However, node capacity above certain level will not

Aeps-star
70
SA1 change the performance significantly as replicating
SA2
RC savings (%)

60 SA3
Greedy
the already replicated fragments will not produce con-
50
GRA
DROPS-BC
siderable performance increase. If the storage nodes
DROPS-CC
40 DROPS-EC have enough capacity to store the allocated file frag-
30
ments, then a further increase in the storage capacity
20
of a node cannot cause the fragments to be stored
10 20 30 40 50 60 70 80 90 again. Moreover, the T-coloring allows only a single
R/W ratio
replica to be stored on any node. Therefore, after a
Fig. 9: RC versus R/W ratio (Dcell) certain point, the increase in storage capacity might
not affect the performance.
We increase the nodes storage capacity incremen-
of enough storage space to store the file fragment. tally from 20% to 40%. The results are shown in Fig.
Higher node storage capacity allows full-scale repli- 6 (b), Fig. 7 (a), and Fig. 7 (b). It is observable from
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation
information: DOI 10.1109/TCC.2015.2400460, IEEE Transactions on Cloud Computing
IEEE TRANSACTIONS ON CLOUD COMPUTING 12

Three tier
Fat tree
more replicas of fragments resulting in increased cost
90 Dcell
of updating the replicas. Therefore, the increased cost
75 of updating replicas underpins the advantage of de-
creased cost of reading with higher number of replicas
Failed nodes (%)

60 at R/W ratio above 0.50. It is also important to men-


45 tion that even at higher R/W ratio values the DRPA-
star, WA-star, A-star, and Greedy algorithms almost
30 maintained their initial RC saving values. The high
15
performance of the aforesaid algorithms is due to the
fact that these algorithms focus on the global RC value
0
500 1024 2400 30000
while replicating the fragments. Therefore, the global
No. of nodes perception of these algorithms resulted in high perfor-
Fig. 10: Fault tolerance level of DROPS mance. Alternatively, LMM and GMM did not show
substantial performance due to their local RC view
while assigning a fragment to a node. The SA1, SA2,
the plots that initially, all of the algorithms showed and SA3 suffered due to their restricted search tree
significant increase in performance with an increase that probably ignored some globally high performing
in the storage capacity. Afterwards, the marginal in- nodes during expansion. The DROPS methodology
crease in the performance reduces with the increase in maintained almost consistent performance as is ob-
the storage capacity. The DRPA-star, greedy, WA-star, servable from the plots. The reason for this is that the
and A-star showed nearly similar performance and DROPS methodology replicates the fragments only
recorded higher performance. The DROPS methodol- once, so varying R/W ratios did not affect the results
ogy did not show any considerable change in results considerably. However, the slight changes in the RC
when compared to previously discussed experiments value are observed. This might be due to the reason
(change in number of nodes and files). This is because that different nodes generate high cost for R/W of
the DROPS methodology does not go for a full-scale fragments with different R/W ratio.
replication of file fragments rather they are replicated As discussed earlier, the comparative techniques
only once and a single node only stores a single focus on the performance and try to reduce the RC
fragment. Single time replication does not require as much as possible. The DROPS methodology, on
high storage capacity. Therefore, the change in nodes the other hand, is proposed to collectively approach
storage capacity did not affect the performance of the security and performance. To increase the security
DROPS to a notable extent. level of the data, the DROPS methodology sacrifices
the performance to certain extent. Therefore, we see a
drop in the performance of the DROPS methodology
5.3.4 Impact of increase in the read/write ratio as compared to discussed comparative techniques.
The change in R/W ratio affects the performance of However, the drop in performance is accompanied by
the discussed comparative techniques. An increase in much needed increase in security level.
the number of reads would lead to a need of more Moreover, it is noteworthy that the difference in
replicas of the fragments in the cloud. The increased performance level of the DROPS methodology and
number of replicas decreases the communication cost the comparative techniques is least with the reduced
associated with the reading of fragments. However, storage capacity of the nodes (see Fig. 6 (b), Fig. 7
the increased number of writes demands that the (a), and Fig. 7 (b)). The reduced storage capacity pro-
replicas be placed closer to the primary node. The scribes the comparative techniques to place as many
presence of replicas closer to the primary node results replicas as required for the optimized performance. A
in decreased RC associated with updating replicas. further reduction in the storage capacity will tend to
The higher write ratios may increase the traffic on the even lower the performance of the comparative tech-
network for updating the replicas. niques. Therefore, we conclude that the difference in
Fig. 8 (a), Fig. 8 (b), and Fig. 9 show the perfor- performance level of the DROPS methodology and the
mance of the comparative techniques and the DROPS comparative techniques is least when the comparative
methodology under varying R/W ratios. It is ob- techniques reduce the extensiveness of replication for
served that all of the comparative techniques showed any reason.
an increase in the RC savings up to the R/W ratio of Due to the fact that the DROPS methodology re-
0.50. The decrease in the number of writes caused the duces the number of replicas, we have also investi-
reduction of cost associated with updating the replicas gates the fault tolerance of the DROPS methodology.
of the fragments. However, all of the comparative If two nodes storing the same file fragment fail, the
techniques showed some sort of decrease in RC saving result will be incomplete or faulty file. We randomly
for R/W ratios above 0.50. This may be attributed to picked and failed the nodes to check that what per-
the fact that an increase in the number of reads caused centage of failed nodes will result in loss of data or
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation
information: DOI 10.1109/TCC.2015.2400460, IEEE Transactions on Cloud Computing
IEEE TRANSACTIONS ON CLOUD COMPUTING 13

TABLE 3: Average RC (%) savings for increase in number of nodes

Architec- DRPA LMM wa-star GMM A-star SA1 SA2 SA3 Greedy GRA DROPS- DROPS- DROPS-
ture BC CC EC
Three 74.70 36.23 72.55 45.62 71.82 59.86 49.09 64.38 69.1 66.1 24.41 24.41 24.41
tier
Fat 76.76 38.95 75.22 45.77 73.33 60.89 52.67 68.33 71.64 70.54 23.28 23.28 23.28
tree
Dcell 79.6 44.32 76.51 46.34 76.43 62.03 54.90 71.53 73.09 72.34 23.06 25.16 30.20

TABLE 4: Average RC (%) savings for increase in number of fragments

Architec- DRPA LMM wa-star GMM A-star SA1 SA2 SA3 Greedy GRA DROPS- DROPS- DROPS-
ture BC CC EC
Three 74.63 40.08 69.69 48.67 68.82 60.29 49.65 62.18 71.25 64.44 23.93 23.93 23.93
tier
Fat 75.45 44.33 70.90 52.66 70.58 61.12 51.09 64.64 71.73 66.90 23.42 23.42 23.42
tree
Dcell 76.08 45.90 72.49 52.78 72.33 62.12 50.02 64.66 70.92 69.50 23.17 25.35 28.17

TABLE 5: Average RC (%) savings for increase in storage capacity

Architec- DRPA LMM wa-star GMM A-star SA1 SA2 SA3 Greedy GRA DROPS- DROPS- DROPS-
ture BC CC EC
Three 72.37 28.26 71.99 40.63 71.19 59.29 48.67 61.83 72.09 63.54 19.89 19.89 19.89
tier
Fat 69.19 28.34 70.73 41.99 66.20 60.28 51.29 61.83 69.33 62.16 21.60 21.60 21.60
tree
Dcell 73.57 31.04 71.37 42.41 67.70 60.79 50.42 63.78 69.64 64.03 21.91 22.88 24.68

TABLE 6: Average RC (%) savings for increase in R/W ratio

Architec- DRPA LMM wa-star GMM A-star SA1 SA2 SA3 Greedy GRA DROPS- DROPS- DROPS-
ture BC CC EC
Three 77.28 32.54 76.32 53.20 75.38 55.13 49.61 59.74 73.64 58.27 24.08 24.08 24.08
tier
Fat 76.29 31.47 74.81 52.08 73.37 53.33 49.35 57.87 71.61 57.47 23.68 23.68 23.68
tree
Dcell 78.72 33.66 78.03 55.82 76.47 57.44 52.28 61.94 74.54 60.16 23.32 23.79 24.23

selection of two nodes storing same file fragment. 6 C ONCLUSIONS


The numbers of nodes used in aforesaid experiment
We proposed the DROPS methodology, a cloud stor-
were 500, 1,024, 2,400, and 30, 000. The number of file
age security scheme that collectively deals with the
fragments was set to 50. The results are shown in Fig.
security and performance in terms of retrieval time.
10. As can be seen in Fig. 10, the increase in number of
The data file was fragmented and the fragments are
nodes increases the fault tolerance level. The random
dispersed over multiple nodes. The nodes were sepa-
failure has generated a reasonable percentage for a
rated by means of T-coloring. The fragmentation and
soundly decent number of nodes.
dispersal ensured that no significant information was
We report the average RC (%) savings in Table 3, Ta- obtainable by an adversary in case of a successful
ble 4, Table 5, and Table 6. The averages are computed attack. No node in the cloud, stored more than a single
over all of the RC (%) savings within a certain class of fragment of the same file. The performance of the
experiments. Table 3 reveals the average results of all DROPS methodology was compared with full-scale
of the experiments conducted to observe the impact of replication techniques. The results of the simulations
increase in the number of nodes in the cloud for all of revealed that the simultaneous focus on the security
the three discussed cloud architectures. Table 4 depicts and performance, resulted in increased security level
the average RC (%) savings for the increase in the of data accompanied by a slight performance drop.
number of fragments. Table 5 and Table 6 describe the Currently with the DROPS methodology, a user has
average results for the increase the storage capacity to download the file, update the contents, and upload
and R/W ratio, respectively. It is evident from the it again. It is strategic to develop an automatic update
average results that the Dcell architecture showed mechanism that can identify and update the required
better results due to its higher connectivity ratio. fragments only. The aforesaid future work will save
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation
information: DOI 10.1109/TCC.2015.2400460, IEEE Transactions on Cloud Computing
IEEE TRANSACTIONS ON CLOUD COMPUTING 14

the time and resources utilized in downloading, up- [20] Y. Tang, P. P. Lee, J. C. S. Lui, and R. Perlman, “Secure overlay
dating, and uploading the file again. Moreover, the cloud storage with access control and assured deletion,” IEEE
Transactions on Dependable and Secure Computing, Vol. 9, No. 6,
implications of TCP incast over the DROPS method- Nov. 2012, pp. 903-916.
ology need to be studied that is relevant to distributed [21] M. Tu, P. Li, Q. Ma, I-L. Yen, and F. B. Bastani, “On the
data storage and access. optimal placement of secure data objects over Internet,” In
Proceedings of 19th IEEE International Parallel and Distributed
Processing Symposium, pp. 14-14, 2005.
[22] D. Zissis and D. Lekkas, “Addressing cloud computing secu-
R EFERENCES rity issues,” Future Generation Computer Systems, Vol. 28, No. 3,
2012, pp. 583-592.
[1] K. Bilal, S. U. Khan, L. Zhang, H. Li, K. Hayat, S. A. Madani, [23] J. J. Wylie, M. Bakkaloglu, V. Pandurangan, M. W. Bigrigg,
N. Min-Allah, L. Wang, D. Chen, M. Iqbal, C. Z. Xu, and A. Y. S. Oguz, K. Tew, C. Williams, G. R. Ganger, and P. K. Khosla,
Zomaya, “Quantitative comparisons of the state of the art data “Selecting the right data distribution scheme for a survivable
center architectures,” Concurrency and Computation: Practice and storage system,” Carnegie Mellon University, Technical Report
Experience, Vol. 25, No. 12, 2013, pp. 1771-1783. CMU-CS-01-120, May 2001.
[2] K. Bilal, M. Manzano, S. U. Khan, E. Calle, K. Li, and A. [24] M. Newman, Networks: An introduction, Oxford University
Zomaya, “On the characterization of the structural robustness Press, 2009.
of data center networks,” IEEE Transactions on Cloud Computing, [25] A. R. Khan, M. Othman, S. A. Madani, S. U. Khan,
Vol. 1, No. 1, 2013, pp. 64-77. “A survey of mobile cloud computing application
[3] D. Boru, D. Kliazovich, F. Granelli, P. Bouvry, and A. Y. Zomaya, models,” IEEE Communications Surveys and Tutorials, DOI:
“Energy-efficient data replication in cloud computing datacen- 10.1109/SURV.2013.062613.00160.
ters,” In IEEE Globecom Workshops, 2013, pp. 446-451. .
[4] Y. Deswarte, L. Blain, and J-C. Fabre, “Intrusion tolerance in dis-
tributed computing systems,” In Proceedings of IEEE Computer
Society Symposium on Research in Security and Privacy, Oakland
CA, pp. 110-121, 1991.
Mazhar Ali is currently a PhD student at North Dakota State Uni-
[5] B. Grobauer, T.Walloschek, and E. Stocker, “Understanding
versity, Fargo, ND, USA. His research interests include information
cloud computing vulnerabilities,” IEEE Security and Privacy, Vol.
security, formal verification, modeling, and cloud computing systems.
9, No. 2, 2011, pp. 50-57.
[6] W. K. Hale, “Frequency assignment: Theory and applications,”
Proceedings of the IEEE, Vol. 68, No. 12, 1980, pp. 1497-1514.
[7] K. Hashizume, D. G. Rosado, E. Fernndez-Medina, and E. B.
Fernandez, “An analysis of security issues for cloud comput- Kashif Bilal did his PhD in Electrical and Computer Engineering
ing,” Journal of Internet Services and Applications, Vol. 4, No. 1, from the North Dakota State University, USA. His research interests
2013, pp. 1-13. include data center networks, distributed computing, and energy
[8] M. Hogan, F. Liu, A.Sokol, and J. Tong, “NIST cloud computing efficiency.
standards roadmap,” NIST Special Publication, July 2011.
[9] W. A. Jansen, “Cloud hooks: Security and privacy issues in
cloud computing,” In 44th Hawaii IEEE International Conference
onSystem Sciences (HICSS), 2011, pp. 1-10.
[10] A. Juels and A. Opera, “New approaches to security and Samee U. Khan is an assistant professor at the North Dakota State
availability for cloud data,” Communications of the ACM, Vol. University. His research interest include topics, such as sustainable
56, No. 2, 2013, pp. 64-73. computing, social networking, and reliability. He is a senior member
[11] G. Kappes, A. Hatzieleftheriou, and S. V. Anastasiadis, “Dike: of IEEE, and a fellow of IET and BCS.
Virtualization-aware Access Control for Multitenant Filesys-
tems,” University of Ioannina, Greece, Technical Report No.
DCS2013-1, 2013.
[12] L. M. Kaufman, “Data security in the world of cloud comput-
ing,” IEEE Security and Privacy, Vol. 7, No. 4, 2009, pp. 61-64. Bharadwaj Veeravalli is an associate professor at the National
[13] S. U. Khan, and I. Ahmad, “Comparison and analysis of University of Singapore. His main stream research interests include,
ten static heuristics-based Internet data replication techniques,” Scheduling problems, Cloud/Cluster/Grid computing, Green Stor-
Journal of Parallel and Distributed Computing, Vol. 68, No. 2, 2008, age, and Multimedia computing. He is a senior member of the IEEE.
pp. 113-136.
[14] A. N. Khan, M. L. M. Kiah, S. U. Khan, and S. A. Madani,
“Towards Secure Mobile Cloud Computing: A Survey,” Future
Generation Computer Systems, Vol. 29, No. 5, 2013, pp. 1278-1299.
[15] A. N. Khan, M.L. M. Kiah, S. A. Madani, and M. Ali, “En- Keqin Li is a SUNY distinguished professor. His research interests
hanced dynamic credential generation scheme for protection include mainly in the areas of design and analysis of algorithms,
of user identity in mobile-cloud computing, The Journal of parallel and distributed computing, and computer networking. He is
Supercomputing, Vol. 66, No. 3, 2013, pp. 1687-1706 . a senior member of the IEEE.
[16] T. Loukopoulos and I. Ahmad, “Static and adaptive dis-
tributed data replication using genetic algorithms,” Journal of
Parallel and Distributed Computing, Vol. 64, No. 11, 2004, pp.
1270-1285.
[17] A. Mei, L. V. Mancini, and S. Jajodia, “Secure dynamic frag- Albert Y. Zomaya is currently the chair professor of high per-
ment and replica allocation in large-scale distributed file sys- formance computing and networking in the School of Information
tems,” IEEE Transactions on Parallel and Distributed Systems, Vol. Technologies, The University of Sydney. He is a fellow of IEEE, IET,
14, No. 9, 2003, pp. 885-896. and AAAS.
[18] L. Qiu, V. N. Padmanabhan, and G. M. Voelker, “On the
placement of web server replicas,” In Proceedings of INFOCOM
2001, Twentieth Annual Joint Conference of the IEEE Computer and
Communications Societies, Vol. 3, pp. 1587-1596, 2001.
[19] D. Sun, G. Chang, L. Sun, and X. Wang, “Surveying and
analyzing security, privacy and trust issues in cloud computing
environments,” Procedia Engineering, Vol. 15, 2011, pp. 2852
2856.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy