DROPS: Division and Replication of Data in Cloud For Optimal Performance and Security
DROPS: Division and Replication of Data in Cloud For Optimal Performance and Security
DROPS: Division and Replication of Data in Cloud For Optimal Performance and Security
Abstract—Outsourcing data to a third-party administrative control, as is done in cloud computing, gives rise to security concerns.
The data compromise may occur due to attacks by other users and nodes within the cloud. Therefore, high security measures are
required to protect data within the cloud. However, the employed security strategy must also take into account the optimization
of the data retrieval time. In this paper, we propose Division and Replication of Data in the Cloud for Optimal Performance and
Security (DROPS) that collectively approaches the security and performance issues. In the DROPS methodology, we divide a
file into fragments, and replicate the fragmented data over the cloud nodes. Each of the nodes stores only a single fragment
of a particular data file that ensures that even in case of a successful attack, no meaningful information is revealed to the
attacker. Moreover, the nodes storing the fragments, are separated with certain distance by means of graph T-coloring to prohibit
an attacker of guessing the locations of the fragments. Furthermore, the DROPS methodology does not rely on the traditional
cryptographic techniques for the data security; thereby relieving the system of computationally expensive methodologies. We
show that the probability to locate and compromise all of the nodes storing the fragments of a single file is extremely low. We
also compare the performance of the DROPS methodology with ten other schemes. The higher level of security with slight
performance overhead was observed.
If M = 30, s = 10, and z = 7, then P (10, 7) = 0.0046. TABLE 1: Notations and their meanings
However, if we choose M = 50, s = 20, and z = 15, Symbols Meanings
then P (20, 15) = 0.000046. With the increase in M, the M Total number of nodes in the cloud
probability of a state reduces further. Therefore, we N Total number of file fragments to be placed
can say that the greater the value of M, the less prob- Ok k-th fragment of file
able that an attacker will obtain the data file. In cloud ok Size of Ok
systems with thousands of nodes, the probability for Si i-th node
an attacker to obtain a considerable amount of data, si Size of S i
reduces significantly. However, placing each fragment ceni Centrality measure for S i
once in the system will increase the data retrieval colS i Color assigned to S i
time. To improve the data retrieval time, fragments T A set containing distances by which assignment of
can be replicated in a manner that reduces retrieval fragments must be separated
time to an extent that does not increase the aforesaid rki Number of reads for Ok from S i
probability. Rki Aggregate read cost of rki
wki Number of writes for Ok from S i
3.2 Centrality Wki Aggregate write cost of wki
N N ik Nearest neighbor of S i holding Ok
The centrality of a node in a graph provides the
c(i,j) Communication cost between S i and S j
measure of the relative importance of a node in the
Pk Primary node for Ok
network. The objective of improved retrieval time
Rk Replication schema of Ok
in replication makes the centrality measures more
RT Replication time
important. There are various centrality measures; for
instance, closeness centrality, degree centrality, be-
tweenness centrality, eccentricity centrality, and eigen-
3.2.3 Eccentricity
vector centrality. We only elaborate on the closeness,
betweenness, and eccentricity centralities because we The eccentricity of a node n is the maximum distance
are using the aforesaid three centralities in this work. to any node from a node n [24]. A node is more central
For the remainder of the centralities, we encourage in the network, if it is less eccentric. Formally, the
the readers to review [24]. eccentricity can be given as:
E(va ) = maxb d(va , vb ), (4)
3.2.1 Betweenness Centrality
The betweenness centrality of a node n is the number where d(va , vb ) represents the distance between node
of the shortest paths, between other nodes, passing va and node vb . It may be noted that in our evaluation
through n [24]. Formally, the betweenness centrality of the strategies the centrality measures introduced
of any node v in a network is given as: above seem very meaningful and relevant than using
simple hop-count kind of metrics.
δab (v)
Cb (v) = ∑ , (2)
a≠v≠b δab
3.3 T-coloring
where δab is the total number of shortest paths be- Suppose we have a graph G = (V, E) and a set T
tween a and b, and δab (v) is the number of shortest containing non-negative integers including 0. The T-
paths between a and b passing through v. The variable coloring is a mapping function f from the vertices of V
Cb (v) denotes the betweenness centrality for node v. to the set of non-negative integers, such that ∣f(x)- f(y)∣
∉ T , where (x, y) ∈ E. The mapping function f assigns
3.2.2 Closeness Centrality
a color to a vertex. In simple words, the distance
A node is said to be closer with respect to all of between the colors of the adjacent vertices must not
the other nodes within a network, if the sum of the belong to T. Formulated by Hale [6], the T-coloring
distances from all of the other nodes is lower than problem for channel assignment assigns channels to
the sum of the distances of other candidate nodes the nodes, such that the channels are separated by a
from all of the other nodes [24]. The lower the sum distance to avoid interference.
of distances from the other nodes, the more central is
the node. Formally, the closeness centrality of a node
v in a network is defined as: 4 DROPS
N −1 4.1 System Model
Cc (v) = , (3)
∑ d(v, a) Consider a cloud that consists of M nodes, each with
a≠v its own storage capacity. Let S i represents the name
where N is total number of nodes in a network and of i-th node and si denotes total storage capacity of
d(v, a) represents the distance between node v and S i . The communication time between S i and S j is
node a. the total time of all of the links within a selected path
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation
information: DOI 10.1109/TCC.2015.2400460, IEEE Transactions on Cloud Computing
IEEE TRANSACTIONS ON CLOUD COMPUTING 5
from S i to S j represented by c(i, j). We consider N successful attack on a node might put the data con-
number of file fragments such that Ok denotes k-th fidentiality or integrity, or both at risk. The aforesaid
fragment of a file while ok represents the size of k-th scenario can occur both in the case of intrusion or
fragment. Let the total read and write requests from S i accidental errors. In such systems, performance in
for Ok be represented by rki and wki , respectively. Let terms of retrieval time can be enhanced by employing
Pk denote the primary node that stores the primary replication strategies. However, replication increases
copy of Ok . The replication scheme for Ok denoted by the number of file copies within the cloud. Thereby,
Rk is also stored at Pk . Moreover, every S i contains increasing the probability of the node holding the file
a two-field record, storing Pk for Ok and N N ik that to be a victim of attack as discussed in Section 1.
represents the nearest node storing Ok . Whenever Security and replication are essential for a large-scale
there is an update in Ok , the updated version is system, such as cloud, as both are utilized to provide
sent to Pk that broadcasts the updated version to all services to the end user. Security and replication must
of the nodes in Rk . Let b(i,j) and t(i,j) be the total be balanced such that one service must not lower the
bandwidth of the link and traffic between sites S i service level of the other.
and S j , respectively . The centrality measure for S i In the DROPS methodology, we propose not to
is represented by ceni . Let colS i store the value of store the entire file at a single node. The DROPS
assigned color to S i . The colS i can have one out of two methodology fragments the file and makes use of the
values, namely: open color and close color. The value cloud for replication. The fragments are distributed
open color represents that the node is available for such that no node in a cloud holds more than a
storing the file fragment. The value close color shows single fragment, so that even a successful attack on
that the node cannot store the file fragment. Let T be the node leaks no significant information. The DROPS
a set of integers starting from zero and ending on a methodology uses controlled replication where each
prespecified number. If the selected number is three, of the fragments is replicated only once in the cloud to
then T = {0, 1, 2, 3}. The set T is used to restrict the improve the security. Although, the controlled repli-
node selection to those nodes that are at hop-distances cation does not improve the retrieval time to the level
not belonging to T. For the ease of reading, the most of full-scale replication, it significantly improves the
commonly used notations are listed in Table 1. security.
Our aim is to minimize the overall total network In the DROPS methodology, user sends the data file
transfer time or replication time (RT) or also termed to cloud. The cloud manager system (a user facing
as replication cost (RC). The RT is composed of two server in the cloud that entertains user’s requests)
factors: (a) time due to read requests and (b) time due upon receiving the file performs: (a) fragmentation,
to write requests. The total read time of Ok by S i from (b) first cycle of nodes selection and stores one frag-
N N ik is denoted by Rki and is given by: ment over each of the selected node, and (c) second
cycle of nodes selection for fragments replication.
Rki = rki ok c(i, N N ik ). (5) The cloud manager keeps record of the fragment
placement and is assumed to be a secure entity.
The total time due to the writing of Ok by S i ad-
The fragmentation threshold of the data file is spec-
dressed to the Pk is represented as Wki and is given:
ified to be generated by the file owner. The file owner
Wki = wki ok (c(i, Pk ) + ∑ c(Pk , j)). (6) can specify the fragmentation threshold in terms of
(j∈Rk ),j≠i either percentage or the number and size of different
fragments. The percentage fragmentation threshold,
The overall RT is represented by:
for instance, can dictate that each fragment will be
M N of 5% size of the total size of the file. Alternatively,
RT = ∑ ∑ (Rki + Wki ) (7) the owner may generate a separate file containing
i=1 k=1
information about the fragment number and size, for
The storage capacity constraint states that a file frag- instance, fragment 1 of size 5,000 Bytes, fragment 2
ment can only be assigned to a node, if storage of size 8,749 Bytes. We argue that the owner of the
capacity of the node is greater or equal to the size file is the best candidate to generate fragmentation
of fragment. The bandwidth constraint states that threshold. The owner can best split the file such that
b(i, j) ≥ t(i, j)∀i, ∀j. The DROPS methodology as- each fragment does not contain significant amount
signs the file fragments to the nodes in a cloud that of information as the owner is cognizant of all the
minimizes the RT, subject to capacity and bandwidth facts pertaining to the data. The default percentage
constraints. fragmentation threshold can be made a part of the
Service Level Agreement (SLA), if the user does not
specify the fragmentation threshold while uploading
4.2 DROPS the data file. We primarily focus the storage system
In a cloud environment, a file in its totality, stored security in this work with an assumption that the
at a node leads to a single point of failure [17]. A communication channel between user and the cloud
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation
information: DOI 10.1109/TCC.2015.2400460, IEEE Transactions on Cloud Computing
IEEE TRANSACTIONS ON CLOUD COMPUTING 6
Attack Description
Data Recovery Rollback of VM to some previous state. May expose previously stored data.
Cross VM attack Malicious VM attacking co-resident VM that may lead to data breach.
Improper media sanitization Data exposure due to improper sanitization of storage devices.
E-discovery Data exposure of one user due to seized hardware for investigations related to some other users.
VM escape A malicious user or VM escapes from the control of VMM. Provides access to storage and compute devices.
VM rollback Rollback of VM to some previous state. May expose previously stored data.
nodes must be greater than n because each of the com- the fragment.
promised node may not give fragment in the DROPS
methodology as the nodes are separated based on
the T-coloring. Alternatively, an attacker has to com-
5 E XPERIMENTAL SETUP AND RESULTS
promise the authentication system of cloud [23]. The The communicational backbone of cloud computing is
effort required by an attacker to compromise a node the Data Center Network (DCN) [2]. In this paper, we
(in systems dealing with fragments/shares of data) is use three DCN architectures namely: (a) Three tier, (b)
given in [23] as: Fat tree, and (c) DCell [1]. The Three tier is the legacy
DCN architecture. However, to meet the growing de-
EConf = min(EAuth , n × EBreakIn ), (8) mands of the cloud computing, the Fat tree and Dcell
architectures were proposed [2]. Therefore, we use
where EConf is the effort required to compromise the aforementioned three architectures to evaluate the
the confidentiality, EAuth is the effort required to performance of our scheme on legacy as well as state
compromise authentication, and EBreakIn is the effort of the art architectures. The Fat tree and Three tier
required to compromise a single node. Our focus in architectures are switch-centric networks. The nodes
this paper is on the security of the data in the cloud are connected with the access layer switches. Multiple
and we do not take into account the security of the access layer switches are connected using aggregate
authentication system. Therefore, we can say that to layer switches. Core layers switches interconnect the
obtain n fragments, the effort of an attacker increases aggregate layer switches.. The Dcell is a server centric
by a factor of n. Moreover, in case of the DROPS network architecture that uses servers in addition
methodology, the attacker must correctly guess the to switches to perform the communication process
nodes storing fragments of file. Therefore, in the worst within the network [1]. A server in the Dcell architec-
case scenario, the set of nodes compromised by the ture is connected to other servers and a switch. The
attacker will contain all of the nodes storing the file lower level dcells recursively build the higher level
fragments. From Equation (1), we observe that the dcells. The dcells at the same level are fully connected.
probability of the worst case to be successful is very For details about the aforesaid architectures and their
low. The probability that some of the machines (av- performance analysis, the readers are encouraged to
erage case) storing the file fragments will be selected read [1] and [2].
is high in comparison to the worst case probability.
However, the compromised fragments will not be
enough to reconstruct the whole data. In terms of 5.1 Comparative techniques
the probability, the worst, average, and best cases are We compared the results of the DROPS methodol-
dependent on the number of nodes storing fragments ogy with fine-grained replication strategies, namely:
that are selected for an attack. Therefore, all of the (a) DRPA-star, (b) WA-star, (c) A-star, (d) SA1, (e)
three cases are captured by Equation (1). SA2, (f) SA3, (g) Local Min-Min, (h) Global Min-
Besides the general attack of a compromised node, Min, (i) Greedy algorithm, and (j) Genetic Replication
the DROPS methodology can handle the attacks in Algorithm (GRA). The DRPA-star is a data replication
which attacker gets hold of user data by avoiding or algorithm based on the A-star best-first search algo-
disrupting security defenses. Table 2 presents some of rithm. The DRPA-star starts from the null solution
the attacks that are handled by the DROPS methodol- that is called a root node. The communication cost
ogy. The presented attacks are cloud specific that stem at each node n is computed as: cost(n) = g(n) + h(n),
from clouds core technologies. Table 2 also provides a where g(n) is the path cost for reaching n and h(n) is
brief description of the attacks. It is noteworthy that called the heuristic cost and is the estimate of cost
even in case of successful attacks (that are mentioned), from n to the goal node. The DRPA-star searches
the DROPS methodology ensures that the attacker all of the solutions of allocating a fragment to a
gets only a fragment of file as DROPS methodology node. The solution that minimizes the cost within
stores only a single fragment on the node. Moreover, the constraints is explored while others are discarded.
the successful attack has to be on the node that stores The selected solution is inserted into a list called
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation
information: DOI 10.1109/TCC.2015.2400460, IEEE Transactions on Cloud Computing
IEEE TRANSACTIONS ON CLOUD COMPUTING 8
the OPEN list. The list is ordered in the ascending values for µc and µm is advocated in [16].The best
order so that the solution with the minimum cost chromosome represents the solution. GRA utilizes mix
is expanded first. The heuristic used by the DRPA- and match strategy to reach the solution. More details
star is given as h(n) = max(0, (mmk(n)g(n))), where about GRA can be obtained from [16].
mmk(n) is the least cost replica allocation or the max-
min RC. Readers are encouraged to see the details 5.2 Workload
about DRPA-star in [13]. The WA-Star is a refinement
The size of files were generated using a uniform dis-
of the DRPA-star that implements a weighted func-
tribution between 10Kb and 60 Kb. The primary nodes
tion to evaluate the cost. The function is given as:
were randomly selected for replication algorithms. For
f (n) = f (n) + h(n) + (1 − (d(n)/D)h(n). The variable
the DROPS methodology, the S i′ s selected during the
d(n) represents the depth of the node n and D denotes
first cycle of the nodes selection by Algorithm 1 were
the expected depth of the goal node [13]. The A-star
considered as the primary nodes.
is also a variation of the DRPA-star that uses two lists,
The capacity of a node was generated using a
OPEN and FOCAL. The FOCAL list contains only
uniform distribution between ( 12 CS)C and ( 32 CS)C,
those nodes from the OPEN list that have f greater
where 0 ≤ C ≥ 1. For instance, for CS = 150 and
than or equal to the lowest f by a factor of 1 + .
C = 0.6 the capacities of the nodes were uniformly
The node expansion is performed from the FOCAL list
distributed between 45 and 135. The mean value of
instead of the OPEN list. Further details about WA-
g in the OPEN and FOCAL lists was selected as the
Star and A-star can be found in [13]. The SA1 (sub-
value of , for WA-star and A-star, respectively. The
optimal assignments), SA2, and SA3 are DRPA-star
value for level R was set to ⌊ d2 ⌋, where d is the depth
based heuristics. In SA1, at level R or below, only the
of the search tree(number of fragments).
best successors of node n having the least expansion
The read/write (R/W) ratio for the simulations
cost are selected. The SA2 selects the best successors
that used fixed value was selected to be 0.25 (The
of node n only for the first time when it reaches
R/W ratio reflecting 25% reads and 75% writes within
the depth level R. All other successors are discarded.
the cloud). The reason for choosing a high workload
The SA3 works similar to the SA2, except that the
(lower percentage of reads and higher percentage
nodes are removed from OPEN list except the one
of writes) was to evaluate the performance of the
with the lowest cost. Readers are encouraged to read
techniques under extreme cases. The simulations that
[13] for further details about SA1, SA2, and SA3. The
studied the impact of change in the R/W ratio used
LMM can be considered as a special case of the bin
various workloads in terms of R/W ratios. The R/W
packing algorithm. The LMM sorts the file fragments
ratios selected were in the range of 0.10 to 0.90. The
based on the RC of the fragments to be stored at a
selected range covered the effect of high, medium, and
node. The LMM then assigns the fragments in the
low workloads with respect to the R/W ratio.
ascending order. In case of a tie, the file fragment
with minimum size is selected for assignment (name
local Min-Min is derived from such a policy). The 5.3 Results and Discussion
GMM selects the file fragment with global minimum We compared the performance of the DROPS method-
of all the RC associated with a file fragment. In case ology with the algorithms discussed in Section 5.1.
of a tie, the file fragment is selected at random. The The behavior of the algorithms was studied by: (a)
Greedy algorithm first iterates through all of the M increasing the number of nodes in the system, (b)
cloud nodes to find the best node for allocating a increasing the number of objects keeping number
file fragment. The node with the lowest replication of nodes constant, (c) changing the nodes storage
cost is selected. The second node for the fragment capacity, and (d) varying the read/write ratio. The
is selected in the second iteration. However, in the aforesaid parameters are significant as they affect the
second iteration that node is selected that produces problem size and the performance of algorithms [13].
the lowest RC in combination with node already
selected. The process is repeated for all of the file 5.3.1 Impact of increase in number of cloud nodes
fragments. Details of the greedy algorithm can be We studied the performance of the placement tech-
found in [18]. The GRA consists of chromosomes rep- niques and the DROPS methodology by increasing the
resenting various schemes for storing file fragments number of nodes. The performance was studied for
over cloud nodes. Every chromosome consists of M the three discussed cloud architectures. The numbers
genes, each representing a node. Every gene is a N of nodes selected for the simulations were 100, 500,
bit string. If the k-th file fragment is to be assigned 1,024, 2,400, and 30,000. The number of nodes in
to S i , then the k-th bit of i-th gene holds the value the Dcell architecture increases exponentially [2]. For
of one. Genetic algorithms perform the operations of a Dcell architecture, with two nodes in the Dcell0 ,
selection, crossover, and mutation. The value for the the architecture consists of 2,400 nodes. However,
crossover rate (µc ) was selected as 0.9, while for the increasing a single node in the Dcell0 , the total nodes
mutation rate (µm ) the value was 0.01. The use of the increases to 30, 000 [2]. The number of file fragments
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation
information: DOI 10.1109/TCC.2015.2400460, IEEE Transactions on Cloud Computing
IEEE TRANSACTIONS ON CLOUD COMPUTING 9
80 80
DRPA-star DRPA-star
LMM LMM
WA-star 70 WA-star
70
GMM GMM
AƐ
Aeps-star AƐ
Aeps-star
60
60 SA1 SA1
SA2 SA2
RC savings (%)
RC savings (%)
SA3 50 SA3
50 Greedy Greedy
GRA GRA
40
DROPS-BC DROPS-BC
40 DROPS-CC DROPS-CC
DROPS-EC 30 DROPS-EC
30
20
20 10
100 500 1024 2400 30000 100 500 1024 2400 30000
No. of nodes No. of nodes
(a) (b)
Fig. 2: (a) RC versus number of nodes (Three tier) (b) RC versus number of nodes (Fat tier)
80 25
DRPA-star DROPS-BC
LMM DROPS-CC
WA-star 24 DROPS-EC
70
GMM
AƐ
Aeps-star
23
60 SA1
SA2
RC savings (%)
RC savings (%)
SA3 22
50 Greedy
GRA
21
DROPS-BC
40 DROPS-CC
DROPS-EC 20
30
19
20 18
100 500 1024 2400 30000 100 500 1024 2400 30000
No. of nodes No. of nodes
(a) (b)
Fig. 3: (a) RC versus number of nodes (Dcell) (b) RC versus number of nodes for DROPS variations with
maximum available capacity constraint (Three tier)
24 30
DROPS-BC DROPS-BC
DROPS-CC DROPS-CC
DROPS-EC DROPS-EC
23 28
22 26
RC savings (%)
RC savings (%)
21 24
20 22
19 20
18 18
100 500 1024 2400 30000 100 500 1024 2400 30000
No. of nodes No. of nodes
(a) (b)
Fig. 4: RC versus number of nodes for DROPS variations with maximum available capacity constraints (a) Fat
tree (b) Dcell
was set to 50. For the first experiment we used tecture exhibits better inter node connectivity and
C = 0.2. Fig. 2 (a), Fig. 2 (b), and Fig. 3 (a) show robustness [2]. The DRPA-star gave best solutions as
the results for the Three tier, Fat tree, and Dcell compared to other techniques and registered consis-
architectures, respectively. The reduction in network tent performance with the increase in the number
transfer time for a file is termed as RC. In the figures, of nodes. Similarly, WA-star, A-star, GRA, greedy,
the BC stands for the betweenness centrality, the CC and SA3 showed almost consistent performance with
stands for closeness centrality, and the EC stands for various number of nodes. The performance of LMM
eccentricity centrality. The interesting observation is and GMM gradually increased with the increase in
that although all of the algorithms showed similar number of nodes since the increase in the number of
trend in performance within a specific architecture, nodes increased the number of bins. The SA1 and SA2
the performance of the algorithms was better in the also showed almost constant performance in all of the
Dcell architecture as compared to three tier and fat three architectures. However, it is important to note
tree architectures. This is because the Dcell archi- that SA2 ended up with a decrease in performance
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation
information: DOI 10.1109/TCC.2015.2400460, IEEE Transactions on Cloud Computing
IEEE TRANSACTIONS ON CLOUD COMPUTING 10
80 80
DRPA-star DRPA-star
LMM LMM
70 WA-star WA-star
70
GMM GMM
AƐ
Aeps-star AƐ
Aeps-star
60
SA1 60 SA1
SA2 SA2
RC savings (%)
RC savings (%)
50 SA3 SA3
Greedy 50 Greedy
GRA GRA
40
DROPS-BC DROPS-BC
DROPS-CC 40 DROPS-CC
30 DROPS-EC DROPS-EC
30
20
10 20
50 100 200 300 400 500 50 100 200 300 400 500
No. of fragments No. of fragments
(a) (b)
Fig. 5: (a) RC versus number of file fragments (Three tier) (b) RC versus number of file fragments (Fat tier)
as compared to the initial performance. This may be is that nodes with higher eccentricity are closer to all
due to the fact that SA2 only expands the node with other nodes in the network that results in lower RC
minimum cost when it reaches at certain depth for value for accessing the fragments.
the first time. Such a pruning for the first time, might
have purged nodes by providing better global access 5.3.2 Impact of increase in number of file fragments
time. The DROPS methodology, did not employ full- The increase in number of file fragments can strain
scale replication. Every fragment is replicated only the storage capacity of the cloud that, in turn may
once in the system. The smaller number of replicas of affect the selection of the nodes. To study the impact
any fragment and separation of nodes by T-coloring on performance due to increase in number of file
decreased the probability of finding that fragment by fragments, we set the number of nodes to 30,000. The
an attacker. Therefore, the increase in the security numbers of file fragments selected were 50, 100, 200,
level of the data is accompanied by the drop in 300, 400, and 500. The workload was generated with
performance as compared to the comparative tech- C = 45% to observe the effect of increase number
niques discussed in this paper. It is important to note of file fragments with fairly reasonable amount of
that the DROPS methodology was implemented using memory and to discern the performance of all the
three centrality measures namely: (a) betweenness, (b) algorithms. The results are shown in Fig. 5 (a), Fig. 5
closeness, and (c) eccentricity. However, Fig. 2(a) and (b), and Fig. 6 (a) for the Three tier, Fat tree, and Dcell
Fig. 2(b) show only a single plot. Due to the inherent architectures, respectively. It can be observed from the
structure of the Three tier and Fat tree architectures, plots that the increase in the number of file fragments
all of the nodes in the network are at the same reduced the performance of the algorithms, in general.
distance from each other or exist at the same level. However, the greedy algorithm showed the most
Therefore, the centrality measure is the same for all of improved performance. The LMM showed the highest
the nodes. This results in the selection of same node loss in performance that is little above 16%. The loss in
for storing the file fragment. Consequently, the per- performance can be attributed to the storage capacity
formance showed the same value and all three lines constraints that prohibited the placements of some
are on the same points. However, this is not the case fragments at nodes with optimal retrieval time. As
for the Dcell architecture. In the Dcell architecture, discussed earlier, the DROPS methodology produced
nodes have different centrality measures resulting in similar results in three tier and fat tree architectures.
the selection of different nodes. It is noteworthy to However, from the Dcell architecture, it is clear that
mention that in Fig 3(a), the eccentricity centrality the DROPS methodology with eccentricity centrality
performs better as compared to the closeness and be- maintains the supremacy on the other two centralities.
tweenness centralities because the nodes with higher
eccentricity are located closer to all other nodes within 5.3.3 Impact of increase in storage capacity of nodes
the network. To check the effect of closeness and Next, we studied the effect of change in the nodes
betweenness centralities, we modified the heuristic storage capacity. A change in storage capacity of the
presented in Algorithm 1. Instead of selecting the nodes may affect the number of replicas on the node
node with criteria of only maximum centrality, we due to storage capacity constraints. Intuitively, a lower
selected the node with: (a) maximum centrality and node storage capacity may result in the elimination
(b) maximum available storage capacity. The results of some optimal nodes to be selected for replication
are presented in Fig. 3 (b), Fig. 4 (a), and Fig. 4 (b). It is because of violation of storage capacity constraints.
evident that the eccentricity centrality resulted in the The elimination of some nodes may degrade the per-
highest performance while the betweenness centrality formance to some extent because a node giving lower
showed the lowest performance. The reason for this access time might be pruned due to non-availability
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation
information: DOI 10.1109/TCC.2015.2400460, IEEE Transactions on Cloud Computing
IEEE TRANSACTIONS ON CLOUD COMPUTING 11
90 80
DRPA-star DRPA-star
LMM LMM
80 WA-star 70 WA-star
GMM GMM
70 AƐ
Aeps-star AƐ
Aeps-star
60
SA1 SA1
SA2 SA2
RC savings (%)
RC savings (%)
60
SA3 50 SA3
50 Greedy Greedy
GRA GRA
40
DROPS-BC DROPS-BC
40
DROPS-CC DROPS-CC
DROPS-EC 30 DROPS-EC
30
20 20
10 10
50 100 200 300 400 500 20 25 30 35 40 45
No. of fragments Node storage capacity
(a) (b)
Fig. 6: (a) RC versus number of file fragments (Dcell) (b) RC versus nodes storage capacity (Three tier)
80 80
DRPA-star DRPA-star
LMM LMM
70 WA-star 70 WA-star
GMM GMM
AƐ
Aeps-star AƐ
Aeps-star
60 60
SA1 SA1
SA2 SA2
RC savings (%)
RC savings (%)
50 SA3 50 SA3
Greedy Greedy
GRA GRA
40 40
DROPS-BC DROPS-BC
DROPS-CC DROPS-CC
30 DROPS-EC 30 DROPS-EC
20 20
10 10
20 25 30 35 40 45 20 25 30 35 40 45
Node storage capacity Node storage capacity
(a) (b)
Fig. 7: (a) RC versus nodes storage capacity (Fat tree) (b) RC versus nodes storage capacity (Dcell)
90 90
DRPA-star DRPA-star
LMM LMM
80 WA-star 80 WA-star
GMM GMM
AƐ
Aeps-star AƐ
Aeps-star
70 70
SA1 SA1
SA2 SA2
RC savings (%)
RC savings (%)
60 SA3 60 SA3
Greedy Greedy
GRA GRA
50 50
DROPS-BC DROPS-BC
DROPS-CC DROPS-CC
40 DROPS-EC 40 DROPS-EC
30 30
20 20
10 20 30 40 50 60 70 80 90 10 20 30 40 50 60 70 80 90
R/W ratio R/W ratio
(a) (b)
Fig. 8: (a) RC versus R/W ratio (Three tree) (b) RC versus R/W ratio (Fat tree)
90
DRPA-star
LMM
cation of fragments, increasing the performance gain.
80 WA-star
GMM However, node capacity above certain level will not
AƐ
Aeps-star
70
SA1 change the performance significantly as replicating
SA2
RC savings (%)
60 SA3
Greedy
the already replicated fragments will not produce con-
50
GRA
DROPS-BC
siderable performance increase. If the storage nodes
DROPS-CC
40 DROPS-EC have enough capacity to store the allocated file frag-
30
ments, then a further increase in the storage capacity
20
of a node cannot cause the fragments to be stored
10 20 30 40 50 60 70 80 90 again. Moreover, the T-coloring allows only a single
R/W ratio
replica to be stored on any node. Therefore, after a
Fig. 9: RC versus R/W ratio (Dcell) certain point, the increase in storage capacity might
not affect the performance.
We increase the nodes storage capacity incremen-
of enough storage space to store the file fragment. tally from 20% to 40%. The results are shown in Fig.
Higher node storage capacity allows full-scale repli- 6 (b), Fig. 7 (a), and Fig. 7 (b). It is observable from
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation
information: DOI 10.1109/TCC.2015.2400460, IEEE Transactions on Cloud Computing
IEEE TRANSACTIONS ON CLOUD COMPUTING 12
Three tier
Fat tree
more replicas of fragments resulting in increased cost
90 Dcell
of updating the replicas. Therefore, the increased cost
75 of updating replicas underpins the advantage of de-
creased cost of reading with higher number of replicas
Failed nodes (%)
Architec- DRPA LMM wa-star GMM A-star SA1 SA2 SA3 Greedy GRA DROPS- DROPS- DROPS-
ture BC CC EC
Three 74.70 36.23 72.55 45.62 71.82 59.86 49.09 64.38 69.1 66.1 24.41 24.41 24.41
tier
Fat 76.76 38.95 75.22 45.77 73.33 60.89 52.67 68.33 71.64 70.54 23.28 23.28 23.28
tree
Dcell 79.6 44.32 76.51 46.34 76.43 62.03 54.90 71.53 73.09 72.34 23.06 25.16 30.20
Architec- DRPA LMM wa-star GMM A-star SA1 SA2 SA3 Greedy GRA DROPS- DROPS- DROPS-
ture BC CC EC
Three 74.63 40.08 69.69 48.67 68.82 60.29 49.65 62.18 71.25 64.44 23.93 23.93 23.93
tier
Fat 75.45 44.33 70.90 52.66 70.58 61.12 51.09 64.64 71.73 66.90 23.42 23.42 23.42
tree
Dcell 76.08 45.90 72.49 52.78 72.33 62.12 50.02 64.66 70.92 69.50 23.17 25.35 28.17
Architec- DRPA LMM wa-star GMM A-star SA1 SA2 SA3 Greedy GRA DROPS- DROPS- DROPS-
ture BC CC EC
Three 72.37 28.26 71.99 40.63 71.19 59.29 48.67 61.83 72.09 63.54 19.89 19.89 19.89
tier
Fat 69.19 28.34 70.73 41.99 66.20 60.28 51.29 61.83 69.33 62.16 21.60 21.60 21.60
tree
Dcell 73.57 31.04 71.37 42.41 67.70 60.79 50.42 63.78 69.64 64.03 21.91 22.88 24.68
Architec- DRPA LMM wa-star GMM A-star SA1 SA2 SA3 Greedy GRA DROPS- DROPS- DROPS-
ture BC CC EC
Three 77.28 32.54 76.32 53.20 75.38 55.13 49.61 59.74 73.64 58.27 24.08 24.08 24.08
tier
Fat 76.29 31.47 74.81 52.08 73.37 53.33 49.35 57.87 71.61 57.47 23.68 23.68 23.68
tree
Dcell 78.72 33.66 78.03 55.82 76.47 57.44 52.28 61.94 74.54 60.16 23.32 23.79 24.23
the time and resources utilized in downloading, up- [20] Y. Tang, P. P. Lee, J. C. S. Lui, and R. Perlman, “Secure overlay
dating, and uploading the file again. Moreover, the cloud storage with access control and assured deletion,” IEEE
Transactions on Dependable and Secure Computing, Vol. 9, No. 6,
implications of TCP incast over the DROPS method- Nov. 2012, pp. 903-916.
ology need to be studied that is relevant to distributed [21] M. Tu, P. Li, Q. Ma, I-L. Yen, and F. B. Bastani, “On the
data storage and access. optimal placement of secure data objects over Internet,” In
Proceedings of 19th IEEE International Parallel and Distributed
Processing Symposium, pp. 14-14, 2005.
[22] D. Zissis and D. Lekkas, “Addressing cloud computing secu-
R EFERENCES rity issues,” Future Generation Computer Systems, Vol. 28, No. 3,
2012, pp. 583-592.
[1] K. Bilal, S. U. Khan, L. Zhang, H. Li, K. Hayat, S. A. Madani, [23] J. J. Wylie, M. Bakkaloglu, V. Pandurangan, M. W. Bigrigg,
N. Min-Allah, L. Wang, D. Chen, M. Iqbal, C. Z. Xu, and A. Y. S. Oguz, K. Tew, C. Williams, G. R. Ganger, and P. K. Khosla,
Zomaya, “Quantitative comparisons of the state of the art data “Selecting the right data distribution scheme for a survivable
center architectures,” Concurrency and Computation: Practice and storage system,” Carnegie Mellon University, Technical Report
Experience, Vol. 25, No. 12, 2013, pp. 1771-1783. CMU-CS-01-120, May 2001.
[2] K. Bilal, M. Manzano, S. U. Khan, E. Calle, K. Li, and A. [24] M. Newman, Networks: An introduction, Oxford University
Zomaya, “On the characterization of the structural robustness Press, 2009.
of data center networks,” IEEE Transactions on Cloud Computing, [25] A. R. Khan, M. Othman, S. A. Madani, S. U. Khan,
Vol. 1, No. 1, 2013, pp. 64-77. “A survey of mobile cloud computing application
[3] D. Boru, D. Kliazovich, F. Granelli, P. Bouvry, and A. Y. Zomaya, models,” IEEE Communications Surveys and Tutorials, DOI:
“Energy-efficient data replication in cloud computing datacen- 10.1109/SURV.2013.062613.00160.
ters,” In IEEE Globecom Workshops, 2013, pp. 446-451. .
[4] Y. Deswarte, L. Blain, and J-C. Fabre, “Intrusion tolerance in dis-
tributed computing systems,” In Proceedings of IEEE Computer
Society Symposium on Research in Security and Privacy, Oakland
CA, pp. 110-121, 1991.
Mazhar Ali is currently a PhD student at North Dakota State Uni-
[5] B. Grobauer, T.Walloschek, and E. Stocker, “Understanding
versity, Fargo, ND, USA. His research interests include information
cloud computing vulnerabilities,” IEEE Security and Privacy, Vol.
security, formal verification, modeling, and cloud computing systems.
9, No. 2, 2011, pp. 50-57.
[6] W. K. Hale, “Frequency assignment: Theory and applications,”
Proceedings of the IEEE, Vol. 68, No. 12, 1980, pp. 1497-1514.
[7] K. Hashizume, D. G. Rosado, E. Fernndez-Medina, and E. B.
Fernandez, “An analysis of security issues for cloud comput- Kashif Bilal did his PhD in Electrical and Computer Engineering
ing,” Journal of Internet Services and Applications, Vol. 4, No. 1, from the North Dakota State University, USA. His research interests
2013, pp. 1-13. include data center networks, distributed computing, and energy
[8] M. Hogan, F. Liu, A.Sokol, and J. Tong, “NIST cloud computing efficiency.
standards roadmap,” NIST Special Publication, July 2011.
[9] W. A. Jansen, “Cloud hooks: Security and privacy issues in
cloud computing,” In 44th Hawaii IEEE International Conference
onSystem Sciences (HICSS), 2011, pp. 1-10.
[10] A. Juels and A. Opera, “New approaches to security and Samee U. Khan is an assistant professor at the North Dakota State
availability for cloud data,” Communications of the ACM, Vol. University. His research interest include topics, such as sustainable
56, No. 2, 2013, pp. 64-73. computing, social networking, and reliability. He is a senior member
[11] G. Kappes, A. Hatzieleftheriou, and S. V. Anastasiadis, “Dike: of IEEE, and a fellow of IET and BCS.
Virtualization-aware Access Control for Multitenant Filesys-
tems,” University of Ioannina, Greece, Technical Report No.
DCS2013-1, 2013.
[12] L. M. Kaufman, “Data security in the world of cloud comput-
ing,” IEEE Security and Privacy, Vol. 7, No. 4, 2009, pp. 61-64. Bharadwaj Veeravalli is an associate professor at the National
[13] S. U. Khan, and I. Ahmad, “Comparison and analysis of University of Singapore. His main stream research interests include,
ten static heuristics-based Internet data replication techniques,” Scheduling problems, Cloud/Cluster/Grid computing, Green Stor-
Journal of Parallel and Distributed Computing, Vol. 68, No. 2, 2008, age, and Multimedia computing. He is a senior member of the IEEE.
pp. 113-136.
[14] A. N. Khan, M. L. M. Kiah, S. U. Khan, and S. A. Madani,
“Towards Secure Mobile Cloud Computing: A Survey,” Future
Generation Computer Systems, Vol. 29, No. 5, 2013, pp. 1278-1299.
[15] A. N. Khan, M.L. M. Kiah, S. A. Madani, and M. Ali, “En- Keqin Li is a SUNY distinguished professor. His research interests
hanced dynamic credential generation scheme for protection include mainly in the areas of design and analysis of algorithms,
of user identity in mobile-cloud computing, The Journal of parallel and distributed computing, and computer networking. He is
Supercomputing, Vol. 66, No. 3, 2013, pp. 1687-1706 . a senior member of the IEEE.
[16] T. Loukopoulos and I. Ahmad, “Static and adaptive dis-
tributed data replication using genetic algorithms,” Journal of
Parallel and Distributed Computing, Vol. 64, No. 11, 2004, pp.
1270-1285.
[17] A. Mei, L. V. Mancini, and S. Jajodia, “Secure dynamic frag- Albert Y. Zomaya is currently the chair professor of high per-
ment and replica allocation in large-scale distributed file sys- formance computing and networking in the School of Information
tems,” IEEE Transactions on Parallel and Distributed Systems, Vol. Technologies, The University of Sydney. He is a fellow of IEEE, IET,
14, No. 9, 2003, pp. 885-896. and AAAS.
[18] L. Qiu, V. N. Padmanabhan, and G. M. Voelker, “On the
placement of web server replicas,” In Proceedings of INFOCOM
2001, Twentieth Annual Joint Conference of the IEEE Computer and
Communications Societies, Vol. 3, pp. 1587-1596, 2001.
[19] D. Sun, G. Chang, L. Sun, and X. Wang, “Surveying and
analyzing security, privacy and trust issues in cloud computing
environments,” Procedia Engineering, Vol. 15, 2011, pp. 2852
2856.