Adaptive Fault Tolerant Qos Control Algorithms For Maximizing System Lifetime of Query-Based Wireless Sensor Networks
Adaptive Fault Tolerant Qos Control Algorithms For Maximizing System Lifetime of Query-Based Wireless Sensor Networks
Adaptive Fault Tolerant Qos Control Algorithms For Maximizing System Lifetime of Query-Based Wireless Sensor Networks
Adaptive Fault Tolerant QoS Control Algorithms for Maximizing System Lifetime
of Query-Based Wireless Sensor Networks
Ing-Ray Chen*, Anh Phan Speer* and Mohamed Eltoweissy+
*Department of Computer Science
+Department of Electrical and Computer Engineering
Virginia Tech
{irchen, nphan, toweissy}@vt.edu
Abstract - Data sensing and retrieval in wireless sensor systems have a widespread application in areas
such as security and surveillance monitoring, and command and control in battlefields. In query-based
wireless sensor systems, a user would issue a query and expect a response to be returned within the
deadline. While the use of fault tolerance mechanisms through redundancy improves query reliability
in the presence of unreliable wireless communication and sensor faults, it could cause the energy of
the system to be quickly depleted. Therefore, there is an inherent tradeoff between query reliability vs.
energy consumption in query-based wireless sensor systems. In this paper, we develop adaptive fault
tolerant quality of service (QoS) control algorithms based on hop-by-hop data delivery utilizing
source and path redundancy, with the goal to satisfy application QoS requirements while
prolonging the lifetime of the sensor system. We develop a mathematical model for the lifetime of the
sensor system as a function of system parameters including the source and path redundancy levels
utilized. We discover that there exists optimal source and path redundancy under which the
lifetime of the system is maximized while satisfying application QoS requirements. Numerical data are
presented and validated through extensive simulation, with physical interpretations given, to
demonstrate the feasibility of our algorithm design.
Keywords Wireless sensor networks, reliability, timeliness, query processing, redundancy, energy
conservation, QoS, mean time to failure.
1 Introduction
Over the last few years, we have seen a rapid increase in the number of applications for wireless
sensor networks (WSNs). WSNs can be deployed in battlefield applications, and a variety of vehicle
health management and condition-based maintenance applications on industrial, military, and space
IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING ,VOL. 8, NO. 2, MARCH-APRIL 2011
2
platforms. For military users, a primary focus has been area monitoring for security and surveillance
applications.
A WSN can be either source-driven or query-based depending on the data flow. In source-driven
WSNs, sensors initiate data transmission for observed events to interested users, including possibly
reporting sensor readings periodically. An important research issue in source-driven WSNs is to
satisfy QoS requirements of event-to-sink data transport while conserving energy of WSNs. In query-
based WSNs, queries and data are forwarded to interested entities only. In query-based WSNs, a user
would issue a query with QoS requirements in terms of reliability and timeliness.
Retrieving sensor data such that QoS requirements are satisfied is a challenging problem and has not
been studied until recently [4, 5, 6, 7, 8, 9]. The general approach is to apply redundancy to satisfy the
QoS requirement. In this paper we are also interested in applying redundancy to satisfy application
specified reliability and timeliness requirements for query-based WSNs. Moreover, we aim to
determine the optimal redundancy level that could satisfy QoS requirements while prolonging the
lifetime of the WSN. Specifically, we develop the notion of path and source level redundancy.
When given QoS requirements of a query, we identify optimal path and source redundancy such that
not only QoS requirements are satisfied, but also the lifetime of the system is maximized. We develop
adaptive fault tolerant QoS control (AFTQC) algorithms based on hop-by-hop data delivery to achieve
the desired level of redundancy and to eliminate energy expended for maintaining routing paths in the
WSN.
The rest of the paper is organized as follows. In Section 2 we survey related work. In Section 3 we
discuss the WSN system model and assumptions used in the paper. In Section 4 we develop
probability models for computing the lifetime of a query-based WSN as a function of path and
source redundancy being employed, defined as the number of queries that the system is able to
execute successfully in terms of QoS satisfaction before failure. We also discuss extensions to the
mathematical model developed to deal with software faults, data aggregation, and concurrent query
processing which a query-based WSN might experience. In Section 5 we analyze the effect of
redundancy on the system lifetime of WSNs, and identify the optimal level of path and source
redundancy that could maximize the system lifetime while satisfying the QoS requirements of queries
before failure. Section 6 presents simulation validation. Finally Section 7 concludes the paper and
discusses future work.
IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING ,VOL. 8, NO. 2, MARCH-APRIL 2011
3
2 Related Work
Existing research efforts related to applying redundancy to satisfy QoS requirements in query-based
WSNs fall into three categories: traditional end-to-end QoS, reliability assurance, and application-
specific QoS [4]. Traditional end-to-end QoS solutions are based on the concept of end-to-end QoS
requirements. The problem is that it may not be feasible to implement end-to-end QoS in WSNs due to
the complexity and high cost of the protocols for resource constrained sensors. An example is
Sequential Assignment Routing (SAR) [5] that utilizes path redundancy from a source node to the sink
node. Each sensor uses a SAR algorithm for path selection. It takes into account the energy and QoS
factors on each path, and the priority level of a packet. For each packet routed through the network, a
weighted QoS metric is computed as the product of the additive QoS metric and a weight coefficient
associated with the priority level of that packet. The objective of the SAR algorithm is to minimize the
average weighted QoS metric throughout the lifetime of the network. The algorithm does not consider
the reliability issue.
ESRT [12] has been proposed to address this issue with reliability as the QoS metric. ReInForM has
been proposed [6] to address end-to-end reliability issues. ReInForm considers information awareness
and adaptability to channel errors along with a differentiated allocation strategy of network resources
based on the criticality of data. The protocol sends multiple copies of a packet along multiple paths
from the source to the sink such that data is delivered with the desired reliability. It uses the concept of
dynamic packet state to control the number of paths required for the desired reliability using local
knowledge of the channel error rate and topology. The protocol observes that for uniform unit disk
graphs, the number of edge-disjoint paths between nodes is equal to the average node degree with a
very high probability. This protocol results in the use of the disjoint paths existing in a thin band
between the source and the sink. However, the protocol only concerns QoS in terms of reliability.
In [7], M. Perillo et al. provide application QoS with the goal of maximizing the lifetime of WSNs
while satisfying a minimum level of reliability. This maximization is achieved through the joint
optimization of scheduling active sensor sets and finding paths for data routing. The lifetime is defined
as the sum of the time that all sensor sets are used. The approach uses the strategy of turning off
redundant sensors for periods of time to save energy while considering the tradeoff between energy
consumption and reliability. This approach can extend the lifetime of a network considerably
compared with approaches that do not use intelligent scheduling. However, this approach is not
scalable and QoS is limited to application reliability only.
IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING ,VOL. 8, NO. 2, MARCH-APRIL 2011
4
Recently, a multi-path and multi-speed routing protocol called MMSPEED is proposed in [8] which
takes into account both timeliness and reliability as QoS requirements. The goal is to provide QoS
support that allows packets to choose the most proper combination of service options depending on
their timeliness and reliability requirements. For timeliness, multiple QoS levels are supported by
providing multiple data delivery speed options. For reliability, multiple reliability requirements are
supported by probabilistic multi-path forwarding. The protocol provides end-to-end QoS provisioning
by employing localized geographic forwarding using immediate neighbor information without end-to-
end path discovery and maintenance. It utilizes dynamic compensation which compensates for
inaccuracy of local decision as a packet travels towards its destination. The protocol adapts to network
dynamics. However, MMEPEED does not consider energy issues. Our work considers energy
consumption, in addition to reliability and timeliness requirements as in MMSPEED. Further, we also
consider network dynamics due to sensor failures, energy depletion and sensor connectivity. Utilizing
hop-by-hop data delivery, the AFTQC algorithm developed in our work specifically forms m
p
redundancy paths for path redundancy and m sensors for source redundancy to satisfy the imposed
s
QoS requirements, facilitating the determination of the best (m , m ) that would maximize the lifetime
p s
of the WSN.
In [9], QoS is defined as the optimum number of sensors that should be sending information to the
sinks at any given time. The protocol utilizes the base station to communicate QoS information to each
of the sensors using a broadcast channel. It exploits the mathematical paradigm of the Gur Game to
dynamically adjust to the optimum number of sensors. The objective is to maximize the lifetime of the
sensor network by having sensors periodically powered down to conserve energy, and at the same time
having enough sensors powered-up and sending packets to the sinks to collect enough data. The
protocol allows the base station to dynamically adjust the QoS resolution. This solution requires the
determination of the amount of sensors that should be powered up a priori to maintain a resolution.
QoS metrics for data delivery such as reliability and timelines are not considered.
Clustering SN prolongs the system lifetime of a WSN [1, 2] because clustering reduces contention
on wireless channels [13] and supports data aggregation and forwarding at cluster heads (CHs). HEED
[1] increases energy efficiency by periodically rotating the role of CH among SNs with equal
probability such that the SN with the highest residual energy and node proximity to its neighbors
within a cluster area is selected as a CH. In LEACH [2], the key idea is to reduce the number of nodes
communicating directly with the base station by forming a small number of clusters in a self-
organizing manner. LEACH uses randomization with equal probability in cluster head selection to
IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING ,VOL. 8, NO. 2, MARCH-APRIL 2011
5
achieve energy balance. REED [14] considers the use of redundancy to cope with failures of SNs in
hostile environments. We also consider cluster-based WSNs for energy reasons.
Our approach of satisfying application reliability and timeliness requirements while maximizing the
system lifetime is to determine the optimal level of redundancy at the source and path levels. The
source level redundancy refers to the use of multiple sensors to return the requested sensor reading.
The path level redundancy refers to the use of multiple paths to relay the reading to the sink node.
Since WSNs are constrained with resources, the AFTQC algorithm developed in this paper utilizes
hop-by-hop data delivery and dynamically forms multiple paths for data delivery, without incurring
extra overhead to first formulate multiple paths before data delivery. Our contribution is that we
identify the best level of redundancy to be used to answer queries to satisfy their QoS requirements
while prolonging the lifetime of query-based WSNs.
3 System Model
Figure 1: Cluster-based WSN Architecture.
ACRONYMS
MTTF Mean time to failure, defined as the mean number of queries that the sensor system
is able to execute successfully with QoS satisfaction before failure
SN Sensor node
PC Processing center
CH Cluster head
WSN Wireless sensor network
NOTATION
A Length of each side of a square sensor area (meter)
n
b
Size of a data packet (bit)
IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING ,VOL. 8, NO. 2, MARCH-APRIL 2011
6
E
elec
Energy dissipation to run the transmitter and receiver circuitry (J/bit)
E
amp
Energy used by the transmit amplifier to achieve an acceptable signal to noise
ratio (J/bit/m )
2
E
o
Initial energy per SN (Joule)
E
initial
Initial energy of the WSN (Joule)
E
clustering
Energy for executing the clustering algorithm (Joule)
E
threshold
Energy threshold below which the WSN depletes its energy (Joule)
E
q
Average energy consumption per query (Joule)
R
q
Probability that a query reply is delivered successfully within the deadline
r Wireless radio communication range (meter)
p Probability of a SN becoming a CH
q SN hardware failure probability
q
s
SN reading software failure probability
e
j
Transmission failure probability of a SN with index j
n Number of SNs in the WSN
n
s
Number of SNs per cluster
N
c
Number of clusters in the WSN
m
p
Number of paths from a source CH in response to a query
m
s
Number of SNs per cluster in response to a query
f Fraction of neighbor SNs that will forward data
SN population density (nodes/meter )
2
q
Query arrival rate (times/sec)
d
inter
Distance between a source CH and the processing center (meter)
d
intra
Distance between a SN to its CH (meter)
N
h
inter
Average number of hops between a source CH and the processing center
N
h
intra
Average number of hops between a SN to its CH
i Index to a path
j Index to a node
k Index to a neighbor node
S
jk
Progressive transmission speed between two SNs with indexes j and k (meter/sec)
T
clustering
Time interval for executing the clustering algorithm (sec)
T
req
Query deadline requirement (sec)
R
req
Query reliability requirement
A WSN consists of a set of low-power sensor nodes (SNs) typically deployed through air-drop into
a geographical area. We make the following assumptions regarding the structure and operation of a
query-based WSN:
1. SNs are homogeneous and indistinguishable with the same initial energy level E .
o
2. SNs are deployed into a geographical area of size A with each side of length A. This assumption
2
has been used in the literature [1, 2, 11] to simplify the analysis although the method developed in
this paper can deal with other geographical shapes.
3. SNs are distributed according to a homogeneous spatial Poisson process with intensity . We
assume the domain is relatively free of obstacles and the WSN is dense enough so that the length of
a path connecting two SNs can be approximated by the straight line distance divided by r. We
7
assume that the WSN deployed is sufficiently dense to satisfy the connectivity condition [10] so that
sensors are well connected.
4. The failure behavior of a SN due to environment conditions (i.e., harsh environments causing
hardware failure) and software faults is characterized by a hardware failure probability parameter q
(where 0<q<1) and a software failure probability q (where 0< q <1). Both parameters are assumed
s s
to be constant.
5. A clustering algorithm (e.g., [1, 2]) that aims to fairly rotate SNs to take the role of CHs has been
used to organize sensors into clusters for energy conservation purposes, as illustrated in Figure 1. A
CH is elected in each cluster. The function of a CH is to manage the network within the cluster,
gather sensor reading data from the SNs within the cluster, and relay data in response to a query.
The clustering algorithm is executed periodically by all SNs in iterations in which:
A SN announces its role as a CH candidate with probability p.
The announcement message carrying the candidate CHs residual energy information is
broadcast with the time to live (TTL) field set to the number of hops bounded by the cluster area
size predetermined at design time.
Any non-CH SN overhearing the announcement can select a CH with the highest residual
energy to join a cluster. The SN also reports its location to the candidate CH.
This announcement and join process is executed in iterations such that a tentative CH can
change its role to a SN if it overhears a CH candidate having a higher residual energy in a
subsequent iteration.
If a non-CH SN does not hear any CH announcement, p is doubled in the next iteration.
A clustering algorithm as described above can be proven to converge within a finite number of
iterations and in effect could randomly rotate the role of a CH among SNs in a cluster so that
sensors consume their energy evenly [1]. With random rotation the cluster size, n , would be
s
equal to 1/ p [11]. Note that to deal with uneven SN distribution, this CH-rotating probability
doubles in a subsequent iteration until it becomes 1, so in the worst case when a SN cannot find
any CH to join a cluster, it will eventually form a cluster by itself with probability 1. This
unbalanced clustering behavior occurs rarely when the WSN is dense. When the WSN is
sufficiently dense and the target cluster area size is the same, it is shown that clusters are
balanced in practice [1]. The total energy expended by the system depends on the period
(T
clustering
) over which the clustering algorithm is executed and the energy expended per
execution (E
clustering
). The clustering algorithm is assumed to be executed as often as possible
8
(with the rate of 1/T
clustering
) to balance energy consumption of SNs within a cluster. We
determine the clustering interval T
clustering
for satisfying the assumption of fair rotation among
SNs by simulation. Note that by our clustering protocol, a SN will select another SN to be the
CH only if they are connected possibly through multiple hops, so for the case in which a SN is
2 r apart, but it is still connected to a candidate CH because there are intermediate SNs around,
the candidate CH can still be the CH for the SN. In the worst case in which there is no candidate
CH around, a SN will elect itself as the CH. However this case is extremely rare because of the
massive deployment of SNs with high density.
6. To save energy, the transmission power of a SN even when it is a CH is reduced to a minimum level
to enable the SN to communicate with its neighbor SNs within one-hop radio range denoted by r.
Thus, every SN needs to use a multi-hop route (i.e. passing through a number of other SNs) for it to
communicate with another SNs distance away. When the WSN becomes less dense as time
progresses due to sensor node failures, the one-hop radio range can be increased dynamically to
allow the WSN to continue its function at the expense of energy consumption.
7. The unreliable transmission failure behavior of the wireless medium in WSNs due to noise and
interference is characterized by a transmission failure parameter. This parameter varies among
sensors, depending on the node density and the packet transmission frequency of SNs within radio
range. Let e denote the transmission failure probability of SN . Note that e varies dynamically in
j j j
respond to network dynamics.
8. Users on a flying airplane or a moving vehicle can issue queries through any CH, which we call it a
processing center (PC) as labeled in Figure 1. Assume that queries arrival at the system in
accordance with a Poisson process with rate
q
. A query may involve all or a subset of clusters, say,
k clusters, to respond to the query for data sensing and retrieval. These requested clusters are termed
source clusters. The CH of a source cluster receive m packets carrying the same data content from
s
m SNs within its cluster because of source redundancy but it will only relay the first packet it
s
receives to the PC. The CH could also aggregate data and return summarized information in term of
the min, average, or max of sensor readings received from m SNs. We assume queries are issued by
s
the user who is on the move. Thus, the timeliness requirement may be tight, i.e., on the order of
second. The WSN does not have a base station. Also, sensors in a cluster will rotate to be the CH in
their cluster. As a result, the notion of higher energy consumption by critical nodes [3] for relaying
messages to a base station or to a CH does not exist.
9. Routing in the WSN is based on geographic forwarding (e.g., [8]). No path information needs to be
maintained by individual SNs to conserve energy. Essentially only the location information of the
9
destination SN needs to be known by a forwarding SN for any source-destination communication.
We note that when a CH is elected periodically, the location information is broadcast to the WSN to
let other CHs know its location. Also, SNs within a cluster know the location of their CH, and vice
versa, as part of the election process.
10. A source CH must relay sensor data information to the PC in response to a user query, and thus
can consume more energy than a SN within its cluster. The energy consumed by the system for data
forwarding in response to a query depends on the total length (in terms of the number of hops) of
the paths connecting m SNs within a cluster to the source CH for source redundancy, and the total
s
length of the m paths connecting the source CH and the processing center (the destination CH) for
p
path redundancy. As the clustering algorithm in effect rotates sensor nodes within a cluster fairly
evenly to assume the role of the CH, each sensor node would consume energy at about the same
rate. Thus, instead of considering each individual sensor energy level, we can consider the system
energy whose initial energy level is given by E
initial
= nE . When the energy level of the system falls
o
below a threshold value, say E
threshold
, the WSN is considered as having depleted its energy.
11. To save energy, SNs operates in power saving mode. At this mode, a SN operates either in active
mode, i.e., transmitting or receiving, or in sleep mode. The radio module of a modern sensor [15],
[16] allows it to shut off while in sleep mode. Essentially, in sleep mode an analog block stays
awake and acts as the radio detector. When the analog block detects a radio signal, the signal is
converted to a control signal which in turn is sent to power control electronics to wake up the radio
module. With the state-of-art technology, energy consumed by the analog block is very small. Also
the current technology can achieve the transient time between active and sleep mode of 5s [15].
The energy consumed for turning on/off radio while a SN is in power-saving mode is also very
small. Thus, we only consider the energy consumed while a SN is transmitting or receiving in
active mode. For the energy model, we adopt the radio model in [1]. The energy dissipation to run
the transmitter and receiver circuitry is denoted as E
elec
. The energy used by the transmit amplifier
to achieve an acceptable signal to noise ratio is denoted as E
amp
. Also there is an r energy loss due
2
to channel transmission under the assumption that the WSN is relatively free of obstacles where r is
radio range. Thus the energy spent by a SN to transmit a data packet of length n bits a distance r is
b
given by:
) (
2
r E E n E
amp elec b T
(1)
The energy spent to receive a message is given by:
10
elec b R
E n E (2)
We define the system lifetime or the mean time to failure (MTTF) as the total number of queries the
system can answer correctly until it fails to delivery query results either due to channel or sensor
faults, or when the system energy reaches the energy threshold level E
threshold
. W e define a querys
QoS requirements in terms of its reliability and timeliness requirements, denoted as R
req
and T
req
. The
system must deliver query results within T
req
and the reliability of data delivery must be at least R
req
.
Our objective is to determine the best path and source redundancy levels to satisfy QoS while
maximizing MTTF.
4 Probability Model
The adaptive fault tolerant QoS control (AFTQC) algorithm developed in this paper takes two forms
of redundancy. The first form is path redundancy. That is, instead of using a single path to connect a
source cluster to the processing center, m
p
disjoint paths may be used. The second is source
redundancy. That is, instead of having one sensor node in a source cluster return requested sensor data,
m sensor nodes may be used to return readings to cope with data transmission and/or sensor faults.
s
Figure 1 illustrates a scenario in which m = 2 (two paths going from the CH to the processing center)
p
and m = 5 (five SNs returning sensor readings to the CH).
s
Below we derive analytical expressions for R (query reliability) and E (energy consumption per
q q
query) resulting from the use of AFTQC. We first derive MTTF for the case in which only one source
cluster is required to answer a query and only the reverse traffic is considered. Later we generalize the
result to the case in which the forward traffic for query dissemination is considered and in which
multiple source clusters are required to answer a query.
4.1 Query Reliability
Let d
inter
be a random variable denoting the distance between a source CH and the PC and d
intra
be a
random variable denoting the distance between a SN to the CH. Then, the number of hops between the
PC and the source CH, denoted by h, is given by:
~
~
r
d
h
er int
(3)
11
With the user being mobile, a query can be issued from the user to any CH which serves as the PC
for that query. Thus, the location of the processing center varies on query by query basis. For
derivation convenience without loss of generality, let the PC be located in the center of the sensor area
with the coordinate at (0, 0) and the source CH be randomly located at (X , Y ) in the square sensor area
i i
with A/2 X A/2 and A/2 Y A/2 and. Then, the expected value of d
i i inter
is given by:
A dY dX
A A
Y X d E
A
A
i i
A
A
i i er
3825 . 0 )
1
)(
1
( ) ( ] [
2 /
2 /
2 /
2 /
2 2
int
+ +
(4)
The same final expression for E[d
inter
] would result if we had taken the coordinate of the processing
center to be (X , Y ) in the square sensor area and put two more integrals, one for X and the other for
c c c
Y with A/2 X A/2 and A/2 Y A/2, because of symmetric properties. For notational
c c c
convenience, let N
h
inter
represent the average number of hops to forward sensor data from a source CH
to the processing center.
~
~
r
A
h E N
h
er
3825 . 0
] [
int
(5)
Since a sensor becomes a CH with probability p and all the sensors are distributed in the area in
accordance with a spatial Poisson process with intensity the CH and non-CH sensors will also be
distributed in accordance with a spatial Poisson process with rates p and (1-p ) respectively. Non-
cluster-head sensors thus would join the cluster of the closest CH to form a Voronoi cell [4]
corresponding to a cluster in the WSN. It has been shown that [5, 11] the average number of non-
cluster-head sensors in each Voronoi cell is (1-p)/p and the expected distance from a non-cluster-head
sensor to the CH is given by:
| |
2 / 1
int
) (2
1
p
d E
ra
(6)
If this distance is more than per-hop distance r, a sensor will take a multi-hop route to transmit
sensor data to the CH. The average number of intermediate sensors (including the sensor itself) is the
quantity above divided by per-hop distance r. Let N
h
intra
denote the average number of hops to forward
sensor data from a SN responsible for a reading to its CH. Then N
h
intra
is given by:
12
~
~
2 / 1
int
) ( 2
1
p r
N
h
ra
(7)
A query response is transmitted from a SN performing sensing to the PC through the CH hop-by-
hop. The total delay must be lower than the imposed deadline requirement T
req
for the user to accept
the query result. To achieve this, as a query response is relayed along a path, we choose a forwarding
node that satisfies the minimum transmission speed requirement. Since the distance separating a
sensing SN from the PC is d
inter
+ d
intra
and the maximum time for the query result to reach the PC is
T
req
, the minimum transmission speed, denoted by X , to satisfy the imposed deadline requirement is
set
given by:
req
ra er
set
T
d d
X
int int
(8)
Plugging in the expected values of d
inter
and d
intra
, the expected minimum transmission speed is
given by:
req
set
T
p
A
X E
2 / 1
) (2
1
3825 . 0
] [
(9)
As a query result is forwarded hop-by-hop through geographical routing, the expression above
represents the minimum per-hop transmission speed to transmit the query results from a SN to the PC
in order not to miss the deadline. Here we note that queuing delay is ignored here because not much
cross-traffic is anticipated in a query-based WSN so queuing delay is considered small compared with
transmission delay.
Let Q
t,jk
be the probability that if a packet is forwarded to SN from SN , the speed requirement
k j
would be violated. To calculate Q
t,jk
we need to know the transmission speed S from SN to SN . This
jk j k
can be dynamically measured by SN following the approach described in [8]. If S
j jk
is above E[X
set
]
then Q
t,jk
= 0; otherwise, Q
t,jk
= 1. In general S
jk
is not known until runtime. If S
jk
is uniformly
distributed within a range [a, b], then Q
t,jk
can be computed as:
a b
a X E
x E S cdf Q
set
set jk jk t
o
] [
]) [ (
,
(10)
13
Other than speed violation failure, a node may also fail to relay sensor data because of either a
sensor failure or a transmission failure, or both. Let Q
r,j
be this failure probability of a SN, say, SN .
j
Then Q
r,j
is given by:
)] 1 )( 1 [( 1
, j j r
e q Q (11)
Here for sensor failure, we have only considered hardware failure. Later in Section 5 we will extend
this to the case in which SN software faults are possible.
Cluster head
Processing center
1
m
2
Figure 2: Hop-by-Hop Data Delivery in AFTQC.
We develop a hop-by-hop data delivery scheme to implement the desired level of redundancy to
achieve QoS. For path redundancy, we want to form m paths from a source CH to the processing
p
center, as illustrated in Figure 2. This is achieved by having m nodes on the first hop relay the data,
p
and only one single node relay the data per receiving group in all subsequent hops. For source
redundancy, we want each of the m sensors to communicate with the source CH through a distinct
s
path. Here we note that a WSN is inherently broadcast based. However, a SN can specify a set of SNs
in the next hop (that is, m in the first hop and 1 in a subsequent hop) as the intended receivers and
p
only those SNs will forward data.
It has been reported that the number of edge-disjoint paths between nodes is equal to the average
node degree with a very high probability [6]. Therefore, when the density is sufficiently high such that
the average number of one-hop neighbors, n , calculated as
k
r , is sufficiently larger than m and m ,
2
p s
this hop-by-hop data delivery scheme can effectively result in m redundant paths for path redundancy
p
and m distinct paths from m sensors for source redundancy.
s s
The probability of SN failing to relay a broadcast packet to a one-hop neighbor SN because of
j k
either sensor/channel failures, or speed violation, denoted by Q
rt, jk
, is given by:
14
)] 1 )( 1 [( 1
, , , jk t j r jk rt
Q Q Q (12)
The probability that at least one next-hop SN (among the one-hop neighbors) of SN along the
j
direction of the destination node is able to satisfy the speed requirement and receive the broadcast
message is given by:
k
n f
k
jk rt j
Q
1
,
1 \
(13)
Here n is the number of neighbors; f is the fraction of neighbors that would forward data based on
k
geographical routing, e.g., f=1/4 meaning only the sensors along the quadrant toward the direction of
the target node will do data forwarding. Note that while SN forwards data to its one-quadrant
j
neighbors SN s, if one of SN s is the destination node, then the probability that the destination SN
k k
fails to receive the message due to sensor/channel failures or speed violation is exactly equal to Q
rt, jk
as given in Equation (12).
Below we derive the probability that a path is successfully formed for hop-by-hop data delivery
between the source CH and the processing center. Since there are N
h
inter
hops between the source CH
(the first SN with index 1), and the processing center (the last SN with index N
h
inter
+1), a path is
formed for data delivery if in each hop there is at least one next-hop sensor along the direction of the
target node is able to satisfy the speed requirement and receive the broadcast message, and also that
the destination node is able to satisfy the speed requirement and receive the message. Thus, the
probability that a path of length N
h
inter
is formed successfully for hop-by-hop data delivery is given by:
) 1 ( ) ( ) (
) 1 ( ,
1
1
int
int int
int
h
er
h
er
h
er
N N rt
N
j
j
h
er
Q N \
(14)
where Q
rt, N inter (N inter+1)
h h
is from Equation (12) for the probability that the PC node (the last SN with
index N
h
inter
+1 ) f ails to receive the message due to sensor/channel failures or speed violation. Here we
adopt the convention that if the upper bound is smaller than the lower bound in the product term, then
the product term evaluates to 1.
We create m
p
paths between the source CH and the PC based on the hop-by-hop data delivery
scheme discussed earlier. The source cluster will fail to deliver data to the PC if one of the following
happens:
15
1. None of the SNs in the first hop receives the message. The probability for this case is 1- .
1
2. In the first hop, i (1 i < m ) SNs receive the message, and each of them attempts to form a
p
path for data delivery. However, all i paths fail to deliver the message because the subsequent
hops fail to receive the broadcast message. The failure probability for this case is:
p
m I I i
h
er i
I i
i rt
I i
i rt
N Q Q )]} 1 ( 1 [ )}{ ]( ) 1 ( {[
int 1 , 1 ,
where I stands for a set consisting of first-hop SNs that receive the message and |I| is the
cardinality of set I. The first term is the probability that i SNs from the set of f n nodes in the
k
first hop successfully receive the message, and the second term is the probability that all i
paths fail to deliver data. Note that a subscript i has been used to label
i
to refer to path i (i.e.,
the path that starts from a particular SN with index i). Also the argument to
i
is only N
h
inter
-1
because there is one less hop to be considered in each path.
3. In the first hop, at least m SNs receive the broadcast message from the source CH from which
p
m SNs are randomly selected to forward data, but all m paths fail to deliver the message
p p
because the subsequent hops fail to receive the broadcast message. The probability for this case
is:
c
p
p
m I
m M
I M
M i
h
er i
I i
i rt
I i
i rt
N Q Q
,
,
int 1 , 1 ,
)]} 1 ( 1 [ )}{ ]( ) 1 ( {[
where M is a subset of I with cardinality of m . The second term in the above expression is the
p
probability that all m paths fail to delivery data.
p
Thus, the probability of the source cluster failing to deliver data to the processing center is given by:
c
p
p
p
p
m I
m M
I M
M i
h
er i
I i
i rt
I i
i rt
m I I i
h
er i
I i
i rt
I i
i rt
m
fp
N Q Q
N Q Q
Q
,
,
int 1 , 1 ,
int 1 , 1 ,
1
)]} 1 ( 1 [ )}{ ]( ) 1 ( {[
)]} 1 ( 1 [ )}{ ]( ) 1 ( {[
1 \ (15)
For source redundancy, instead of using one SN, we assign m SNs in each cluster to return sensor
s
readings to their CH to cope with channel/sensor faults. To implement source redundancy, SNs also
use hop-by-hop data delivery based on geographical routing to send sensor data to their CH. For a path
of N
h
intra
from a SN to the CH, again assign an index of 1 to the SN and an index of N
h
intra
+1 to the
16
CH. Then following a similar derivation, the probability that a path is formed successfully from the
SN to the CH for data delivery is given by:
) 1 ( ) ( ) (
) 1 ( ,
1
1
int
int int
int
h
ra ra
h
ra
N N rt
N
j
j
h
ra
Q N \
(16)
For source redundancy, m SNs are used for returning sensor readings. So the failure probability that
s
all m SNs within a cluster fail to return sensor reading to the CH is given by:
s
)] ( 1 [
int ra
1
h
m
i
i
m
fs
N Q
s
s
(17)
Note that in each of the m path, distinct e and S
s j jk
exist along each path depending on each paths
traffic condition. Combining results from above, the failure probability of a source cluster not being
able to return a correct response, because of either path or source failure, or both, is given by:
) 1 )( 1 ( 1
s p m
fs
m
fp f
Q Q Q
(18)
Therefore, the query success probability is given by:
f q
Q R 1 (19)
4.2 Query Processing Energy Consumption
Next we calculate energy consumed per query. For source redundancy, in response to a query, a SN
assigned would transmit a data packet to its source CH. Since the average number of hops between a
SN and its CH is given by N
h
intra
as derived above, and a query requires the use of m SNs for source
s
redundancy, the total energy required for these m SNs to forward sensor readings to the CH is given
s
by:
] ) ( [
2
int R T
h
ra s s
E r E N m E (20)
For path redundancy, let E
ch
be the total energy consumed by the WSN to transmit sensor data from
the source CH to the PC with m paths connecting the CH to the processing center. The source CH
p
would broadcast a copy of the data packet and all first-hop neighbors would receive. Then, among the
first-hop neighbors, m nodes would broadcast again and all 2nd-hop neighbors would receive. In each
p
17
of the subsequent hops on a path, only one node would broadcast and the neighbors on the next-hop
would receive. Consequently, E
ch
is given by:
] ) ( )[ 1 (
) (
2
int
2
R T
h
er p
R T ch
E r E N m
E r E E
(21)
The total amount of energy spent by the system, Eq, to answer a query that demands a source cluster
to respond, using m SNs for source redundancy and m paths for path redundancy, is given by:
s p
s ch q
E E E (22)
4.3 Energy Consumption due to Clustering
For clustering, the system would consume energy for broadcasting the announcement message
and for the cluster-join process. Since p is the probability of becoming a CH, there will be pn SNs that
would be broadcasting the announcement message. This announcement message will be received and
retransmitted by each SN to the next hop until the TTL of the message reaches the value 0, i.e. the
number of hops equals N
h
intra
. Thus, the energy required for broadcasting
is )] )( ( [
2
int R T
h
ra
E E r N pn . The cluster-join process will require a SN to send a message to the CH
informing that it will join the cluster and the CH to send an acknowledgement to the SN. Since there
are pn CHs and (n pn) SNs in the system, the energy for this is n(E
T
+ E ). Let the size of the
R
message exchange be n . E and E will be calculated from Equations (1) and (2) with n in place of n .
l R T l b
Let N
iteration
be the number of iteration required to execute the clustering algorithm. Then, the energy
required for each execution of the clustering algorithm, E
clustering
, is given by:
) ( )] )( ( [
2
int R T R T
h
ra iteration clustering
E E n E E r N pnN E
(23)
4.4 System Lifetime Mean Time to Failure
Our objective is to find the best redundancy level represented by m and m that would satisfy the
p s
query reliability and timeliness requirements while maximizing MTTF, when given a set of system
parameter values characterizing the application and network conditions. That is, if T
req
and R
req
are the
timing and reliability requirements of a query, then we determine the best combination of (m , m )
p s
such that the MTTF is maximized, subject to the constraint:
18
req q
R R ! (24)
Note that the constraint given above implies that the timing requirement T
req
is also satisfied
because we consider the probability of minimum transmission speed being satisfied when we derive R
q
in Equation (19).
From a users perspective, if the user does not see a response returned within the specified real-time
constraint, the system is considered as having failed. We define a metric called the mean time to
failure (MTTF) of the sensor system that considers this failure definition. Specifically, we define the
MTTF of a sensor data system as the average number of queries that the system is able to answer
correctly before it fails, with the failure caused by either channel or sensor faults (such that a response
is not delivered within the real-time deadline), or energy depletion.
When m paths and m SNs are used to achieve R in order to satisfy condition (24), the amount of
p s q
energy consumed is given by E in Equation (22) above. Consider for the time being that the system
q
fails due to energy depletion only. Then, the system fails when the systems energy falls below
E
threshold
. Let the potential maximum lifetime of the system be denoted by T
life
. There are two sources
of energy consumption: query processing and periodic clustering. Also consider the case in which
queries arrive at the system as a Poisson process with rate
q
. The energy consumed due to query
processing is given by E
q
q
T
life
where
q
T
life
is the maximum number of queries the system can
possibly process during its lifetime. On the other hand, the energy expended due to the execution of
the periodic clustering algorithm is given by E
clustering
T
life
/T
clustering
where T
life
/T
clustering
is the number of
times the clustering algorithm is executed during the system lifetime. Thus, T
life
can be calculated as
follows:
threshold initial
clustering
life
clustering life q q
E E
T
T
E T E
(25)
The maximum number of queries that the system is able to sustain before running out its energy,
denoted by N , is given by:
q
) / (
) (
clustering clustering q q
threshold initial q
life q q
T E E
E E
T N
(26)
Since the system is able to answer N queries before energy depletion, each with the reliability of
q
R , the MTTF of the system is the expected number of queries that the system can answer without
q
experiencing a failure with the upper bound of N , i.e.,
q
19
1
1
(1 )
q
q
N
N
i
q q q q
i
MTTF iR R N R
(27)
This MTTF metric can be translated into a more classic system lifetime metric with the unit of
time, i.e., mean lifetime to failure (MLTF), as follows:
q
MTTF
MLTF
(28)
4.4. Generalization
Certain assumptions have been made in the paper to simplify the mathematical analysis. Below we
discuss how these assumptions can be relaxed to generalize the model.
4.4.1 Query Involving Multiple Clusters for a Response
The analysis can be easily extended to the case where multiple source clusters are demanded. Let
P (k) be the probability that a query requires k source clusters to respond. Let R (k) be the query
q q
success probability for a query that requires k source clusters to respond, and E (k) be the energy
q
consumption of the system to answer a query that requires k source clusters. The expressions for R (k)
q
and E (k) can be easily derived from those based on a single source cluster, i.e., through Equations
q
(19) and (22), respectively, based on the application requirements (e.g., the query is considered
successful if all k source clusters must return sensor readings). Then E
q
would be given by the
expected value of E (k) as:
q
1
( ) ( )
np
qqq
k
E EkP k
(29)
The success probability of a query, R , would be given as:
q
1
( ) ( )
np
qqq
k
R RkP k
(30)
The same analytical expression for the MTTF as given by Equation (27) with new E and R given
q q
in Equations (29) and (30) then can be used to analyze the effect of P (k).
q
4.4.2 Concurrent Query Processing with Distinct QoS Requirements
Our analysis can also be extended to scenarios in which queries arrive at the system concurrently,
say, by multiple users, by simply measuring e (transmission failure probability experienced by SN )
j j
20
and S
jk
(progressive speed if the packet is forwarded from SN to SN ) to properly account for the
j k
interference and noise introduced due to simultaneous transmission of data packets by SNs. The reason
is that the MTTF metric is based on the number of queries that the system is able to service before it
fails, so it does not matter whether queries are processed sequentially or concurrently, as long as the
interference and noise introduced due to simultaneous transmission of data packets by SNs have been
properly accounted for in calculating the query success probability (R ) and energy consumption (E ).
q q
In reality, queries may be in different service classes and thus have different QoS requirements. The
analysis can be extended to handle this more general case by considering the probability of a query
being in a particular QoS class and computing the weighted R and E of a query, and consequently the
q q
MTTF of the system. For example a timeliness requirement can be (1sec, 5sec, 10sec) and a reliability
requirement can be (0.999, 0.99, 0.9) so there will be nine QoS classes. For each QoS class, say, class
i, we apply the analysis method to calculate R
q,i
and E
q,i
for class i only. Then given knowledge of the
probability that a query is in class i, PQoS , we can calculate the expected reliability and energy
i
consumption per query, R and E , as:
q q
i q
R PQoS
q
R
i
i
,
(31)
i q
E PQoS
q
E
i
i
,
(32)
Then the MTTF calculation can use R
q
and E
q
instead of R and E
q q.
4.4.3 Software Fault
For source redundancy, m SNs are used for returning sensor readings. If we consider both hardware
s
and software failures of SNs, the system will fail if the majority of SNs does not return sensor readings
(due to hardware failure), or if the majority of SNs returns sensor readings incorrectly (due to software
failure). Assume that all SNs have the same software failure probability, denoted by q . Also assume
s
that all sensors that sense a given event make the same measurements. Then, to account for software
failure, Equation (15) can be replaced with Equation (33) below.
21
]} ) 1 (
| |
[ 1 {
)} ( )]}{ ( 1 [ {
)} ( )]}{ ( 1 [ {
) | (|
| |
2
2
| |
int int
2
| |
int int
j I
s
j
s
I
m
j
m
I
h
ra
I i
i
h
ra
I i
i
m
I
h
ra
I i
i
h
ra
I i
i
m
fs
q q
j
I
C
N N
N N Q
s
s
s
s
~
~
~
~
c
~
~
c
v
(33)
Here the first expression is the probability that the majority of m SNs failing to return sensor readings
s
due to hardware failure, and the second expression is the probability that the majority of m SNs
s
returning sensor readings but no majority of them agrees on the same sensor reading as the output
because of software failure. Here we note that sensor reading errors may be resulting from software
faults or from falsified readings by a compromised sensor node having been attacked and that the
majority voting mechanism proposed can cope with both types of sensor reading errors.
4.4.4 Data Aggregation
The analysis performed thus far assumes that a source CH does not aggregate data. The CH may
receive up to m redundant sensor readings due to source redundancy but will just forward the first one
s
received to the PC. Thus, the data packet size is the same. For more sophisticated scenarios,
conceivably the CH could also aggregate data for query processing and the size of the aggregate
packet may be larger than the average data packet size. We extend the analysis to deal with data
aggregation in two ways. The first is to set a larger size for the aggregated packet that would be
transmitted from a source CH to the PC. This will have the effect of favoring the use of a smaller
number of redundant paths (i.e., m ) because more energy would be expended to transmit aggregate
p
packets from the source CH to the PC. The second is for the CH to collect a majority of sensor
readings from its sensors before data are aggregated and transmitted to the PC. The analysis of data
aggregation thus in effect is the same as the one we have performed for SN software faults in Section
4.4.3 requiring a majority of sensors to return correct sensor readings.
4.4.5 Forward Traffic
The analysis performed in the paper considers only the reserve traffic for response propagation from
SNs to the PC but neglects the forward traffic for query dissemination from the sink to the CH and
SNs. The reliability and energy consumption of the forward traffic due to hop-by-hop query delivery
22
can be calculated by following a similar analysis as for the reverse traffic. The success probability (R )
q
would be adjusted by considering the forward traffic and reverse traffic together as a series system.
The energy consumption of a query (E ) would be used to calculate the maximum number of queries
q
the system can possibly process. This, along with R , would allow MTTF to be calculated.
q
5 Evaluation
In this section we present numeric data to demonstrate the tradeoff between R and E and that
q q
there exists an optimal (m , m ) set that would maximize the MTTF of the sensor system while
p s
satisfying Condition (24). Table 1 lists the parameters used along with their default parameters. Our
WSN consists of 1000 sensor nodes distributed according to a Poisson process with density in a
square area of 400m by 400m. Each SN has a transmission radio range of 40 m. The initial bandwidth
of the wireless channel is 200Kb/s. Each SN has an initial energy of 10 Joule. The energy parameters
used by the radio module are adopted from [1, 2]. The energy cost to run the transmitter/receiver radio
circuitry per bit processed (E
elec
) is chosen to be 50nJ/bit. The energy used by the transmit amplifier to
achieve an acceptable signal to noise ratio (
amp
) is chosen to be 10 pJ/bit/m .
2
While in reality e (transmission failure probability experienced by SN ) and S
j j jk
(progressive
speed if the packet is forwarded to SN from SN ) vary depending on network traffic, we consider e =
k j j
e and S being uniformly distributed with parameters [a, b] to simplify the analysis. We vary other key
jk
parameters to study their effect on optimal (m, m ) and MTTF.
s
Parameter Default Value Parameter Default Value
m
p
[1 10] A 400m
m
s
[1 10] n
b
50 bytes
n 1000 E
elec
50 nJ/bit
n
s
100
amp
10 pJ/bit/m
2
q 10
-6
E
o
10 Joule
e [0.0001 0.1] E
threshold
0 Joule
r 40 m N
iteration
3
f T
clustering
[5 20] sec
10 nodes/(40 x 40 m )
2
T
req
[0.3 1.0] sec
q
1 query/min B 200Kb/s
Table 1: Parameter Default Values.
23
5.1 MTTF Analysis
Table 2 summarizes the optimal (m , m ) set that would maximize the MTTF of the sensor
p s
system under the environment characterized by the set of parameter values listed in Table 1. Other
parameter values may generate different (m , m ) but the trend remains the same. We see that as
p s
wireless transmission reliability e increases, the system tends to use more redundancy to satisfy
Condition (24) and to maximize the MTTF of the sensor system. Also as the real-time deadline
increases, the system tends to use less redundancy. In the special case in which the network is
extremely reliable and the deadline is not stringent, the optimal (m , m ) is at (1, 1). We observe that
p s
there always exists an optimal (m , m ) set that would maximize the MTTF of the sensor system.
p s
Table 2: Optimal (m , m ) with varying e and T
p s req
.
1
2
3
4
5
6
7
1
2
3
4
5
6
7
0
50 00 00
100000 0. 0
150000 0. 0
m
s
m
p
M
T T F
e=0.0001, T
req
=1.0
e=0.001, T
req
=1.0
Figure 3: MTTF vs. (m , m ) with T
p s req
= 1 sec, e = [0.0001-0.001].
T
req
e=0.0001 0.00 1 0.01
0.4 sec 5,5 5,5 5,6
0.5 sec 3,3 4,4 4,4
1.0 sec 2,2 3,3 4,4
2 sec 1,1 2,1 2,3
5 sec 1,1 1,1 2,3
24
Figure 3 shows a snapshot of the MTTF of the sensor system as a function of (m , m ) with
p s
T
req
=1.0. Two 3-D graphs are shown in Figure 3 to show the effect of e. The top 3-D graph is for the
case in which e=0.0001 where the optimal (m , m ) set is (2, 2) at which the MTTF is maximized. The
p s
bottom 3-D graph is for the case in which e=0.001 for which the optimal (m , m ) set is (3, 3). We see
p s
from these two 3-D diagrams that either inadequate or excessive redundancy is detrimental to the
MTTF of the sensor system.
The existence of the optimal (m , m ) set can be best understood by seeing the tradeoff between
p s
R and E as a function of (m , m ). Figure 4 shows R vs. (m , m ) as a function of (m , m ). When
q q p s q p s p s
either m
p
or m
s
increases, R increases. In particular, R
q q
is more sensitive to m
p
because in the
environment tested, the distance between the processing center and a CH (N
h
inter
) is longer than that
between the CH and a SN within a cluster (N
h
intra
). Consequently, incorporating path redundancy
(represented by m ) greatly improves R compared with source redundancy (represented by m ).
p q s
1
2
3
4
5
6
7
1
2
3
4
5
6
7
0. 9 975
0. 9 985
0. 9 995
1
m
s
m
p
R
q
e=0.0001,T
req
=1
e=0.001,T
req
=1
Figure 4: R vs. (m , m ) with T
q p s req
=1 sec, e = [0.0001 0.001].
Correspondingly, Figure 5 shows the energy consumption as a function of (m , m ). We see
p s
that the energy consumption per query is monotonically increasing as either m
p
or m increases.
s
Therefore, if more redundancy is used to answer a query, on one hand the MTTF would increase due
to a higher R (to satisfy Condition (24)), but on the other hand the MTTF would decrease due to a
q
high E . As a result, an optimal redundancy level in terms of optimal (m , m ) exists.
q p s
Next we test the effect of the real-time deadline on MTTF. Figure 6 shows a snapshot of the
MTTF of the sensor system as a function of (m , m ) with e=0.0001 with varying T
p s req
. The top 3-D
graph is for the case in which T
req
=1.0 for which the optimal (m , m ) set is (2, 2) at which the MTTF
p s
is maximized. The bottom 3-D graph is for the case in which T
req
=0.5 for which the optimal (m , m )
p s
25
set is (4, 4). In general, we observe that as T
req
increases (less stringent real-time deadline constraints),
the MTTF increases. Also the system would select less redundancy to maximize the MTTF of the
system.
1
2
3
4
5
6
7
1
2
3
4
5
6
7
0
0. 0 05
0. 01
0. 0 15
0. 02
0. 0 25
m
s
m
p
E
q
Figure 5: E vs. (m , m ).
q p s
1
2
3
4
5
6
7
1
2
3
4
5
6
7
0
500000 . 0
1,000,000
1,500,000
m
s
m
p
M
T T F
e=0.0001, T
req
=1.0
e=0.0001, T
req
=0.5
Figure 6: MTTF with e = 0.0001, T
req
= [0.5 1.0] sec.
5.2 Comparison of AFTQC vs. Baseline
We compare our design with a baseline design in which there is no redundancy and the classic
acknowledgement and retransmission on timeout mechanism is used for data transmission.
26
Figures 7 and 8 show a snapshot of the MTTF of the sensor system as a function of (m , m ) in
p s
logarithmic scale in order to more vividly show the baseline design case. Figure 7 is for the case in
which the channel transmission reliability is relatively high, i.e., e=0.0001. The top 3-D graph
shows the MTTF under AFTQC. The bottom 3-D graph shows the MTTF using the baseline design
(labeled as m = 1, m = 1 with ACK). We observe that AFTQC (without ACK) greatly increases
p s
the MTTF compared with the baseline design under this set of parameter values characterizing the
WSN. We also observe that when the WSN is extremely reliable, i.e., when e is extremely small, so
that the optimal (m , m ) is at (1, 1), AFTQC still yields a higher MTTF than the baseline system
p s
because no acknowledgement is being used by AFTQC which saves energy.
Next we consider a case in which the channel transmission reliability is relatively low, i.e., e=0.1.
We observe that when the network is not reliable, the baseline scheme only marginally performs
better than AFTQC when (m , m ) is set to (1, 1) to run AFTQC. In all other settings, AFTQC
p s
significantly outperforms the baseline scheme, the effect of which is especially pronounced at the
optimal (m , m ) = (7, 7). Summarizing the results observed from Figures 7 and 8, we conclude that
p s
AFTQC operating under the optimal (m , m ) set always outperforms the baseline scheme and that
p s
properly utilizing redundancy would prolong the system lifetime while satisfying QoS requirements
of queries.
1
2
3
4
5
6
7
1
2
3
4
5
6
7
1
2
3
4
5
6
7
m
s
m
p
l
o
g
(
M
T
T
F
)
e=0.0001,T
req
=1.0, AFTQC without ACK
m=1, m =1, baseline with ACK
s
Figure 7: AFTQC vs. Baseline with T
req
= 1 sec, e = 0.0001 in Logarithmic Scale.
27
1
2
3
4
5
6
7
1
2
3
4
5
6
7
0
1
2
3
4
5
6
m
s
m
p
M T
T
F
e=0.1,T
req
=1.0, AFTQC without ACK
e=0.1,T
req
=1.0, baseline with ACK
Figure 8: AFTQC vs. Baseline with T
req
= 1 sec, e = 0.1 in Logarithmic Scale.
5.3 Effect of Clustering on MTTF
In this section we analyze the effect of clustering on the proposed algorithm. We also analyze
the effect of different clustering intervals on the system MTTF.
Figure 9 shows a snapshot of the MTTF of the WSN system as a function of (m , m ) with
p s
T
req
=1.0, e = 0.0001 to show the effect of clustering. All 3-D graphs show the optimal (m , m ) set of
p s
(2, 2) at which the MTTF is maximized. The top 3-D graph shows the ideal baseline case in which the
energy used for clustering is zero, i.e., E
clustering
=0. The second 3-D graph is for the case when the
clustering interval T
clustering
= 20 sec. The third 3-D graph is the case when the clustering interval
T
clustering
= 5 sec. The energy consumed E
clustering
by the last two cases is calculated by Equation (23).
We see that when the clustering interval is short (T
clustering
= 5 sec), the MTTF values are lower
than that under the ideal baseline case. This is because the energy consumption by the clustering
algorithm is significant in this case. When the clustering interval is sufficiently long (T
clustering
= 20
sec), the system achieves about the same MTTF value as the ideal baseline case. In this case, the
energy consumption by the clustering algorithm is small and does not significantly affect the system
MTTF.
28
1
2
3
4
5
6
7
1
2
3
4
5
6
7
0
40000 0
80000 0
1 20000 0
1 60000 0
m
s
m
p
M T
T
F
e=0.0001,T
req
=1.0,baseline
e=0.0001,T
req
=1.0,T
clust
=20 sec
e=0.0001,T
req
=1.0,T
clust
=5 sec
Figure 9: Effect of Clustering Intervals on MTTF with e = 0.0001, T
req
= 1.0 sec.
Finally we note that the MTTF curves for all three cases show the same trend with respect to
(m , m ) with the optimal set at (2, 2) and that the optimal (m , m ) set is relatively insensitive to the
p s p s
energy used by the clustering algorithm. This is due to the assumption that clustering is executed
frequent enough to maintain perfect rotation of CHs, so the frequency of clustering will only affect the
total energy consumed but will not affect the optimal (m , m ) set selected. In Section 6 we will
p s
conduct a simulation study to identify the frequency of clustering under which the assumption is
justified, and compare simulation vs. analytical results.
5.4 AFTQC with Software Failure
Finally here we analyze the effect of software faults on MTTF. Figures 10 and 11 show a
snapshot of the MTTF of the sensor system as a function of (m , m ) with T
p s req
=1.0 after applying
Equation (33) derived in Section 4.4.3 for modeling software failure in the calculation. Figures 10 and
11 show the shift of the optimal (m , m ) when software failure is included compare with the case
p s
when there is no software failure. Figure 10 is for the case in which e = 0.0001, T
req
= 1.0. The top 3-D
graph is for the case when we do not include software failure in the analysis. For this case, the optimal
(m , m ) set is (2, 2) at which the MTTF is maximized. The bottom 3-D graph is for the case when we
p s
include software failure in the analysis. For this case, the optimal (m , m ) set is (2, 3). We see that
p s
when software failure is included in the analysis, the optimal (m , m ) is changed from (2, 2) to (2, 3).
p s
This reflects the fact that when software faults are possible, the system tends to choose a larger
29
number of sensor nodes to increase the probability that the majority agrees on the same sensor reading,
e.g., in this case optimal m is changed from 2 to 3. Figure 11 is for the case in which e = 0.001, T
s req
=
1.0. In this case, the optimal is changed from (3, 3) to (3, 4). Again, we see that the system chose a
larger number of sensor nodes to cope with software failure.
1
2
3
4
5
6
7
1
2
3
4
5
6
7
0
400 ,000
800 ,000
1,200,000
1,600,000
m
s
m
p
M
T
T
F
e=0.0001,T
req
=1.0, no software failure
e=0.0001,T
req
=1.0, with software failure
Figure 10: AFTQC with/without Software Failure with e = 0.0001, T
req
= 1.0 sec.
2
2
3
4
5
6
7
1
2
3
4
5
6
7
0
400,000
800,000
1 200000
m
s m
p
M
T
T
F
e=0.001,T
req
=1.0, no software failure
e=0.001,T
req
=1.0, with software failure
Figure 11: AFTQC with/without Software Failure with e = 0.001, T
req
= 1.0 sec.
30
6 Simulation
Parameter Value Parameter Value
m
p
[1 4] n
b
50 bytes
m
s
[1 4] n
q
10 bytes
N 600 E
T
0.00 002 64 J
n
s
100 E
R
0.00 002 J
q 0.00 01 E
o
0.05 J
e 0.00 01 E
s
threshold
0.00 002 64 J
r 40 m T
clustering
5-20 sec
f T
req
1.0 sec
q
1 query/sec B 200Kb/s
A 400m
Table 3: Parameter Values in Simulation.
In this section, we present simulation results to compare with analytical results for the purpose
of validation. Table 3 lists a default set of parameter values used in the simulation. We use J-Sim as
our simulation framework. We consider a small-scaled WSN so we could obtain simulation results
with statistical significance. In our simulation environment, SNs are distributed in a square terrain area
of size A
2
in accordance with a population distribution function. We consider two population
distribution functions, uniform distribution vs. homogeneous Poisson, and analyze the sensitivity of
simulation results with respect to SN population distributions. SNs use stateless non-deterministic
geographic routing as described in [17]. To simulate geographic routing, we utilize the Node Position
Tracker implemented in J-Sim. We use the S-MAC protocol [18] in our simulation of the sensor MAC
layer. A query is considered as not being executed successfully if one of the following conditions
happens:
If all m SNs fail to deliver sensor readings to the source CH, due to a combination of link
s
failure, SN energy depletion or SN hardware/software failures;
All paths between the CH and the PC are broken, due to a combination of link failure, SN
energy depletion and SN hardware failure;
The query result is not returned within the deadline requirement T . We accumulate the time it
q
takes to propagate the results back based on the progressive speeds of the SNs chosen to
forward data. For each segment (from a SN to the CH and from the CH to the PC) we use the
transmission time of the first path that returns the query result to get the total response time. If
31
SN measurement software faults are considered, the transmission time for all the SN-CH paths
to return sensor readings to the CH is considered instead.
The simulation runs in rounds. In each round, we record the number of queries processed
successfully, which is recorded as an instance of the system MTTF. We use the batch mean analysis
technique to obtain MTTF, treating each MTTF obtained from a simulation run as a data point in order
to obtain the average MTTF within a specified confidence interval and accuracy. We run the
simulation until we archive 95% confidence level and 10% accuracy. To achieve this, we collect
observations in batches with 1000 observations in each batch. In one batch we obtain a batch mean out
of 1000 observations collected. We run at least 10 batches to get a minimum of 10 batch means from
which we calculate the grand mean and estimate the difference of the grand mean from the true mean
with 95% confidence. If the accuracy obtained is greater than 10%, we run more batches and collect
more observations until the specified 10% accuracy requirement is met. We run the simulation for the
optimal (m , m ) and other non-optimal (m , m ) values. The results are used to draw a 3-D graph
p s p s
representing MTTF based on m and m against which analytical results are compared and validated.
p s
Below we compare simulation results obtained with analytical results under identical parameter value
sets.
1
2
3
4
1
2
3
4
250 0
300 0
350 0
400 0
450 0
500 0
550 0
m
s
m
p
M
T T F
Analytical
Simulation
Figure 12: Comparison of Analytical and Simulation Results for MTTF vs. (m , m ).
p s
Figure 12 compares simulation results obtained vs. analytical results for a query-based WSN
operating under the set of parameter values listed in Table 3. We see that the simulation and analytical
MTTF curves correlate very well, with the same optimal (m , m ) at (2, 2). We have also conducted a
p s
simulation study that considers changes in the network conditions. Specifically, in addition to
32
simulating transmission failure and transmission speed violation, we also simulate sensor node
hardware failure and energy depletion. Figure 13 compares simulation results vs. analytical results
when such network dynamics are considered. Again the results show good correlation. Both
simulation and analytical results confirm that in a WSN characterized by the set of parameter values in
Table 3, the system can better tolerate sensor failures due to hardware or energy depletion when proper
source and path redundancy is employed, especially at the optimal (m , m ) identified.
p s
1
2
3
4
1
2
3
4
2000
2500
3000
3500
4000
4500
5000
5500
m
s
m
p
M T T
F
Analytical
Simulation
Figure 13: Simulation vs. Analytical Results in MTTF vs. (m , m ) when Network Dynamics are
p s
Considered.
1
2
3
4
1
2
3
4
0
1 000
2 000
3 000
4 000
5 000
m
s
m
p
M
T
T
F
Analytical, T
clustering
=20 sec
Simulation, T
clustering
=5 min
Simulation, T
clustering
=20 sec
Simulation, T
clustering
=5 sec
Figure 14: Simulation Results for the Effect of Clustering Intervals on MTTF vs. (m , m ).
p s
33
Next we conduct simulation experiments to determine the minimum clustering interval under which
the assumption of fair rotation among SNs as the CH is justified and also to validate analytical results
for the effect of clustering intervals on MTTF. The simulation results (bottom three curves) shown in
Figure 14 confirm that using a short clustering interval (T
clustering
= 5 sec. vs. 20 sec. vs. 5 min) will
result in a smaller MTTF since more energy would be consumed when the clustering algorithm is
executed more often. The simulation results also reveal that the system can achieve a perfect rotation
with T
clustering
= 5 sec. and near-perfect rotation with T
clustering
= 20 sec under the given workload
condition. The analytical results at T
clustering
= 20 are also shown in Figure 14 (top curve) which
correlate well with simulation results. Consequently, we conclude that the assumption of fair rotation
in the analytical model is justified.
Lastly, we have conducted simulation studies to compare the case when SNs are distributed
according to a homogeneous Poisson process vs. the case when SNs are distributed uniformly to test
the sensitivity of simulation results with respect to SN population distribution. Figure 15 shows that
the simulation results are insensitive to these two types of distribution used with the mean percentage
difference between them being only 0.69%.
Figure 15: Comparison of Simulation Results between Poisson and Uniform Distribution of SNs.
7 Applicability and Future Work
In this paper we have developed an adaptive fault tolerant QoS control (AFTQC) algorithm which
incorporates path and source redundancy mechanisms to satisfy query QoS requirements while
1
2
3
4
1
2
3
4
2500
3000
3500
4000
4500
5000
m
s
m
MTTF
Simulation - Uniform
Simulation - Poisson
p
34
maximizing the lifetime of query-based sensor networks. We discussed how these mechanisms can be
realized using hop-by-hop packet data delivery and derived the probability of successful data delivery
within a real-time constraint (R ), as well as the amount of energy consumed (E ) per query. When
q q
given a set of parameter values characterizing the operating and workload conditions of the
environment, we identified the optimal (m , m ) setting that would maximize the MTTF while
p s
satisfying the application QoS requirements.
To apply the results derived in this paper, one could build a table at design time listing MTTF as a
function of (m , m ) covering a perceivable set of parameter values. Dynamic parameter values such as
p s
e and distribution of S
j jk
can be predicted by using local measurements, or, alternatively collected
either proactively or reactively by the CHs at the expense of energy consumption. Then, a simple table
lookup could be performed at runtime to determine the optimal (m , m ) that could satisfy the QoS
p s
requirements and maximize the MTTF.
In the future, we plan to provide a more detailed analysis of the effect of network dynamics on
MTTF, such as more energy may be consumed by some SNs over others or some SNs may fail earlier
than others. This affects the number of SNs in a cluster as time progresses and makes several key
parameters such as r, p, e and S
j jk
as a function of time. Finally, we plan to consider the use of
acknowledgement and timeout mechanisms in our hop-by-hop data delivery scheme at various levels,
such as hop-by-hop or end-to-end, and identify the optimal (m , m ) that minimizes MTTF, as well as
p s
conditions under which no-ACK is better than ACK-based data delivery schemes, or vise versa.
References
[1] O. Younis and S. Fahmy, HEED: A Hybrid Energy Efficient, Distributed Clustering Approach for Ad Hoc Sensor
Network, IEEE Transaction on Mobile Computing, Vol. 3, No. 3, October-December 2004, pp. 366-379.
[2] W. Heinzelman, C. Chandrakasan and H. Balakrishnan, An Application-Specific Protocol Architecture for Wireless
Microsensor Networks, IEEE Transactions on Wireless Communication, Vol. 1, No. 4, 2002, pp. 660-670.
[3] P. Mhatre, et al. A Minimum Cost Heterogeneous Sensor Network with a Lifetime Constraint, IEEE Transactions
on Mobile Computing, Vol. 4, No. 1, 2005, pp. 4-15.
[4] D. Chen, and P. Varshney, QoS Support in Wireless Sensor Networks: A Survey, International Conference on
Wireless Networks, Las Vegas, Nevada, USA, June 21-24, 2004.
[5] K Sohrabi, J. Gao, V. Ailawadhi, and G. Pottie, Protocol for Self-Organization of a wireless Sensor Network, IEEE
Personal Communications, October 2000, pp. 16-27.
[6] B. Deb, S. Bhatnagar and B. Nath, ReInForM: Reliable Information Fowarding using Multiple Paths in Sensor
Networks, 28
th
Annual IEEE Conference on Local Computer Networks, Bonn, Germany, Oct. 2003
35
[7] M Perilo, and W. Heinzelman, Providing Application QoS through Intelligent Sensor Management, 1
st
IEEE
International Workshop on Sensor Network Protocols and Applications, May 2003.
[8] E. Felemban, C. G. Lee, and E. Ekici, MMSPEED: Multipath Multi-SPEED Protocol for QoS Guarantee of
Reliability and Timeliness in Wireless Sensor Networks, IEEE Transactions on Mobile Computing, Vol. 5, No. 6,
June 2006, pp. 738-754.
[9] R. Iyer, and L. Kleinrock, QoS Control for Sensor Networks, IEEE Conference on Communications, May 2003.
[10]P. Gupta and P.R. Kumar, Critical Power for Asymptotic Connectivity in Wireless Networks, Stochastic Analysis,
Control, Optimizations, and Applications, W.M. McEneaney, G. Yin, and Q. Zhang (Eds.), 1998.
[11] S. Bandyopadhyay, and E. Coyle, An Energy Efficient Hierarchical Clustering Algorithm for Wireless Sensor
Networks, IEEE INFOCOM, April 2003, pp. 1713-1723.
[12] Y. Sankarasubramaniam, O. B. Akan, and I. F. Akyildiz. ESRT: Event-to-sink Reliable Transport in Wireless Sensor
Networks, 4th ACM International Symposium on Mobile Ad Hoc Networking and Computing, Annapolis, MD, USA,
June 2003, pp. 177188.
[13] J. A. Gutierrez, E. H. Callaway, Jr., and R. L. Barrett, Jr., Low-Rate Wireless Personal Area Networks, IEEE Press,
New York, NY, 2004.
[14] O. Younis, S. Fahmy and P. Santi, Robust Communication for Sensor Networks in Hostile Environments, 12
th
IEEE
International Workshop on Quality of Service, June 2004, pp. 10-19.
[15]G. Bravos, A. Kanatas, Energy Consumption and Trade-offs on Wireless Sensor Networks, IEEE 16th International
Symposium on Personal, Indoor and Mobile Radio Communications, Vol. 2, pp. 1279- 1283, September 2005.
[16]Q. Shi, Power Management in Networked Sensor Radios - A Network Energy Model, IEEE Sensors Applications
Symposium, pp. 1-5, San Diego, CA, February 2007.
[17]T. He, J. Stankovic, C. Lu, T. Abdelzaher, SPEED: A. Stateless Protocol for Real-Time Communication in Sensor
Networks, Proceedings of 23
rd
International Conference on Distributed Computing Systems, pp. 46 55, May 2003.
[18]V. Tippanagoudar, I. Mahgoub, A. Badi, Implementation of the Sensor-MAC Protocol for the JIST/SWANS
Simulator, IEEE/ACS Conference on Computer Systems and Applications, Amman, Jordan, pp. 225-232, May 2007.