Networked Wireless Sensor Data Collection: Issues, Challenges, and Approaches
Networked Wireless Sensor Data Collection: Issues, Challenges, and Approaches
Networked Wireless Sensor Data Collection: Issues, Challenges, and Approaches
AbstractWireless sensor networks (WSNs) have been applied is constrained by the battery attached on it, and the network
to many applications since emerging. Among them, one of the lifetime in turn depends on the lifetime of sensor nodes, thus
most important applications is Sensor Data Collections, where to further reduce the costs of maintenance and redeployment,
sensed data are collected at all or some of the sensor nodes
and forwarded to a central base station for further processing. the consideration of energy efficiency is often preferred in a
In this paper, we present a survey on recent advances in this WSN design [11]. Moreover, these challenges are complicated
research area. We first highlight the special features of sensor by the wireless losses and collisions during sensor nodes
data collection in WSNs, by comparing with both wired sensor communicate with each other.
data collection network and other WSN applications. With these
features in mind, we then discuss the issues and prior solutions
on the utilizations of WSNs for sensor data collection. Based On the other hand, the requirements specified by sensor data
on different focuses of previous research works, we describe collection applications also raise issues that need to be consid-
the basic taxonomy and propose to break down the networked ered in the network design. First of all, the deployed sensors
wireless sensor data collection into three major stages, namely,
the deployment stage, the control message dissemination stage may need to cover the full area that the sensor data collection
and the data delivery stage. In each stage, we then discuss the application is interested in. And to acquire data accurately,
issues and challenges, followed by a review and comparison of sensors may be required to be put at specific locations. Also
the previously proposed approaches and solutions, striving to different types of data (temperature, light, vibration) may be
identify the research and development trend behind them. In obtained by different sensors with different sampling rates.
addition, we further discuss the correlations among the three
stages and outline possible directions for the future research of These issues may cause unbalanced energy consumptions over
the networked wireless sensor data collection. a WSN and significantly shorten the network lifetime if not
handling carefully. In addition, since data are required to be
Index TermsWireless sensor network, sensor data collection,
deployment, data gathering, message dissemination. delivered to the base station without any information loss, the
data aggregation/fusion operations [12] are hard to be applied,
which calls for novel solutions for enhancing the network
I. I NTRODUCTION
performance.
IRELESS sensor networks have been applied to many
W applications since emerging [1]. Among them, one
of the most important applications is sensor data collection,
In this paper, we present a survey on recent advances
of tackling these challenges. By comparing with both wired
where sensed data are continuously collected at all or some of
sensor data collection networks and other applications of
the sensor nodes and forwarded through wireless communica-
WSNs, we first highlight the special features of sensor data
tions to a central base station for further processing. In a WSN,
collections in WSNs. With these features in mind, we then
each sensor node is powered by a battery and uses wireless
discuss issues and prior solutions on sensor network deploy-
communications. This results in the small size of a sensor
ment and data delivery protocols. In addition, we discuss
node and makes it easy to be attached at any location with
different approaches for control message dissemination, which
little disturbances to the surrounding environment. Such flex-
acts as an indispensable component for network control and
ibility greatly eases the costs and efforts for deployment and
management and can greatly affect the overall performance of
maintenance and makes wireless sensor network a competitive
WSNs for sensor data collections.
approach for sensor data collection comparing with its wired
counterpart. In fact, a wide range of real-world deployments
have be witnessed in the past few years. Examples are across The remainder of this paper is organized as follows: In
wildlife habitat monitoring [2], environmental research [3][4], Section II, we compare WSNs for sensor data collection with
volcano monitoring [5][6], water monitoring [7], civil engi- the wired sensor data collection networks and WSNs for
neering [8][9] and wildland fire forecast/detection [10], to other applications, aiming to highlight the special features to
name but a few. be considered in the network design. Section III presents a
The unique features of WSNs, however, also bring many detailed investigation on different deployment strategies and
new challenges. For instance, the lifetime of a sensor node Section IV discusses issues and solutions on the data delivery
protocol design. Prior mechanisms on message dissemination
for network management and control are investigated in Sec-
Feng Wang and Jiangchuan Liu are with the School of Computing Sci-
ence, Simon Fraser University, British Columbia, Canada (email: {fwa1, tion V. Finally, Section VI concludes the paper and gives
jcliu}@cs.sfu.ca). further discussions on the directions of future work.
2 SUBMITTED TO IEEE COMMUNICATIONS SURVEYS & TUTORIALS FOR POSSIBLE PUBLICATION
II. OVERVIEW taken to find out and replace the broken line. In addition, the
A. Wireless Sensor Networks sensing environment itself may make the wired deployment
and its maintenance very difficult, if not impossible. For
As a type of newly emerged network, WSN has many
example, the environment near a volcano [5][6] or a wildfire
special features comparing with traditional networks such
scene [10], where the hot gases and steams can damage a
as Internet, wireless mesh network and wireless mobile ad
wire easily. Indeed, even in a less harsh environment like
hoc network. First of all, a sensor node after deployed is
wild habitat [2][3][4] or a building [17][9][7], the threats from
expected to work for days, weeks or even years without further
rodents are still critical and make the protection of wires much
interventions. Since it is powered by the attached battery,
more difficult than that of sensors. All these issues make
high efficient energy utilization is necessary, which is different
wireless sensor network a pleasant choice as it emerges with
from Internet as well as wireless mesh and mobile ad hoc
technology advances.
network, where either constant power sources are available or
On the other hand, although many research efforts have
the expected lifetime is several order of magnitude lower than
been done on WSNs, and quite a few prototype or preliminary
it is for WSNs.
systems have been deployed, sensor data collection in WSNs
Although a sensor node is expected to work through a
is still in its early stage and its special features call for novel
long time, it is often not required to work all the time, i.e.,
approaches and solutions different from other applications.
it senses ambient environment, processes and transmits the
For example, a common work pattern in most of other
collected data; it then idles for a while until the next sensing-
applications, such as target tracking [18], is that sensing data
processing-transmitting cycle. To support fault tolerance, a
or information are locally processed and stored at some nodes
location is often covered by several sensor nodes. To avoid
and may be queried later by some other nodes [19]. Sensor
duplicate sensing, while one node is performing the sensing-
data collection, nevertheless, requires all sensing data are
processing-transmitting cycle, other nodes are kept in the idle
correctly and accurately collected and forwarded to the base
state. In these cases, the energy consumption can be further
station, since the processing of these data needs the global
reduced by letting the idle nodes turn to dormant state, where
knowledge and is much more complex than that in other
most of the components (e.g., the wireless radio, sensing
applications like target tracking. This feature also prevents
component and processing unit) in a sensor node are turned off
using data aggregation/fusion techniques to enhance the net-
(instead of keeping in operation as in the idle state). When the
work performance. As a result, the major traffic in sensor data
next cycle comes (indicated by some mechanism such as an
collection is the reported data from each sensor to the base
internal timer), these components are then waken up back to
station. Such many-to-one traffic pattern, if not carefully
the normal (active) state again. Define duty-cycle as the ratio
handled, will cause high unbalanced and inefficient energy
between active period and the full active/dormant period. A
consumption in the whole network. As a concrete example, the
low duty-cycle WSN clearly enjoys a much longer lifetime
energy hole problem was reported and discussed in [16], where
for operation. This feature has been exploited in quite a few
sensor nodes close to the base station are depleted quickly
research works [13][14]. However, as will be shown later in
due to traffic relays and create a hole shape area that leaves
this paper, the new working pattern also brings challenges to
the remaining network disconnected from the base station.
the network design.
One possible solution to alleviate such issues is using mobile
Another special feature related to energy consumption is
entities that proactively move around and collect data in the
to control the transmission range of a sensor node. Previous
sensing field [20][21]. However, due to the harshness of the
researches have shown that one of the major energy costs
sensing environment as well as to minimize the disturbances,
in a sensor node comes from the wireless communication,
such a solution is often unfeasible in the context of sensor
where the main cost increases with the 2 to 6 power of the
data collection.
transmission distance [15][16]. As a result, the transmission
In addition, unlike other WSNs, the sensors used in sensor
range of a sensor node is often preferred to be adjustable and
data collection are often in great amount and of different
may be dynamically adjusted to achieve better performance
types [2][4][8][9], from traditional thermometer, hygrometer to
and lower energy consumption.
very specialized accelerometer and strain sensor. These sensors
work at their own sample rates specified by the applications,
B. Sensor Data Collection and the rates may be different from one to another, e.g. a
In a sensor data collection application, sensors are often typical sampling rate of an accelerometer is 100Hz, while
deployed at the locations specified by the application re- the frequency to sample temperature is much lower. Such
quirement to collect sensing data. The collected sensing data difference in turn leads to different transmission rates to
are then forwarded back to a central base station for further relay data from different type of sensors, which may further
processing. Traditionally, these sensors are connected by wires aggravate the unbalance of the traffic pattern and energy
which are used for data transmission and power supply. consumption and thus result in performance inefficiencies.
However, the wired approach is found to need great efforts for
deployment and maintenance. To avoid disturbing the ambient
environment, the deployment of the wires has to be carefully C. Taxonomy
designed. And a breakdown in any wire may cause the whole In practice, using WSNs for sensor data collection can
network out of service and enormous time and efforts may be be broken into three major stages, namely, the deployment
WANG et al.: DATA COLLECTION IN WIRELESS SENSOR NETWORKS: ISSUES, CHALLENGES AND APPROACHES 3
Area-Coverage
Deployment
Location-Coverage
(a)
Flooding-based
Sensor data Control message
collection dissemination
Gossiping-based
Fig. 1. Major stages and taxonomy of using wireless sensor networks for
sensor data collection.
2 Rc 2 R
k=3 2 = max(min( ,2 arccos ), min( ,2 arcsin c )),
3 2 Rs 3 2 Rs
d = d = min(R , 3R ).
1 2 c s
* ** ** ** ***
= ,
1
2 Rc
k=4 2 = min( , max(2 arcsin , )),
3 2 Rs 2
d = d = min( R , 3R ).
1 2 c s
* * * *** ***
1 = 3 ,
2 Rc
k=5 2 = + min( , max( 2 arcsin , )),
3 3 2 Rs 2
d = d = min( R , 3R ).
1 2 c s
* ** ** ** ***
1 = ,
2
k=6 2 = ,
3
d1 = d 2 = min( Rc , 3Rs ).
* * * * ***
* Conjectured Globally Optimal (proving or disproving is unknown) ** -Optimal (Conjectured Globally Optimal) *** Globally Optimal
Fig. 5. A complete set of optimal patterns achieving full coverage and k-connectivity with k = 1, . . . , 6, respectively (where the sensing range Rs is
invariant and the communication range Rc varies) [27]. These patterns are specific forms of the universally element pattern (shown in Fig. 4) defined by
expressions of 1 , 2 , (3 = 2 1 2 ), d1 and d2 on the right side of the above deployment patterns. Note that there are one and two vertical lines of
nodes for global connectivity in 1- and 2-connectivity patterns, respectively. They are not shown for the sake of simplicity.
all regular deployments1 . base station by one hop communication. The outer part is
Even with connectivity considered at the initial stage, as the part farthest to the base station, where no traffic from
time goes on, some sensor nodes may consume more en- other relay nodes needs to be relayed and the relay nodes
ergy than others due to more traffic relaying. This leads to only relay traffic directly from the sensor nodes. The medium
unbalanced energy costs and the network being partitioned part is the part remaining between the inner and outer parts,
prematurely with a great number of nodes still having a large where relay nodes need to relay traffics from both the sensor
amount of energy. To alleviate this problem, the authors of [32] nodes and the relay nodes one hop farther from the base
have proposed to deploy additional relay nodes so as to take station. Different relay node density is then derived for each
the burden of traffic relaying from sensor nodes and prolong of the three parts. Based on the results, the authors of [32]
the lifetime of the whole network. In addition, they proposed suggested to divide relay nodes into two portions. The first
a hybrid approach to deploy relay nodes while considering the portion is distributed proportionally to the derived density for
connectivity and network lifetime simultaneously. Specifically, each part, and the remaining relay nodes are then deployed to
the sensing field is divided into three parts based on the compensate for connectivity, i.e., to guarantee a relay node is
distance from the base station. The inner part is the part within the communication range of a sensor node with high
closest to the base station, where relay nodes can reach the probability.
Another typical coverage requirement for sensor data col- (a) (b)
lection is that sensors are manually attached to some speci-
Fig. 6. An example of relay node deployment: (a) connectivity-based
fied locations that are carefully chosen by applications. One deployment; (b) traffic-aware deployment [38]. s1 , s2 are sources with data
example is the project conducted on TsingMa Bridge in rate of 0.6 and 0.3. s0 is the base station. Given N relay nodes, by scheme
Hong Kong [23], where the bridge is equipped with a large (a) which only considers connectivity, nodes relaying the traffic from v to
s0 will die much earlier than those relaying from s1 and s2 to v, while by
number of accelerometers, thermometers and strain sensors to strategically deploying more nodes (N ) on section (v, s0 ) (from less busy
monitor its working conditions. Another recent project, which section (s2 , v)), the network lifetime is prolonged.
is still ongoing, is on the Guangzhou New TV Tower [24]
in Guangzhou, China, where the tower will be attached with
similar sensors for real-time monitoring and analyzing. In
these systems, sensors are deployed at specified locations to Later, to enable fault-tolerance, a series of approximation
fulfill the civil engineering requirements. Since the locations algorithms [35][36][37] have been proposed to place minimum
selected by applications are not necessarily considering the number of relay nodes while achieving k-connectivity with
networking requirements such as connectivity and energy effi- k 2. The core idea in these papers is to compute k-connected
ciency, additional relay nodes are often placed in the sensing spanning subgraph from the full connected graph containing
field to match these requirements and facilitate sensing data all sensor nodes as vertices. In addition, the edge between each
deliveries from sensor nodes to the base station. Yet an issue pair of vertices is assigned a weight equal to the minimum
is how many relay nodes are required and where to deploy number of relay nodes required to make any two neighboring
them. nodes on the edge within each others wireless communication
In [33], the authors modeled the relay node placement prob- range. Besides, some relay nodes are duplicated to avoid using
lem for connectivity as Steiner Minimum Tree with Minimum sensor nodes to relay traffic. And redundant relay nodes are
number of Steiner Points and bounded length (SMT-MSP) removed to reduce the costs.
problem [34] and proposed a 3-approximation algorithm. Recently, it is noticed that for sensor data collection applica-
Specifically, considering in a graph, sensor nodes are the given tions, only considering connectivity for relay node deployment
vertices and relay nodes are steiner points2 , then the problem may not always lead to the best performance in terms of the
to use minimum number of relay nodes to connect all sensor energy efficiency and network lifetime [38]. For example, in
nodes becomes to use minimum number of steiner points to Fig. 6, by connectivity-based deployment (Fig. 6a), which is
connect all the given vertices, where the constraint is that the traffic oblivious, the optimal solution to maximize the network
edge length can only be less than or equal to the wireless lifetime is to evenly distribute relay nodes along the minimum
communication range. And the main idea of the proposed 3- steiner tree topology. However, given the sensing data traffic
approximation algorithm is to conduct the minimum spanning from each sensor node to the base station, a better solution
tree algorithm on the given vertices and insert an intermediate that considers such traffic patterns and moves some relay
stage when the remaining edges between the given vertices are nodes from the low traffic edge to the high one (Fig. 6b) can
longer than the communication range. In the inserted stage, further extend the network lifetime with more efficient energy
a steiner point (with three edges) is added to connect three utilization.
connected components into one if the steiner point can connect
Motivated by this, the authors of [38] proposed a traffic-
each component with one edge whose length is less than or
aware deployment strategy. In particular, given the number
equal to the communication range. In addition, if an edge
of relay nodes and the average sensing data rate at each
longer than the communication range being selected by the
sensor node, the authors modeled the traffic-aware deployment
algorithm, minimum number of steiner points are also added
problem as a generalized Euclidian Steiner Minimum Tree
on the edge to break it into smaller ones with length less than
problem (ESMT) [39], where sensor nodes are vertices and a
or equal to the communication range.
number of steiner points are introduced so as to minimize the
2 The concept of steiner points originates from the Steiner Minimum Tree
total length of the edges weighted by the rate of the aggregate
problem. To minimize the total length of the edges that connect some given traffic flowing through each edge. The authors proposed a
points, additional points may be introduced as intermediate points to connect hybrid algorithm to compute the number of required steiner
other points and reduce the total length of the used edges. Here, the steiner points and their positions. And on each edge, a number of
points serve for similar purposes such as optimizing the total length of the
used edges or reducing the length of each single edge to meet the edge length relay nodes can then be assigned proportionally to the amount
constraint. of traffic passing through the edge.
WANG et al.: DATA COLLECTION IN WIRELESS SENSOR NETWORKS: ISSUES, CHALLENGES AND APPROACHES 7
Rx Tx sleep Rx Tx
Rx Tx sleep Rx Tx
Rx Tx sleep Rx Tx
Rx Tx sleep Rx Tx
u0 u0
u0
u1 u1 uk+r+1
ut uk+1
uk+r+p+1
ut+1 ut+k+1 uk+r+2
uk ut+k ut+k+r uk uk+r uk+r+p uk+r+p+q
(a) linear graph (b) simple binary tree (c) another tree
as energy efficiency as well as reliability. Fig. 7 illustrates recoveries. Specifically, each node keeps tracking sequence
a generic architecture for data delivery approaches. To collect numbers of packets it receives from a source node. A gap
data from sensor nodes, two mandatory components are topol- in the sequence numbers of received packets indicates packet
ogy maintenance and transmission scheduler. The topology loss. The sequence number of the missing packet and its
maintenance component constructs a connected topology and source node ID are then stored in a missing list and piggy-
maintains the connectivity during network dynamics and link backed when a packet is forwarded. The node that previously
quality variations. The transmission scheduler then schedules relayed the missing packet will then schedule a retransmission
packet transmissions based on the information from other when it overhears the piggy-backed information. And to afford
components so as to reduce collisions and energy wastes. the retransmission in the hop-by-hop recovery, each newly
Given different QoS requirements such as throughput, latency received packet is cached for some short period. However,
and reliability, different optional components may be added. if heavy packet loss happens or the network topology changes
Yet a more challenging issue is that sensor nodes are operating due to dynamics such as link quality variations, the hop-by-hop
autonomously, thus the transmission scheduling algorithm recovery may fail due to the temporary overflow of missing
needs to be designed to work in a distributed manner. In lists or losing connections to previous forwarders. Thus an
the following subsections, we will discuss recently proposed end-to-end recovery scheme is necessary to such situations. In
approaches by the categorization based on their major QoS particular, if a node overhears a piggy-backed missing list and
considerations. finds some missing packets in the list sharing the same sources
with those packets in its own packet cache, it then adds these
A. Reliability packets into its own missing list and goes on to piggy-back
One of the prior works [17] designed a WSN system their information in its transmissions. By this means, missing
named Wisden that adopted a data delivery approach with packet information will trace back hop-by-hop until reaching
a stress on the reliability and exploited a hybrid scheme for the sources. The sources will then re-send the packets and
reliable data deliveries using both hop-by-hop and end-to-end finish the circle of end-to-end recoveries.
WANG et al.: DATA COLLECTION IN WIRELESS SENSOR NETWORKS: ISSUES, CHALLENGES AND APPROACHES 9
Time
sensors
Channel
offset
funnel pure CSMA ...
hybrid TDMA/CSMA
intensity
region choke point
Time slot Superframe
sink
Fig. 11. An illustration of the matrix that divides the wireless space along
Fig. 10. An illustration of funneling-MAC [44]. both the time and channel dimensions [45]. The shaded slots are assigned to
links by the centralized schedule. The schedule repeats as superframes and
can be updated at the beginning of a superframe.
B. Latency
Since wireless communications consume a significant por- sleep to save energy, or perform only one task of either sending
tion of energy budgets on sensor nodes, MAC protocols have or receiving. Given each sensor node has one packet to report
been proposed to reduce idle listenings and turn the radio to the base station during each round, for a linear topology
of the sensor node to sleep mode to save more energy. as shown in Fig. 9(a), one optimal schedule to minimize the
Such general designs, however, if being used for sensor data time duration for one round data collection is to let the even-
collection without careful consideration, may introduce extra level links and odd-level links be active alternatively, which
latencies and even more energy costs. For example, if the next- is called wavelike forwarding. If there is any branch on the
hop neighbor is still sleeping, a node has to wait some extra topology, as shown in Fig. 9(b), the optimal schedule can
time (called sleeping latency) until the neighbor turns active. be achieved by letting the one path (e.g. ut+k u0 ) does
On the other hand, to reduce sleeping latency, one approach wavelike forwarding first, then after the branch (ut+k ut+1 )
is to let a node overhear for possible transmissions so as to of the path is finished, the remaining part together with the
temporarily increase its active duration for potential incoming other branch (ut+k+r ut+k+1 ) will then form a new path
packets. However, this would make all nodes that overhear a and go on to do wavelike forwarding. In general, for any tree
transmission spend extra time being active and consume more topology, an optimal schedule can be achieved by recursively
energy while only several of them really participate in the applying wavelike forwarding to each branch. Let N (u) denote
traffic relaying. the total number of nodes in the tree rooted at u. The authors
To reduce sleeping latency as well as energy costs, the showed that the time duration for all packets from the tree
authors of [42] proposed DMAC to enhance sensor data rooted at u to be forwarded up is 2N (u) 1. Furthermore,
collection. The main idea is shown in Fig. 8. Based on the since the base station does not need to forward packets, it then
network topology, sensor nodes along a delivery path from a can collect packets from two subtrees alternatively at the same
source node to the base station will turn to receiving, sending time, e.g., in Fig. 9(c), if u0 is the base station, link u1 u0 and
and sleep mode one after one in a sequential order. If there uk+1 u0 can be active alternatively to send packets to u0 . Thus
are more packets to send, a More Data Flag is piggy-backed the optimal schedule can be achieved by letting all the subtrees
with each previous packet to indicate the next transmission. of the base station do wavelike forwarding simultaneously and
The receiver then turns back to receiving mode, instead of the base station collect packets from its children alternatively
sleep mode, to listen to the following packet. For the case in descending order of subtree size. The time duration for one
that a receiver has more than one sender, on receiving a round data collection of the whole network is then derived as
packet from one sender, the receiver predicts that there are max(2N (u1 ) 1, N (u0 ) 1), where u0 is the base station
packets from other senders and turns to receiving mode. And and u1 is the child rooting the largest subtree.
if nothing is heard, it turns back to sleep mode. In addition, Recently, it is noticed that a single piece of sensing data
within a transmission time slot, a contention-based mechanism may be quite small and multiple pieces of data can still
(CSMA) is used for several senders to compete for one fit and be transmitted in one packet so as to reduce the
receiver, and another small time slot is reserved after each transmission overhead [46]. Such batch transmission is differ-
transmission slot for the failed sender to send a small More ent from traditional data aggregation/fusion techniques, where
To Send packet, so as to make the receiver listen to its re- multiple data are combined into a smaller size at the price of
transmission instead of turning to sleep mode. losing original information. Since it is quite time-consuming
Another work named STREE was proposed in [43], which to wait for enough data from one sensor node to form a
also targets on minimizing latency and reducing energy costs. packet and thus increases the latency, the authors of [46]
By assuming global synchronization, time slot is defined proposed an approach named TIGRA to batch small sensing
to be the duration for successfully transmitting a maximum data from different sensor nodes into packets while these data
transmission unit. Within one time unit, a sensor node can are gathered along the collection tree. TIGRA uses a gathering
10 SUBMITTED TO IEEE COMMUNICATIONS SURVEYS & TUTORIALS FOR POSSIBLE PUBLICATION
Gathering Network Synchronization Loss Congestion Sleep Rate Source Main QoS
Approach Topology Recovery Control Mode Control Rate Consideration
Wisden [17] Tree Not required Hop-by-hop No No No Any Reliability
End-to-end
DMAC [42] Tree Local Link layer No Yes No Any Latency
STREE [43] Tree Global Link layer No Yes No Single Latency
TIGRA [46] Tree Global Link layer No Yes No Single Latency
Funneling- Any Area covered by Link layer Partial No Partial Any Throughput
MAC [44] Base Station
Congestion Tree Not required Link layer Yes No Yes Single Throughput
Control and Fairness
Fairness [47]
Dozer [48] Tree Local Hop-by-hop Yes Yes Yes Single Energy
Consumption
TSMP [45] Any Global Hop-by-hop Yes Yes Yes Any Energy
Consumption
TABLE II
D IFFERENT DATA DELIVERY APPROACHES .
consumptions. On the other hand, in gossiping, received mes- parent set are those that the node depends on to receive the
sages are only forwarded with some pre-defined probability4 . first forwarded message; the neighbors in the child set are
By theoretical analysis, a threshold probability exists to cover those that depends on the node to receive the first forwarded
the whole network with high probability for a given topology message; and the remaining neighbors are in the sibling set.
and wireless communication loss. Thus by setting the pre- Given an expected network delivery ratio , the required per-
defined probability just above the threshold, a great amount of hop delivery ratio hop can be estimated by the equation
duplicate messages can be avoided. Nevertheless, in practice,
(hop ) = , (8)
the pre-defined probability is very sensitive to the changes of
the network topology and wireless communication loss, which where is the estimated diameter of the network. Thus for
often leads to unsatisfactory reliability for message delivery. a node with K neighbors in its parent set, the required
Ideally, if without wireless communication loss, every sen- forwarding probability (prequired ) for each parent neighbor
sor node needs to receive and forward the broadcast message can be estimated using the equation
at most once. Thus though their basic forms are known ineffi-
cient, significant efforts have been made toward enhancing the (1 prequired )K < (1 hop ) . (9)
efficiency of the flooding or gossiping, while retaining their Each node then collects prequired from all its child neighbors
robustness in the presence of error-prone transmissions. and uses the maximum as its own forwarding probability.
Also, the three sets and prequired on each node are computed
B. Different Enhancements periodically based on recent message forwarding history, so
as to make the forwarding probability adaptive to network
The author of [54] proposed a protocol named LM-PB dynamics (e.g., node failure).
(Lifetime Maximizing Protocol for Broadcasting) that uses a
timing heuristic to reduce redundant message forwardings in A more recent work is RBP (Robust Broadcast Propaga-
the basic flooding as well as to extend the network lifetime. tion) [56], which extends the flooding-based approach and
To suppress duplicate forwardings, a node only schedules targets for high reliability broadcast. It lets each node do
a forwarding when it receives a broadcast message for the forwarding when receiving the broadcast message for the first
first time. Also a short latency named FDL (Forwarding-node time. Then by overhearing, a node can quickly identify the
Declaration Latency) is introduced before a node forwards percentage of its neighbors that have successfully received the
a message, and if a forwarding for the same message is message. Based on this percentage and the local density (the
overheard, the node cancels its forwarding to further reduce number of neighbors), a node determines whether to retransmit
duplicate forwardings. To extend the network lifetime, for a the message, where the principle is that for a low density, the
node u, its FDL is computed based on its residual energy message will be retransmitted until a high receiving percentage
Et (u), specifically, by the following equation is achieved, while for a high density, a moderate percentage
is enough. To counter wireless loss, explicit ACKs will be
Et (u) sent to nodes that are heard rebroadcasting a message several
F DL(u) = T (1 ) + tD (u) , (7)
Eref (u) times. In addition, if a node finds itself highly depending on
where T is a timing constant, tD (u) is the maximum delay another node to receive broadcast messages, the link between
related to signal processing, transceiver switching and so forth them is deemed as an important link. The downstream node
at the potential forwarding nodes other than u, and Eref will then notify its upstream node to increase the number
is the maximum energy capacity of a battery. As a result, of retransmissions to improve the probabilities of message
each time that several neighboring nodes receive a broadcast deliveries.
message, only the node with the highest residual energy and To enhance reliability one step further, the authors of [57]
thus the shortest FDL will forward the message. Other nodes proposed an approach named Trickle with perfect broadcast
by overhearing will suppress their own forwardings to save reliability (i.e. all sensor nodes receive the broadcast message)
the energy so that the network lifetime is extended. for code redistribution and update propagation. To keep codes
Smart Gossip [55], on the other hand, extends the basic updated, each sensor node transmits a summary of its code
gossip to minimize forwarding overhead while still keeping if it has not heard a few other sensor nodes do so. When
reasonable reliability. Different from the basic gossip that uses receiving a code summary from its neighbor, a node compares
the same static forwarding probability for all sensor nodes, the received summary with its own. If the neighbors summary
the authors proposed to dynamically adapt the forwarding is old, the node then sends its new code to the neighbor. And if
probability on each node to its local topology and the origina- the neighbors summary is newer, the node retransmits its own
tor of the broadcast message. Specifically, based on where summary so as to trigger the neighbor to send the new code.
the forwarded broadcast message comes from and who is Otherwise, a node counts the number of summaries received
its last forwarder, a nodes neighbors are divided into three within one time interval, if the number exceeds a threshold,
sets, namely, parent, child and sibling. The neighbors in the the node suppresses its own transmission so as to save energy.
And to balance energy costs, within each time interval, a node
4 In wired networks such as Internet, gossiping was originally designed to
randomly picks its summary transmission time by following a
let a received message be forwarded to a randomly selected neighboring node.
Due to the broadcast nature of wireless communication, gossiping in WSNs uniform distribution. Moreover, the length of a time interval
is eventually evolved into the version mentioned above. is set to a lower bound when a summary of new codes is
WANG et al.: DATA COLLECTION IN WIRELESS SENSOR NETWORKS: ISSUES, CHALLENGES AND APPROACHES 13
received, so as to accelerate code updates. After that, the length the same message to these neighbors unless after a timeout,
of each next interval will be the double of the current one until the receipts of the message on these neighbors are still not
it reaches to an upper bound, which further helps to reduce confirmed by overhearing.
energy costs.
D. Summary
C. Integrated with Duty-Cycle The control message dissemination mechanisms discussed
The above approaches, though are designed with different in this section are summarized in Tab. III. Although these
stress, such as reducing energy consumption or assuring high works are either based on the flooding or the gossiping, their
reliability, all take an implicit assumption that all network enhancements cover a broad spectrum. Due to the broadcast
nodes are active during the broadcast process (referred to nature of the wireless communication, messages forwarded
as all-node-active assumption). This assumption is valid for by a sensor node may be received by multiple nodes. The
wired networks and for many conventional multi-hop wireless topology information thus can be exploited to avoid duplicate
networks. It however may fail to capture the uniqueness of messages being sent to the same node. In addition, on a
the energy-constrained applications in wireless sensor net- topology there may be critical positions that other nodes rely
works. In these applications, sensor nodes are often alternating on to receive messages. Carefully considering these positions
between dormant and active states [13][14]; in the former, may greatly increase the reliability of the whole dissemination.
they go to sleep and thus consume little energy, while in the Besides, to extend the network lifetime, only reducing the
latter, they actively perform sensing tasks and communica- total message costs may not always be effective. The energy
tions, consuming significantly more energy (e.g., 56 mW for consumption also needs to be balanced among different nodes
IEEE802.15.4 radio plus 6 to 15 mW for Atmel ATmega 128L so as to avoid some nodes to be over-burdened and out of
micro-controller and possible sensing devices on a MicaZ energy too early. Also, there is an implicit tradeoff among the
mote). Define duty-cycle as the ratio between active period reliability, delay and message costs, since higher reliability or
and the full active/dormant period. A low duty-cycle WSN lower delay may introduce more message costs, and sometimes
clearly has a much longer lifetime for operation, but breaks the higher reliability also causes more time for the dissemination
all-node-active assumption. More importantly, the duty-cycles process to be finished.
are often optimized for the given application or deployment, Another observation is that although many mechanisms have
and a broadcast service accommodating the schedules is thus been proposed, most of them did not consider the scenario of
expected for cross-layer optimization of the overall system. low duty-cycle WSNs except for RBS [58]. Along this new
To accommodate low duty-cycle in WSNs, the authors direction, many efforts are still required. First, theoretical mod-
of [58] proposed RBS (Reliable Broadcast Service) to dy- els are expected to be introduced to more clearly understand
namically schedule message forwardings by adapting to its how duty-cycle and the active-dormant patterns would affect
neighbors active-dormant patterns and forwarding schedules. the message dissemination. Also, RBS is proposed to achieve
The core idea is to let a node only issues a message forwarding perfect broadcast reliability. This however is not mandatory
if it finds otherwise some neighbor may miss the message and in some scenarios, where it may be preferred to sacrifice a
can not be contacted until next time the neighbor becomes small portion of reliability so as to cut off more message
active. In addition, when a broadcast message is received costs. For such scenarios, a gossiping-based approach may be
for the first time, a node also schedules a transmission for more favored for the system design. Moreover, in a low duty-
the message so that the message can be quickly delivered cycle WSN, although the topology of active nodes changes
among its active neighbors. Also, when forwarding a broadcast frequently, the physical topology containing all nodes is rel-
message, a node piggy-backs those neighbors that it knows atively stable. Thus how to apply topology-aware techniques
have received the message. Then by overhearing, other nodes such as those used in [55][56] to message dissemination in
can quickly know which neighbor has received the message low duty-cycle WSNs is also an interesting topic.
even if some forwarded messages have been missed due to
being dormant or wireless loss. To further reduce energy costs, VI. C ONCLUSION
after a forwarding, a node assumes all its active neighbors have Wireless sensor networks have been applied to many appli-
received the forwarded message and will not try to forward cations since emerging. And sensor data collection is one of
14 SUBMITTED TO IEEE COMMUNICATIONS SURVEYS & TUTORIALS FOR POSSIBLE PUBLICATION
the most important applications among them. In a WSN for control message dissemination, and therefore needs to be
sensor data collection, sensed data are continuously collected specifically considered during the performance optimization;
at all or some of the sensor nodes and forwarded through low duty-cycle is considered as an effective way to extend
wireless communications to a central base station for further the network lifetime of a WSN, yet an interesting topic is
processing. This makes it different from other applications of to explore how its utilization in networked wireless sensor
WSNs as well as traditional sensor data collection using wired data collection interacts with other design issues; and another
networks. In this paper, we presented an in-depth survey on direction is to further optimize the system performance by
recent advances in networked wireless sensor data collection. combining the designs of the deployment, data delivery and
Specifically, we first highlighted the special features of sensor control message dissemination stages together.
data collection in WSNs, by comparing it with both wired
sensor data collection networks and other applications using
WSNs. Bearing these features in mind, we discussed issues R EFERENCES
on using WSNs for sensor data collection, which in general [1] M. Tubaishat and S. Madria, Sensor Networks: an Overview, IEEE
can be broken into the deployment stage, the control message Potentials, vol. 22, no. 2, pp. 2023, April/May 2003.
dissemination stage and the data delivery stage. [2] G. Tolle, J. Polastre, R. Szewczyk, D. Culler, N. Turner, K. Tu,
S. Burgess, T. Dawson, P. Buonadonna, D. Gay, and W. Hong, A
In the deployment stage, based on whether the coverage Macroscope in the Redwoods, in ACM SenSys, 2005.
requirement is area-coverage or location-coverage, different [3] L. Selavo, A. Wood, Q. Cao, T. Sookoor, H. Liu, A. Srinivasan, Y. Wu,
strategies have been proposed to achieve different levels W. Kang, J. Stankovic, D. Young, and J. Porter, LUSTER: Wireless
Sensor Network for Environmental Research, in ACM SenSys, 2007.
of coverage and network connectivity while minimizing the [4] G. Barrenetxea, F. Ingelrest, G. Schaefer, and M. Vetterli, SensorScope:
required node number or maximizing the network lifetime, Out-of-the-Box Environmental Monitoring, in ACM/IEEE IPSN, 2008.
according to the physical limits (such as the sensing range [5] G. WernerAllen, K. Lorincz, J. Johnson, J. Lees, and M. Welsh, Fidelity
and Yield in a Volcano Monitoring Sensor Network, in USENIX OSDI,
and communication range) of the sensor nodes. 2006.
In the data delivery stage, different approaches have been [6] W.-Z. Song, R. Huang, M. Xu, A. Ma, B. Shirazi, and R. LaHusen,
proposed to deliver sensing data from sensor nodes to the Air-dropped Sensor Network for Real-time High-fidelity Volcano Mon-
itoring, in ACM MobiSys, 2009.
base station and optimize their own main QoS consider- [7] Y. Kim, T. Schmid, Z. M. Charbiwala, J. Friedman, and M. B. Srivastava,
ations as well as balance the tradeoffs among other QoS NAWMS: Nonintrusive Autonomous Water Monitoring System, in
requirements, such as improving throughput while considering ACM SenSys, 2008.
[8] S. Kim, S. Pakzad, D. Culler, J. Demmel, G. Fenves, S. Glaser, and
rate/congestion control and fairness, balancing energy con- M. Turon, Health Monitoring of Civil Infrastructures Using Wireless
sumption and latency, or enforcing better reliability with more Sensor Networks, in ACM/IEEE IPSN, 2007.
transmission overheads. [9] M. Ceriotti, L. Mottola, G. P. Picco, A. L. Murphy, S. Guna, M. Corra,
M. Pozzi, D. Zonta, and P. Zanon, Monitoring Heritage Buildings
The control message dissemination stage, on the other with Wireless Sensor Networks: The Torre Aquila Deployment, in
hand, strives to reliably disseminate control messages over the ACM/IEEE IPSN, 2009.
network with low time and transmission costs, where different [10] C. Hartung, R. Han, C. Seielstad, and S. Holbrook, FireWxNet: A
mechanisms such as forwarding based on the residual energy Multi-Tiered Portable Wireless System for Monitoring Weather Condi-
tions in Wildland Fire Environments, in ACM MobiSys, 2006.
of sensor nodes or the network topology information have been [11] N. A. Pantazis and D. D. Vergados, A Survey on Power Control
used to enhance the basic flooding or gossiping to achieve Issues in Wireless Sensor Networks, IEEE Communications Surveys
good balances among the reliability, delay and message costs. and Tutorials, vol. 9, no. 4, pp. 86107, 2007.
[12] R. Rajagopalan and P. K. Varshney, Data-Aggregation Techniques
Although these stages have their own issues to address, in Sensor Networks: A Survey, IEEE Communications Surveys and
it has been shown that by considering them jointly, better Tutorials, vol. 8, no. 4, pp. 4863, 2006.
performance can be achieved. One example is to be aware of [13] Y. Gu, J. Hwang, T. He, and D. H.-C. Du, Sense: A Unified Asym-
metric Sensing Coverage Architecture for Wireless Sensor Networks,
the traffic pattern in the data delivery stage while designing the in IEEE ICDCS, 2007.
strategy for the deployment stage as discussed in Section III. [14] X. Wang, G. Xing, Y. Zhang, C. Lu, R. Pless, and C. Gill, Integrated
Other examples include to consider the multi-path data de- Coverage and Connectivity Configuration in Wireless Sensor Networks,
in ACM SenSys, 2003.
livery enabled by the deployment stage (with k connectivity) [15] W. B. Heinzelman, A. P. Chandrakasan, and H. Balakrishnan, An
as mentioned in Section IV, and to support duty-cycle in the Application-Specific Protocol Architecture for Wireless Microsensor
control message dissemination stage in a way similar to the Networks, IEEE Transactions on Wireless Communications, vol. 1,
no. 4, pp. 660670, October 2002.
data delivery stage as investigated in Section V. [16] S. Olariu and I. Stojmenovic, Design Guidelines for Maximizing
In the future, many issues still need to be further explored Lifetime and Avoiding Energy Holes in Sensor Networks with Uniform
and possibly considered jointly so as to lead to a more Distribution and Uniform Reporting, in IEEE INFOCOM, 2006.
[17] N. Xu, S. Rangwala, K. K. Chintalapudi, D. Ganesan, A. Broad,
efficient and long-lifetime sensor data collection system. Some R. Govindan, and D. Estrin, A Wireless Sensor Network For Structural
of the directions are to consider the special many-to-one Monitoring, in ACM SenSys, 2004.
traffic pattern in the data delivery stage as well as the one- [18] C. Gui and P. Mohapatra, Power Conservation and Quality of Surveil-
to-many traffic pattern in the control message dissemination lance in Target Tracking Sensor Networks, in ACM MobiCom, 2004.
[19] C. Intanagonwiwat, R. Govindan, and D. Estrin, Directed Diffusion: A
stage; also, the sensing environment in practice may be more Scalable and Robust Communication Paradigm for Sensor Networks,
complicated than a regular 2-D sensing field, where obstacles in ACM MobiCom, 2000.
and elevation differences may reduce the capacity of wireless [20] R. C. Shah, S. Roy, S. Jain, and W. Brunette, Data MULEs: Mod-
eling a Three-tier Architecture for Sparse Sensor Networks, in IEEE
communication, resulting in various deployment designs and International Workshop on Sensor Network Protocols and Applications,
thus complicated network topologies for data delivery and 2003.
WANG et al.: DATA COLLECTION IN WIRELESS SENSOR NETWORKS: ISSUES, CHALLENGES AND APPROACHES 15
[21] M. Ma and Y. Yang, SenCar: An Energy-Efficient Data Gathering Cost Pipe Network Interconnecting One Sink and Many Sources, SIAM
Mechanism for Large-Scale Multihop Sensor Networks, IEEE Trans- Journal of Optimization, vol. 10, no. 1, pp. 2242, October 1999.
actions on Parallel and Distributed Systems, vol. 18, no. 10, pp. 1476 [40] N. Aitsaadi, N. Achir, K. Boussetta, and G. Pujolle, Potential Field
1488, October 2007. Approach to Ensure Connectivity and Differentiated Detection in WSN
[22] S. Meguerdichian, F. Koushanfar, M. Potkonjak, and M. B. Srivastava, Deployment, in IEEE International Conference on Communications,
Coverage Problems in Wireless Ad-hoc Sensor Networks, in IEEE 2009.
INFOCOM, 2001. [41] , Multi-Objectives WSN Deployment: Quality of Monitoring,
[23] J. Ko, Y. Ni, H. Zhou, J. Wang, and X. Zhou, Investigation Concerning Connectivity and Lifetime, in IEEE International Conference on Com-
Structural Health Monitoring of an Instrumented Cable-Stayed Bridge, munications, 2010.
Structure and Infrastructure Engineering, 2008. [42] G. Lu, B. Krishnamachari, and C. S. Raghavendra, An Adaptive
[24] Structural Health Monitoring for Guangzhou New TV Energy-Efficient and Low-Latency MAC for Data Gathering in Wireless
Tower using Sensor Networks. [Online]. Available: Sensor Networks, in IEEE IPDPS, 2004.
http://www.cse.polyu.edu.hk/benchmark/ [43] W.-Z. Song, F. Yuan, and R. LaHusen, Time-Optimum Packet Schedul-
[25] R. Iyengar, K. Kar, and S. Banerjee, Low-coordination Topologies for ing for Many-to-One Routing in Wireless Sensor Networks, in IEEE
Redundancy in Sensor Networks, in ACM MobiHoc, 2005. MASS, 2006.
[44] G.-S. Ahn, E. Miluzzo, A. T. Campbell, S. G. Hong, and F. Cuomo,
[26] X. Bai, Z. Yun, D. Xuan, T. H. Lai, and W. Jia, Deploying Four-
Funneling-MAC: A Localized, Sink-Oriented MAC For Boosting Fi-
Connectivity and Full-Coverage Wireless Sensor Networks, in IEEE
delity in Sensor Networks, in ACM SenSys, 2006.
INFOCOM, 2008.
[45] K. S. J. Pister and L. Doherty, TSMP: Time Synchronized Mesh
[27] X. Bai, D. Xuan, Z. Yun, T. H. Lai, and W. Jia, Complete Optimal Protocol, in IASTED International Symposium on Distributed Sensor
Deployment Patterns for Full-Coverage and k-Connectivity (k 6) Networks (DSN), 2008.
Wireless Sensor Networks, in ACM MobiHoc, 2008. [46] L. Paradis and Q. Han, TIGRA: Timely Sensor Data Collection Using
[28] H. Zhang and J. Hou, On Deriving the Upper Bound of -Lifetime for Distributed Graph Coloring, in IEEE PerCom, 2008.
Large Sensor Networks, in ACM MobiCom, 2004.
[29] H. Zhang and J. C. Hou, Is Deterministic Deployment Worse than Ran- [47] C. T. Ee and R. Bajcsy, Congestion Control and Fairness for Many-to-
dom Deployment for Wireless Sensor Networks? in IEEE INFOCOM, One Routing in Sensor Networks, in ACM SenSys, 2004.
2006. [48] N. Burri, P. von Rickenbach, and R. Wattenhofer, Dozer: Ultra-Low
[30] G. Xing, X. Wang, Y. Zhang, C. Lu, R. Pless, and C. Gill, Integrated Power Data Gathering in Sensor Networks, in ACM/IEEE IPSN, 2007.
Coverage and Connectivity Configuration in Wireless Sensor Networks, [49] S. Lindsey, C. Raghavendra, and K. M. Sivalingam, Data Gathering
ACM Transactions on Sensor Networks, vol. 1, no. 1, pp. 3672, 2005. Algorithms in Sensor Networks using Energy Metrics, IEEE Transac-
tions on Parallel and Distributed Systems, vol. 13, no. 9, pp. 924935,
[31] X. Bai, S. Kumar, D. Xuan, Z. Yun, and T. H. Lai, Deploying Wireless
September 2002.
Sensors to Achieve Both Coverage and Connectivity, in ACM MobiHoc,
[50] S. Deering, Scalable Multicast Routing Protocol, Ph.D. dissertation,
2006.
Stanford University, 1989.
[32] K. Xu, H. Hassanein, and G. Takahara, Relay Node Deployment [51] S. Floyd, V. Jacobson, C. Liu, S. McCanne, and L. Zhang, A Reliable
Strategies in Heterogeneous Wireless Sensor Networks: Multiple-Hop Multicast Framework for Light-weight Sessions and Application Level
Communication Case, in IEEE SECON, 2005. Framing, IEEE/ACM Transactions on Networking, vol. 5, no. 6, pp.
[33] X. Cheng, D. Z. Du, L. Wang, and B. Xu, Relay Sensor Placement in 784803, 1997.
Wireless Sensor Networks, Spinger Wireless Networks, vol. 14, no. 3, [52] S.-Y. Ni, Y.-C. Tseng, Y.-S. Chen, and J.-P. Sheu, The Broadcast Storm
pp. 347355, 2008. Problem in a Mobile Ad Hoc Network, in ACM MobiCom, 1999.
[34] G. Lin and G. Xue, Steiner Tree Problem with Minimum Number [53] K. Akkaya and M. Younis, A Survey on Routing Protocols for Wireless
of Steiner Points and Bounded Edge-Length, Information Processing Sensor Networks, Ad Hoc Networks, vol. 3, no. 3, pp. 325349, May
Letters, vol. 69, pp. 5357, 1999. 2005.
[35] J. L. Bredin, E. D. Demaine, M. Hajiaghayi, and D. Rus, Deploying [54] X. Guo, Broadcasting for Network Lifetime Maximization in Wireless
Sensor Networks with Guaranteed Capacity and Fault Tolerance, in Sensor Networks, in IEEE SECON, 2004.
ACM MobiHoc, 2005. [55] P. Kyasanur, R. R. Choudhury, and I. Gupta, Smart Gossip: An
[36] A. Kashyap, S. Khuller, and M. Shayman, Relay Placement for Higher Adaptive Gossip-based Broadcasting Service for Sensor Networks, in
Order Connectivity in Wireless Sensor Networks, in IEEE INFOCOM, IEEE MASS, 2006.
2006. [56] F. Stann, J. Heidemann, R. Shroff, and M. Z. Murtaza, RBP: Robust
[37] W. Zhang, G. Xue, and S. Misra, Fault-Tolerant Relay Node Placement Broadcast Propagation in Wireless Networks, in ACM SenSys, 2006.
in Wireless Sensor Networks: Problems and Algorithms, in IEEE [57] P. Levis, N. Patel, D. Culler, and S. Shenker, Trickle: A Self-Regulating
INFOCOM, 2007. Algorithm for Code Propagation and Maintenance in Wireless Sensor
[38] F. Wang, D. Wang, and J. Liu, Traffic-Aware Relay Node Deployment Networks, in USENIX NSDI, 2004.
for Data Collection in Wireless Sensor Networks, in IEEE SECON, [58] F. Wang and J. Liu, RBS: A Reliable Broadcast Service for Large-Scale
2009. Low Duty-Cycled Wireless Sensor Networks, in IEEE International
[39] G. Xue, T. P. Lillys, and D. E. Dougherty, Computing the Minimum Conference on Communications, 2008.