Prediction-Based Sensor Nodes
Prediction-Based Sensor Nodes
Prediction-Based Sensor Nodes
Computer Communications
journal homepage: www.elsevier.com/locate/comcom
a r t i c l e i n f o a b s t r a c t
Article history: In many environmental monitoring applications, since the data periodically sensed by wireless sensor
Received 26 January 2010 networks usually are of high temporal redundancy, prediction-based data aggregation is an important
Received in revised form 30 September approach for reducing redundant data communications and saving sensor nodes energy. In this paper,
2010
a novel prediction-based data collection protocol is proposed, in which a double-queue mechanism is
Accepted 5 October 2010
Available online 12 October 2010
designed to synchronize the prediction data series of the sensor node and the sink node, and therefore,
the cumulative error of continuous predictions is reduced. Based on this protocol, three prediction-based
data aggregation approaches are proposed: Grey-Model-based Data Aggregation (GMDA), Kalman-Filter-
Keywords:
Wireless sensor networks
based Data Aggregation (KFDA) and Combined Grey model and Kalman Filter Data Aggregation
Data collection protocol (CoGKDA). By integrating the merit of grey model in quick modeling with the advantage of Kalman Filter
Data aggregation in processing data series noise, CoGKDA presents high prediction accuracy, low communication overhead,
Grey model and relative low computational complexity. Experiments are carried out based on a real data set of a tem-
Kalman Filter perature and humidity monitoring application in a granary. The results show that the proposed
approaches signicantly reduce communication redundancy and evidently improve the lifetime of wire-
less sensor networks.
2010 Elsevier B.V. All rights reserved.
0140-3664/$ - see front matter 2010 Elsevier B.V. All rights reserved.
doi:10.1016/j.comcom.2010.10.003
794 G. Wei et al. / Computer Communications 34 (2011) 793802
some environmental sensing applications. For example, granary Anastasi et al. [4] presents a systematic and comprehensive taxon-
monitoring must continuously gather temperature and humidity omy of the energy conservation scheme in wireless sensor net-
data from every sensor node with relatively small tolerated-error. works. Prediction-based data aggregation approaches are
Xin et al. [21] analyze some more complicated aggregation ap- overviewed and classied into three types: stochastic approaches,
proaches, including data mining and multiple-source-queries rout- time series forecasting, and algorithmic approaches.
ing. These approaches can provide higher accuracy. However, they Stochastic approaches exploit the probabilistic and statistical
consume a large amount of computational power and storage re- properties of sensed data. Deshpande et al. [5] propose a data pre-
sources, because their pre-processing stages require O(n2d) trans- diction scheme based on a probabilistic model to reduce data
missions, where n is the number of nodes and d is the diameter transmission and reduce the quantity of data acquisition. A repre-
of the network [21]. Therefore, these complex approaches are sentative stochastic approach, named KEN [6], uses dynamic prob-
infeasible for most environmental monitoring applications. abilistic model to minimize communication from the sensor node
This paper proposes a novel prediction-based data collection to the base station. The data aggregation process does not require
protocol to reduce redundant data transmission. A double-queue communication between the sensor node and the base station ex-
mechanism is designed to synchronize the predicted data series cept when the sensor node senses anomalous data. KEN naturally
in the sensor node and the sink node, and therefore, the mecha- accommodates applications that are based on event reporting or
nism avoids the cumulative error of continuous predictions. Based anomaly detection. An extension of KEN is presented in [7], where
on this protocol, we design three prediction-based data aggrega- a Dynamic Probabilistic Model (DPM) is exploited to implement a
tion approaches (GMDA, KFDA, and CoGKDA). The proposed ap- probabilistic database view. The main drawback of this class of
proaches are used to predict the data of the next period at both techniques is that they inherently have relative high computa-
sensor and sink ends based on the same small number of recent tional cost. To improve compression of the data communicated,
data items. When data of the next period is sensed, the sensor node some stochastic models exploit sophisticated spatial correlations
compares the predicted data with the sensed data. The sensor node of data in neighboring nodes. However, the more sophisticated
does not forward the sensed data to the sink node when the predic- the model, the more communications are required among sensor
tion error is less than a pre-congured threshold value. In this case, nodes themselves for coordination [2]. Therefore, possible
the sink node considers the predicted data as the sensed data in improvements in this direction may focus on deriving simplied
current sensing period. Therefore, unnecessary transmission is distributed models for obtaining the desired trade-off between
eliminated and energy is saved. The sensor node must send the the energy efciency and the data accuracy according to users
sensed data to the sink node when the prediction error is out of requirements.
the pre-congured threshold. The pre-congured threshold is a The most representative time series methods include Moving
tunable parameter for users to control the accuracy of predicted Average (MA), Auto-Regressive (AR) and Auto-Regressive Moving
data. It is inversely proportional to data accuracy. Experiments Average (ARMA) models. These models are quite simple, and can
and evaluations demonstrate the proposed approaches can signif- be used in many practical cases. Probabilistic Adaptable Query sys-
icantly reduce communication redundancy and improve the net- tem (PAQ) [8] uses a combination of AR models to probabilistically
work lifetime in environmental monitoring applications. answer queries. This model is used globally to predict the readings
Our contribution can be summarized as follows: of individual sensors at the sink node, and locally to detect when
sensor nodes produce outlier readings or when the model ceases
A prediction-based data collection protocol is proposed to spec- to properly t the data at a sensor node. The Similarity-based
ify the cooperative processes between sensor node and sink Adaptive Framework (SAF) [9] uses a simple linear time series
node, in which a novel double-queue mechanism is designed model that consists of a time-varying function, also called trend
to synchronize the prediction data series in the sensor node component, and a stationary AR component representing the diver-
and the sink node, hence cumulative error in continuous predic- gence of the phenomenon from the time-varying function over
tions is avoided. time. SAF can detect both outliers and inconsistent data. Le-Borgne
By integrating the merits of the grey model in quick modeling et al. [10] propose an adaptive multi-model selection mechanism,
with the advantages of Kalman Filter in processing data series which uses a lightweight, online algorithm that allows a sensor
noise, we have designed the CoGKDA algorithm for environmen- node to autonomously determine a satisfactory model from a set
tal monitoring wireless sensor networks. CoGKDA exhibits high of candidate models. As sensed data are collected, based on a
data accuracy, low communication overhead, and relatively low weight metric, it is possible to select the model that offers at each
computational complexity. Furthermore, CoGKDA can extend instant the highest achievable communication savings. Time series
the sensor nodes lifetime by reducing data transmissions redun- forecasting methods can provide sufcient accuracy, and their
dancy and conserving power during continuous data collections. implementation in sensor devices is simple and lightweight. How-
ever, it is difcult to nd an appropriate model that can tackle the
The rest of the paper is organized as follows: In Section 2, state- long-term trend and short-term noise of data sequences simulta-
of-the-art methods in data aggregation are reviewed. Section 3 neously while providing a tunable trade-off between energy ef-
presents a novel prediction-based data collection protocol. Section ciency and data accuracy.
4 describes the Grey-Model-based Data Aggregation approach. An- Algorithmic approaches aggregate data by exploiting the
other data aggregation approach based on the Kalman Filter is gi- heuristic or behavioral characteristics of the sensing phenomena.
ven in Section 5. In Section 6, a combined data aggregation PREMON [10] views a snapshot of the sensor network as an image
approach and its concrete algorithm are presented in detail. Exper- the readings of individual sensors corresponding to the intensity
iments and performance evaluation are presented in Section 7 and value of pixels in the image. Monitoring operations are considered
concluding remarks are made in Section 8. as receiving a sequence of the snapshots on a continuous basis.
When the sink node gets the initial reading from a sensor node,
it computes the model by evaluating correlations between
2. Related work macro-blocks and deriving a motion vector relative to each block.
After obtaining the model, the sensor node sends the model back
There has been a lot of work done in the eld of data-driven to the sink node. From this time on, the sensor node compares each
techniques for energy conservation in wireless sensor networks. sample with the prediction derived from the model. When sensed
G. Wei et al. / Computer Communications 34 (2011) 793802 795
data are close to the prediction within a user-specied tolerance, To solve above problems, the proposed cooperative data collec-
the sensor node does not transmit the data to the sink node. The tion protocol is presented in detail as follows.
model is periodically updated. Goel et al. [11] propose a buddy pro- Prerequisites:
tocol to extend the PREMON approach by establishing a collabora-
tive buddy relationship between sensor and sink nodes. It is (1.1) Each sensor nodes lifetime is divided into equal periods. A
suitable for cluster structured wireless sensor networks. By includ- sensor node produces only one sensed data in one period.
ing a periodic polling scheme in cluster operations, the proposed (1.2) Both the sink node and the sensor node use the same pre-
buddy protocol can guarantee that each node in the network is diction algorithm. The sink node is assumed to have sufcient
reachable within the specied maximum delay constraints. Han computing power, storage, and energy.
et al. [12] present an Energy Efcient Data Collection (EEDC) mech- (1.3) A reliable data delivery is dened as an end-to-end data
anism for data prediction. EEDC is effective in active inquiry-based intercommunication in which the receiver must send an
applications, in which each node associates an upper and a lower acknowledgement message back to the sender.
bound, whose difference represents the accuracy of the sensed
data. These bounds are sent to the sink node, which stores them Initialization:
for each sensor node in the network. These bounds can be updated
according to source-initiated and sink-initiated requests. However, (2.1) The sink node broadcasts its acceptable prediction error
the algorithmic techniques are too complex in computation and threshold e and cumulative error threshold h to all sensor nodes
may also incur a great deal of communication overhead [2]. according to the requirement of specic application by using
Compared to the above mentioned data aggregation methods, reliable data deliveries. e and h are tunable parameters, pre-con-
the data collection protocol and data aggregation approaches pro- gured at the sink node. When their values are modied, the
posed in this paper have the following advantages. (1) They can pro- fresh e and h must be re-broadcast to all sensor nodes.
vide high prediction accuracy without a large amount of training (2.2) Each sensor node constructs two data queues, actual data
data and a priori knowledge of the distribution of sensed data, and queue (ADQ) and predicted value queue in sensor end (PVQsensor).
eliminate more redundant transmissions. (2) They are more adap- ADQ stores actual data series and is used to control cumulative
tive to dynamic changes in the distribution of sensed data. In addi- error. PVQsensor stores the data series that is used to do the same
tion, they are more scalable and structure-free, therefore, they can predictions in both the sensor node and the sink node. PVQsensor
be used to couple with other route or topology-based data aggrega- may contain predicted data value. This is called the Double
tion protocols. (3) They are relatively lightweight in terms of compu- Queue Mechanism. The length of ADQ and PVQsensor are equal
tational complexity to resource-constrained sensor nodes. and both are specied by the applied prediction algorithm
(denoted as l). The sink node constructs a corresponding queue
3. Prediction-based data collection protocol for each sensor node, called PVQsink, PVQsink(i) = PVQsensor(i) for
sensor node i.
In the application layer of a wireless sensor network, data col- (2.3) Each sensor node stores the rst l sensed data into its ADQ
lection can be classied into three schemes: Pull, Push, and Integra- and PVQsensor, and sends them to the sink node to construct
tion of Pull and Push. In the Pull scheme, the sensor node acquires PVQsink via reliable delivery. Let xj denote the data item in a
data from physical layer and caches it locally. The cached data is queue. In the initial stage, ADQ(i) = PVQsensor(i) = PVQsink(i) =
collected only when the sensor node receives a query from the sink {x1, x2, . . . , xl} for an arbitrary sensor node i.
node. In this case, the sensor network looks like a database. In the
Push scheme, the sensor node periodically senses data and imme- Prediction:
diately delivers it to the sink node. The sink node acts as a passive
data collector. The Integration scheme provides capabilities of ac- (3) Let xl+1, x0l1 and x00l1 denote the actual sensed data, predicted
tive data pushing and passive data acquisition by integrating the value using ADQ, and predicted value using PVQsensor(i), respec-
Pull scheme with the Push scheme. tively. It is noticeable that the sink node can also obtain x0l1
In this paper, a prediction-based data collection protocol is pro- from the PVQsink(i) queue. If absx00l1 xl1 < e, the prediction
posed for the Push scheme. The proposed protocol is different from error is considered as in threshold; otherwise out of threshold.
data collection protocols in the MAC layer, since it only focuses on If abs x00l1 x0l1 < h, the cumulative error is considered as in
the prediction-based cooperation between the sensor node and the threshold; otherwise out of threshold. When prediction error
sink node without taking into consideration network topology, and cumulative error are in their thresholds simultaneously,
node density, link quality and radio transceiver parameters. In gen- the prediction of this period is considered successful. For a suc-
eral, the main challenges in designing a prediction-based data col- cessful prediction, the sensor node does not need to send xl+1 to
lection protocol include: (1) how to keep the data series at the sink sink node. The sink node considers the predicted value x00l1 as
node and the sensor node synchronous. In our approaches, both xl+1 in this period. After a successful prediction, the queues are
sensor node and sink node must use the same data series and updated by following rules: (a) ADQ(i) = {x2, x3, . . . , xl+1}; (b)
the same prediction algorithm. However, the sensor has real PVQ sensor i fx2 ; x3 ; . . . ; x00l1 g; and (c) PVQ sink i fx2 ; x3 ; . . . ;
sensed data while the sink node does not. The reason is that some x00l1 g.
sensed data have not been sent to the sink node since related suc-
cessful predictions are done previously; (2) how to avoid cumula- Exceptions:
tive error in continuous predictions. Since the data used for
performing predictions may contain predicted value, cumulative (4.1) The actual sensed data xl+1 must be sent to the sink node
error will inherently be produced; and (3) how to differentiate suc- using reliable delivery in the following cases: (a) a failed predic-
cessful prediction and data loss when the sink node does not re- tion occurs; and (b) the number of continuous successful pre-
ceive the sensed data. When the sensed data is out of threshold, dictions exceeds a pre-congured number.
it must be sent to the sink node. Nonetheless, the sink node may
fail to receive the sensed data due to packet loss induced by unre- (4.2) After an exceptional data delivery, the queues are updated
liable communication. From the viewpoint of the sink node, this by following rules: (a) ADQ(i) = {x2, x3, . . . , xl+1}, (b) PVQsensor(i) =
case is very similar to the successful prediction scenario. {x2, x3, . . . , xl+1}, and (c) PVQsink(i) = {x2, x3, . . . , xl+1}.
796 G. Wei et al. / Computer Communications 34 (2011) 793802
0 M
4. Grey model based data aggregation (GMDA) ^0 t 1 e^x t1 :
y 8
A system is called a white system if all information about it is Let Dt 1 jy ^0 t 1 y0 t 1j and e represent the pre-
known, and a black system if no information about it is known. A diction error and the threshold of the prediction error, respectively.
grey system is intervenient between the white system and the For simplicity, cumulative error is not taken into consideration
black system, in which poor, incomplete, or uncertain data is pro- here. After obtaining the predicted data and the prediction error,
vided [1]. The grey model provides a powerful tool for modeling the sensor node compares the error with e. If D(t + 1) < e, the sen-
discrete series with a few data items and for forecasting based sor node does not need to transmit y(0)(t + 1) to the sink node.
on determination of an exponential pattern. A sensor node can Otherwise, it must send y(0)(t + 1) to the sink node. At the other
be treated as an uncertain grey system in the data aggregation pro- end, the sink node runs the same prediction program with the
cess, since only a small sample and poor information is stored and same prediction data sequence. Therefore, it obtains the same pre-
provided. In this paper, the single variable rst-order grey model dicted data as the sensor node predicted. However, the sink node
GM(1, 1) [1] is used to capture the long-term trend of the sensed can not compute the prediction error since it does not have
data sequence by exploring and extracting valuable information y(0)(t + 1). If there is no data coming from the sensor node in a xed
from recently sensed data. time T0, the sink node sets D(t + 1) < e and considers y ^0 t 1 as
Before predicting, a few historical sensed data should be stored (0)
y (t + 1) in the current sensing period. T0 should be longer than
in the sensor node to construct the initial data sequence for the maximum transmission latency, but shorter than the length
GM(1, 1) model, denoted as Y(0). of a sensing period. It is important to synchronize the prediction
data sequences, PVQsensor and PVQsink. When D(t + 1) P e, the sensor
Y 0 y0 1; y0 2; . . . ; y0 t : 1 node must use the predicted data y ^0 t 1 as the data of (t + 1)th
period in its next prediction sequence, because the sink node does
In Eq. (1), y(0)(j), j = 1, 2, . . . , t, represents a data element. t de-
have y(0)(t + 1).
notes the number of elements in the sequence. t is an invariant,
which represents the length of the data sequence. The GM(1, 1)
model uses data of most recent t periods. To eliminate the inu-
ence of oscillation in the initial data sequence, the natural loga- 5. Kalman-Filter-based Data Aggregation (KFDA)
rithm and the exponential function are used to get the adjusted
sequence for GM(1, 1) model, as described in Eq. (2). The Kalman Filter [13] is an efcient recursive lter that esti-
1=M n mates the state of a linear dynamic system from a series of noisy
1=M 1=M 1=M o
ln Y 0 ln y0 1 ; ln y0 2 ; . . . ; ln y0 t : measurements. It presents high prediction accuracy based on a
small quantity of information. It has been used to design adaptive
2 routing mechanisms in mobile wireless sensor networks [14,15].
Olfati-Saber [17] proposes a peer-to-peer continuous-time distrib-
In Eq. (2), M is an integer invariant. In general, 1 < M < 10. Let
uted Kalman Filter that uses local aggregation of the sensor data
x(0)(j) = (lny(0)(j))1/M and X(0) denotes the prediction data sequence,
but attempts to reach a consensus on estimates with other nodes
as described in Eq. (3).
in the network. Yu et al. [18] also design a distributed consensus
X 0 x0 1; x0 2; . . . ; x0 t : 3 lter, in which each sensor can communicate with the neighboring
sensors, and ltering can be distributed among nodes. By using a
(1)
Let X be the 1-AGO (accumulated generating operator) se- pinning control scheme, only a small fraction of sensors need to
quence of X(0), as described in Eq. (4). measure the target information. In this paper, the Kalman Filter
is used to estimate the data sequence for each sensor node rather
X 1 x1 1; x1 2; . . . ; x1 t : 4
than to choose sensor nodes.
Therefore, the GM(1, 1) model can be established as Eq. (5) (a
differential equation).
1 5.1. Kalman-Filter-based prediction model
dx
ax1 b: 5
dt
0 1 0 1 1 In a sensor node, continuous data forms a discrete time data se-
x0 2 z 2 1 quence, which can be modeled by the following Linear Stochastic
B x0 3 C B z1 3 1 C
Let AB C and BB C, where Difference equation:
@ A @ A
x0 t z1 t 1
Xk AkXk 1 BkUk Wk: 9
z1 k 12 x1 k x1 k 1 when k = 2, . . . , t. Therefore,
^ T BT B1 BT A. Using the Least Squares Method, the values of
^; b X(k) represents the predicted data at the period k. A(k) repre-
a
sents the state transition model which is applied to the data of
the parameters a and b can be obtained. Therefore, x k 1 can^1
the previous period (k 1). B(k) represents the control-input mod-
be obtained by using Eq. (6).
el applied to the control vector U(k). W(k) represents the noise of
! the prediction period, which is assumed to follow a zero mean
^
b ^
b
^x1 k 1 eak x1 1 : 6 multivariate normal distribution with the covariance Q(k). Let
^
a ^
a
Z(k) denote the actual sensed data sequence at the period k:
^x0 k 1 ^x1 k 1 ^x1 k: H(k) is the observation model which maps the predicted data
7
space into the actual sensed data sequence. V(k) is the noise which
^0 t 1 can be obtained by
Finally, the nal predicted data y is assumed to be Zero Mean Gaussian white Noise with covariance
Eq. (8). V(k).
G. Wei et al. / Computer Communications 34 (2011) 793802 797
sensed data, the data predicted by using the prediction value it knows the sensor node is functioning. Otherwise, the sensor
queue in the sensor node (PVQsensor), and the data predicted by node will be temporarily treated as a dysfunctional node since it
using the actual data queue (ADQ), respectively. In the (t + 1)th per- has not sent back data in the pre-congured time. In CoGKDA, v
iod, Dt 1 absy ^t 1 yt 1 is the prediction error and is a pre-congured parameter, which should be determined by
D0 t 1 absy
^0 t 1 y
^t 1 is the cumulative error. The the trade-off between reducing concurrent error and increasing
sensor node checks the prediction error with the pre-congured communication overhead. It is reasonable to deduce that as v de-
global thresholds. If D(t + 1) < e and D0 (t + 1) < h, the sensor node creases, CoGKDA decreases concurrent error but increases commu-
does not send the actual y(t + 1) to the sink node. The sink node nication overhead.
considers y ^t 1 as y(t + 1). The sensor node must set
yt 1 y^t 1 to keep the prediction data sequence PVQsensor
7. Experiment and performance evaluation
synchronized with the sink nodes PVQsink for future predictions.
Otherwise, the sensor node sends y(t + 1) to the sink node. The
7.1. Experiment setup
CoGKDA algorithm in the sensor node is described in Table 1.
According to the protocol described in Section 3, when contin-
In this paper, experiments are based on an environmental mon-
uous and successful predictions are made in a sensor node, the sink
itoring system in a granary. Since grain is liable to mildew when
node will not receive any data from the sensor node for a long time.
the humidity and temperature in the storehouse are too high, it
In this case, the sensor node is very similar to a dysfunctional (or
is very important to monitor real-time humidity and temperature.
failed) sensor node. In addition, as the number of continuous and
The data used in our experiments are derived from a real deployed
successful predictions increases, the cumulative error will increase
sensor network. The sensor network is used to collect the temper-
correspondingly. To distinguish the two different cases and avoid
ature and humidity of the grain in a large granary, which consists
excessive cumulative error, we use another threshold v for the
of 30 storehouses. Each storehouse is a detached building, which is
number of continuous and successful predictions. When the num-
divided into 24 volumes. The grain is stored in the volumes. In each
ber of continuous and successful predictions is out of v, the sensor
volume, there are four sensor nodes buried in the grain. The sensor
node must send the actual data to the sink node. Therefore, when
eld of a volume is divided into four zones: top, middle-top, mid-
the sink node receives actual data from a sensor node in v periods,
dle-bottom, and bottom. All sensor nodes in a storehouse form a
tree-structured network with three layers: sensor layer, intermedi-
ate layer and sink layer. Nodes in the intermediate layer and the
Table 1 sink layer are external powered, while nodes in the sensor layer
The CoGKDA algorithm. are only powered by battery. Each intermediate node receives data
Input: from four sensor nodes in one volume and sends them to the sink
Y(i): current prediction data sequence, Yi y ^it1 ; y
^it2 ; . . . ; y
^i ; i P t; node (intermediate nodes just relay data between sensor and sink
W: static variable, the current optimal weight vector; nodes). One sink node is deployed in one storehouse to collect data
yi+1: the sensed data of the (i + 1) th period; from intermediate nodes. Sensor nodes sense and return tempera-
r: static variable, the number of the continuous and successful predictions;
ture and humidity data every thirty minutes. This system has
u: static variable, the number of periods for re-computing weight vectors;
v: static variable, a threshold for variable r. If r P v, sensor must send worked for three years and has collected a large volume of data.
currently sensed data to sink node; To evaluate the proposed data aggregation approaches, these
s: static variable, the age of the current weight vector; approaches are implemented in our test bed system, in which all
e: static variable, the threshold of prediction error; sensor nodes are designed based on TinyOS 2 and the IEEE
h: static variable, the threshold of cumulative error;
Output: 802.15.4 protocol. All experiments are carried out based on a real
Y(i + 1): next prediction data sequence; data set. Since these prediction-based data aggregation approaches
CoGKDA (Y(i), W, yi+1, r, u, v, s, e, h) are structure-free and topology-free, a sensor node was randomly
{ chosen for the experiments and only its temperature data is used.
Perform GMDA prediction and obtain the predicted data y ^g and its error eg;
A data sequence (denoted as D) that includes 720 continuous data
Perform KFDA prediction and obtain the predicted data y ^k and its error ek;
if s < u 1{ items (data for half a month) was randomly extracted from the ori-
s = s + 1; ginal temperature data stream for the following experiments.
Perform the combination and obtain the predicted data y ^c , prediction error In the proposed approaches, e and h represent users require-
pec, ments on data accuracy. They are application-specic parameters.
and cumulative error cec;
In the temperature monitoring of a granary, the tolerated error of
}
else{ the predicted data is relatively small, since the grain is sensitive
Compute new optimal weight vector Wnew; to temperature changes. Therefore, in our experiments, we let
Send the new weight vector to the sink node; e = 0.5 and e = 1 represent users high and low on data accuracy
W = Wnew;
requirements, respectively. For simplicity, we let e = h.
s = 0;
}
if pec < e and cec < h and r < v 1{ 7.2. GMDA
r = r + 1;
^i1 y
y ^c ;//Synchronize the next period prediction data sequence with the
sink node. Compared to other prediction-based approaches, GMDA is very
Yi 1 y ^it2 ; y
^it3 ; . . . ; y
^i1 ; //Refresh the prediction data sequence lightweight. The predicted data sequence of GMDA can be repre-
for future predictions. sented as Y(0) = (y(0)(1), y(0)(2), . . . , y(0)(t)), where parameter t de-
} notes the length of the sequence used for predictions. In general,
else{
Send yi+1 to the sink node;
longer sequences lead to more accurate predictions. However,
r = 0; greater length consumes more sensor storage and leads to higher
Yi 1 y ^it2 ; y
^it3 ; . . . ; y
^i ; yi1 ; computational complexity. To choose a suitable t value, we evalu-
} ate the growth rate of prediction accuracy while t changes from 3
return Y(i + 1);
to 9. The results are shown in Fig. 1. First, we randomly extracted
}
three sub-sequences from the original data set. Each sub-sequence
G. Wei et al. / Computer Communications 34 (2011) 793802 799
0.35 1
0.25 0.8
0.2 0.7
0.15 0.6
CDF
GMDA
0.1 0.5 KFDA
CoGKDA
0.05 0.4
0 0.3
0.05 0.2
0.1 0.1
3 4 5 6 7 8 9 0 1 2 3 4 5
length of data queue used for predicting (t) Prediction error
Fig. 1. Evaluation of parameter t in GMDA. Fig. 3. Cumulative distribution functions of the prediction errors of GMDA, KFDA
and CoGKDA when e = 1.
0.7 KFDA. The CDF value of the prediction errors produced by CoGDA
is higher than those of KFDA and GMDA for both e = 0.5 and
Success prediction rate and Comunication overhead
25 450
Overhead in communication energy consumption(\%)
400
= 0.5
the number of successful predictions
20
=1 350
300
15
250
10 AR(3)
200
CoGKDA
SAF
150 PAQ
5
100
0 50
2 4 6 8 10 12 14 16 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
v prediction threshold
Fig. 5. Communication overhead as v changes from 2 to 16. Fig. 6. The comparison of the success rates as e changes from 0.2 to 1.
G. Wei et al. / Computer Communications 34 (2011) 793802 801
0.4
Prediction-based data aggregation is a fundamental data-driven
energy conservation approach. The prediction-based approach
0.3 saves energy by reducing redundant data communications. Since
CoGKDA the prediction-based approach is structure-free, it can be used to
0.2 PAQ couple with other route or topology-based data aggregation ap-
SAF proaches. By analyzing energy efciency and data accuracy, a novel
AR(3)
0.1 prediction-based data collection protocol is proposed to specify the
cooperations between the sensor node and the sink node. In the
proposed protocol, a double-queue mechanism is designed to syn-
0
0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 chronize predicted data at both sensor node and sink node to avoid
Prediction threshold cumulative error in continuous predictions. Based on this novel
data collection protocol, three prediction-based data aggregation
Fig. 7. The comparison of the mean square errors of all successful predictions as e
changes from 0.2 to 1. approaches are proposed: GMDA, KFDA and CoGKDA. Experiments
have been carried out based on a real data set collected from a tem-
perature and humidity monitoring application in a grain repertory.
7.5. Complexity and scalability The results demonstrate that the proposed approaches can reduce
energy consumption caused by redundant communications with
Main computation in GMDA derives from the GM(1, 1) algo- minimally increased overhead. Experiments also show CoGKDA
rithm. According to Eqs. (1)(8), it can be deduced that the compu- achieves better performance compared to traditional prediction-
tational complexity of GMDA is O(t), where t is the number of data based approaches (including SAF, PAQ and AR).
items for the prediction algorithm. For most applications, the com-
putational complexity of GMDA is very low when t = 5. Acknowledgement
In the KFDA approach, as described in [16], the order of the Kal-
man Filter is O(2m2n) + O(2mn2) + O(m3) + O(n3). In this paper, m is This work was supported by the National Natural Science Foun-
the number of dimensions of data prediction sequence at the pre- dation of China under Grant No. 60803161.
diction phase, and n is the number of predicted data items for one
recursion. If n = am, where a > 1, then the number of computations References
can be transformed to O[(1 + 2a + 2a2 + a3)m3]. As described in Eqs.
(9)(15), the computational complexity becomes O(3m3), when [1] J.L. Deng, Introduction to Grey system theory, Journal of Grey System 1 (1)
(1989) 124.
using the equivalent system with A(k) = 1, B(k) = Q(k) = 0,
[2] R. Rajagopalan, P.K. Varshney, Data aggregation techniques in sensor
R(k) = H(k) = I and P(0j0) = 1. Therefore, the computational com- networks: a survey, IEEE Commununications Surveys Tutorials 8 (4) (2006)
3 4863.
plexity of KFDA is O 12a2a 2 a3 times the normal KF algorithm.
[3] S. Ozdemir, Y. Xiao, Secure data aggregation in wireless sensor networks: a
For example, let a = 2, the computational complexity of KFDA is comprehensive overview, Computer Networks 53 (2009) 20222037.
[4] G. Anastasi, M. Conti, M.D. Francesco, A. Passarella, Energy conservation in
only 0.14 times the normal KF algorithm. By analyzing the CoGKDA wireless sensor networks: a survey, Ad Hoc Networks 7 (2009) 537568.
algorithm in Table 1, the computation of CoGKDA approach mainly [5] A. Deshpande, C. Guestrin, S. Madden, J.M. Hellerstein, W. Hong, Model-driven
consists of three parts: GMDA, KFDA and computing weight vector data acquisition in sensor networks, in: Proceedings of the 30th International
Conference on Very Large Data Bases, 2004, pp. 588599.
W. As described in Eqs. (16)(20), the complexity for computing W
[6] D. Chu, A. Deshpande, J.M. Hellerstein, W. Hong, Approximate data collection
is O(mn2), in which m is the number of meta predictions model and in sensor networks using probabilistic models, in: Proceedings of the 22nd
n is the number of data items in the prediction data sequence. In International Conference on Data Engineering 2006, pp. 4859.
CoGKDA, the computational complexity for computing W is [7] B. Kanagal, A. Deshpande, Online ltering, smoothing and probabilistic
modeling of streaming data, in: Proceedings of the 24th International
O(2n2). To unify measurements, we let m represent the number Conference on Data Engineering 2008, pp. 11601169.
of data items in prediction data sequence. Therefore, adding the [8] D. Tulone, S. Madden, PAQ: time series forecasting for approximate query
three parts, the computational complexity of CoGKDA is answering in sensor networks, in: Proceedings of the Third European
Conference on Wireless Sensor Networks, 2006, pp. 2137.
O(m) + O(3m3) + O(6m2), and the order of CoGKDAs computation [9] D. Tulone, S. Madden, An energy-efcient querying framework in sensor
complexity is approximately O(m3). networks for detecting node similarities, in: Proceedings of the Ninth
It can be seen that CoGKDA is the most complex, KFDA is sim- International ACM Symposium on Modeling, Analysis and Simulation of
Wireless and Mobile Systems, 2006, pp. 291300.
pler, and GMDA is the simplest. Although the computational com- [10] Y. Le-Borgne, S. Santini, G. Bontempi, Adaptive model selection for time series
plexity of CoGKDA seems high, in practice, it is acceptable for most prediction in wireless sensor networks, Signal Process 87 (12) (2007) 3010
applications when the length of data queues (ADQ and PVQ) is 3020.
[11] S. Goel, A. Passarella, T. Imielinski, Using buddies to live longer in a boring
small, e.g. m = 5. world, in: Proceedings of 2006 IEEE International Workshop on Sensor
In the proposed approaches, all computations for data aggrega- Networks and Systems for Pervasive Computing, 2006, pp. 342346.
tion are only performed in the sensor node and the sink node. Data [12] Q. Han, S. Mehrotra, N. Venkatasubramanian, Energy efcient data collection
in distributed sensor environments, in: Proceedings of the 24th IEEE
processing in intermediate nodes is not needed. Therefore, the pro-
International Conference on Distributed Computing Systems, 2004, pp. 590
posed approaches are independent of network scale. Their perfor- 597.
mance is only determined by users requirements which are jointly [13] R.E. Kalman, A new approach to linear ltering and prediction problems,
controlled by parameters e, h, t, u and v. Furthermore, the proposed Transactions of the ASME Journal of Basic Engineering 82 (Series D) (1960) 35
45.
approaches can be used in tree-structured, cluster-structured and [14] B. Pasztor, M. Musolesi, C. Mascolo, Opportunistic mobile sensor data
peer-structured wireless sensor networks, since the data collection collection with SCAR, in: Proceedings of MASS 2007, 2007, pp. 112.
802 G. Wei et al. / Computer Communications 34 (2011) 793802
[15] M. Musolesi, S. Hailes, C. Mascolo, Adaptive routing for intermittently [19] R. Kay, F. Mattern, The design space of wireless sensor networks, IEEE Wireless
connected mobile ad hoc networks, in: Proceedings of 2005 IEEE WoWMoM, Communications 11 (6) (2004) 5461.
2005, pp. 183189. [20] Y. Yao, B.B. Giannakis, Energy-efcient scheduling for wireless sensor
[16] M.J. Goris, D.A. Gray, I.M.Y. Mareels, Reducing the computational load of a networks, IEEE Transactions on Communications 53 (8) (2005) 5461.
Kalman lter, Electronics Letters 33 (18) (1997) 15391541. [21] Q. Xin, L. Gasieniec, C. Su, P. Wong, Routing via single-source and multiple-
[17] R. Olfati-Saber, Distributed Kalman ltering for sensor networks, in: The 46th source queries in static sensor networks, in: Proceedings of IPDPS 2005, 2005,
IEEE Conference on Decision and Control, 2007, pp. 54925498. pp. 183189.
[18] W. Yu, G. Chen, Z. Wang, W. Yang, Distributed consensus ltering in sensor
networks, IEEE Transactions on Systems, Man, and CyberneticsPart B:
Cybernetics 39 (6) (2009) 15681577.