A New Framework For Smartphone Sensor Based Human Activity Recognition Using Graph Neural Network

Download as pdf or txt
Download as pdf or txt
You are on page 1of 8

This article has been accepted for publication in a future issue of this journal, but has not been

fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JSEN.2020.3015726, IEEE Sensors
Journal
IEEE SENSORS JOURNAL, VOL. XX, NO. XX, XXXX 2017 1

A New Framework for Smartphone Sensor


based Human Activity Recognition using Graph
Neural Network
Riktim Mondal , Debadyuti Mukherjee , Pawan Kumar Singh , Member, IEEE , Vikrant Bhateja ,
Senior Member, IEEE and Ram Sarkar , Senior Member, IEEE

Abstract— Automatic human activity recognition (HAR) through computing devices is a challenging research topic in the
domain of computer vision. It has widespread applications in various fields such as sports, healthcare, criminal investi-
gation and so on. With the advent of smart devices like smartphones, availability of inertial sensors like accelerometer
and gyroscope can easily be used to track our daily physical movements. State-of-the-art deep neural network models
like Convolutional Neural Network (CNN) do not need any additional feature extraction for such applications. However,
it requires huge amount of data for training which is time consuming, and requires ample resource. Another limiting
factor of CNN is that it considers only the features of an individual sample for learning without considering any structural
information among the samples. To address the aforesaid issues, we propose an end-to-end fast Graph Neural Network
(GNN) which not only captures the individual sample information efficiently but also the relationship with other samples
in the form of an undirected graph structure. To the best of our knowledge, this is the first work where the time series data
are transformed into a structural representation of graph for the purpose of HAR using sensor data. Proposed model has
been evaluated on 6 publicly available datasets, and it achieves nearly 100% recognition accuracy for all the 6 datasets.
Source code of this work is available at https://github.com/riktimmondal/HAR-Sensor.
Index Terms— Human Activity Recognition, Graph Neural Network (GNN), Message Passing, Smartphone sensors, Deep
learning.

I. I NTRODUCTION sensors, for example, tri-axial accelerometer and gyroscopes,


capture information while the movement is being performed.
I N recent years with the availability of smart devices
like smartphones, our society has witnessed a tremendous
change in the field of multimedia communication. Not only
Different physiological signals, for example, heartbeat, breath,
and so forth and natural signals, for example, temperature,
this, these devises are also used in monitoring our daily time, pressure, and so on are also recorded which help to
activities. Sensors like accelerometers, gyroscopes, proximity develop a better HAR system. An important application of
and magnetometers, etc. which come as inbuilt sensors in the HAR is in healthcare domain, particularly in helping restora-
smart devices can help to measure our physical activities. This tion, physiotherapist, and old age caring management and
has led to an interesting domain of research called Human psychological weakness.
Activity Recognition (HAR) which finds its application in Traditional HAR strategies have gained a significant im-
healthcare [1], sports activities [2], smart home systems [3], provement in finding noteworthy data from scores of low-level
and criminal surveillance [4]. readings. However, such models are fruitful for information
HAR can be defined as the task of distinguishing a physical gathered in controlled environment as well as for limited
activity completed by an individual subject to a hint of number of activities. Recognition of complex human activities
development inside a specific domain. Daily activities, for are restricted due to the naive feature extraction procedures and
example, Walking, Sitting, Standing, and Climbing steps etc. restriction in field awareness. The simple feature extraction
are considered as conventional physical events and build our procedures lower the performance of the learning algorithms
class of actions.To record these events or change in events, and do not work well for some closely related human activities
such as Walking and Jogging. On the other hand, deep learning
Riktim Mondal, Debadyuti Mukherjee and Ram Sarkar are with models have the abilities to automatically learn features from
the Department of Computer Science and Engineering, Jadavpur
University, Kolkata-700032, India (e-mail: riktim97@gmail.com, de- the raw data with the higher precision [5].
badyuti23@gmail.com, rsarkar@ieee.org ). In the earlier days, traditional machine learning models
Pawan Kumar Singh is with the Department of Information Tech- like Support Vector Machine (SVM), and K-Nearest Neighbor
nology, Jadavpur University, Kolkata-700106, India (e-mail: pawankrs-
ingh@ieee.org ). (KNN) were used for HAR, where additional time and fre-
Vikrant Bhateja is with Department of Electronics and Communi- quency domain features were needed to be extracted from the
cation Engineering, Shri Ramswaroop Memorial Group of Professional raw sensor data. Micucci et al. in [6] used SVM, and obtained
Colleges (SRMGPC), Lucknow-226028, Uttar Pradesh, India (e-mail:
bhateja.vikrant@ieee.org) and, Dr. A.P.J. Abdul Kalam Technical Uni- an accuracy of 98.71% on uniMiB-SHAR dataset. KNN was
versity, Lucknow, Uttar Pradesh, India. used in the work reported in [7] and Random Forest (RF) was

1530-437X (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: Aristotle University of Thessaloniki. Downloaded on November 09,2020 at 11:39:08 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JSEN.2020.3015726, IEEE Sensors
Journal
2 IEEE SENSORS JOURNAL, VOL. XX, NO. XX, XXXX 2017

used in [8]. Both the works considered WISDM dataset [9]


for experimentation, and achieved an accuracy of 96.2% and
98.09% respectively. Dadkhahi & Marlin et al. [10] introduced
tree based firm cascade linear classifier and achieved showed
97% accuracy on PAMAP2 dataset [11]. A Variational Deep
Embedding (VaDE) clustering model was introduced in [12]
to cluster different activities obtaining accuracy of 84.46%
on HHAR dataset [13]. For USC-HAD dataset, a classical
model of SVM was used by [14] to produce an accuracy
of 95% where they performed feature extraction followed by
classification using SVM to predict motion category for each
subject.
Later with the availability of computation resources Deep Fig. 1: Generic framework for transforming time-series based
Neural Network (DNN) based models like Convolutional Neu- sensor data into a graph structure for HAR using GNN based
ral Network (CNN)[15], Recurrent Neural Network (RNN) approach
[16] etc. become state-of-the-art models for various classi-
fication tasks including HAR. The authors of [17] used a
CNN based approach to achieve an accuracy of 91.7% by B. Contributions
extracting features from 3-D raw accelerometer data. Deep
In this paper, a new end-to-end trainable GNN based
RNN (DRNN) was introduced by [18] to study different types
framework is proposed for sensor-based HAR problem. The
of sequential models on various datasets. Authors of [19]
schematic diagram of the framework is shown in Fig. 1. Our
used combination of multi-channel Deep CNN (DCNN) and
work also gives a complete perspective on HAR framework
multilayer backpropagation on PAMAP2 dataset to extract
and describes different issues related to the GNN structure.
features and obtained an accuracy of 90.8%. A new model
It further endeavors to decrease the computational needs and
based on Hierarchical deep long short-term memory unit
improve the performance by using graph based strategies. The
named as H-LSTM was introduced in [20] to perform HAR
data are classified at the graph level, i.e., each node of a graph
on HHAR dataset [13] where they achieved 90% accuracy.
belongs to the same activity performed by a particular user.
Although satisfactory results are obtained using such networks
The main contributions of this paper are listed below:
but it needs a huge amount of labeled dataset which is time
consuming and hard to annotate correctly. Besides, most of 1) A generic dataset preparation framework is proposed
these models require heavy computation power and take a to transform any sensor based HAR data into a graph
very long time to train the model. Another important factor structure for training with graph based models.
which limits the success of these models is that at the time 2) A very fast and effective GNN model built upon
of learning, these models learn only from the input feature of GraphConv message passing technique [22] is developed
each sample independently without considering any structural which learns from neighborhood features along with its
and relational information among the samples. This is one of own features that correspond to both local and global
the key reasons for the conventional DNN based models to feature learning for graph based classification.
perform poorly when human activities show close structural 3) Extensive experiments are performed on 6 standard
resemblance. datasets to produce state-of-the-art results with very fast
training and evaluation on low-resource systems.
A. Motivation
The motivation of this work is to represent such unstructured II. DATA PREPARATION
time-series sensor data into a form of graph (i.e., a structured
way of representing the data) where each node corresponds In this section, we describe the datasets consid-
to a single sample of activity with undirected edges connect- ered here for experimentation. We have considered 6
ing other samples belonging to the same activity. This very publicly available HAR datasets namely MobiAct[23],
structural representation of nodes with edges connecting other WISDM[9], MHEALTH[24], PAMAP2[11], HHAR[13] and
nodes helps to capture the information from both individual USC-HAD[25] for evaluating our proposed GNN based model.
samples as well as from the neighboring samples. To the best A tabular description of the HAR datasets on the basis of
our knowledge, this is the first attempt where the concept number of activity classes, range of number of graphs obtained
of Graph Neural Network (GNN) is applied for the task of per activity as well as sensors used for data collection is
HAR using sensor data. GNNs are Artificial Neural Network provided in Table I.
(ANN) or connectionist network models which can capture The data pre-processing and preparation techniques for the
the structural dependence in a graph through sub-graphs and 6 HAR datasets have been discussed below.
nodes via message passing technique between the nodes of the 1) At first we have divided our dataset user-wise, and for
graph. Although GNN[21] is a very recent concept, researchers each user we have further divided the samples on the
have started applying it in various domains like Recommender basis of activities performed by a user. Hereafter, such
Systems, Image classification and Language Modeling. sub-divisions will be referred to as slices.

1530-437X (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: Aristotle University of Thessaloniki. Downloaded on November 09,2020 at 11:39:08 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JSEN.2020.3015726, IEEE Sensors
Journal
AUTHOR et al.: A NEW FRAMEWORK FOR SMARTPHONE SENSOR BASED HUMAN ACTIVITY RECOGNITION USING GRAPH NEURAL NETWORK 3

2) The features are normalized for each slice so that all


the values of a feature are centralized. Normalization is
done by subtracting a feature value by the mean of all
the values of that feature present in the slice and later
by dividing by the standard deviation of the values.
3) Since each slice contains a very large number of sam-
TABLE I: DESCRIPTION OF THE HAR DATASETS USED
ples, we have further pre-processed the samples into
IN THIS EXPERIMENT(Here A, G, M and E denote Ac-
non-overlapping windows. Each window has the maxi-
celerometer, Gyroscope, Magnetometer and ECG respectively
mum size of 128 samples i.e., we consider a window of
which are the sensors used to collect data)
6.4 seconds duration for the WISDM, MobiAct datasets,
Dataset Number Activities Present Graph Sensor(s) 2.56 seconds duration for the MHEALTH dataset, 1.28
of range used seconds for PAMAP2, HHAR and USC-HAD datasets.
Activity per
classes activity
4) For each window, we take the aggregate feature value as
MobiAct 13 Standing, Walking, 49-200 A the mean of all the sample values present in the window
Jogging, Jumping, for the corresponding feature.
Stairs Up, Stairs 5) Next we have considered each window as a node and
Down, Sit Chair, created a graph by taking a maximum number of 600
Car Step In,Car Step
nodes per graph. Since each slice contains less than 600
Out, Forward Lying,
Front Knees Lying, nodes so, there is one graph per slice. From the given
Sideward Lying, nodes, edges of the graph are obtained as follows:
Back Sitting Chair To predict an activity, we need to know its nature i.e.,
WISDM 6 Walking, Jogging, 23-26 A how the feature values change with time. So, two consecutive
Upstairs,
Downstairs, Sitting, windows can be used as an estimate of two consecutive time-
Standing periods. Following this logic, we have connected every two
MHEALTH 12 Standing still, Sitting 10 A, G, M, consecutive nodes by an undirected edge. Therefore, for each
and relaxing, Lying E graph the nodes corresponding to the first and last windows
down, Walking, have degree 1, and the rest of the nodes have degree 2.
Climbing stairs,
Waist bends forward, Some samples of the generated graphs of MobiAct, WISDM,
Frontal elevation of MHEALTH, PAMAP2, HHAR and USC-HAD datasets are
arms, Knees shown in the Fig. 2. The source code for generation of graph
bending(Crouching), attributes i.e., nodes, edges, feature values of a node, labels of
Cycling, Jogging, node and graph can be found at 1 .
Running, Jump front
and back
PAMAP2 12 Lying, Sitting, 4-8 A, G, M III. GRAPH NEURAL NETWORK (GNN)
Standing, Ironing, A graph G is represented by nodes (V) and edges (E)
Vacuum cleaning, as G=(V, E) where each node contains feature vectors, Xv
Descending stairs,
Ascending stairs, for v ∈ V and each edge contains feature vectors Xe for e
Walking, Nordic ∈ E. Here E represents the connection between two closest
walking, neighboring nodes. Classification using GNN is performed at
Cycling,Running and two levels. First, for node classification where each node v ∈
Rope jumping V of G is labeled as yv and one needs to predict the labels
HHAR 6 Biking, Sitting, 27-33 A
Standing, Walking, of each node v by finding an embedding vector hv such that
Stair Up,Stair down yv =f(hv ) where f is a differentiable approximation function.
USC-HAD 12 Walking Forward, 14 A, G f is used to map the relation of similarity between graph
Walking Left, nodes in simple embedding dimension with the actual graph
Walking Right, nodes present in the real graph. In embedding dimension,
Walking Upstairs,
Walking Downstairs,
many information is stored in compressed manner through
Running Forward, hv . Second, besides using node information, structure of the
Jumping Up, Sitting, graph is also considered for graph classification. Here, a set of
Standing, Sleeping, graphs g1 ,g2 ..,gn ∈ G with their respective labels y1 ,y2 ,..,yn
Elevator Up, ∈ Y are provided, where the representation vector hg needs to
Elevator Down
be found for every g for predicting each graph label through
yg =f1 (hg ). Here f1 is similar to f but it stores the entire graph
representation instead of nodes. In some situations, the feature
vector at edges i.e. Xe are also considered where directed edge
graphs with additional edge attributes are present. In our case,
1 https://github.com/riktimmondal/HAR-Sensor

1530-437X (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: Aristotle University of Thessaloniki. Downloaded on November 09,2020 at 11:39:08 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JSEN.2020.3015726, IEEE Sensors
Journal
4 IEEE SENSORS JOURNAL, VOL. XX, NO. XX, XXXX 2017

we have considered a graph with undirected edges containing


only node feature attributes.
GNN uses node features Xv and graph structure to learn
the embedding vector of node hv and the embedding vector
of the entire graph hg . Most of the GNNs work on the principle
of message passing technique to collect information from its
neighbors and pass it along. It is a neighborhood strategy
where the representation of a node is updated iteratively by
the aggregated representation of all of its neighboring nodes.
(k)
So, after k number of iterations, a node representation hv
contains the structural information of all the neighboring nodes
which are k-hops away. Here, k number of iterations actually
corresponds to kth layer as well, so at k>0layer, the final
representation vector of a node is represented as:

h(k)
v = N ON − LIN EARIT Y ( Fig. 2: One sample graph showing an activity class generated
COM BIN E(h(k−1)
v , AGGREGAT E(h(k−1)
u : u ∈ N (v)))) for each of the 6 HAR datasets.
(1)
Here NON-LINEARITY is a nonlinear function like Sigmoid,
Output of the GraphConv layer is passed through ReLU
ReLU, ELU; N(v) represents all the neighboring nodes of v
(0) function to introduce non-linearity. It is followed by a Top-
directly connected to it with an edge. We initialize hv =Xv .
KPooling layer which takes X and E as inputs to reduce
The COMBINE and the AGGREGATE functions can be “add”,
the number of graph nodes and convert the graphs into sub-
“mean”, “max”, “concatenation” as per types of GNN models.
graphs which are coarse version of the entire graph capable
In our present work, we have used GraphConv as the
to represent the high-dimensional representation of the same.
message passing layer where the representation vector at layer
This output (O1 ) is passed on to another block of the same
k>0 is represented as,
GraphConv-ReLU-TopKPooling layers whose output (O2 ) is
(k) (k) added with the output O1 . The final O1 + O2 output is
X
h(k)
v = σ(hv
(k−1)
∗ W1 + h(k−1)
u ∗ W2 ) (2)
u∈N (v)
passed through a linear layer with ReLU activation, dropout
layer (dropout rate=0.5). Then, a linear layer with dimension
(k) (k)
Where W1 and W2 represent weight matrices need to learn equal to that of the number of classes of the problem under
by model,Pσ represents component wise NON-LINEARITY consideration, with Log Softmax as the activation function is
function, is the AGGREGATE function. used to produce the final probability vector Z.
(K)
For node classification this hv vector can be directly used eZi
at final Kth layer for label prediction. Whereas for entire LogSof tmax(Zi ) = log( PC ) (4)
graph classification all the nodes v present in the graph G, j=1 eZj
(K)
its embedding vectors hv are aggregated to get the entire Where, Zi is the probability/log ofP its value of ith element
C Zj
graph level prediction using a READOUT function. in the last linear layer vector and j=1 e is the sum of
(K) all probability values of all jth elements including i in the Z
hG = READOU T (h(K)
v : v ∈ G) (3) vector for C number of classes. Our proposed GNN model
Here READOUT is the final aggregation function used to com- is shown in Fig. 3. We have used Negative Log Likelihood
bine all nodes’ v ∈ G embedding vector through operations (NLL) function as our classification objective function which
like “add”, “mean” etc. For our task, we have considered needs to be minimized and can be represented as follows,
both the “mean” and the “max” functions as the aggregation C
X
functions. N LL(Z) = − (yi ∗ LogSof tmax(Zi )) (5)
i=1

A. Proposed GNN based HAR model where yi represents the ground truth label of the ith graph.
Our network consists of a block of GraphConv layer with
two inputs: (a) X which is the feature matrix of each node of IV. EXPERIMENTATION
shape[V,D] where V is the total number of nodes in the graph For each of 6 HAR datasets, mentioned earlier, we have
and D is the dimensionality of the input features vector; and split the dataset into train and test sets in the ratio of 70% and
(b) the adjacency matrix A of shape [V,V] or edge list E of 30% upon which the number of graphs obtained for training
shape [2,L] consisting of all edges present in the entire graph and testing for each dataset is shown in Table II. We have
in the form of pair(V1 ,V2 ) where V1 and V2 are two nodes performed our experiments 5 times, and in each time the
connected by an edge and L is the total number of edges in dataset is randomly shuffled and split into the said train (70%)
the entire graph. and test (30%) sets. Then the model performance is evaluated

1530-437X (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: Aristotle University of Thessaloniki. Downloaded on November 09,2020 at 11:39:08 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JSEN.2020.3015726, IEEE Sensors
Journal
AUTHOR et al.: A NEW FRAMEWORK FOR SMARTPHONE SENSOR BASED HUMAN ACTIVITY RECOGNITION USING GRAPH NEURAL NETWORK 5

TABLE II: DISTRIBUTION OF TRAINING AND TESTING


SETS AND TIME TAKEN BY OUR MODEL FOR TRAIN-
ING AND TESTING FOR EACH DATASET
Dataset No. of No. of Time to Time to
Graph in Graph in train (sec) test (sec)
Train set Test set
MobiAct 712 306 5.3675 0.1599
WISDM 125 54 1.5330 0.0483
MHEALTH 84 36 0.7232 0.0257
PAMAP2 58 26 2.02928 0.0264
HHAR 122 53 6.8431 0.0781
USC-HAD 117 51 3.4871 0.0764

TABLE III: BEST ACCURACY ON TRAIN AND TEST


Fig. 3: Proposed GNN model applied for HAR using sensors SETS OBTAINED ON A SINGLE EXPERIMENT OUT OF
data 5 EXPERIMENTS FOR ALL 6 HAR DATASETS
Dataset Best train accuracy Best test accuracy
of single iteration from 30% data of
on the test set. We have displayed the plots of training accuracy of experiment single iteration of
and training loss with respect to each epoch for a single experiment
experiment for all datasets in Fig. 4. MobiAct 99.86% 100%
WISDM 100% 100%
MHEALTH 100% 100%
PAMAP2 100% 100%
HHAR 100% 100%
USC-HAD 100% 100%

C. Results and Discussion


Here, we show that such a simple model based on GNN is
capable of performing classification of human activities with
very high accuracy even after considering every activity (from
each subject) and every data sample. Most of the researchers
have claimed to achieve high accuracy but considered less
number of activity classes. Considering one particular experi-
ment out of total 5 experiments, where we have obtained the
best results for 70% train and 30% test datasets is shown in
Table III. We have also calculated the mean accuracy with
Fig. 4: Plot showing the variation of training accuracy and
standard deviation from all 5 experiments for both 70-30 and
training loss against epoch sizes for 6 HAR datasets used in
50-50 train and test set splits of the said datasets as shown in
our experiment. This plot is from one particular experiment of
Table IV.
our setup out of a total 5 experiments with randomly shuffled
To establish the robustness of our model for different
datasets.
datasets, where the number of activities performed varies from
minimum 6 to maximum 13, we have shown the effect of
progressive addition of training data to the model in order
A. Setting of model parameters to obtain a better classification model (see Fig. 5). We have
We have trained the model with mini batch Gradient De- considered a single experiment out of a total 5 experiments
scent Algorithm [26] with batch size of 16. Adam optimizer
[27] is used with initial learning rate of 0.01 and weight decay
of 5e-4. All the experiments are performed on a machine TABLE IV: MEAN TEST ACCURACY CONSIDERING
with Xeon Octa core processor, 8 GBGPU and 32 GB system 5 EXPERIMENTS FOR BOTH 30% AND 50% TEST
memory. DATASETS
Dataset Mean Accuracy for Mean Accuracy for
B. Model Efficiency (70-30) split (50-50) split
MobiAct 1.0 ± 0.0 0.9992 ± 0.0015
The model is trained with maximum 30 epochs for each WISDM 1.0 ± 0.0 1.0 ± 0.0
dataset. As we claim the model is very fast for training and MHEALTH 1.0 ± 0.0 0.9933 ± 0.0081
evaluating, so in Table II we display the time taken for training PAMAP2 0.9615 ± 0.0486 0.9523 ± 0.0451
and testing respectively for each dataset for a single iteration HHAR 0.9962 ± 0.0075 0.9931 ± 0.0090
of experiment. USC-HAD 1.0 ± 0.0 1.0 ± 0.0

1530-437X (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: Aristotle University of Thessaloniki. Downloaded on November 09,2020 at 11:39:08 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JSEN.2020.3015726, IEEE Sensors
Journal
6 IEEE SENSORS JOURNAL, VOL. XX, NO. XX, XXXX 2017

where we have iterated the splitting ratio for train set 10% to
90% (test is done on other 90% to 10% data accordingly) in
steps of 10% to show improvement of our model for increase
in training data. Even for imbalanced dataset where we have TABLE V: COMPARISON OF PROPOSED MODEL WITH
considered two cases,(1) when imbalance is in number of EXISTING MODELS FOR 6 HAR DATASETS
graph per class (obtained mean accuracy is 99.37%), and (2)
when imbalance is in number of nodes in the graphs of a class Dataset Author Year Methodology Accuracy(%)
(obtained mean accuracy is 99.09%). or Classifier
Used
Zheng et 2018 TASG applied 90.5

MobiAct
al.[28] with SVM
Xu et al.[29] 2019 CNN with 98.9
LSTM Model
Zerkouk et 2020 Autoencoder 98
al.[30] with CNN and
LSTM
Proposed 2020 GNN model 100
Quispe et al.[7] 2018 KNN 96.2

WISDM
Zhang et 2019 U-Net 96.4
al.[31]
Burns et al.[32] 2020 Deep triplet 91.3
embedding
Proposed 2020 GNN model 100
MHEALTH

Fig. 5: Progressive accuracies of train and test sets with Gumaei et 2019 Hybrid Deep 99.6
al.[33] Learning
increasing training data to the GNN model for 6 HAR
Model
datasets
Chen et al.[34] 2019 Recurrent 94.05
Convolutional
& Attention
D. Comparison with existing HAR methods Burns et al.[32] 2020 Deep triplet 99.9
We have also compared our methods with some recent embedding
methods. From Table V, it is evident that our model performs Proposed 2020 GNN model 100
better in comparison to existing methods which are considered
Dadkhahi et 2017 Tree structured 97
PAMAP2

as state-of-the-art for HAR on sensor data. al.[10] cascaded


network
V. CONCLUSION Priyadharshini 2019 MCDCNN 90.8
Sensor-based HAR is an active research field since the last et al.[19]
decade. However, there are certain perspectives of this HAR Zhang et 2019 HMVAN 94.4
problem which can be used to improve the framework and alter al.[35]
the way people make interactions with their smartphones. To Proposed 2020 GNN model 100
this end, we have designed a GNN based model with message
Qin et al.[36] 2019 SSUI 96.41
passing technique to use the structural information obtained
HHAR

from the time-series sensor data through graph representation. Gupta et al.[37] 2020 FECM 89.9
We have set a new benchmark result on 6 standard and publicly Qin et al.[38] 2020 GASF 96.74
available sensor-based HAR datasets. We have also contributed Concatenation
by developing a generic framework to convert time series Proposed 2020 GNN model 100
sensor data into graphical data to be used for training any GNN
USC-HAD

Mohammed et 2018 LSTM 96.31


models on HAR datasets. We have also verified robustness of al.[39]
the model on imbalanced data. For real life applications in
Tahir et al.[40] 2020 SMO 73.33
future, we can use the framework in online mode where with
new collection of sensor data, new nodes and new graph can be Singh et al.[41] 2020 Deep 94.06
ConvLSTM
created for real time prediction. Also by using transfer learning
approach, we can transfer knowledge from one data source to Proposed 2020 GNN model 100
another. Since, here we have used simple GNN model which
does not work for input graph whose embeddings cannot be
represented as injective function and even different depth and
width of the graph may affect in results, so in future, we plan

1530-437X (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: Aristotle University of Thessaloniki. Downloaded on November 09,2020 at 11:39:08 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JSEN.2020.3015726, IEEE Sensors
Journal
AUTHOR et al.: A NEW FRAMEWORK FOR SMARTPHONE SENSOR BASED HUMAN ACTIVITY RECOGNITION USING GRAPH NEURAL NETWORK 7

to use other graph models like spline convolution, attention [16] T. Mikolov, S. Kombrink, L. Burget, J. Černock, and S.
based convolution etc. Besides, we have considered only node Khudanpur, “Extensions of recurrent neural network language
feature attributes, in our future work, we will work upon edge model”, in 2011 IEEE international conference on acous-
tics, speech and signal processing (ICASSP), IEEE, 2011,
feature attributes as well. pp. 5528–5531.
[17] W. Xu, Y. Pang, Y. Yang, and Y. Liu, “Human activity
R EFERENCES recognition based on convolutional neural network”, in 2018
24th International Conference on Pattern Recognition (ICPR),
[1] V. Osmani, S. Balasubramaniam, and D. Botvich, “Human
IEEE, 2018, pp. 165–170.
activity recognition in pervasive health-care: Supporting effi-
[18] A. Murad and J.-Y. Pyun, “Deep recurrent neural networks for
cient remote collaboration”, Journal of network and computer
human activity recognition”, Sensors, vol. 17, no. 11, p. 2556,
applications, vol. 31, no. 4, pp. 628–655, 2008.
2017.
[2] Y.-L. Hsu, S.-C. Yang, H.-C. Chang, and H.-C. Lai, “Human
[19] J. M. H. Priyadharshini, S. Kavitha, and B. Bharathi, “Com-
daily and sport activity recognition using a wearable inertial
parative analysis of multilayer backpropagation and multi-
sensor network”, IEEE Access, vol. 6, pp. 31 715–31 728,
channel deep convolutional neural network for human activity
2018.
recognition”, in AIP Conference Proceedings, AIP Publishing
[3] S. Ramasamy Ramamurthy and N. Roy, “Recent trends in
LLC, vol. 2095, 2019, p. 030 014.
machine learning for human activity recognition—a survey”,
[20] L. Wang and R. Liu, “Human activity recognition based
Wiley Interdisciplinary Reviews: Data Mining and Knowledge
on wearable sensor using hierarchical deep lstm networks”,
Discovery, vol. 8, no. 4, e1254, 2018.
Circuits, Systems, and Signal Processing, vol. 39, no. 2,
[4] M. Ehatisham-ul-Haq, M. A. Azam, J. Loo, K. Shuang, S.
pp. 837–856, 2020.
Islam, U. Naeem, and Y. Amin, “Authentication of smartphone
[21] J. Zhou, G. Cui, Z. Zhang, C. Yang, Z. Liu, L. Wang, C. Li,
users based on activity recognition and mobile sensing”,
and M. Sun, “Graph neural networks: A review of methods
Sensors, vol. 17, no. 9, p. 2043, 2017.
and applications”, arXiv preprint arXiv:1812.08434, 2018.
[5] L. Shi, Y. Zhang, J. Cheng, and H. Lu, “Skeleton-based action
[22] C. Morris, M. Ritzert, M. Fey, W. L. Hamilton, J. E. Lenssen,
recognition with multi-stream adaptive graph convolutional
G. Rattan, and M. Grohe, “Weisfeiler and leman go neural:
networks”, arXiv preprint arXiv:1912.06971, 2019.
Higher-order graph neural networks”, in Proceedings of the
[6] D. Micucci, M. Mobilio, and P. Napoletano, “Unimib shar: A
AAAI Conference on Artificial Intelligence, vol. 33, 2019,
dataset for human activity recognition using acceleration data
pp. 4602–4609.
from smartphones”, Applied Sciences, vol. 7, no. 10, p. 1101,
[23] G. Vavoulas, C. Chatzaki, T. Malliotakis, M. Pediaditis, and
2017.
M. Tsiknakis, “The mobiact dataset: Recognition of activities
[7] K. G. Montero Quispe, W. Sousa Lima, D. Macêdo Batista,
of daily living using smartphones.”, in ICT4AgeingWell, 2016,
and E. Souto, “Mboss: A symbolic representation of human
pp. 143–151.
activity recognition using mobile sensors”, Sensors, vol. 18,
[24] O. Banos, R. Garcia, J. A. Holgado-Terriza, M. Damas, H.
no. 12, p. 4354, 2018.
Pomares, I. Rojas, A. Saez, and C. Villalonga, “Mhealthdroid:
[8] K. Walse, R. Dharaskar, and V. Thakare, “Performance eval-
A novel framework for agile development of mobile health
uation of classifiers on wisdm dataset for human activity
applications”, in International workshop on ambient assisted
recognition”, in Proceedings of the Second International Con-
living, Springer, 2014, pp. 91–98.
ference on Information and Communication Technology for
[25] M. Zhang and A. A. Sawchuk, “Usc-had: A daily activity
Competitive Strategies, 2016, pp. 1–7.
dataset for ubiquitous activity recognition using wearable
[9] J. R. Kwapisz, G. M. Weiss, and S. A. Moore, “Activity
sensors”, in Proceedings of the 2012 ACM Conference on
recognition using cell phone accelerometers”, ACM SigKDD
Ubiquitous Computing, 2012, pp. 1036–1043.
Explorations Newsletter, vol. 12, no. 2, pp. 74–82, 2011.
[26] S. Ruder, “An overview of gradient descent optimization
[10] H. Dadkhahi and B. M. Marlin, “Learning tree-structured
algorithms”, arXiv preprint arXiv:1609.04747, 2016.
detection cascades for heterogeneous networks of embedded
[27] D. P. Kingma and J. Ba, “Adam: A method for stochastic
devices”, in Proceedings of the 23rd ACM SIGKDD Interna-
optimization”, arXiv preprint arXiv:1412.6980, 2014.
tional Conference on Knowledge Discovery and Data Mining,
[28] Z. Zheng, J. Du, L. Sun, M. Huo, and Y. Chen, “Tasg: An
2017, pp. 1773–1781.
augmented classification method for impersonal har”, Mobile
[11] A. Reiss and D. Stricker, “Introducing a new benchmarked
Information Systems, vol. 2018, 2018.
dataset for activity monitoring”, in 2012 16th International
[29] J. Xu, Z. He, and Y. Zhang, “Cnn-lstm combined network
Symposium on Wearable Computers, IEEE, 2012, pp. 108–
for iot enabled fall detection applications”, in Journal of
109.
Physics: Conference Series, IOP Publishing, vol. 1267, 2019,
[12] Z. Jiang, Y. Zheng, H. Tan, B. Tang, and H. Zhou, “Variational
p. 012 044.
deep embedding: An unsupervised and generative approach to
[30] M. Zerkouk and B. Chikhaoui, “Spatio-temporal abnormal
clustering”, arXiv preprint arXiv:1611.05148, 2016.
behavior prediction in elderly persons using deep learning
[13] A. Stisen, H. Blunck, S. Bhattacharya, T. S. Prentow, M. B.
models”, Sensors, vol. 20, no. 8, p. 2359, 2020.
Kjærgaard, A. Dey, T. Sonne, and M. M. Jensen, “Smart
[31] Y. Zhang, Z. Zhang, Y. Zhang, J. Bao, Y. Zhang, and H. Deng,
devices are different: Assessing and mitigatingmobile sensing
“Human activity recognition based on motion sensor using u-
heterogeneities for activity recognition”, in Proceedings of the
net”, IEEE Access, vol. 7, pp. 75 213–75 226, 2019.
13th ACM conference on embedded networked sensor systems,
[32] D. M. Burns and C. M. Whyne, “Personalized activity
2015, pp. 127–140.
recognition with deep triplet embeddings”, arXiv preprint
[14] O. Politi, I. Mporas, and V. Megalooikonomou, “Human
arXiv:2001.05517, 2020.
motion detection in daily activity tasks using wearable sen-
[33] A. Gumaei, M. M. Hassan, A. Alelaiwi, and H. Alsalman, “A
sors”, in 2014 22nd European signal processing conference
hybrid deep learning model for human activity recognition
(EUSIPCO), IEEE, 2014, pp. 2315–2319.
using multimodal body sensing data”, IEEE Access, vol. 7,
[15] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet
pp. 99 152–99 160, 2019.
classification with deep convolutional neural networks”, in
[34] K. Chen, L. Yao, D. Zhang, X. Wang, X. Chang, and F. Nie,
Advances in neural information processing systems, 2012,
“A semisupervised recurrent convolutional attention model
pp. 1097–1105.
for human activity recognition”, IEEE transactions on neural

1530-437X (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: Aristotle University of Thessaloniki. Downloaded on November 09,2020 at 11:39:08 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JSEN.2020.3015726, IEEE Sensors
Journal
8 IEEE SENSORS JOURNAL, VOL. XX, NO. XX, XXXX 2017

networks and learning systems, vol. 31, no. 5, pp. 1747–1756, Pawan Kumar Singh ( email: pawankrs-
2019. ingh@ieee.org ) received his B. Tech degree in
[35] X. Zhang, Y. Wong, M. S. Kankanhalli, and W. Geng, “Hier- Information Technology from West Bengal Uni-
archical multi-view aggregation network for sensor-based hu- versity of Technology in 2010. He received his M.
man activity recognition”, Plos one, vol. 14, no. 9, e0221390, Tech in Computer Science and Engineering and
Ph.D. (Engineering) degrees from Jadavpur Uni-
2019. versity (J.U.) in 2013 and 2018 respectively. He
[36] Z. Qin, L. Hu, N. Zhang, D. Chen, K. Zhang, Z. Qin, also received the RUSA 2.0 fellowship for pursu-
and K.-K. R. Choo, “Learning-aided user identification using ing his post-doctoral research in J.U. in 2019. He
smartphone sensors for smart homes”, IEEE Internet of Things is currently working as an Assistant Professor
Journal, vol. 6, no. 5, pp. 7760–7772, 2019. in the Department of Information Technology in
[37] A. Gupta, H. P. Gupta, B. Biswas, and T. Dutta, “A fault- J.U. He has published more than 50 research papers in peer-reviewed
tolerant early classification approach for human activities journals and international conferences. His areas of current research in-
using multivariate time series”, IEEE Transactions on Mobile terest are Computer Vision, Pattern Recognition, Handwritten Document
Computing, 2020. Analysis, Image and Video Processing, Machine Learning and Artificial
Intelligence. He is a member of the IEEE (U.S.A.), The Institution of
[38] Z. Qin, Y. Zhang, S. Meng, Z. Qin, and K.-K. R. Choo, Engineers (India) and Association for Computing Machinery (ACM) as
“Imaging and fusing time series for wearable sensor-based well as a life member of the Indian Society for Technical Education
human activity recognition”, Information Fusion, vol. 53, (ISTE, New Delhi) and Computer Society of India (CSI).
pp. 80–87, 2020.
[39] S. Ashry, R. Elbasiony, and W. Gomaa, “An lstm-based
descriptor for human activities recognition using imu sensors”,
in Proceedings of the 15th International Conference on Infor-
matics in Control, Automation and Robotics, ICINCO, vol. 1,
2018, pp. 494–501.
[40] S. B. ud din Tahir, A. Jalal, and M. Batool, “Wearable sensors
for activity analysis using smo-based random forest over smart
home and sports datasets”, in 2020 3rd International Confer-
ence on Advancements in Computational Sciences (ICACS),
IEEE, 2020, pp. 1–6. Vikrant Bhateja
[41] S. P. Singh, A. Lay-Ekuakille, D. Gangwar, M. K. Sharma, (email:bhateja.vikrant@ieee.org) is working as
and S. Gupta, “Deep convlstm with self-attention for hu- an Associate Professor, Department of ECE
man activity decoding using wearables”, arXiv preprint in SRMGPC, Lucknow. His areas of research
arXiv:2005.00698, 2020. include digital image and video processing,
computer vision, medical imaging, machine
learning, pattern analysis and recognition.
He has around 155 quality publications in
various international journals and conference
proceedings. He is associate editor of IJSE and
IJACI. He has edited more than 22 volumes of
conference proceedings with Springer Nature and is presently EiC of
IGI Global: IJNCR journal.
Riktim Mondal recieved his B.E degree from
Department of Computer Science and Engineer-
ing, Jadavpur University. His areas of interest
are Computer Vision, Pattern Recognition, Natu-
ral Language Processing and Human Computer
Interface.

Ram Sarkar (email:rsarkar@ieee.org) received


his B. Tech degree in Computer Science and
Engineering from University of Calcutta in 2003.
He received his M.E. degree in Computer Sci-
ence and Engineering and PhD (Engineering)
degree from Jadavpur University in 2005 and
2012 respectively. He joined in the department of
Computer Science and Engineering of Jadavpur
Debadyuti Mukherjee is currently an under- University as an Assistant Professor in 2008,
graduate student in Department of Computer where he is now working as an Associate Pro-
Science and Engineering, Jadavpur Univer- fessor. He received Fulbright-Nehru Fellowship
sity.His areas of interest are Computer Vision, (USIEF) for post-doctoral research in University of Maryland, College
Pattern Recognition and Machine learning. Park, USA in 2014-15. His areas of current research interest are Image
Processing, Pattern Recognition, and Machine Learning. He is a senior
member of the IEEE, U.S.A.

1530-437X (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: Aristotle University of Thessaloniki. Downloaded on November 09,2020 at 11:39:08 UTC from IEEE Xplore. Restrictions apply.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy