Expert Systems With Applications: Review

Expert Systems With Applications 105 (2018) 233–261
Contents lists available at ScienceDirect
Expert Systems With Applications

journal homepage: www.elsevier.com/locate/eswa
Review
Deep learning algorithms for human activity recognition using mobile

and wearable sensor networks: State of the art and research
challenges
Henry Friday Nweke a,b, Ying Wah Teh a,∗, Mohammed Ali Al-garadi a, Uzoma Rita Alo b
a
Department of Information Systems, Faculty of Computer Science and Information Technology, University of Malaya, Kuala Lumpur 50603, Malaysia
b
Computer Science Department, Ebonyi State University, Abakaliki, Ebonyi State P.M.B 053, Nigeria
a r t i c l e i n f o a b s t r a c t
Article history: Human activity recognition systems are developed as part of a framework to enable continuous moni-
Received 29 May 2017 toring of human behaviours in the area of ambient assisted living, sports injury detection, elderly care,
Revised 26 March 2018
rehabilitation, and entertainment and surveillance in smart home environments. The extraction of rel-
Accepted 27 March 2018
evant features is the most challenging part of the mobile and wearable sensor-based human activity
Available online 1 April 2018
recognition pipeline. Feature extraction influences the algorithm performance and reduces computation
Keywords: time and complexity. However, current human activity recognition relies on handcrafted features that are
Deep learning incapable of handling complex activities especially with the current influx of multimodal and high di-
Mobile and wearable sensors mensional sensor data. With the emergence of deep learning and increased computation powers, deep
Human activity recognition learning and artificial intelligence methods are being adopted for automatic feature learning in diverse
Feature representation areas like health, image classification, and recently, for feature extraction and classification of simple and
Review
complex human activity recognition in mobile and wearable sensors. Furthermore, the fusion of mo-
bile or wearable sensors and deep learning methods for feature learning provide diversity, offers higher
generalisation, and tackles challenging issues in human activity recognition. The focus of this review is
to provide in-depth summaries of deep learning methods for mobile and wearable sensor-based human
activity recognition. The review presents the methods, uniqueness, advantages and their limitations. We
not only categorise the studies into generative, discriminative and hybrid methods but also highlight their
important advantages. Furthermore, the review presents classification and evaluation procedures and dis-
cusses publicly available datasets for mobile sensor human activity recognition. Finally, we outline and
explain some challenges to open research problems that require further research and improvements.
© 2018 Elsevier Ltd. All rights reserved.
1. Introduction actions of users using different multimodal data generated by vari-

ety of sensors (Cao, Wang, Zhang, Jin, & Vasilakos, 2017; Ordonez &
Human activity recognition is an important area of research Roggen, 2016). Previous studies in human activity recognition can
in ubiquitous computing, human behaviour analysis and human- be broadly categorised based on diverse devices, sensor modali-
computer interaction. Research in these areas employ different ma- ties and data utilised for detection of activity details. These in-
chine learning algorithms to recognise simple and complex activi- clude video based, wearable and mobile phone sensors, social net-
ties such as walking, running, cooking, etc. Particularly, recognition work sensors and wireless signals. Video-based sensors are utilised
of daily activities is essential for maintaining healthy lifestyle, pa- to capture images, video or surveillance camera features to recog-
tient rehabilitation and activity shifts among the elderly citizens nise daily activity (Cichy, Khosla, Pantazis, Torralba, & Oliva, 2016;
that can help to detect and diagnose serious illnesses. Therefore, Onofri, Soda, Pechenizkiy, & Iannello, 2016). With the introduction
human activity recognition framework provides mechanism to de- of mobile phones and other wearable sensors, inertial sensor data
tect both postural and ambulatory activities, body movements and (Bhattacharya & Lane, 2016; Bulling, Blanke, & Schiele, 2014b) are
collected using mobile or wearable embedded sensors placed at
different body positions in order to infer human activities details
∗
Corresponding author. and transportation modes. Alternatively, the use of social network
E-mail addresses: henrynweke@siswa.um.edu.my (H.F. Nweke), methods (Y. Jia et al., 2016) that exploit appropriate users’ informa-
tehyw@um.edu.my (Y.W. Teh), mohammedali@siswa.um.edu.my (M.A. Al-garadi), tion from multiple social network sources to understand user be-
auzomarita@yahoo.com (U.R. Alo).
https://doi.org/10.1016/j.eswa.2018.03.056
0957-4174/© 2018 Elsevier Ltd. All rights reserved.
234 H.F. Nweke et al. / Expert Systems With Applications 105 (2018) 233–261
haviour and interest have also been proposed recently. In addition, finally classification of activity details. Pre-processing involves the
wireless signal based human activity recognition (Savazzi, Rampa, removal and representation of the raw sensor data. Different meth-
Vicentini, & Giussani, 2016) takes advantages of signal propagated ods such as nonlinear, low pass and high pass filter, and Laplacian
by the wireless devices to categorise human activity. However, the and Gaussian filter have been utilised for pre-processing. The seg-
use of sensor data generated using smartphones and other wear- mentation procedure divides the signal into different window sizes
able devices have dominated the research landscape in human mo- to extract useful features. Generally, sensor data segmentation is
tion analysis, activity monitoring and detection due to their ob- achieved using methods ranging from sliding windows, events or
vious advantages over other sensor modalities (Cornacchia, Ozcan, energy based activities (Bulling, Blanke, & Schiele, 2014a). Next,
Zheng, & Velipasalar, 2017). relevant feature vectors are extracted from the segmented data
Generally, mobile phones and wearable based sensors for hu- to determine lower set of features to minimise classification er-
man activity identification are driven by their ubiquity, unobtru- rors and reduce computation time. In addition, the extracted fea-
siveness, cheap installation procedure and ease of usability. Mobile tures are often further reduced through feature selection methods
phones have become part of our daily life and can be found in ev- to the most discriminative features for recognition tasks. Feature
ery homes and carried everywhere we go. In this context, mobile vectors for human activity recognition can be broadly categorised
phones and wearable sensors are popular alternative methods of into statistical and structural features (Bulling et al., 2014a; Figo,
inferring activity details. For instance, while the video sensor ex- Diniz, Ferreira, Jo, et al., 2010). Statistical features (mean, median,
tract features such as the Histogram of Oriented Gradient (HOG), time domain, frequency domain, standard deviation, etc.) extract
Spatio-temporal interest Point (STIP) and Region of Interest (ROI), quantitative properties of sensor data while structural features use
mobile sensors utilise statistical and frequency based features to the relationship among the mobile sensor data for feature extrac-
recognise activity details. Statistical features provide less computation. Likewise, dimensionality reduction reduces the dimension of
tion time and complexity (Figo, Diniz, Ferreira, & Cardoso, 2010). the extracted features to decrease the computational time. The di-
Furthermore, vision based techniques intrude on user privacy, re- mensionality reductions widely used in human activity recognition
quire fixed location implementations and capture non-target infor- are principal component analysis (PCA), linear discriminate anal-
mation (Yang, Nguyen, San, Li, & Krishnaswamy, 2015). In addi- ysis (LDA) and empirical cumulative distribution functions (ECDF)
tion, video sensors based human activity recognition are affected (Abidine, Fergani, Fergani, & Oussalah, 2016). The activity recogni-
by lighting variability leading to decrease in performances due to tion and classification phases help to map extracted features into
visual disturbanes (Wang, 2016). On the other hand, mobile and sets of activities using machine learning or pattern recognition
wearable sensor-based methods provide better advantages for real- methods (Bulling et al., 2014b). Large varieties of machine learning
time implementation of human activity recognition systems. More- techniques have played prominent roles in inferring activity details.
over, mobile phone and wearable devices are not location depen- These include the Support Vector Machine (Anguita, Ghio, Oneto,
dents, cost effective, easy to deploy and do not pose any health Parra, & Reyes-Ortiz, 2012; Kim & Ling, 2009), Hidden Markov
hazard caused by radiation (Alsheikh et al., 2015) unlike wireless Model (Safi, Mohammed, Attal, Khalil, & Amirat, 2016), Decision
signals based method. Considering the obvious advantages of mo- Tree, K-Nearest Neighbour (KNN) (Shoaib, Bosch, Incel, Scholten,
bile and wearable sensor based implementation of human activity, & Havinga, 2016) and Gaussian Mixture Model (Rodriguez, Or-
number of studies have been proposed by leveraging on the data rite, Medrano, & Makris, 2016). Studies by Bulling et al. (2014b),
generated using these devices (Morales & Akopian, 2017). Incel, Kose, and Ersoy (2013) and Pires, Garcia, Pombo, and Flórez-
The explosion of smartphones era embedded with multi-sensor Revuelta, 2016) provide excellent information on the human activ-
systems that enable researchers to collect human physiological sig- ity recognition process using handcrafted features with mobile and
nal for monitoring of activity of daily living, have made human wearable sensor data.
motion analysis integral part of our daily life. Smartphones pro- Recently, to overcome the challenges associated with single
vide access to wide range of sensor such as accelerometer, gy- sensor modalities and increase generalization, many studies have
roscope, magnetometer, Bluetooth, Wi-Fi, microphones, proximity proposed information fusion strategies that combine multiple sen-
and light sensor and cellular radio sensors that can be exploited sors modalities or classifiers to increase robustness, reliabilities,
to infer activity details. Sensors such as accelerometer, gyroscope, derive confidence measures among different classifiers and re-
magnetometer, heart rate, GPS can be deployed for coarse grain duce the complexity of recognition system (Pires et al., 2016). In-
and context activity recognition, user location and social inter- formation fusion in human activity recognition are necessitated
action between users. Motion sensors (Accelerometers, gyroscope by increase in sensor of different modalities (Gravina, Alinia,
magnetometer) provide important information that facilitate recog- Ghasemzadeh, & Fortino, 2017). Information fusion techniques
nition and monitoring of users’ movement such as walking, stand- are prevalent in both handcrafted features and automatic feature
ing or running. Similarly, proximity and light sensors embedded learning using deep learning (Habib, Makhoul, Darazi, & Couturier,
in mobile devices to enhance user experiences can also be de- 2016; Shoaib, Bosch, Incel, Scholten, & Havinga, 2014; Zhu & Sheng,
ployed to determine whether the user is in light or dark places 20 09; Zouba et al., 20 09). In this review, recent works on informa-
(Incel, 2015). Other sensors such as barometers, thermometers, tion fusion for human activity recognition using automatic feature
air humidity and pedometers have also been applied to maintain representation were also analysed.
healthy status of elderly citizens and for assisted living (Gong, Cui, Of all the different phases of human activity recognition
Xiao, & Wang, 2012). For instance, the pedometer found in the framework, feature extraction is the most important stage
Samsung Galaxy smartphones and exercises tracking wearable de- (Domingos, 2012). This is because of the correlation between per-
vices are essential for step counts, heart rate and pulse monitoring. formances of activity recognition system and extraction of relevant
These are effective for important health conditions identifications and discriminative feature vectors. Therefore, extensive works have
which may interfere with user activities (Kanaris, Kokkinis, Liotta, been done on how to improve human activity recognition system
& Stavrou, 2017; Natarajasivan & Govindarajan, 2016; Zouba, Bre- through extraction of expert-driven features (Figo, Diniz, Ferreira,
mond, & Thonnat, 2009). Jo, et al., 2010). However, expert-driven features extraction meth-
In human activity recognition, data collection with varieties of ods depend on the knowledge of the experts or guess and applica-
sensors installed in mobile phone and wearable devices is pre- bility of the feature vectors in the problem domains. Even though,
ceded by other data analytic phases such as pre-processing, data conventional handcrafted features learning methods are easy to
segmentation, extraction of salient and discriminative features, and understand and have been widely utilised for activity recognition,
H.F. Nweke et al. / Expert Systems With Applications 105 (2018) 233–261 235
feature vectors extracted using such techniques are tasks or ap- levels hierarchically. It has become a critical research area in im-
plications dependent, and cannot be transferred to similar activ- age and object recognition, natural language processing, machine
ity tasks. Furthermore, hand-engineered features cannot represent translation and environmental monitoring (Y. Guo et al., 2016).
the salient characteristics of complex activities, and involve time- More recently, various deep learning methods have been proposed
consuming feature selection techniques to select the optimal fea- for mobile and wearable sensor based human activity recogni-
tures (Yang et al., 2015). Also, there are no universal procedures tion. These methods include restricted Boltzmann machine, au-
for selecting appropriate features leading to many studies resort to toencoder, sparse coding, convolutional neural network and recur-
heuristic means using feature engineering knowledge approach. In rent neural network. These deep learning methods can be stacked
the nutshell, the major challenges of conventional handcrafted fea- into different layers to form deep learning models that provide en-
tures for mobile and wearable sensor based human activity recog- hanced system performance, flexibility, robustness and remove the
nition are summarised below: need to depend on conventional handcrafted features. The essence
of this study is to review different human activity recognition and
• Feature representation techniques in current human activity
health monitoring systems in mobile and wearable sensors that
recognition approaches for mobile and wearable sensors use
utilise deep neural network for feature representations. We pro-
carefully engineered feature extraction and selections methods
vide an extensive review of the recent developments in the field
that are manually extracted using expert domain knowledge.
of human activity recognition for mobile and wearable sensors us-
However, such feature extraction approach are task or applica-
ing deep learning. Specifically, we present comprehensive review
tions dependent and cannot be transferred to activity of similar
of deep learning methods; taxonomy of the recent studies in deep
patterns. Furthermore, carefully engineered features vectors are
learning based activity recognition, their advantages, training pro-
challenging to model complex activity details and involve time
cedure and popular deep learning software frameworks. Based on
consuming feature selections (Ronao & Cho, 2016; Yang et al.,
the reviewed papers, open research issues were derived, and future
2015);
research directions are suggested.
• There are no universal procedures for selecting appropriate fea-
Deep learning and human activity recognition or activity of
tures but many studies resort to extensive heuristic knowledge
daily living as a separate research areas have been progressive ar-
to develop and select appropriate tasks for a given human ac-
eas for years. A good number of surveys and reviews have been
tivity recognition system (Zdravevski et al., 2017);
published. However, these reviews either focus on deep learning
• Moreover, the current statistical features such as time or fre-
and their applications or activity recognition using conventional
quency domain features for human activity recognition are un-
features learning methods. Furthermore, these reviews have be-
able to model and support the dynamic nature of the cur-
come outdated and require urgent research to analyse the high
rent seamless and ubiquitous collection of mobile and wearable
volume of papers published in the area lately. In deep learn-
senor streams (Hasan & Roy-Chowdhury, 2015);
ing methods, reviews by Angermueller, Parnamaa, Parts, and Ste-
• Also, human activity recognition using expert driven features
gle (2016), Benuwa, Zhan, Ghansah, Wornyo, and Kataka (2016),
require large amount of labelled training sensor data to obtain
Dolmans, Loyens, Marcq, and Gijbels (2016), Gawehn, Hiss, and
accurate recognition performance. The experimental protocol to
Schneider (2016), LeCun et al. (2015), W. Liu, Ma, Qi, Zhao,
collect large amount of labelled training data require extensive
and Chen (2017), W. Liu et al. (2016), Mamoshina, Vieira, Putin,
infrastructural setup that are time consuming. On the contrary,
and Zhavoronkov (2016), Ravì, Wong, Deligianni, et al. (2017),
unlabelled data are easy to obtain leveraging Internet of Things
Schmidhuber (2015) provide comprehensive knowledge of the
(IoT), smart homes and mobile crowdsourcing from transporta-
development and historical perspective. While studies such as
tion modes (Song-Mi, Sang Min, & Heeryon, 2017);
(Ahmad, Saeed, Saleem, & Kamboh, 2016; Attal et al., 2015; Bulling
• Other challenges of handcrafted features are the issues both-
et al., 2014b; Cornacchia et al., 2017; Gravina, Alinia, et al., 2017;
ering on intra-class variability and inter-class similarities
O. D. Incel et al., 2013; Kumari, Mathew, & Syal, 2017; Onofri
(Bulling et al., 2014b). In this case, same activities may be per-
et al., 2016; Pires et al., 2016; Turaga, Chellappa, Subrahmanian,
formed differently by different individuals or different activities
& Udrea, 2008) discussed the human activity and action recogni-
appear to have same pattern of executions. Developing generic
tion based on handcrafted features, sensor fusion techniques to in-
expert driven features that can accurately model these issues
crease the robustness of recognition algorithms and developmen-
are challenging;
tal trends on wearable sensors for the collection of activity data.
• Furthermore, human activities are hierarchical and inherently
Others presented the use of handcrafted and deep learning based
translational in nature with ambiguity in temporal segmenta-
features for human activity recognition in video sensor and images
tion of sub-activities that constitute the main activity. There-
(Aggarwal & Xia, 2014; Sargano, Angelov, & Habib, 2017; Xu et al.,
fore, capturing spatial and temporal variation of activities are
2013; F. Zhu, Shao, Xie, & Fang, 2016). Recently, authors (Gamboa,
important for accurate detection of complex activity details
2017; Langkvist, Karlsson, & Loutfi, 2014) reviewed deep learning
(Kautz et al., 2017);
for time series analysis; another closely related area in human ac-
• To achieve diversity and robust features for human activity
tivity recognition. However, the author took a broader view on the
recognition performance generalisation across heterogeneous
applications of deep learning in time series that comprises speech
domain, approaches such as multimodal fusion and decision fu-
recognition, sleep stage classification and anomaly detection but
sion are utilised. However, there still exist, uncertainties on the
this review focused on deep learning based human activity recog-
best fusion techniques to achieve higher generalisation with re-
nition using sensor data generated by mobile or wearable devices.
duced computation time for mobile and wearable sensor imple-
From the available literature, there are no studies on review or sur-
mentation.
vey of deep learning based feature representation and extraction
To solve the above problems, studies have delved into tech- for mobile and wearable sensors based on human activity recogni-
niques that involve automatic features extraction with less hu- tion. To fill this gap, this review is a timely exploration of the pro-
man efforts (LeCun, Bengio, & Hinton, 2015) using deep learning cesses for developing deep learning based human activity recogni-
techniques. Deep learning, a new branch of machine learning that tion and provide in-depth tutorial on the techniques, implementa-
models high-level features in data, has become an important trend tion procedure and feature learning process.
in human activity recognition. Deep learning comprises multiple The remainder of this paper is organised as follows:
layers of neural networks that represent features from low to high Section 2 discusses Comparison of deep learning feature rep-
resentation and conventional handcrafted feature learning ap- sionality reduction commonly used. For instance, principal com-
proach. Section 3 discusses the deep learning methods and their ponent analysis (PCA) treat each dimensionality as statistically in-
subdivisions. Section 4 review different representative studies in dependent and extract features based on sensor appearance, but
deep learning for human activity recognition using mobile and activities are performed based on activity windows, and this have
wearable sensors. The section is subdivided into generative feature been found to affect recognition accuracy (Plötz et al., 2011).
extraction techniques such as Deep Belief Network (DBN), Deep Clearly, there is need for appropriate techniques to extract dis-
Boltzmann Machine (DBM), sparse coding, and discriminative criminative features to achieve optimal performance accuracy. Re-
feature extraction with Convolutional Neural Network (CNN), Re- cent studies in human activity recognition have observed there
current Neural Network (RNN) and hybrid methods that combine are no universally best discriminative feature that accurately rep-
generative and discriminative deep learning methods. The descrip- resent across dataset and applications (Capela, Lemaire, & Bad-
tion, advantages and weakness of these studies are also discussed dour, 2015). Therefore, automatic feature representations are re-
in details. Section 5 discusses the training procedure, classification quired to enable extraction of translational invariant feature vec-
and evaluation of deep learning for human activity recognition. tors without reliance on domain expert knowledge. Deep learning
Section 6 reviews common benchmark datasets for human activity methods for automatic feature representation provide the ability
recognition using deep learning. Section 7 includes the software to learn features from raw sensor data with little pre-processing
frameworks for implementation of deep learning algorithms. (LeCun et al., 2015). Using multiple layer of abstraction, deep learn-
Section 8 provides the open research challenges requiring fur- ing methods learn intricate features representation from raw sen-
ther improvements and attention while Section 9 concludes the sor data and discover the best pattern to improve recognition per-
review. formance. Recently, studies have indicated the incredible results of
deep learning over conventional handcrafted features for human
2. Comparison of deep learning feature representation and activity recognition (Ordóñez & Roggen, 2016; S. Yao, Hu, Zhao,
conventional feature learning Zhang, & Abdelzaher, 2017). Also, the use of automatic feature rep-
resentation helps to capture local dependencies and scale invari-
Feature extraction is a vital part of the human activity recog- ants features. Thus, deep learning provide effective means to solve
nition process as it helps to identify lower sets of features from the problem of intra-class variabilities and inter-class similarities
input sensor data to minimise classification errors and computa- that are fundamental challenges for implementing human activity
tional complexity. Effective performance of Human activity recog- recognition with handcrafted features (Bulling et al., 2014b). Fur-
nition system depends on appropriate and efficient feature repre- thermore, deep learning methods apply unsupervised pre-training
sentation (Abidine et al., 2016). Therefore, extraction of efficient to learn structure of high dimensional sensor data to prevent over-
feature vectors from mobile and wearable sensor data helps to re- fitting. With the current influx of unlabelled sensor streams from
duce computation time and provide accurate recognition perfor- Internet of Things (IoT), crowdsourcing and cyber-physical sys-
mance. Feature extraction can be performed manually or automat- tems, implementing efficient human activity recognition would be
ically based on expert knowledge. Manually engineered features very challenging without automatic feature representation from
follow bottom-up approaches that consist of data collection, sig- raw sensor data (Gravina et al., 2017). In Table 1, we summarised
nal pre-processing and segmentation, handcrafted features extrac- the comparison of the two approaches in terms of strengths and
tion and selection, and classification. Manually engineered feature weaknesses for mobile and wearable sensor based human activ-
processes utilise appropriate domain knowledge and expert-driven ity recognition. The comparisons are summarised using five char-
approach to extract time domain, frequency domain and Hulbert- acteristics. These include feature representation method, generali-
Huang features using Empirical mode decomposition to represent sation, data preparation, changes in activity details and execution
signal details (Z. L. Wang, Wu, Chen, Ghoneim, & Hossain, 2016; time.
Zdravevski et al., 2017). Then, appropriate feature selection meth-
ods such as Minimal Redundancy Maximal Relevance, correlation 3. Automatic feature extraction using deep learning methods
based features selection method and RELIEF F are employed to re-
duce computation time and memory usage due to inability of mo- Deep learning as a machine learning method and artificial in-
bile and wearable devices to support computational intensive ap- telligence techniques for feature extraction has come a long way
plications (Bulling et al., 2014b). Also, data dimensionality reduc- since its resurgence in 2006 with the work of Hinton, Osindero,
tion approach such Principal Component analysis (PCA), Linear Dis- and Teh (2006). The upsurge in deep learning research is fuelled
criminative analysis (LDA), Independent Component analysis (ICA) by its ability to extract salient features from raw sensor data with-
and Empirical Cumulative Distribution Function (ECDF) (Abidine out relying on laboriously handcrafted features. Furthermore, in
et al., 2016; Plötz, Hammerla, & Olivier, 2011) are utilised to fur- the area of human activity recognition, for instance, complex hu-
ther reduce features dimensionality and produce compact feature man activities are translational invariant and hierarchical in nature,
vectors representations. and the same activities can be performed in different ways by the
However, it is very challenging to measure the efficient perfor- same participants. In some cases, activities can be a starting point
mances of manually engineered features across different applica- for other complex activities; running and jogging might not be dis-
tions and also require time consuming features selection and di- tinguishable depending on the age and health condition for the
mensionality reduction methods specified above to obtain accept- person performing the activity.
able results (X. Li et al., 2017; Ronao & Cho, 2016). Moreover, the Deep learning (Bengio, 2009; Hinton et al., 2006; Hollensen
use of feature selection are often arbitrary and lacks generalizabil- & Trappenberg, 2015) is a machine learning technique that uses
ity or ability to model complex activity details. It is highly ac- representational learning to discover feature representation in
knowledged that activity in natural environments are abstracts, hi- raw sensor data automatically. Unlike classical machine learn-
erarchical and translational in nature with temporal and spatial ing (support vector machine, k-nearest neighbour, k-mean, etc.)
information (X. Li et al., 2017). In order to consider these mobile that require a human engineered feature to perform optimally
and wearable sensor data characteristics for human activity recog- (LeCun et al., 2015). Over the years, deep learning has provided
nition, require intensive feature extraction and selection especially extensive applications in image recognition (Szegedy et al., 2015),
for continuous sensor streams (Ordóñez & Roggen, 2016). Another speech recognition (G. Hinton et al., 2012), medicine and pharmacy
pertinent issues with handcrafted features are based on the dimen- (J. Ma, Sheridan, Liaw, Dahl, & Svetnik, 2015), natural language pro-
Table 1
Comparison of deep learning feature representation and conventional feature learning.
Characteristics Deep learning based feature representation Conventional feature learning approach
Feature extraction and Representation Ability to learn features from raw sensors data and Use manually engineered feature vectors that are
discover the most efficient patterns to improve applications dependent, and unable to model complex
recognition accuracy activity details
Generalisation and Diversity Helps to automatically capture spatial, temporal Require labelled sensor data and use arbitrary feature
dependencies and scale invariant features from selection, and dimensionality reduction approaches that
unlabelled raw sensor data are hardly generalizable
Data preparations Data pre-processing and normalisation is not compulsory Extract features based on sensor appearance but activities
in deep learning features to obtain improved results are performed within activity windows. Furthermore,
manually engineered features require extensive data
pre-processing and normalization to produce improved
results
Temporal and Spatial changes in The use of hierarchical and translational invariant features Handcrafted features are inefficient at handling inter-class
Activities helps to solve the problem of intra-class variabilities and variabilities and inter-class similarities.
inter-class similarities inherent in handcrafted features.
Model Training and Execution time Require large amount of sensor dataset to avoid overfitting Require small training data with less computation time
and high computation intensive system, therefore require and memory usage.
Graphical Processing Unit (GPU) to speed up training
Fig. 1. Different architecture of deep learning algorithms.
cessing (Bordes, Chopra, & Weston, 2014; Sutskever, Vinyals, & Le, Sutskever, & Salakhutdinov, 2012). Restricted Boltzmann Machine is
2014) and recently in human activity recognition (Y. Q. Chen, Xue, composed of the visible unit and hidden units that are restricted to
& Ieee, 2015; L. Lin et al., 2016; Rahhal et al., 2016; Ronao and Cho, form bipartite graph for effective algorithm implementation. There-
2016; Vollmer, Gross, & Eggert, 2013a). fore, weights connecting the neurons between visible units and
Extensive number of deep learning methods (LeCun et al., 2015; hidden units are conditionally independent without visible-visible
Schmidhuber, 2015) have been proposed recently, and these meth- or hidden-hidden connections. To provide efficient feature extrac-
ods can be broadly classified into Restricted Boltzmann Machine, tion, several RBMs are stacked to form visible to hidden units,
Deep Autoencoder, Sparse Coding, Convolutional Neural Network and the top layers are fully connected or embedded with classi-
and Recurrent Neural Networks (Fig. 1). These methods are re- cal machine learning to discriminate features vectors (Fischer &
viewed in the subsection below, outlining the characteristics, ad- Igel, 2014). Although, issues like inactive hidden neuron, class vari-
vantages and drawbacks of each method. ation, intensity and sensitivity to larger dataset make training RBM
difficult. Recently, methods such as regularisation using noisy rec-
tified linear unit (Nair & Hinton, 2010) and temperature based Re-
3.1. Restricted Boltzmann Machine stricted Boltzmann Machine (G. Li et al., 2016) have been proposed
to resovle the issue. Restricted Boltzmann Machine has been exten-
Restricted Boltzmann Machine (Fischer & Igel, 2014; Hinton & sively studied in feature extraction and dimensionality reduction
Sejnowski, 1986) is a generative model that serves as a building (G. E. Hinton & Salakhutdinov, 2006), modelling high dimensional
block in greedy layer by layer feature learning and training of deep data in video and motion sensors (Taylor, Hinton, & Roweis, 2007),
neural network. The model is trained with contrastive divergence movie rating (Salakhutdinov, Mnih, & Hinton, 2007) and speech
(CD) to provide unbiased estimates of maximum likelihood learn- recognition (Mohamed & Hinton, 2010). Two well know Restricted
ing. However, Restricted Boltzmann Machine is difficult to converge Boltzmann Machine methods in literature are Deep Belief Network
to local minimal and variant of data representation. Furthermore, it and Deep Boltzmann Machine (See Fig. 2).
is challenging to know how automatic adaptation parameters set- Deep Belief Network (Hinton et al., 2006) is a deep learning al-
tings such as learning rate, weight decay, momentum, the size of gorithm trained in a greedy-wise layer manner by stacking sev-
mini-batch and sparsity can be specified to achieve optimal re- eral Restricted Boltzmann to extract hierarchical features from raw
sults (Cho, Raiko, & Ihler, 2011; G. E. Hinton, Srivastava, Krizhevsky,
Fig. 2. Representation of restricted Boltzmann machine: (a) Deep belief network (b) Deep Boltzmann machine.
sensor data. Deep Belief Network has directed connection between

the lower layer and undirected connection at the top layer that
helps to model observed distribution between the vectors space
and hidden layers. Likewise, training involves layer by layer at a
time with weight fine-tuning using contrastive convergence (CD).
Then, the conditional probability distribution of the data is com-
puted to learn robust features that are invariant to transformation,
noise and displacement (G. E. Hinton et al., 2006).
Deep Boltzmann Machine (DBM) (Salakhutdinov & Hinton, 2009;
Salakhutdinov & Larochelle, 2010) is a generative model with sev-
eral hidden layers in undirected connection in the entire network
layers. DBM hierarchically learns features from data in which fea-
tures learned in the first layer are used as latent variables in
the next layer. Similar to deep belief network (DBN), Deep Boltz-
mann machine deploys Markov random field for layer by layer
Fig. 3. Deep Autoencoder encoding and decoding process.
pre-training of massive unlabelled data and provide feedback us-
ing bottom-up pass approach. Furthermore, the algorithm is fined
through back propagation approach. Fine-tuning allows variation to provide a lower dimensional discriminative feature for recogni-
inference and the algorithm to be deployed in specific classification tion of activities with reduced computation time (Ravì, Wong, Deli-
or activity recognition task. Training RBM (Salakhutdinov & Hin- gianni, et al., 2017). Moreover, autoencoder algorithm uses multi-
ton, 2012; Salakhutdinov & Larochelle, 2010) involves maximising ple layer of encoder units to transform high dimensional data into
the lower bound of likelihood with stochastic maximum likelihood the low dimensional feature vectors. Autoencoder algorithm is pre-
algorithms (Younes, 1999). In this case, training strategies need to trained using restricted Boltzmann machine due to its complexity
adopt a way to determine the training statistics, weight initializa- (Hinton & Salakhutdinov, 2006) and then obtains higher feature
tion and update after each mini-batch by replacing stochastic bi- representations by stacking several level of autoencoder algorithms
nary values with deterministic real probabilities. The major draw- (Zhang, Shan, Kan, & Chen, 2014). Generally, different variations of
back that has been observed in DBM is the time complexity with autoencoder have been proposed to ensure robust features repre-
higher optimisation parameters. In Montavon and Müller (2012), sentation for machine learning applications. These include denois-
a centring optimisation method was proposed for stable learning ing autoencoder, sparse autoencoder and contractive autoencoder.
algorithms and Midsized DBM for faster and good generative and Denoising autoencoder was first introduced by
discriminative model Vincent, Larochelle, Bengio, and Manzagol (2008) as method
to stochastically learn robust feature representation from cor-
3.2. Deep Autoencoder rupted version of data (e.g. sensor values) by partial destruction
of the raw input sample. Thus, denoising autoencoder is trained to
The autoencoder method replicates the copies of the input reconstruct sample input data from corrupted version by forcing
value as output as shown in Fig. 3. Using encoder and decoding random sample values of the data to zero through stochastic map-
units, autoencoder methods produces the most discriminative fea- ping. Similar to other unsupervised deep learning model, denoising
tures from unlabeled sensor data by projecting them to lower di- autoencoder is trained through layer to layer initialisation. Each
mensional space. The encoder transforms the sensor data input layer of the network is trained to produce input data of the next
into hidden features which are then reconstructed by the decoder higher level layer representation. The layer to layer training ensure
to approximate values to minimise error rates (Liou, Cheng, Liou, & that autoencoder network is able to capture robust structure and
Liou, 2014; Lukun Wang, 2016). The method provides data-driven observed statistical dependencies and regularities about input
learning feature extraction techniques to avoid problems inherent data distributions. Moreover, stacked denoising autoencoder can
in handcrafted features. Training autoencoder is done in such a be stacked to learn useful representation of corrupted version of
way that the hidden units are smaller than the inputs or outputs input sample data which have been found to give less classifica-
tion error (Vincent, Larochelle, Lajoie, Bengio, & Manzagol, 2010), captures the feature maps with different kernel sizes and strides
and this was recently applied to recognise complex activities and then pooled the features maps together in order to reduce the
(Oyedotun & Khashman, 2016). number of connections between the convolutional layer and the
Sparse autoencoder (Marc’Aurelio Ranzato, et al., 2007) is un- pooling layer. The pooling layer reduces the feature maps, num-
supervised deep learning model developed for sparse and over- ber of parameters and makes the network translational invariant
complete feature representation from input data by forcing sparsity to changes and distortion. In the past, different pooling strategies
term to the model loss function and set some of the active units have been proposed for Convolutional Neural Network implemen-
close to zero. Sparse autoencoder is highly applicable in tasks that tation in various area of applications. These include max pool-
require analysis of high dimensional and complex input data such ing, average pooling, stochastic pooling and spatial pooling units
as motion sensors, images and videos. Generally, the use of spar- (Y. Guo et al., 2016). Recently, theoretical analysis and performance
sity term allow the model to learn feature representation that are evaluations of these pooling strategies have shown superior per-
robust, linearly separable and invariant to changes, distortion, dis- formance of max pooling strategies. Thus, max pooling strategy
placements and learning applications (Zhou et al., 2015). Therefore, is extensively applied in deep learning training (Boureau, Ponce,
sparse autoencoder model is very efficient for extraction of low di- & LeCun, 2010; Scherer, Müller, & Behnke, 2010). Moreover, recent
mensional features from high dimensional input data and compact studies human activity recognition also applies max pooling strate-
interpretation of complex input data using supervised learning ap- gies due to its robustness to small changes (Kautz et al., 2017;
proach (Liu & Taniguchi, 2014). Liu, Liang, Lan, Hao, & Chen, 2016). However, studies in time series
Recently, Rifai, Vincent, Muller, Glorot, and Bengio (2011) pro- analysis with deep learning observed reduction in discriminative
pose contrative autoencoder by introducing penalty term of par- ability of max pooling strategies (Abdel-Hamid, Deng, & Yu, 2013).
tial derivatives for efficient feature representation. The use of sum Therefore, further experimental analysis and evaluation is required
of square of all partial derivatives for the feature vectors with to ascertain the effectives of these pooling strategies in human ac-
respect to size of input data, force the features within neigh- tivity recognition and time series applications.
bourhood of the input data (Dauphin et al., 2012). Furthermore, The fully connected layer is fused with the inference engine
penalty term reduces the dimensional feature space with the train- such as SoftMax, Support Vector Machine or Hidden Markov Model
ing datasets and makes it invariant to changes and distortion. Con- that takes the features vectors from sensor data for activity recog-
tractive autoencoder is similar to denoising autoencoder as both nition (Erfani, Rajasegarar, Karunasekera, & Leckie, 2016; Ronao &
apply penalty term to the small corrupted data sample. However, Cho (2016, 2015). In CNN, activation unit values are computed for
unlike the denoising autoencoder, the contractive autoencoder ap- each region of the network in order to learn patterns across the
plies an analytic penalty to the whole data instead of the encod- input data (Ordóñez & Roggen, 2016). The output of convolutional
ing input sample (Mesnil et al., 2012). Section 4.1.3 discusses the 1. j
M
l. j l−1. j
operation is computed as Ci = α (blj + wm xi+m−1 ), where l is
applications of autoencoder in mobile and wearable sensor based m=1
human activity recognition and health monitoring. the layer index, σ is the activation function, b is the bias term
for the feature map, M is the kernel/filter size, W is the weight
3.3. Sparse coding of the feature map. The weight may be shared to reduce com-
plexity and make the network easy to train. Generally, idea of
Sparse coding was first proposed by Olshausen and convolutional neural network (CNN) was inspired by Hubel and
Field (1997) as a machine learning technique for learning over- Wiesel (1962) which noted that the human visual cortex con-
complete basis in order to produce efficient data representation. sists of maps of the local receptive field that decrease in granu-
Sparse coding provides an effective means of reducing the dimen- larity as the cortex move along the receptive fields. Since the pro-
sionality of data and dynamically represent the data as a linear posal, a number of other CNN architectures have been developed
combination of basis vectors. This enable sparse coding model by researchers. These include the AlexNet (Krizhevsky, Sutskever,
captures the data structure and determines correlations between & Hinton, 2012), VGG (Krizhevsky et al., 2012) and GoogleNet
various input vectors (Y. Guo et al., 2016). Recently, some studies (Szegedy et al., 2015).
have proposed sparse coding methods to learn data representation Recently, CNN architectures that combine other deep learning
particularly in human activity recognition. These include the shift- techniques or fusion of different CNN architectures (Jing, Wang,
invariant method (Vollmer, Gross, & Eggert, 2013b) and sparse Zhao, & Wang, 2017; Ordóñez & Roggen, 2016) were also pro-
fusion (Ding, Lei, & Rao, 2016). These algorithms provide feature posed. For instance, (Ordóñez & Roggen, 2016) proposes DeepCon-
dimensionality reduction strategies to reduce computational com- vLSTM, an architecture that replaces the pooling layer of the con-
plexities for implementation of human activity recognition system volutional neural network with Long Short Term Memory (LSTM)
using mobile phone and wearable devices. of the recurrent neural network. Also, convolutional deep be-
lief networks (CDBN) was developed by Lee, Grosse, Ranganath,
3.4. Convolutional Neural Network and Ng (2009) which exploit the power of discriminative CNN
and pre-training technique of Deep Belief Network. Furthermore,
Convolutional Neural Network (CNN) (LeCun, Huang, & Bottou, Masci, Meier, Cireşan, and Schmidhuber (2011) proposed deep con-
2004) is a Deep Neural Network with interconnected structures. A volutional autoencoder for feature learning by integrating convolu-
convolutional neural network performs convolution operations on tion neural network and autoencoder trained with online stochas-
raw data (e.g. sensor values) and is one of the most researched tic gradient descent optimisation. The architecture of Convolutional
deep learning techniques which has found extensive applications neural network is shown in Fig. 4.
in image classification, sentence modelling, speech recognition and
recently in mobile and wearable sensors based human activity 3.5. Recurrent neural network
recognition (Y. Guo, et al., 2016; Karpathy, Johnson, & Fei-Fei, 2015;
Ronao & Cho, 2016). Generally, convolutional neural network model Recurrent neural network (RNN) was developed to model se-
is composed of convolutional layer, pooling layer and fully con- quential data such as time series or raw sensor data (Fig. 5). RNN
nected layer. These layers are stacked to form deep architecture incorporates a temporal layer to capture sequential information
for automatic feature extraction in raw sensor data (Ordóñez & and then learns complex changes using the hidden unit of the re-
Roggen, 2016; Wang, Qiao, & Tang, 2015). The convolutional layer current cell. The hidden unit cells can change based on the infor-
Fig. 4. Deep convolutional neural network.
nism (Valipour, Siam, Jagersand, & Ray, 2016). While LSTM updates
by summation operation, GRU updates the next hidden state by
taking correlation based on the amount of time needed to keep
such information in the memory. Moreover, recent comparative
analysis of the performance of LSTM and GRU shown that GRU
slightly outperformed LSTM in most of machine learning applica-
tions (Chung, Gulcehre, Cho, & Bengio, 2014). An attempt has also
been made to improve on GRU by reducing the number of gates
in the network and introduce only multiplicative gates to control
the flow of information (Gao & Glowacka, 2016). The algorithm
was compared with LSTM and GRU, and it outperformed them in
terms of memory requirement and computational time. Recently,
Chung, Gülçehre, Cho, and Bengio (2015) proposed Gated Feedback
Fig. 5. Simple recurrent neural network. Recurrent Neural Network (GF-RNN) to solve the problem of learn-
ing at multiplicative scale. This learning process is very challeng-
ing in application area such as language modelling and program-
mation available to the network, and this information is constantly
ming language sequence evaluation. Specifically, Gated Feedback
updated to reflect the current status of the network. RNN com-
Recurrent Neural Networks is developed by stacking multiple re-
putes the current hidden state by estimating the next hidden state
current layers and allow control of the signal flowing from upper
as activation of the previously hidden state. However, the model is
layer to the lower layer. The mechanism is done by adaptively con-
difficult to train and suffer from vanishing or exploding gradients
trolling based on the previously hidden state and assign different
limiting its application for modelling long time activity sequence
layer with different timescale. However, GF-RNN is not popular in
and temporal dependencies in sensor data (Guan & Ploetz, 2017).
human activity recognition. For all the studies review, we find no
Variations of RNN such as Long Short Term Memory (LSTM) and
specific work that apply GF-RNN for human activity. Therefore, the
Gated Recurrent Unit (GRU) integrate varieties of gates and mem-
model is omitted in our review of deep learning based human ac-
ory cells to capture temporal activity sequence (Graves, 2013). Long
tivity recognition in Section 4.2.2.
Short Term Memory (Hochreiter & Schmidhuber, 1997) incorpo-
rated memory cell to store contextual information, thereby con-
trol flow of information into the network. With the inclusion of 3.6. Strengths and weaknesses of different deep learning methods
memory cells such input gate, function gate and output gate along-
side learnable weights, allow LSTM model temporal dependencies In this section, we compare these methods discussed above not-
in sequential data and adequately capture global features to boost ing their strengths and weaknesses for mobile and wearable based
recognition accuracy (Zaremba, 2015). human activity recognition. The different deep learning methods
Despite the advantages inherent in LSTM, Cho et al. (2014) ob- discussed in this review have produce state-of-arts performances
served that issues of too many parameters that need to be up- in mobile and wearable sensor based human activity recognition
dated during training increases computational complexity of LSTM. (Section 4). The main advantage of deep learning is the ability
To reduce parameter update, they introduced Gated recurrent units to automatically learn from unlabelled raw sensor data. However,
with fewer parameters that make it faster and less complex to im- these methods provide different capabilities for sensor stream pro-
plement. LSTM and Gated Recurrent Unit (GRU) differ in the way cessing. For instance, Restricted Boltzmann machine algorithms are
the next hidden state are updated and contents exposure mecha- efficient for automatic and efficient unsupervised transformation
of sensor data into feature vector using layer by layer training inative model and hybrid model (Deng, 2014). These subdivi-
leveraging unlabelled data. Also, the methods allow robust fea- sions are presented in Fig. 6. The generative models are graphi-
ture vectors extraction. Nevertheless, Restricted Boltzmann ma- cal models that represent independent or dependent distributions
chine presents major drawback such as high parameter initialisa- in sensor data where graphs node represent the random variable
tion that make training computationally expensive. Considering the of the given sensor data and arc represent the relationship be-
computation capabilities of mobile and wearable sensor devices, tween variables. Generative models capture higher order corre-
it is difficult to support on-board and real-time activity recogni- lation by identifying joint statistical distributions with associated
tion (Yalçın, 2016). On the other hand, Deep autoencoder are ef- class. Moreover, generative models use unlabeled datasets that are
ficient for unsupervised feature transformation into lower feature pre-trained with greedy layer by layer approach and then fine-
vectors automatically from raw sensor data. Specifically, deep au- tuned with labelled data which is then classified with classical
toencoder methods are trained using greedy layer by layer ap- machine learning such as Support Vector Machine (SVM) or HMM
proach for unsupervised feature learning from continuous sensor (Bengio, 2009; Hodo, Bellekens, Hamilton, Tachtatzis, & Atkinson,
streams. Deep autoencoder algorithms are robust to noisy sensor 2017; Mamoshina et al., 2016). Among deep learning methods
data with ability to learn hierarchical and complex features from in these categories are Restricted Boltzmann, Autoencoder, Sparse
sensor data. However, the major drawbacks of deep autoencoder Coding and Deep Gaussian Mixture. In the case of the discrim-
are the inability to search for optimal solutions and high com- inative models, the posterior distribution provides discriminative
putation time due to high parameter tuning. While sparse cod- power in classification and modelling of label sensor data. A con-
ing methods are efficient for reduction of high dimensional sensor volutional neural network is an important category of discrimina-
data into linear combination feature vectors and ensure compact tive deep learning model (Mamoshina et al., 2016). Others are Re-
representation of features. Moreover, sparse coding is invariant to current Neural Network, Artificial Hydrocarbon and Deep Neural
sensor transformation and orientation, and effective for modelling Model. Conversely, hybrid models are used to classify data by de-
changes in activity progression (Zhang & Sawchuk, 2013). Change ploying the feature output generated by generative models. This in-
in sensor orientation is big challenges in human activity recogni- volves pre-training of the data to enhance computational time and
tion system especially for smartphone accelerometers (Incel, 2015). then classify with classical machine learning algorithms. The gen-
In this, accelerometer signal produce by smartphone or wearable erative model reinforces hybrid models through optimisation and
devices change with variations in orientation and placement po- regularisation procedures (Deng, 2014). In this review, the stud-
sitions. Nevertheless, it is still challenging to effectively perform ies categorised as a hybrid models are those that combine gen-
unsupervised features learning with sparse coding. Convolutional erative and discriminative or both methods for human activity
Neural Network are capable of learning deep feature vectors from recognition. Notable examples in this area are Convolutional Re-
sensor data for modelling complex and high dimensional sensor stricted Boltzmann Machine (Sarkar, Reddy, Dorgan, Fidopiastis, &
data. The main advantage of CNN is the ability to use pooling layer Giering, 2016), Convolutional Recurrent Neural Network (Ordóñez
to reduce training data dimensions and make it translational in- & Roggen, 2016) and an ensemble of homogenous convolutional
variant to changes and distortion (Ronao & Cho, 2016). The algo- neural network features (Ijjina & Mohan, 2016).
rithms is capable of learning long range and repetitive activities In human activity recognition, deep learning is used in diverse
through multi-channel approach (Zeng et al., 2014). Convolutional tasks such as estimating changes in the movement pattern for the
Neural Networks are more inclined for image processing, therefore, elderly (Yi, Cheng, & Xu, 2017), labelling of human activity se-
sensor data are converted to image description to support extrac- quence (Yao, Lin, Shi, & Ranasinghe, 2017), recognition of emotion
tion of discriminative features (Sathyanarayana et al., 2016b). Con- in people in need using electroencephalogram (EEG) (Yanagimoto
volutional Neural Network are deployed to solve the problem of & Sugimoto, 2016) and health anomaly detection using physiolog-
uncertainty in sensor measurement and conflicting correlation in ical signals. To efficiently achieve these, require automatic feature
high dimensional sensor data. However, CNN require high num- representation. Therefore, deep learning methods provide effective
ber of hyper-parameter tuning to achieve optimal features. Further- features representation approach to improve classification errors
more, it is challenging to support effective on-board recognition of and reduce computational complexity in human activity recogni-
complex activity details. Section 4.2.1 provide comprehensive re- tion. For instance, the variants of Restricted Boltzmann Machine
view of Convolutional Neural Networks implementation for human methods play vital role in features dimension reduction and au-
activity recognition. Finally, Recurrent Neural Networks are applied tomatically discover discriminative features using a layer by layer
to model temporal dynamics in sensor data, thus enable modelling pre-training to increase recognition accuracy. Restricted Boltzmann
of complex activity details. RNN such as Long Short Term Mem- Machine provides an excellent method for learning improved fea-
ory are efficient at creating global temporal dependencies in sen- tures from unlabeled data and then pre-trained for complex ac-
sor data. The major issue in Recurrent Neural Networks especially tivity recognition. The high-order dependencies and localisation
long short term memory is the high computation time due to large among group activities features are extracted with different deep
number of parameter update. Techniques such as high throughput learning methods (Alsheikh et al., 2015).
parameter update approach may help to reduce computation time Sensor data processing are classical time series learning and re-
(Inoue, Inoue, & Nishida, 2016). quire high input sensor data adaptation to enable efficient process-
Table 2 summarises the recent applications domain in mobile ing. Mobile and wearable sensor data generate time series sen-
and wearable sensor based human activity recognition, strength sor data in one dimension (1D) (Zeng et al., 2014). It is chal-
and weakness of each deep learning methods, placing emphasis lenging to processing motion sensor with high dimensional deep
on sensor data processing. Furthermore, the categorisation of each learning architectures. Two approaches have been proposed to con-
method for human activity recognition is presented in Section 4. vert the sensor streams to fit into deep learning algorithms. These
include channel or model based approaches. Channel based ap-
4. Deep learning approaches for human activity recognition proach utilise the sensor dimension as the dimension of the net-
using mobile and wearable sensor data work architecture and extract features from each axis for activ-
ity recognition and fall detection (Khan & Taati, 2017; Ordóñez &
Research on the use of deep learning for feature representa- Roggen, 2016). The sensor axes are used to perform 1D convolu-
tions and classification is growing rapidly. Generally, deep learn- tion for extraction of salient feature and then combined at the
ing methods can be subdivided into generative model, discrim- fully connected layers (Sathyanarayana et al., 2016a). Model based
Table 2
Deep learning methods.
Methods Descriptions Strengths Weaknesses Recent application in human

activity monitoring and
detection
Deep Belief Network Has directed connection at the Unsupervised training with Mobile and wearable sensor Activity of daily living (ADL)
lower layer and undirected unlabelled sensor streams on-board training of the localisation, detection of
connection at two topmost which is naturally available network is computationally posture and hand gestures
layer through cyber-physical systems complex due to extensive activities in Alzheimer.
and Internet of Things and parameters initialization
initialisation prevent process
convergence at local minima
Deep Boltzmann Has undirected connection at Allow feedback mechanism for Due to resource constraint Diagnosis of emotional state in
Machine every layer of the network more robust feature extraction nature of mobile devices, Joint elderly and detection of
through unsupervised training. optimisations are required to irregular heartbeats during
reduce operation overhead and intensive exercise.
execution cost. However, DBM
joint optimisation is practically
difficult to achieve
Denoising autoencoder Enable correct reconstruction Robust to corrupted sensor High computational time, lack Automatic detection of activity
of corrupted input values data streams of scalability to high of daily living (ADL).
dimensional data, rely on
iterative and numerical
optimisation and high
parameter tuning (M. Chen, Xu,
Weinberger, & Sha, 2012)
Sparse Autoencoder Impose sparsity term to the Produce more linearly High computational time due Health rate analysis during
loss function to produce robust separable features to numerous forward pass for intensive sports activities and
features that are invariant to every example of the data health monitoring
learning applications sample (Ng, 2011)
Contractive Add analytic penalty instead of Reduced dimensional features Difficult to optimise and greedy Activity of daily living (ADL),
autoencoder the stochastic penalty to the space and is invariant to pre-training does not find user location and activity
reconstruction error functions changes and local dependencies stable nonlinear features context recommendations
especially for one layer
autoencoder (Schulz, Cho,
Raiko, & Behnke, 2015)
Sparse Coding Over-complete basis for The use of sparse coding Efficient handling and Representation of energy
reducing the dimensionality of method for dimensionality computation of feature vectors related and health monitoring
data as linear combination of reduction of input data helps are non-trivial smart homes and Activity of
basis vector to minimise computational (Harandi, Sanderson, Hartley, & daily living (ADL)
complexity Lovell, 2012). It is also difficult
to develop deep architecture
with sparse coding
(He, Kavukcuoglu, Wang,
Szlam, & Qi, 2014)
Convolutional Neural Deep neural network with Widely implemented in deep Require large dataset and high Predict relationship between
Network interconnected structure learning with a lot of training number of hyper-parameter exercises and sleep patterns,
inspired by biological visual strategies proposed. tuning to achieve optimal automatic pain recognition
cortex Automatically learn features features. Maybe difficult to during strenuous sports
from raw sensor data. support effective on-board activities, energy expenditure
Moreover, CNN is invariant to recognition of complex activity estimation and tracking of
sensor data orientation and details. personal activities.
change in activity details.
Recurrent Neural Neural network for modelling Used to model time Difficult to train and suffer Model temporal patterns in
Network sequential time series data. dependencies in data from vanishing or exploding activity of daily living (ADL),
Incorporate temporal layer to gradients. In case of LSTM, progressive detection of activity
learn complex changes in data require too many parameter levels, fall and heart failures in
updates. Large parameter elderly.
update is challenging for
real-time activity predictions.
methods use temporal correlation of sensor data to convert the adaptation approaches to automatically extract relevant features
sensor data into 2-D image descriptions and apply 2-D convolu- from raw sensor data.
tion operation to extract features. These are common in Convolu- In this section, we discuss recent studies for deep learning im-
tional Neural Network for human activity recognition (Jiang & Yin, plementation of human activity recognition for mobile and wear-
2015; Ravì, Wong, Lo, & Yang, 2017). For instance, Ravì, Wong, Lo, able sensors. In Fig. 6, these methods are depicted while subse-
et al. (2017) propose spectrogram representation to transform the quent sections outline their uniqueness for feature extraction in
motion sensor data (accelerometer and gyroscope) into local tem- mobile and wearable sensor based human activity recognition.
poral convolution to reduce computational complexity. The types
of input adaptation employ for motion sensor in human activ- 4.1. Generative deep learning methods
ity recognition depends application domains. Other works mod-
ified the convolutional kernel of Convolutional Neural Network As stated earlier, generative deep learning methods model in-
to capture temporal dependencies from multiple sensors (Chen & dependent or dependent distributions in data and high order cor-
Xue, 2015). Therefore, previous studies on deep learning imple- relation by identifying the joint statistical distribution with asso-
mentation for human activity recognition adopt these input data ciated classes. In the past decade, various studies have been con-
Fig. 6. Taxonomy of recent deep learning methods for human activity recognition.
ducted using generative feature extraction models for human activ- (Zhang, Wu, & Luo, 2015a,b,c,d). However, the use of large window
ity recognition. Here, we analysed these and their implementation size and storing previous data to provide contextual information in
advantages some of the studies aid increased computational time and mem-
ory usage. Deep Belief Network has also provided excellent means
to model temporal dependencies and observable posterior distri-
4.1.1. Deep restricted Boltzmann machine methods bution in sensor data with Hidden Markov model for diagnosis
Pioneering the use of deep learning based generative fea- and recognition of emotions state in elderly using wearable sen-
ture extraction for human activity recognition was started by sor worn on the patients’ scalp (X. Jia, Li, Li, & Zhang, 2014; Zhang
Plötz et al. (2011) when they proposed the performance evalu- et al., 2015b,d). Also, Z. Y. Wu, Ding, and Zhang (2016) proposed
ation of different generative feature extraction and dimensional- unsupervised feature extraction and recognition of irregular heart
ity reduction techniques such as autoencoder, principal compo- beat during intensive exercise by stacking various layers of Re-
nent analysis, empirical cumulative distribution function and sta- stricted Boltzmann machine. The stacked layers enable hierarchical
tistical features. An extensive experiment using sensor based on extraction of discriminative features that clearly describe complex
public datasets showed that autoencoder outperforms other fea- activity details. The objective is to provide automatic health mon-
ture extraction techniques including handcrafted features. A num- itoring in special cases such as brain activity detection (Electroen-
ber of other deep learning methods for human activity recognition cephalogram), eye movement (Electrocochleogram), skeletal mus-
have since followed suit. For instance, the deep belief network pro- cle activity (Electromyogram) and heart rate (Electrocardiogram).
posed by Hinton et al. (2006) was used to extract hierarchical fea- This will ensure appropriate independent living and overall health
tures from motion sensor data and then model stochastic tempo- status for the elderly (Längkvist, Karlsson, & Loutfi, 2012; Z. Y. Wu
ral activity sequence using Hidden Markov Model (Alsheikh, Niy- et al., 2016; H. Xu & Plataniotis, 2016).
ato, Lin, Tan, & Han, 2016). The work was later extended for on- Zhao and He (2014) explored implementation of deep Restricted
board mobile phone implementation using mobile Spark platform Boltzmann Machine for detection of hand activity in elderly with
(Alsheikh et al., 2016). Also, studies by Yalçın (2016) and L. Zhang, Alzheimer’s disease using Electroencephalogram dataset collected
Wu, & Luo (2015a) introduce deep belief network for online and with wearable devices worn by patients. They leverage on in-
real-time feature extraction for human activity recognition. How- cremental learning and support vector machine to classify what
ever, due to the computationally intensive nature of deep learning, features may lead to accurate diagnosis of the disease. In re-
the algorithm was trained offline with generative backpropagation cent study, Bhattacharya and Lane (2016) investigated smartwatch-
initialized parameters and activity classification done with SoftMax centric activity recognition and the possibility of implementing
Regression. Deep learning has also provided feature representation deep learning in wearable devices. They concluded that GPU-
for the online classification task, contextual information provision enabled smartwatch could provide deep learning implementation.
for sensor and real-time recognition of simple to complex activi- The framework implemented on Snapdragon 400 SoC wristwatch
ties details using datasets collected with the aid of mobile devices
achieved high accuracy for common daily activity such as hand thermore, Zhou et al. (2015) proposed stacked autoencoder for fea-
gesture, indoor/outdoor localisation, and transport model using ture extraction for Android smartphone based motion recognition
public datasets. Another key study was presented by Fang and using sensor data modalities with high-performance accuracy. In
Hu (2014), to learn automatic features for recognition of human addition to checking human activity to promote a healthy life, mo-
activities in constrained environment. The dataset was gathered for bile sensor data can further help in the diagnosis of lifestyle re-
a fifty (50) day period, leveraging on current and previous activity, lated illnesses. Related work for such application was recently pro-
and the duration of the activity to ascertain the individual activi- posed by Unger, Bar, Shapira, and Rokach (2016) using stacked au-
ties. The problem of recognising interleaved and overlapped activi- toencoder. The proposed stacked autoencoder was developed for
ties was examined by Radu et al. (2016), for multimodal and Deep recognition and recommendation of online based activity leverag-
Boltzmann Machine based human activity recognition using pat- ing mobile sensor data. The deep learning method helped to re-
tern mining. With this, the unannotated activity can be discovered duce the dimensionality of the data and select the feature that
by deploying sensors of different modalities. best provides the context-aware recommendation, user location
and users preference. Stacked autoencoder has also been extended
4.1.2. Deep Autoencoder methods to generate a sequence of time series to characterise human move-
Autoencoder, another generative feature learning technique has ment pattern based on time elapse window properties (Munoz-
also dominated human activity recognition landscape. For instance, Organero & Ruiz-Blazquez, 2017). Related implementation for fall
Plötz et al. (2011) had earlier argued the superiority of autoen- detection using sensor data generated by radar was presented in
coder over PCA, ECDF and statistical feature extraction methods. Jokanovic, Amin, and Ahmad (2016). The stacked autoencoder pro-
Other researchers have also developed autoencoder techniques for vides mechanism to reduce the dimensionality of the data into
human activity recognition. Recent studies by Hasan and Roy- lower dimensional features that are feed into SoftMax regression
Chowdhury (2014, 2015) propose the use of sparse autoencoder for fall identification. The use of dimensionality reduction strate-
for human activity recognition. The algorithm was proposed to gies helps to reduce computational complexity notably for mobile
learn features from continuous data streams and then activity de- based implementation.
tails were classified using multi-logistic regression classifier (Soft- Stacked denoising autoencoder when combined with active
Max). Learning of features in stream sensors are very challeng- learning provide excellent means for automatic labelling and fea-
ing due to the scarcity of label data, class invariant and con- ture extraction for activity recognition and heart rate analysis dur-
cept drift. However, with incremental learning and sparse autoen- ing intensive exercise. Moreover, stacked denoising autoencoder
coder, they automatically learn features without relying on man- implementation are important for morbidity rate prediction (Al
ually annotated data. Performance evaluation of sparse autoen- Rahhal et al., 2016; Song, Zheng, Xue, Sheng, & Zhao, 2017). There
coder, deep autoencoder and principal component analysis was ex- is a great need to enable independent living for elderly in different
amined by Liu and Taniguchi (2014). They observed that the use parts of the world due to the high rate of ageing populations. With
high depth deep sparse autoencoder enable extraction of more such assistance, elderly citizens can function optimally by utilis-
discriminative features compared to deep autoencoder and PCA ing sensor-equipped smart homes. One major challenge is how to
using a dataset from CMU Lab. In Li, Shi, Ding, and Liu (2014), increase the performance of the algorithm and automatically ex-
three basic autoencoder methods were evaluated for human activ- tract feature vectors. More so, obtaining labelled data that will
ity recognition from data collected using smartphones. They con- be exploited by features engineers is difficult. To solve the prob-
cluded that sparse autoencoder outperformed other feature learn- lem and improve the performance of human activity recognition
ing techniques in terms of accuracy. However, due to the small in the smart home environment, Wang, Chen, Shang, Zhang, and
size of the smartphone dataset and computational platform used in Liu (2016) proposed denoising autoencoder techniques to learn un-
the study, the performance cannot be accurately generalised. Sim- derlying feature representation in sensor data and then integrate it
ilarly, Harasimowicz (2014) evaluated effects of pre-processing on with a classifier trained into single architecture to obtain powerful
the performance of generative models for feature extraction, exam- recognition model. In general, autoencoder methods have demon-
ining algorithms comparatively using sparse autoencoder and con- strated excellent approaches for automatic feature representation
cluded that pre-processing has a strong influence on the perfor- to learn latent feature representation for human activity monitor-
mance of activity classification especially normalisation techniques. ing and detection approach. Generally, stacked autoencoder pro-
Besides works that parameters evaluation of autoencoder for vide compact feature representation from continuous unlabelled
and preprocessing for human activity recogniton, other studies sensor streams to enable robust and seamless implementation of
have further examined mobile based implementation of stacked human activity recognition system.
autoencoder for human motion analysis using motion sensors (ac-
celerometer, gyroscope, gravity sensors etc.) with high perfor- 4.1.3. Sparse coding methods
mance accuracy (Zhou et al., 2015). Similarly, Wang (2016) extracts Sparse coding proposed in Olshausen and Field (1997) provides
features from the accelerometer and magnetic sensors using con- a means to reduce sensor data dimension and represent them as
tinuous autoencoder for the development of automatic human ac- an efficient linear combination of basis vectors. Due to the efficient
tivity recognition. The proposed continuous autoencoder adds ran- data representation ability of sparse coding, a number of stud-
domness and converts the high dimension inputs into low dimen- ies have used it to develop feature extraction and representations
sional vectors by encoding and decoding process at the hidden for human activity recognition. For instance, sparse coding method
layers. To increase the learning rate of the algorithm, stochastic was presented Zhu, Zhao, Fu, and Liu (2010) to convert feature
gradient descent optimisation was introduced in the hidden layer, in activity recognition into linear combination vector trained with
and the algorithm was compared with statistical features with en- dictionary algorithm. Additionally, Bhattacharya, Nurmi, Hammerla,
hanced performance obtained. Shared-based autoencoder for sep- and Plötz (2014) examined the use of sparse coding algorithm
aration of multiple input modalities sensors into hierarchical com- trained on self-taught theorem and codebook basis for combi-
ponent was proposed by Shahroudy, Liu, Ng, and Wang (2016). In nation of feature vectors. The sensor data were converted into
the study, factorised input modalities were stacked to convert com- a linear combination, and the dimension was reduced to gener-
plex and nonlinear input representation into linear vectors for clas- ated movement patterns computed from raw sensor signals. The
sification. The main advantage of this method is its robustness to algorithm outperformed other well-known dimensionality reduc-
noise and ability to extract hierarchical and complex features. Fur- tion feature learning algorithms such as PCA and semi-supervised
En-co Training. Sparse Coding was also used to pre-process and and Cho (2015). In their study, Convolutional Neural Network was
learn basic function that captures high representation in sen- deployed to extract hierarchical and translational invariant fea-
sor data. Then, activity details were classified using neural net- tures from accelerometer and gyroscope sensor data and activ-
work classifier for wireless sensor network based health moni- ity details classified using Multinomial Logistic regression (Soft-
toring (Guo, Xie, Bie, & Sun, 2014). A major problem in activ- Max). However, the method failed to capture temporal variance
ity recognition is how to solve the problem of intra-class and and change in complex activity detail and generalisation to differ-
inter-class variation and complex nature of human body move- ent activity models. Furthermore, intra-class and inter-class varia-
ment (Bulling et al., 2014b). To minimize intra-class and inter-class tions can be solved by incorporating time-frequency convolution
variation, Zhang and Sawchuk (2013) proposed sparse representa- which was not implemented in the study. In study by Yuqing
tion techniques that employ the use of an over-complete dictio- Chen and Xue (2015), instead of developing new CNN architecture
nary to represent the human signal as a sparse linear combination modified the convolutional kernel using transfer learning to suit
of activity classes. In the algorithm, class membership was deter- the tri-axial characteristics of acceleration signal for human activ-
mined by solving the L1 minimisation problem. The authors com- ity recognition. While Charalampous and Gasteratos (2016) exam-
pare the technique with other established classical machine learn- ined the use of the convolutional neural network for online deep
ing method (logistic regression, multinomial regression and deci- learning feature extraction using the whole data sequence. More-
sion tree) with impressive results obtain with sparse coding. Sparse over, they introduce Viterbi algorithm using optimisation criterion
coding methods provide the possibility for constrained linear cod- and a network of computational nodes in hierarchical form to in-
ing representation of energy-related activities in smart home en- crease performance of the network. However, the proposed ap-
vironments using sensor streams. Therefore, sparse coding inher- proach applied entire sample of the sensor dataset to implement
ently apply sparse dictionary to reduce manual annotation of data the CNN and this may increase the computation time for mobile
(Q. Zhu, Chen, & Soh, 2015). and wearable devices implementation. On the other hand, Ha, Yun,
and Choi (2015) proposed a 2-D kernel convolutional neural net-
4.1.4. Stacked deep Gaussian methods work to capture local dependencies over time and spatial depen-
Recently, various studies have developed deep learning model dencies over sensors and this is important where multiple sensors
by stacking a classical generative model to form a deep archi- are attached to different part of the body. When using 1-D ker-
tecture. Typical examples are Gaussian process classifier (X. M. nel convolution, it will be difficult to capture features from differ-
Wang et al., 2016), molecular complex detection method (Lu et al., ent sensor modalities. The use of a convolutional neural network
2016), and the Deep Gaussian Model. The Gaussian process can also predict the relationship between physical exercises and
model provides unsupervised feature extraction by stacking sev- sleep pattern using accelerometer and gyroscope sensors. In re-
eral layers of Gaussian processes to produce robust features. cent study, Sathyanarayana et al. (2016b) observed that convolu-
Lu et al. (2016) explored the issue of gathering huge amount tional neural network outperformed handcrafted features in terms
of sensor data, complex and diverse activities by proposing of robust feature generation, high dimensional data and classifica-
the molecular complex detection method. The technique was tion accuracy when applied to predict the link between exercises
first introduced to study protein interaction by Bader and and sleep. Furthermore, similar studies have comparatively ex-
Hogue (2003) and the authors extended the algorithm for effec- plore the performances of convolutional neural network and hand-
tive recognition and detection daily activity, product recommenda- crafted features (Egede, Valstar, & Martinez, 2017; Gjoreski, Bizjak,
tion and sports activity using accelerometer data. Recent work by Gjoreski, & Gams, 2015). The experimental analysis showed convo-
Feng, Yuan, and Lu (2017), proposed Deep Gaussian Mixture Model lutional neural network conveniently outperform handcrafted fea-
that adaptively uses multilayer nonlinear input transformation to tures using sensor data generated by wearable devices attached to
extract salient features from motion sensors for human activity the wrist for human activity recognition and automatic pain detec-
recogniton. tion during intensive sports activities. However, wrist sensor place-
However, majority of the generative models have fully con- ment produce irregular movement pattern and it is challenging to
nected layer and cannot capture local and temporal dependencies ascertain best feature combinations to achieve higher performance
in sensor data. In general, generative models have difficult opti- accuracy (Gjoreski, Gjoreski, Luštrek, & Gams, 2016) for such loca-
misation procedures, computationally expensive training processes tion placement. Therefore, the results obtain by the comparative
and suffer from vanishing gradient problem (G. E. Hinton et al., analysis cannot be active generalised.
2012). Table 3 summarises the different generative deep learning Implementation of deep learning algorithm on low-power wear-
methods for feature extraction in human activity recognition. able devices was recently reported in Ravi, Wong, Lo, and Yang
(2016a,b). They proposed a temporal convolutional neural network
4.2. Discriminative deep learning methods that limits the number of hidden layer connections with few in-
put nodes to avoid computational complexity and enable real-
Discriminative feature learning algorithms are modelled with time activity recognition. Furthermore, the authors applied spec-
posterior distribution classes to provide discriminative powers for tral representation of the inertial sensor to achieve invariance to
activity classification and recognition. In recent years, there has sensor placement, orientation and data collection rate. The au-
been a tremendous growth in the amount of activity recognition thors later reported successive implementation combined hand-
that deploys the use of discriminative deep learning methods. The crafted features to reduce computation time and enhance on-board
methods traverse from Convolutional Neural Network to Recurrent wearable devices implementation (Ravì, Wong, Lo, et al., 2017).
Neural Networks. Researchers in ubiquitous sensing have proposed In other way, scale invariant features and local dependencies can
different algorithms in this regard. In this section of the review, also be achieved through weight sharing in convolutional layer
we discuss these implementations for human activity recognition (Zeng et al., 2014). Weight sharing helps to reduce the number of
using mobile and wearable sensor data. training parameters and computational complexity as closely re-
lated filters share similar weights. The issue of computational com-
4.2.1. Convolutional Neural Networks plexity of convolutional neural network algorithm implemented on
A comprehensive implementation of Convolutional Neural Net- low power devices was also analysed by Jiang and Yin (2015). The
work (CNN) for human activity recognition using mobile phone sensor data were transferred and transformed into activity image
sensor data was reported by Ronao and Cho (2016) and Ronaoo that has descriptive information about the data. The activity im-
Table 3
Generative deep learning methods for human activity recognition.
References Methods Description Advantages
(Alsheikh et al., 2016; Alsheikh et al., 2015; Deep Belief Network Generative model that learn greedy Generate feature from unlabelled sensor
Erfani et al., 2016; Fang & Hu, 2014; X. Jia layer-wise compact representation of data that are invariant to irrelevant
et al., 2014; Längkvist et al., 2012; Z. Y. Wu sensor data and learn high-dimensional variation. Used for nonlinear
et al., 2016; Yalçın, 2016; Zhang, Wu, & manifold from unlabelled data dimensionality reduction of high
Luo, 2015a,b,c,d) dimensional sensor data
(Bhattacharya & Lane, 2016; Radu et al., Deep Boltzmann Machine Generative undirected bipartite graphs Use sparse representation techniques to
2016; Zhao & He, 2014) composed of stochastic visible and hidden reduce data sensitivity. Allow
units. The layers are stacked into deep cross-correlation feature extraction and
layers for extracting salient features from sensor fusion for innate feature
sensor observations representation
(Al Rahhal et al., 2016; Jokanovic et al., Deep Autoencoder Unsupervised feature algorithm that Reduce feature dimensionality, minimise
2016; Munoz-Organero & Ruiz-Blazquez, discovers correlation between features and undesirable activities and extract
2017; Plötz et al., 2011; Shahroudy, Ng, extracts low dimensional representation hierarchical features. Learn identity
Gong, & Wang, 2016; Shimizu et al., 2016; using backpropagation to reconstruct approximation and compressed version to
Unger et al., 2016; Zhou et al., 2015) sensor sample select the most suitable feature vectors
(Song, Zheng, Xue, Sheng, & Zhao, 2017; A. Denoising Autoencoder Generative model for partial reconstruction Learn robust and compressed
Wang et al., 2016) of raw sensor input corrupted by adding representation of features from raw sensor
stochastic mapping term data
(Harasimowicz, 2014; Hasan & Sparse Autoencoder Introduce sparsity penalty to Autoencoder Extract high-level features from
Roy-Chowdhury, 2015; Y. Li et al., 2014; Liu hidden units to extract robust and high-dimensional sensor data and select
& Taniguchi, 2014; Wang, 2016) compressed features from the visible units the most suitable feature by sparsity
penalty to the reconstructed inputs sensor
(Bhattacharya et al., 2014; J. Guo et al., Sparse Coding The techniques help to extract salient Enable location of optimal feature, reduce
2014; Zhang & Sawchuk, 2013; Q. Zhu features and convert feature vectors for computational complexity and time, and
et al., 2015; Y. Zhu et al., 2010) human activity recognition from raw speed up data annotation from unlabelled
sensor data into linear vectors data
(Feng et al., 2017; Jänicke, Tomforde, & Stacked Deep Gaussian Deep fusion of generative and probabilistic Reduce number of parameters and model
Sick, 2016; L. Liu, Cheng, Liu, Jia, & models models for nonlinear transformation and complexity during feature extraction.
Rosenblum, 2016; X. M. Wang et al., 2016) adaptive extraction of salient and robust Furthermore, helps to convert high
features from sensor data. dimensional vectors to enhance complex
activity detection
age is then transferred to the deep convolutional neural network typical motor movement (SMM) is challenging due to intra-subject
to extract discriminative features. They noted that to reduce com- and inter-subject variability, and may portray different degree of
putational complexity, there is a need to adopt carefully chosen mental and physical health behaviour. For this, the convolutional
techniques such as feature selection and extraction, sensor selec- neural network has been utilised to learn movement such as hand
tion and use of frequency reduction. tapping, body rocking or simultaneous combination of body move-
For full implement of automate activity recognition techniques ment to detect stereotypical motor movement (Rad et al., 2015;
for wearable, Vepakomma, De, Das, and Bhansali (2015) proposed Rad & Furlanello, 2016). In the same way, studies conducted by
“A-Aristocracy”, a wristband platform to recognise simple and com- Castro et al. (2015) and Singh, Arora, and Jawahar (2016) developed
plex activity using a Deep Neural Network (DNN) classifier for the the first person and egocentric activity recognition using the wear-
elderly health monitoring. The propose platform was tested for its able sensor. They combined contextual information and egocentric
performance on detection of daily living and instrumental activity cues to capture human motion and extract robust and discrimina-
of daily living (cooking, washing plates, doing laundry) (ADL/IADL). tive features using the convolutional neural network. The incorpo-
The use of wearable sensors ensures the privacy of the elderly are ration of cues and contextual information enable the techniques to
maintained, which is a big issue when camera-based sensors are capture time-dependent activities and variation in viewpoints.
deployed for activity recognition. Moreover, the work employed af- Conversely, J. Zhu, Pande, Mohapatra, and Han (2015) exam-
fordable wearable devices and multimodal information such as lo- ined how features extracted by a convolutional neural network
comotion sensing, environmental condition and contextual location can lead to the high estimation of energy expenditure during in-
signal sensing to achieve high recognition accuracy. However, the tensive physical exercises. Energy expenditure estimations enable
study only used a Deep Neural Network with two layers for classi- tracking of personal activity to prevent chronic diseases common
fication and extracted statistical and manual features defeating the in individuals living a sedentary lifestyle. Combining accelerom-
purpose of automatic feature extraction. Sheng et al. (2016) pro- eter sensor and heart rate data, they developed online mecha-
posed quick and short time activity recognition using convolutional nisms to track daily living activity. Energy expenditure prediction
neural network for wearable devices. Long time activities comprise was done on the feature extracted using a backpropagation neu-
series of short-term activity which is segmented using short win- ral network. However, the dataset used for prediction were col-
dow length. Therefore, by constructing an over-complete pattern lected from sensors placed at the waist which does not indicate
library of long time activities into short time activities using slid- movement location. Therefore, there is need to test data collected
ing window techniques, feature extraction was implemented of- from sensors placed on the wrist, chest or ankle that accurately de-
fline and learning for recognition was performed online to ensure tect and monitor total body movements. G. Liu et al. (2016) mod-
real-time and continuous activity description. However, the use of elled binary sensor based human activity recognition by converting
short time window length may result in loss of vital information the sensor value into a binary number and extracting discrimina-
for complex activity recognition (O. Banos et al., 2015). tive features with convolutional neural network. The far-reaching
Autism Spectrum Disorder can affect the functional ability and effect of the study is the ability to reduce computational time
activity performance by individuals, social interaction and com- using fewer binary values during feature extraction from sensor
munication ability. Recognition of such activities can help seam- data. Gait assessment based Convolutional Neural Network in a pa-
less management of the condition. However, detection of stereo- tient with Sclerosis was presented by J. Q. Gong, Goldman, and
Lach (2016) with body-worn sensors. Convolutional Neural Net- from overfitting as it performed better on training data than on
works were implemented to learn the temporal and spectral asso- testing data. Therefore, the use improve regularisation techniques,
ciation among the multichannel time series motion data and learn increase the training datasets and use batch normalisation (Ioffe
holistic gait patterns for robust and efficient feature representation. & Szegedy, 2015) may enhance the performance of the proposed
In related study, Eskofier et al. (2016) propose deep learning al- model. Moreover, adding artificial noise to the data may also im-
gorithm for assessment of movement disorders for patients with prove the prediction accuracy (G. E. Hinton et al., 2012).
idiopathic Parkinson diseases. Patients were attached with inertial
measurement unit sensor nodes to collect accelerometer data and 4.2.2. Recurrent Neural Networks
extract salient features with two convolutional neural network lay- Human activity recognition is a classical time series classifi-
ers and achieved 90.9% accuracy. However, due to the limited num- cation problem made up of complex motor movements and vary
ber of sensor data used for training the Convolutional Neural net- with time. Capturing the temporal dynamic in movement pattern
work, it may be challenging to generalise the performances accu- will help to model complex activity details and enhance the per-
racy achieved. formance of recognition algorithms. Convolutional neural network
In some cases, convolutional neural network are optimised with architecture can only extract translational invariant local features
classical machine learning techniques such as meta-heuristic al- but become ineffective when modelling global temporal dependen-
gorithms to model hyper-parameter tuning to obtain higher ac- cies in sensor data. However, Recurrent Neural Network (RNN) is
curacy. This techniques were recently implemented for detection naturally designed for time series data in which sensor data is a
of Parkinson disease and measurement of calories consumption to prominent part.
combat obesity and recommend physical activities (Pereira, Pereira, Recently various studies have explored different recurrent neu-
Papa, Rosa, & Yang, 2016; Pouladzadeh, Kuhad, Peddi, Yassine, ral network models for modelling human activity recognition. For
& Shirmohammadi, 2016). In a related research for the elderly, instance, studies such as (Chen, Zhong, Zhang, Sun, & Zhao, 2016;
Yin, Yang, Zhang, and Oki (2016) proposed the cascade convolu- X. Ma, Tao, Wang, Yu, & Wang, 2015) proposed long short term
tional neural network for monitoring of heart-related diseases us- memory (LSTM) for feature extraction to recognise activity of daily
ing impulse radio ultra-wideband radar data. Different convolu- living using WISDM data, a publicly available dataset by Wire-
tional neural network modules were implemented to extract ro- less Sensor Data Mining Lab (Kwapisz, Weiss, & Moore, 2011)
bust ECG features and impulse radio ultra-wideband radar feature, and achieved a classification accuracy of 95.1%. Despite the high
which are then combined to form a cascade to distinguish normal performance obtained, the result cannot be generalised due to
heart bits from abnormal ones. The essence of the cascade is to the simplicity of the specified activities and small sample sizes
take care of the different sampling rate and dimensionality of the of the dataset. Therefore, larger datasets are required to improve
various data source. Also, Zhang and Wu (2017) proposed the use the robustness of the algorithm. Large-scale study on the pre-
of the convolutional neural network for automatic stage sleep clas- diction of activity of daily living was examined by Moon and
sification using electrocardiography data. Hamm (2016) with Long Short Term Memory to capture the ran-
Other similar Convolutional Neural networks approach were domness in activity patterns and model the temporal dependencies
lately implemented for automatic data labelling, variable sliding using multi-step look ahead approach. Long short memory pro-
window segmentation and multi-sensor and multi-channel time vides the possibility to automatically detect and characterise eat-
series fusion. For instance, Zebin, Scully, and Ozanyan (2016) intro- ing pattern using the wearable necklace, and early or progressive
duce multichannel sensor time series to acquire sensor data from detection of activities (S. Ma, Sigal, & Sclaroff, 2016; Nguyen, Co-
body-worn inertial sensors. The authors modelled feature extrac- hen, Pourhomayoun, & Alshurafa, 2016). However, issues on the
tion using a convolutional neural network and monitored different modelling of motion movement of head and neck are difficult
hyperparameter setting at the pooling layer, rectified linear units as piezoelectric sensors do not detect such motions. Furthermore,
and max pooling to achieve high accuracy. R. Yao et al. (2017) pro- Long short term memory methods provide technique to rank activ-
posed the use of CNN for dense labelling in human activity recog- ity progression and penalise incorrect activity prediction that may
nition. The use of dense labelling provides an approach to avoid lead to serious consequence especially for detection of fall in el-
missing information, and the algorithm was implemented us- derly (S. Ma et al., 2016).
ing publicly available datasets with an overall accuracy of 91.2%. Inoue et al. (2016) investigated the use of the deep recurrent
Another important applications of convolutional neural network neural network for human activity recognition in real time sce-
is in multi-sensor fusion for human activity detection. Fusion nario. They looked at the best combination of architecture and
of multiple sensor are essential for enhanced activity recogni- optimal parameter values for increased performance. The authors
tion rate (Gravina, Alinia, et al., 2017). However, many issues noted that, increasing the layer of deep RNN will greatly increase
are yet unresolved, such as imprecision and uncertainty in mea- computational time and memory usage and recommend a three-
surement, noise and conflicting correlation, high data dimensions layer architecture for optimal performance. To reduce memory us-
and the best techniques to select the fusion level. To that effect, age, (Edel & Köppe, 2016) developed optimised binary version of
Jing et al. (2017) propose adaptive multi-sensor fusion using the Bidirectional LSTM for human activity recognition in a resource
deep convolutional neural network. The proposed techniques learn constrained environment such as mobile or wearable devices.
features and optimise the combination of sensor fusion level such The extended version of Bidirectional LSTM (Graves & Schmidhu-
as extraction, selection, data, features, and decision fusion levels to ber, 2005) achieved real-time and online activity recognition by
build complex recognition patterns for higher activity detections. applying binary values to the network weight and activation pa-
These processes go through from the lower layer of the network to rameters.
the higher layer and implement the robust feature extraction pro- Subsequent studies introduced other aspects of the recur-
cess. rent neural network. Notably, Palumbo, Gallicchio, Pucci, and
Automatic feature extraction in wearable sensors with the con- Micheli (2016) proposed the Recurrent Neural Network for real-
volutional neural network provide means to monitor beach vol- time human activity recognition trained with echo state network
ley ball players’ skills from a tri-axial accelerometer (Kautz et al., leveraging smartphones and Reciprocal Received Signal Strength
2017). To achieve that, the authors deploy data collected from 30 (RSS). Echo State Network is a Recurrent Neural Network with a
subjects wearing sensors attached to the right hand with a thin non-trainable reservoir and linear readout in which the weights
wristband. However, the proposed architecture of the CNN suffered are randomly generated during training (Rodan & Tino, 2011).
However, a number of issues have deterred the practical applica- recognition tasks due to the limited amount of training and test-
tion of the Echo State Network. These include the unclear prop- ing data. Recently, other studies incorporated the convolutional
erties of the reservoir and lack of training strategies to achieve neural network and sparse coding to produce sparse representa-
optimal performance but rely on a game of chance. Furthermore, tion and reduce computational time. This can be seen in recent
Choi, Schuetz, Stewart, and Sun (2016) develop the Gated Recur- work by Bhattacharya and Lane (2016), which proposed sparse
rent Unit Model (Cho et al., 2014) to detect heart failure from clin- coding-based convolutional neural network for mobile based ac-
ical time series data. Gated recurrent unit is an RNN model that tivity recognition. To reduce computation time, memory and pro-
is similar in structure to LSTM but with simple parameter update cessor usage, they introduced sparsification of the fully connected
and recently achieved superior results in similar classification tasks layer and separation of the convolutional kernel. The techniques
(Zaremba, 2015). ensure full optimisation of CNN to be implemented for mobile de-
vices.
Another work for hybridization of deep learning methods for
4.2.3. Other discriminative deep learning models
robust features extraction was reported in G. Ma et al., (2016). In
Various studies have also proposed other discriminative feature
the work, the authors proposed the fusion of features extracted
extraction methods for human activity recognition. For instance,
with deep autoencoder to obtain more abstract features. While
studies in Ponce, de Lourdes Martínez-Villaseñor, and Miralles-
Khan and Taati (2017)proposed a channel-wise ensemble of au-
Pechúan (2015) and Ponce, Martínez-Villaseñor, and Miralles-
toencoder to detect unseen falls using wearable devices. In the
Pechuán (2016) proposed and analysed the use of Artificial Hy-
study, stacked autoencoder was used to learn accelerometer and
drocarbon Network (AHN) for human activity recognition. Arti-
gyroscope data separately, using interquartile range and then train-
ficial Hydrocarbon Network is an algorithm inspired by an or-
ing a new autoencoder on data with no outliers to accurately iden-
ganic chemistry that use heuristic mechanism to generate or-
tify unseen fall. Ijjina and Mohan (2016) developed ensemble deep
ganise structure to ensure modularity and stability in activity
learning approach based on Convolutional Neural network by al-
recognition. The algorithm is tolerant to noisy sensor data. How-
tering the inputs and weights of network of each convolutional
ever, it needs to be combined with heuristic feature extraction
neural network to create network structures variabilities and then
and selection techniques to increase recognition time. Similarly,
combined the results with different ensemble fusion techniques.
Rogers, Kelleher, and Ross (2016) exploited deep neural language
Recently, an ensemble of diverse long short term memory (Guan
model for the discovery of interleaved and overlapping activities.
& Ploetz, 2017) was evaluated on publicly available datasets for
The model builds hierarchical activities and captures the inher-
human activity recognition. The proposed method outperformed
ent complexities in activity details. Similarly, Hongqing Fang, He,
other methods in real life activity prediction.
Si, Liu, and Xie (2014) initiated backpropagation techniques to
To recognise and detect complex activity details, there is a need
train feedforward neural for complex human activity recognition
to capture spatial and temporal dependencies involve in human ac-
in smart home environment. Although the algorithm outperformed
tivity recognition. The convolutional neural network and recurrent
the Hidden Markov Model and Naïve Bayes, it requires combined
neural network are important deep learning methods in this re-
handcrafted feature extraction for high-performance accuracy. Y.-
gard. The techniques are common in multimodal and multi-sensor
L. Chen et al. (2016) proposed manifold elastic network for feature
activity recognition frameworks. X. Li et al. (2017) investigated the
extraction and dimensionality reduction by mapping motion sensor
use of CNN and LSTM for recognition of concurrent activities. The
data from high dimensional to low dimensional subspace through
authors introduced encoder to output binary code prediction that
minimization algorithm. Table 4 summarises recently discrimina-
denotes whether the activity is in progress or not in progress. Fur-
tive model for human activity recognition and their advantages.
thermore, the architecture can accept input from the sensor of dif-
ferent modalities. Similarly, Ordóñez and Roggen (2016) proposed a
4.3. Hybrid deep learning methods convolutional neural network and long short term memory to au-
tomatically learn translational invariant features and model tem-
Various research efforts have been geared toward obtaining ro- poral dependencies in multimodal sensor comprise of accelerom-
bust and effective features for human activity recognition by com- eter and gyroscope sensor. The pooling layer in the network was
bining generative, discriminative or both methods. From the avail- replaced with a recurrent layer (LSTM) that models the temporal
able literature on hybrid implementation, the convolutional neural sequence, whereas the final layer is the SoftMax regression that
network seems to be the best choice method for many studies to produces the class prediction. The technique was compared with
be hybridised with other generative or discriminative models for baseline CNN using OPPORTUNITY and Skoda datasets with 0.61F1
human activity recognition. For instance, Convolutional Neural Net- score performance. The ensemble of Convolutional neural network
work and Denoising Autoencoder (G. Ma, Yang, Zhang, & Shi, 2016), and bidirectional long short term memory (BLSTM) were proposed
Convolutional Neural Network and Sparse Coding (Bhattacharya & for health monitoring using the accelerometer and acoustic emis-
Lane, 2016), Convolutional Neural Network and Recurrent Neural sion data. CNN extract local features, and while BLSTM encodes
Network (Ordóñez & Roggen, 2016; Sathyanarayana et al., 2016b), temporal dependencies and model sequential structure, past and
Convolutional Neural Network and Restricted Boltzmann Machine present contextual information (R. Zhao, Yan, Wang, & Mao, 2017).
(J. Gao, Yang, Wang, & Li, 2016). Furthermore, other authors have also proposed fusion
In most of these studies, the convolutional neural network is along multimodal and multi-sensor lines. For instance,
incorporated to produce hierarchical and translational invariant Song et al. (2016) proposed the fusion of the video and accelerom-
features. To reduce the source of instability and extract transla- eter sensor model using the convolutional neural network and long
tional invariant features, J. Gao et al. (2016) introduce the cen- short term memory. CNN extract spatial-temporal features from
tred factor Convolutional Restricted Boltzmann Machine (CRBM) video data while the LSTM models temporal dependencies fea-
while in Sarkar et al. (2016), a combination of Deep Belief Net- tures from the accelerometer and gyroscope. These feature vectors
work and convolutional neural network were examined for activity were integrated using a two-level fusion approach for egocentric
recognition in prognostic and health monitoring related services. activity recognition. However, the result obtained in multimodal
The authors compare the performance using electroencephalogram fusion performed below expectation due to the small number of
sensor data with deep learning outperforming handcrafted fea- training examples. In Neverova et al. (2016), the authors proposed
tures. However, the result deteriorated when it was tested on four the recurrent neural network and convolutional neural network
Table 4
Discriminative deep learning methods for human activity recognition.
References Methods Description Advantages
(Castro et al., 2015; Charalampous & Convolutional Neural Multilayer neural network that combines Extract hierarchical and translational
Gasteratos, 2016; Chen & Xue, 2015; Network convolution and pooling operations to invariant features from sensor data with or
Eskofier et al., 2016; M. Gjoreski et al., extract translation invariant, temporally without pre-processing to enhance
2016; J. Q. Gong et al., 2016; Ha et al., correlated and hierarchical feature vectors performance and recognition accuracy
2015; Jiang & Yin, 2015; Jing et al., 2017; from sensor data. The architecture use
Kautz et al., 2017; G. Liu et al., 2016; Page convolutional operation to handle and
et al., 2015; Pereira et al., 2016; extract local features and cancel the effect
Pouladzadeh et al., 2016; Rad et al., 2015; of translation and displacement in sensor
Ravi et al., 2016a,b; C. A. Ronao & S.-B. data
Cho, 2016; Ronaoo & Cho, 2015;
Sathyanarayana et al., 2016b; Sheng et al.,
2016; Singh et al., 2016; Vepakomma et al.,
2015; Yang et al., 2015; R. Yao et al., 2017;
Yin et al., 2016; Zhang & Wu, 2017; Zheng,
Ling, & Xue, 2014; J. Zhu et al., 2015)
(Y. Chen et al., 2016; Inoue et al., 2016; S. Long Short Term Memory Recurrent neural network (RNN) that Capture temporal dependencies and
Ma et al., 2016; X. Ma et al., 2015; Moon & incorporate memory block to overcome complex activities dynamic in raw sensor
Hamm, 2016; Nguyen et al., 2016) backpropagation problem and detect data
activities with long-term temporal
dependencies
(Edel & Köppe, 2016) Binarise-Bidirectional Long Recurrent Neural Network in which the Has low computational complexity and
Short Term Memory network parameters are binary values applicable in resource constrained
trained and evaluated with bits logics environment such as mobile and wearable
devices with low energy resources. The
extracted features are invariant to
distortion and transformation
(Choi et al., 2016) Gated Recurrent Unit Recurrent Neural Network with reduced Gated Recurrent unit has fewer parameters
parameter for detection and recognition of and easy to train
time sensitive events
(Ponce, Miralles-Pechuán, & Artificial Hydrocarbon Nature inspired meta-heuristic and Ability to model noisy and unlabelled data
Martínez-Villaseñor, 2016) Network chemical organic algorithm that organise and also robust to sensor data
activity details in modules characteristics and data point
(Rogers et al., 2016) Deep Neural Model A form of deep learning for modelling Can handle problem of multiple activities
natural language problem. The algorithm is occurring in parallel (interleaved activities)
trained to approximate model distribution
by taking encoding of sensor distribution
and produce posterior distribution of all
possible values
(Y.-L. Chen et al., 2016) Manifold Elastic Network Dimensionality reduction methods that Minimise error mechanisms to select
encode local geometry to find best feature appropriate feature subspace
representation in raw sensor data
to extract feature vectors optimised with shift-invariant dense sensor, merged into global interaction and then extract temporal
mechanism to reduce computation complexity. In order to develop interaction to model signal dynamics.
effective deep learning fusion approach, Hammerla, Halloran, and Various studies have proposed fusion of deep learning model
Ploetz (2016) explored the effect of hyper-parameter setting such and handcrafted features for human activity recognition. Fusion
as regularisation, learning process, the number of architecture of handcrafted features and deep learning are effective for in-
on the performance of deep learning for human activity recog- creased recognition accuracy, real time and on-board human ac-
nition. The authors concluded that hyper-parameters have great tivity recognition in wearable devices. Furthermore, the techniques
impact on the performance of deep architectures and recommend allow extraction of interpretable feature vectors using spectrogram
extensive hyper-parameter tuning strategies to obtain enhance and to capture intensity among data points (Ravì, Wong, Lo, et al.,
activity recognition rate. To develop a multi-fusion architecture 2017). Interestingly, some studies have also found that such fu-
of CNN and LSTM, Morales and Roggen (2016) examined the sion are important means to model lateral and temporal varia-
effect of transfer learning at the network kernel between users, tion in activity details by adaptively decomposing complex activ-
applications domains, sensor modalities and sensor placements ity into simpler activity details and then train the algorithm us-
in human activity recognition. They noted that transfer learning ing radius margin bound for network regularisation and improve
greatly reduced training time and are sensitive to sensor charac- performance generalisation (Liang Lin et al., 2015). In recent work,
teristics, placement and motion dynamic. They utilised the above Alzantot, Chakraborty, and Srivastava (2017) explored generation of
automatic feature representation method to develop a hybrid of artificial activity data by fusion of mixture density network and
CNN and LSTM for extraction of robust features for human activity long short term memory. The approach was proposed to resolve
recognition in a wearable device. In Sathyanarayana et al. (2016b), the issue of lack of training data using mobile phones and discrim-
CNN-LSTM was used to model the impact of sleep on physical inate robust feature vectors. Developing protocol to collect large
activity detection with actigraphy dataset. CNN models robust training data for human activity recognition project is very tedious
feature extraction while LSTM was used to build sleep prediction. and may result to privacy violations. Therefore, the study gener-
Alternatively, a convolutional neural network with Gated Recur- ated synthetic data to augment the training sensor data generated
rent Unit (GRU) was proposed by S. Yao, Hu, Zhao, Zhang, and using mobile phone. Moreover, the developed fusion of mixture
Abdelzaher (2016) for activity recognition and car tracking using density networks and long short term memory will help to re-
accelerometer, gyroscope and magnetometer data. CNN and GRU duce reliance on real training data for evaluation of deep learning.
were integrated to extract local interaction among identical mobile
Table 5
Hybrid deep learning methods for human activity recognition.
References Methods Descriptions Advantages
(J. Gao et al., 2016; Sarkar et al., 2016) CNN, RBM Propose integration of Deep Belief Network Provide automatic feature extraction and
and Convolutional Neural Network for selection without extensive pre-processing
real-time multimodal feature extraction in procedure
unconstrained environment
(Bhattacharya & Lane, 2016) Sparse coding and Automatically produce compact The use of sparse coding helps to reduce
Convolutional Neural representation of features vectors from raw computation time and memory usage by
Networks sensor data for mobile based activity utilising sparsification approach to separate
recognition. fully connected layer and convolutional
kernel.
(Khan & Taati, 2017) Ensemble of Channel-wise Channel-wise autoencoder algorithms Automatically learn generic features from
Autoencoder fusion of autoencoder trained separately raw sensor data.
with accelerometer and gyroscope sensor
data and combine with reconstruction
error values
(Ijjina & Mohan, 2016) Ensemble of Deep Develop fusion of extracted features of Achieve high model diversity and enhance
Convolutional Neural homogenous CNN architecture built by performance generalisation
Networks alternating the initialisation of the network
parameters.
(Guan & Ploetz, 2017; X. Li et al., 2017; Convolutional Neural Propose multimodal and spatial-temporal Suitable for multimodal, Multi-feature and
Morales & Roggen, 2016; Neverova et al., Network (CNN) and feature extraction with CNN and LSTM for multi-sensory for recognition of complex
2016; Ordóñez & Roggen, 2016; Recurrent Neural Networks concurrent activity recognition and concurrent activity details
Sathyanarayana et al., 2016b; Song et al., (RNN)
2016; Zhao et al., 2017)
(S. Yao et al., 2016) CNN, Gated Recurrent Unit Integrate convolutional neural network and Provide low energy consumption and low
(GRU) Gated recurrent unit that exploits local latency services for implementation in
interaction within activities and merges mobile and wearable devices. Gated
them into global interaction to extract recurrent unit has expressible terms with
temporal relationship reduce network complexity for mobile
based implementation
(Lin et al., 2015; Ravi et al., 2016a,b CNN, Conventional feature Combine deep feature learned with CNN Enable real-time on-board implementation
and statistical feature for real-time mobile with reduced feature vectors. The method
based implementation of activity can handle optimal decomposition of
recognition. Also, the fusion provides complex activity details and enhance
effective means of decomposing complex generalisation ability deep learning
activity into sub activities by modelling algorithms for human activity recognition.
temporal variation and extract transition
invariant features.
(Alzantot et al., 2017) LSTM, Mixture Density Deep stacked long short term memory for Distinguish between real and synthetic
Network generation and discriminating artificial data set to improve privacy in data
sensory data in human activity recognition collection
Table 5 summarises the different hybrid deep learning based fea- recognition system with deep learning methods are explained. We
ture extraction techniques for human activity recognition. begin by presenting the training of both deep learning methods
and classification inference algorithm and then the performance
5. Classification algorithms and performance evaluation of evaluation metrics for human activity recognition.
human activities
5.1. Training
Classification is a vital part of human activity recognition pro-
cesses. Classification involves training, testing and use of evalua- Early works using deep neural networks were trained with gra-
tion metrics to measure the performance of the proposed algo- dient descent optimisation where the weights and biases are ad-
rithms. Over the years, different classifiers have been implemented justed to obtain low-cost function. However, training neural net-
in human activity recognition to categorise activity details dur- work with such strategies will cause its output to get stuck in lo-
ing training and testing. The commonly used classifiers are the cal minima due to the high number of parameters involve. To solve
Support Vector Machine (SVM), Hidden Markov Model (HMM), the problem, Hinton et al. (2006) introduced the greedy layer-wise
K-Nearest Neighbour (KNN), and Decision Tress, Neural Network unsupervised pre-training techniques in which the neural network
(NN). In deep learning based human activity recognition, most algorithm is trained one layer at a time then the deep architecture
studies favour multinomial logistic regression (SoftMax) (Ordóñez is fine-tuned in a supervised way with gradient optimisation. In
& Roggen, 2016; Ravi et al., 2016a,b; Song et al., 2016) or Hidden his work, Hinton (2010) showed how to train deep learning algo-
Markov Model (Alsheikh et al., 2015) trained with the deep neu- rithm and set the different hyperparameter settings. Deep learning
ral network for activity recognition. The training process extracts researchers adopt these strategies when validating their methods.
the feature vectors that are fed to the classifiers through fully con- In training deep learning algorithms, the main aim is to find
nected layers to yield probability distribution classes for every sin- network parameters that minimise reconstruction errors between
gle time step of the sensor data (Ravi et al., 2016a,b). The perfor- inputs and outputs (Erfani et al., 2016). Using the pre-training and
mance of the extracted feature vectors is evaluated with pre-set fine-tuning, the networks will learn to extract salient features from
evaluation metrics and access the recognition accuracy and com- sensor data which is then passed to multi-linear logistic regression
putational complexity. Performance metrics such as accuracy, pre- (SoftMax Regression) or any other classifiers to discriminate the
cision, recall and F-measure provide essential information to access activity details. Therefore, numerous regularisation methods have
recognition ability of the features vectors. In this section, training, been proposed to modify the learning algorithm to reduce gen-
classifiers and performance evaluation metrics of human activity eralisation errors by applying hyper-parameter settings to control
Table 6
Sample hyper-parameter setting and optimisation for deep learning training for human activity recognition.
Settings (Ordóñez & (C. A. Ronao & S.-B. (Castro et al., (Jing et al., (Eskofier et al., (Kautz et al., (S. Ma et al.,
Roggen, 2016) Cho, 2016) 2015) 2017) 2016) 2017) 2016)
Learning Rate 0.001 0.01 0.0 0 01 0.05 0.01 0.01 0.1

Momentum 0.9 0.5-0.99 0.9 0.5 0.9-0.999 0.9-0.999 0.9
Size of Mini-batch 100 128 100 20 500 200 100
√ √ √ √ √ √
Dropout
Activation Function ReLU, Tanh ReLU ReLU ReLU ReLU ReLU Tanh
Decay Rate 0.9 0.0 0 0 05 0.0 0 05 0.04 1E-8 1E-8 0.05
Optimisation RMSProp SGD SGD SGD ADAM SGD
Training Epoch 50 0 0 10 0 0 0 0 200 30
Method CNN-LSTM CNN CNN CNN CNN CNN LSTM
the network behaviour. According to Hinton (2010), these hyper- the most popular and is utilised by the majority of the studies re-
parameters include the values of learning rate, momentum, weight viewed (Alsheikh et al., 2015; Jing et al., 2017; Ordóñez & Roggen,
decay, initial values of the weight and weight update mechanism. 2016) with a probability of dropout ranging from 0.5 to 0.8.
Others are pre-training and fine-tuning parameter values, optimi- In addition to dropout, weight decay techniques such as L1/L2
sation procedures, activation functions, sizes of mini-batch, train- regularisations prevent overfitting by introducing penalty term for
ing epochs, network depth and pooling procedure to use when large weights and this help to improve generalisation and shrink
training convolutional neural networks. In deep learning based hu- useless weights. Studies apply different weight decaying terms
man activity recognition, different studies specify varying values with varying values. Also, optimisation techniques such as batch
of these hyper-parameters relying on the network and size of the normalisation that compute gradients on whole datasets, stochas-
training sensor data. Different hyper-parameter settings that were tic gradient descent (SGD) using each training examples or mini-
recently implemented for mobile and wearable sensor based hu- batch gradient descent that compute update on every mini-batch
man activity recognition is shown in Table 6. Here we present brief will further help to reduce invariance of the parameter update
explanations of these hyper-parameters with examples of value (Ruder, 2016). However, batch normalisation is slow and does not
settings in recent works. allow online weight update. Stochastic gradient provides faster
Learning rate provides the value that shows how much the net- convergence and helps to choose proper learning rate. It is widely
work has learned during neural network training iterations. The applied in deep learning based human activity recognition (Ravì,
learning rates need to be initialised in such a way that it is not Wong, Lo, et al., 2017; Vepakomma et al., 2015; Wang, 2016).
too large or small. A large value will cause the network weight Other optimisation algorithms have also been implemented for
to explode; a value between 0.0 0 01 multiplied by the weight is deep learning training. For instance, Adagrad (Duchi, Hazan, &
recommended. Past studies in human activity recognition using Singer, 2011) apply adaptive learning rate to the network param-
mobile and wearable sensor implement varying values that range eter to improve robustness to Stochastic gradient descent, while
from 0.0 0 01 (Castro et al., 2015), 0.001 (Alsheikh et al., 2015; Kautz (Zeiler, 2012) proposed ADADelta that applied adaptive methods
et al., 2017), 0.01 (Eskofier et al., 2016; Ronao & Cho, 2016), 0.0 5 to decrease the learning rate. Furthermore, to solve the problem
(Jing et al., 2017) to as high as 0.1 (S. Ma et al., 2016). of diminishing weights, algorithms such as RMSProp (Tieleman &
Momentum (Qian, 1999) increases the velocity of learning and Hinton, 2012) and Adaptive Moment Estimation (ADAM) (Kingma
the rate of convergence of deep neural networks. Previous studies & Ba, 2014) were proposed. RMSProp adopts adaptive learning rate
in deep learning based human activity recognition adopted the rec- to solve the diminishing weights issues by adapting different step
ommended values between 0.5 and 0.99 (Kautz et al., 2017; Ronao size for each neural network weights. ADAM applies an exponen-
& Cho, 2016). The size of mini-batch is another important parame- tially decaying average of past square gradient with default values
ter used to avoid overfitting. The mini-batch size divides the train- ranging from 0.9 to 0.999 and momentum of 8E-10. Adaptive op-
ing data into small size of 10 to 100 training set, and then total timisation is important and widely used because of its ability to
gradients are computed using these sizes. When the network is adapt to learning rate and momentum without manual interven-
trained with stochastic gradient descent, there is need to maintain tion. Furthermore, Q. Song et al. (2017) proposed an evolutionary
relative sizes to reduce sampling bias. In activity recognition, too based optimisation algorithm called Ecogeography Based Optimi-
large mini-batches will be the equivalent of using large window sation (EBO) that adaptively optimises the autoencoder algorithm
size, and therefore may increase computation time and miss im- layer by layer to achieve optimal performance. Another important
portant activity details. Therefore, factors such as the size of data optimisation technique is the use of early stopping criteria that
and implementation platform play vital roles in choosing the size monitor errors on each validation set and stop when the validation
of mini-batch (Ronao & Cho, 2016). error stops increasing. Table 6 shows some of the training tech-
Another key insight for improving deep learning model is the niques in some of reviewed studies with their value settings.
use of weight regularisation. Regularising large weight in deep
learning to avoid overfitting is imperative during training due to 5.2. Classification
large parameter updates. Overfitting is monitored by measuring
the free energy of training data (Hinton, 2010). Previous stud- Deep learning algorithms are applied on sensor data to ex-
ies have proposed various regularisation techniques for training tract discriminative and salient features and then flattened and
deep neural networks. For instance, Dropout (Srivastava, Hinton, pass to an inference engine to recognise activities classes. The out-
Krizhevsky, Sutskever, & Salakhutdinov, 2014) randomly deletes puts of the deep neural network model feature at the fully con-
half of the feature values to prevent complex co-adaptation and nected layer of the model are connected with classifiers. The most
increase generalisation ability of the model. Dropout regularisation commonly used classifiers are Multinomial Regression (SoftMax)
technique were recently improved by Wan, Zeiler, Zhang, Cun, and (Alvear-Sandoval & Figueiras-Vidal, 2018; Alzantot et al., 2017;
Fergus (2013) into DropConnect by randomly dropping weight vec- Guan & Ploetz, 2017; Ordóñez & Roggen, 2016; Ronao & Cho, 2016),
tors instead of the activation function. However, Dropout is still Support Vector Machine (Erfani et al., 2016) or Hidden Markov
Model (Alsheikh et al., 2015) and provide probability distribution (ROC) curve. Therefore, the activity can be classified as True Pos-
classes over activity details. Most of the studies reviewed use Soft- itive (TP), True Negative (TN) when correctly recognised or False
Max to model the probability of the activity classes. Positive (FP) or False Negative (FN) when incorrectly classified.
SoftMax is a variant of logistic regression that model Multi- Other performance metrics are derived with True positive or True
class classification (J. Gao et al., 2016; O’Donoghue & Roantree, Negative. These metrics are discussed below:
2015) using cost minimization approach. Therefore, given training Accuracy provides the overall correctly classified instances. It is
sets {(x(i) ,y(i) ), (x(i) ,y(i) ), ...............(x(m) ,y(m) )} with corresponding m la- the sum of correct classification divide by the total number of clas-
bel examples, where y(i) ∈ {1, 2, 3......, k} and x is the input feature sification.
space. The SoftMax parameters are trained by minimising the cost TP + TN
function and then fine-tuned to minimise the likelihood function (4)
TP + FP + TN + FN
and improve adaptability. The cost function with the decay terms
Precision (Specificity) measures the accuracy and provides the
is as stated below.
value based on the fraction of the negative instance that are clas-
θ j x
T (i )
1
m k
λk
n sified as negative.
J (θ )= 1{y(y − j )} log k T (i) + θ jk2 (λ > 0 ) TP
m i x 2 (5)
i=1 j=1 i=1 i=1 j=0
TP + FP
(1)
Recall measures the performance of correctly predicted in-
The fine-tuned algorithm through backpropagation to improve stances as positive instances.
performance is given as:
TP
(6)
θ (k)T x(i)
exp TP + FN
p y(i ) = k/x(i ) ; θ = k (2)
j=1 exp θ
( j )T x(i ) F-Measure (Score), F-Measure is mainly applied in unbalanced
datasets and provides a geometric mean of sensitivity and speci-
The above equation provides the probability of the activity ficity. F-measure
classes with possible values of labels (Yan et al., 2015). Also, (Ronao
& Cho, 2016) noted that the last layer of the convolutional neural Pr ecison.Recall
2. (7)
network that infers activity classes is given as: Pr ecision + Recall
Confusion Matrices: Confusion matrices are important perfor-
exp pL−1 wL + bL
p(c/p) = arg max exp NC (3) mance measure, and the matrix provide the overall misclassifica-
L−1 wk
c=C k=1 exp p tions rate in human activity recognition (Hammerla, 2015). The
known classes are represented with rows while the columns corre-
Where c is the activity class, L is the last layer index of the con-
spond to the predicted classes made by the classifiers. The use of
volutional neural network (CNN), and NC is the total number of
confusion matrices allows the analysis of Null class which is com-
activity classes.
mon in Human Activity Recognition and further enables visualisa-
5.3. Evaluation metrics tion of the recognition performance of the system.
Receiver Operating Characteristics (ROC) Curve: The ROC curve
The performance of features representation for human activity is also known as precision-recall rate and provides mechanism to
recognition using mobile and wearable sensors is evaluated with analyse the true positive rate against the true negative rate give as
pre-set evaluation techniques. Criteria such as accuracy, compu- (FPR). However, the ROC curve is only suitable for detection model
tation time and complexity, robustness, diversity, data size, scal- as it depends on the number of True Negative classes and may not
ability, types of sensor, users and storage requirements are used be used in imbalance dataset which is common in deep learning
to evaluate how the features extracted, and classifiers perform based human activity recognition. Metrics such as Equal Error Rate
in relation to other studies. Alternatively, deep learning meth- that show the values at which precision is equal to recall, aver-
ods can also be evaluated on how varying the hyper-parameters age precision and Area Under the Curve (AUC) the show the over-
affect their performances during training, filter size, pre-training all performance of classifiers and probability that chosen positive
and fine-tuning, pooling layers and number of temporal sequences instances will be ranked higher than negative instances (Bulling
(Alsheikh et al., 2015; Ordóñez & Roggen, 2016; Ronao & Cho, et al., 2014b; Hammerla, 2015).
2016). These parameters evaluation is still an open research chal- Accuracy, precision and recall are suitable for two classes and
lenge to establish their effects on deep learning network per- balance datasets. For imbalance data, average accuracy, precision
formance (Erfani et al., 2016; Munoz-Organero & Ruiz-Blazquez, and recall are computed for the overall activities. These values are
2017). averages of the summation of their individual values.
Like the handcrafted features based human activity recogni-
1
N
T Pi
tion methods, deep learning features are evaluated with differ- Average accuracy = (8)
ent performance metrics. Hold-out cross-validation techniques are
N ( T P + F P )i
i=1
utilised to test the performance of features representation on dif-
1 T Pi
N
ferent datasets. Hold-out cross-validation techniques include leave- Precision = (9)
one-out, leave one person out when testing the performance of N T Ii
i=1
single-user, 10-fold cross validation, or leave one day out when us-
1 T Pi
N
ing data collected for a specific number of days for activity details
(Hammerla et al., 2015). These different hold-outs cross-validation Average Recall = (10)
N T Ti
i=1
techniques allow the deep learning training to be repeated a num-
ber of times to ensure generalisation across datasets. Different per- where N is the number of classes, TI, the total number of inferred
formance evaluation metrics used in the studies review is pre- label and TT is the ground truth label. However, it has become an
sented in Table 7 below. issue of contention in deep learning as most of the data are un-
The most common performance metrics are accuracy, precision, labelled data and ground truth labels are missing in most cases.
recall, confusion matrices and Receiver Operating Characteristics The use of average precision and recall require manual annotation
Table 7
Evaluation metrics of deep learning methods for human activity recognition.
References Accuracy Precision Recall Confusion Matrix F1 -Score ROC/AUC

√
(Plötz et al., 2011) – – – – –
√ √
(Bhattacharya et al., 2014) – – – –
√ √ √
(Al Rahhal et al., 2016) – – –
√
(Jokanovic et al., 2016) – – – – –
√ √ √
(Munoz-Organero & Ruiz-Blazquez, 2017) – – –
√
(Jing et al., 2017) – – – – –
√
(Alsheikh et al., 2015) – – – – –
√
(Erfani et al., 2016) – – – – –
√
( Ravi et al., 2016a,b) – – – – –
√ √ √
(Zhang et al., 2015a,c) – – –
√ √ √
(Ravì, Wong, Lo, et al., 2017) – – –
√
(Q. Song et al., 2017) – – – – –
√
(Wang, 2016) – – – – –
√ √ √ √
(Kautz, et al., 2017) – –
√
(Guan & Ploetz, 2017) – – – –
√
(Ronao & Cho, 2016) – – – – –
√ √ √ √ √
(Sathyanarayana et al., 2016b) –
√ √ √
(X. Li et al., 2017) – – –
√ √
(Ordóñez & Roggen, 2016) – – – –
√ √
(Song et al., 2016) – – – –
√ √ √
(Yang et al., 2015) – – –
of data which is tedious and laborious especially for mobile based repetitions in each gesture. The activities considered are “Write on
and real time human activity recognition ( Ravi et al., 2016a,b; notepad”, “Open hood”, “Close hood”, Check steering wheel” etc.
Ravì, Wong, Lo, et al., 2017). Studies adopting deep learning meth- using on-body sensors placed on the right and left arms.
ods test for precision and recall instead. Daily and Sports Activity (Barshan & Yüksek, 2014) was collected
at Bilkent University in Turkey for human activity classification us-
6. Common datasets for deep learning based human activity ing on-body sensors placed on different parts of the body. The
recognition dataset involved five inertial measurement unit sensors by eight
((8) subjects and performed nineteen (19) different ambulatory ac-
Benchmark datasets are important for human activity recogni- tivities. The IMU collected multimodal data: accelerometers, gyro-
tion with deep learning methods. With benchmark datasets, re- scope and magnetometer for activities involving walking, climbing
searchers can test the performance of their proposed methods and stairs, standing, walking on the treadmill etc. It was made public
how the results compare with previous studies. Some studies used after their research with intra-subject variability. It is a challeng-
datasets collected purposely for their research while others rely on ing dataset for human activity recognition.
public datasets to evaluate and validate their methods which are WISDM dataset (Kwapisz et al., 2011) by Wireless Sensor Data
the most popular procedure among researchers in human activity Mining Lab Fordham University describes a dataset collected for
recognition. human activity recognition using Android based mobile phone ac-
The main advantages of benchmark dataset are the ability to celerometer sensors. The data was collected from twenty-nine (29)
provide varieties of activity details both ambulatory, ambient liv- users with single mobile phones doing simple ambulatory activi-
ing, daily, gesture and skill assessment activities (Hammerla, et al., ties such as working, jogging, sitting, standing, etc.
2015). The most widely used benchmark datasets and the number PAMAP2 Reiss & Stricker, 2012), Physical Activity monitoring for
of sensors, activities and subjects are shown in Table 8. Aging People comprises daily activity dataset collected with three
OPPORTUNITY Dataset (Roggen et al., 2010) is a set of com- inertial measurement (IMU) and heart rate monitor sensors for a
plex, hierarchical and interleaved dataset for activity of daily liv- 10 hour period using nine (9) subjects. The sensors were placed
ing (ADL) collected with multiple sensors of different modalities at different body positions (dominant arm, ankle and chest region)
in naturalistic environments. During the data collection, the sen- and measured activities ranging sitting, jogging, watching TV to us-
sors were integrated into objects, environments and on-body that ing the computers.
ensure multimodal data fusion and activity modelling. The OPPOR- mHealth (Oresti Banos et al., 2014) comprises 12 daily activ-
TUNITY dataset is composed of sessions, daily living activities and ity dataset collected using accelerometer, gyroscope, magnetometer
drills. In the daily living activity section, the subjects were asked to and electrocardiogram sensor for health monitoring applications.
perform different kitchen-related activities such as preparing and It uses diverse mobile and wearable biomedical devices to collect
drinking coffee, eating sandwich, cleaning up, etc. while in the drill sensor data. The architecture of the mobile app includes compo-
session, the subjects were asked to perform 20 set of repeated ac- nents such as data collection, storage, data processing and classi-
tivities like “Opening and close the fridge”, “Open and close the fication, data visualisation and service enablers that provide com-
dishwasher”, “Open and close the door”, “Clean the table” etc. for plete health monitoring systems.
a period of 6 hours. All the datasets were gathered with Inertia
Measurement Unit (IMU) sensors with different modalities inform 7. Deep learning implementation frameworks
of accelerometers, gyroscope and magnetometer. In a total of sev-
enteen (17) activities were performed with twelve (12) subjects. Deep learning has come a long way and has become an im-
The Skoda Mini Checkpoint Dataset (Zappi et al., 2008) was col- portant area of research. A number of software and hardware im-
lected to check quality assurance checkpoint among assembly lines plementation platforms have been developed that exploit high-
workers in car production environment. In the study, one subject performance computing platforms to extract discriminative fea-
wore twenty (20) 3D sensors on both arms and performed dif- tures for activity recognitions and other application areas. Some
ferent manipulative gestures recorded for 3hours for seventy (70) of these deep learning frameworks are open source, and others are
Table 8
Benchmark dataset for human activity recognition methods evaluation.
Authors Dataset Sensor modalities Number of # Activities

sensors Participant
(Roggen et al., OPPORTUNITY Accelerometer, 19 4 Open and close door, open and close fridge, open and close
2010) gyroscope, dishwasher, open and close drawer, clean table, drink from cup, Toggle
magnetometer switch, Groom, prepare coffee, Drink coffee, prepare Sandwich, eat
sandwich, Clean up
(Zappi et al., 2008) Skoda Accelerometer, 20 1 Write on Notepad, open hood, close hood, check Gap door, open door,
gyroscope, check steering wheel, open and close trunk, close both doors, close
magnetometer doors, check trunks
(Barshan & Daily and Sports Accelerometer, 5 8 Sitting, standing, lying on back, lying on right side, ascending stair
Yüksek, 2014) Activities gyroscope, descending stairs, standing in an elevator still, moving around in an
magnetometer elevator, walking in a parking lot, walking on a treadmill with a speed
of 4 km/h in flat, walking on a treadmill with a speed of 4 km/h and 15
degree inclined positions, running on a treadmill with a speed of
8 km/h, exercising on a stepper, exercising on a cross trainer, cycling on
an exercise bike in horizontal positions, cycling on an exercise bike in
vertical position, rowing, jumping and playing basketball
(Kwapisz et al., WISDM v2 Accelerometer 1 29 Walking, Jogging, Upstairs, Downstairs, Sitting, Standing
2011)
PAMAP2 Accelerometer, 4 18 Lying, sitting, standing, walking, running, cycling, Nordic walking,
gyroscope and Watching TV, Computer work, Car driving, Ascending stairs, Vacuum
magnetometer cleaning, descending stairs, ironing, folding laundry, house cleaning,
playing soccer, rope jumping
(Oresti Banos et al., mHealth Accelerometer, 4 10 Standing still, sitting and relaxing, lying down, walking, climbing stairs,
2014) gyroscope, waist bends forward, frontal elevation of arms, knees bending, cycling,
magnetometer, jogging, running, jumping front and back
electrocardiogram
proprietary developed by different organisations for use in cutting- Theano has a wide range of learning resources and is still used
edge technological development. NVidia1 has become a driving by many researchers and developers.
force in the development of hardware technologies such as Graph- • Caffe (Y. Jia et al., 2014) is a framework for expressing algo-
ical Processing Unit (GPU) and other processors that accelerate rithms in modular form. It provides C++ core language and
learning and improve the performance of deep learning meth- binding support in Python and MATLAB. Caffe provides a com-
ods. Recently, the organisation developed deep learning purpose- plete architecture for training, testing and deployment of the
built microprocessors such as NVidia Tesla 40 GPU acceleration, deep learning model. Moreover, NVidia GPU provides Caffe sup-
Tesla M4 Hyperscale Accelerator and DGX-1 deep learning sys- port for accelerated learning of deep learning.
tem (NVidia-Corps, 2017). Other companies like Mathematica, Wol- • Pylearn2 (Goodfellow et al., 2013) Pylearn2 was proposed in
fram, Nervana Systems, IBM and Intel Curie have followed suit in 2013 as machine learning library composed of several compo-
the development of deep learning implementation hardware (Ravì, nents that can be combined to form complete machine learn-
Wong, Deligianni, et al., 2017). ing algorithms with deep learning models such as Autoencoder,
One important aspect of the NVidia GPU is their support for the Deep Belief Network, Deep Boltzmann machine implementation
majority of the Machine learning and deep learning implementa- module. It is built on top of Theano and provides CPU and
tion tools and packages. Below, we discussed some of these tools GPU support for intensive machine learning implementation.
and frameworks for implementation of deep learning and their var- The major drawback of Pylearn is its low-level API that requires
ious characteristics as shown in Table 9. Although the parameters expert knowledge to implement any deep learning method.
used in the discussion were presented in Ravì, Wong, Deligianni, • Torch (Collobert, Kavukcuoglu, & Farabet, 2011), scientific com-
et al. (2017), the frameworks were updated to reflect the current puting framework that provides model for machine learning
development in the area. implementation. The framework was developed to extend Lua
programming Language and provide the flexibility needed to
• TensorFlow (Abadi et al., 2016) is an open source framework de- design and train machine learning algorithms. It is equipped
veloped by Google Research Team for Numerical computation with tensor; standard MATLAB and Neural Network model
using data flow graph. TensorFlow has the highest number of functionalities that describe neural network architectures.
community support for implementation of deep learning mod- • Cognitive Network Toolkit (Microsoft, 2017) was developed by
els. TensorFlow is very popular in deep learning research due to Microsoft Research to provide a unified framework for well-
its flexibility for a variety of algorithms, portability and can run known deep learning algorithms. It provides multi-GPU paral-
inference on mobile phones devices. Furthermore, it provides lelisation of learning techniques and implements stochastic gra-
support for low level and high-level network training with mul- dient descent and automatic differentiation. The toolkit was re-
tiple GPU, robust and provides consistency of parameter up- leased in 2015 and still has high community contribution in
dates. GitHub.
• Theano (Bergstra et al., 2010) is a Python library used to define, • Lasagne (Dieleman et al., 2015) provides a light library for im-
optimise and evaluate the mathematical expression for multi- plementation of deep learning algorithms such as convolutional
dimensional array. Theano provides high network modelling ca- neural network and recurrent neural network in Theano. It al-
pability, dynamic code generation and speed with multiple GPU lows multiple input architectures with many popular optimi-
support. However, Theano provides low-level API and involves sation techniques such as RMSprop and ADAM. The algorithm
a lot of complex compilations that are often slow. Meanwhile, also provides CPU and Multiple GPU support for the implemen-
tation of deep learning methods.
1
www.nvidia.co.uk
Table 9
Software frameworks for deep learning implementation.
Name Organisation Licence Platform Language Support OpenMP Support Techniques Cloud
Support Computing
Support
RNN CNN DBN

√ √ √
Theano Universite de Montreal BSD Cross Platform Python – –
√ √ √
TensorFlow Google Research Apache 2.0 Linux, OSX Python – –
√
Caffe Berkeley Vision and FreeBSD Linux, Win, OSX, C++,Python, MATLAB – – – –
Learning Centre Android,
√ √ √
Torch Ronan Collobert et al. BSD Linux, Win, OSX, Lua, LuaJIT, C – –
Android, iOS
√ √ √
CNTK Microsoft MIT Linux, Window C++, Python, C#, – –
Command Line
√ √ √ √
Deeplearning4jK Skymind Apache 2.0 Linux, Win, OSX, Java, Scala, Clojure, –
Android Spark
√ √
Keras Francois Chollet MIT Licence Linux, Win, OSX Python –
√ √ √ √ √
Neon Nervana Systems Apache 2.0 OSX, Linux Python
√ √ √ √
Lasagne Universite de Montreal BSD Linux, Win, OSX, Python –
Android
√ √
MXNet Chen et al Apache 2.0 Linux, Win, Andriod Python, R, C++, Julia – – –
√ √ √ √
Pylearn LISA Lab Universite de BSD Cross Platform Python –
Montreal
√ √ √ √
PyTorch Facebook BSD Linux Python
√ √ √ √
CuDNN NVIDIA Free BSD Linux, Win, Android, C –
OSX
• Keras (Chollet, 2015) was developed for deep learning imple- the context based API of CuDNN allows for multithreading and
mentation in Theano and TensorFlow written in Python pro- evaluation of complete deep learning algorithms.
gramming language. It enables high-level neural network API
for speedy implementation of deep learning algorithms. The Various other frameworks are still being developed that will
main key point of Keras is its support for Theano and Tensor- simplify deep learning implementation across platforms and het-
Flow, popular deep learning implementation framework and al- erogeneous devices. For instance, frameworks such as DIGIT, Con-
lows modular, extensible and user platform using Python. vnet and MATLAB based CNN toolbox for feature extraction, Cud-
• MXNet (T. Chen et al., 2015) combines symbolic and imperative anet, CUDA and C++ implementation of CNN and others are being
programming to enable deep neural network implementation fine-tuned to enable deep learning development. There are a num-
on heterogeneous devices (Mobile or GPU clusters). It automat- ber of evaluations of these frameworks that were reported recently
ically derives neural network gradients and graph optimisation (Bahrampour, Ramakrishnan, Schott, & Shah, 2015a, 2015b; Erick-
layer to provide fast and memory efficient execution. son et al., 2017) using parameters such as language support, doc-
• Deeplearning4j (Nicholson and Gibson, 2017) developed by Sky- umentation, development environment, extension speed, training
mind is an open source, distributed and commercial machine speed, GPU support, maturity level, model library, etc. From these,
learning toolkits for deep learning implementation. The frame- TensorFlow has the highest GitHub interest and contribution, sur-
work integrates Hadoop and Spark, with CPU and GPU-enabled passing Caffe and CNTK. Also, some of the frameworks support
for easy and quick prototyping of deep neural network imple- GPU or have limited support in which the GPU has to be resident
mentation. on the workstation (e.g., MXNet).
• Neon (Nervana-Systems, 2017) is developed for cross-platform With the development of deep learning based human activity
implementation in all hardware with support for popular deep recognition, these frameworks have become dominant choices for
learning methods, convolutional neural network and recurrent developers and researcher for mobile and wearable sensor based
neural network. Once codes are written in Neon, it can be de- applications. With different implementation frameworks and vary-
ployed on different hardware platforms, and it provides the ing programming support, the choice of the framework depends
best performance among deep learning libraries. on the programming and technical ability of the users. The soft-
• Pytorch (Erickson, Korfiatis, Akkus, Kline, & Philbrick, 2017) was ware frameworks recently used for mobile-based human activity
recently developed at Facebook and is a front-end integra- recognition are TensorFlow (Eskofier et al., 2016; Kautz et al., 2017),
tion of Torch for high performance deep learning development Theano (Ordóñez & Roggen, 2016; C. A. Ronao & S.-B. Cho, 2016),
with excellent GPU support. It provides Python front-end that Caffe (Yin et al., 2016), Keras (X. Li et al., 2017), Torch ( Ravi et al.,
enables dynamic neural network construction. However, the 2016a,b) and Lasagne (Guan & Ploetz, 2017). Other studies de-
toolkit was recently released and does not have a lot of com- velop the algorithm using programming platforms such as MATLAB
munity support, learning resources and evaluation for its per- (Bhattacharya & Lane, 2016; Erfani et al., 2016; Sheng et al., 2016;
formance. Zebin et al., 2016) and C++ (Ding et al., 2016).
• CuDNN (Chetlur et al., 2014) was developed as GPU-accelerated
library for implementation of common deep learning meth- 8. Open research challenges
ods. The framework with developing with the same intent
as BLAS for optimised high-performance computing, to ease In this section, we present some research challenges that re-
development, training and implementation of deep learning quire further discussion. Many open research issues in the area of
such as convolutional layer, recurrent neural network and back- sensor fusion, real-time and on-board implementation on mobile
propagation techniques. CuDNN supports both GPU and other and wearable devices, data pre-processing and evaluation, collec-
platforms and provides straightforward integration with other tion of large dataset and class imbalance problems are some of
frameworks such as TensorFlow, Caffe, Theano and Keras. Also, the areas that required further research. Here, we discuss these re-
search directions in seven important themes:
• Real-time and on-board implementation of deep learning algo- new domains and a critical issue in activity recognition. Further
rithm on mobile and wearable devices: On-board implementa- research in area related to kernel, convolutional layer, inter-
tion of deep learning algorithms on mobile and wearable de- location and inter-modalities transferability will improve im-
vices will help to reduce computation complexity on data stor- plementation of deep learning based human activity recogni-
age and transfer. However, this technique is hampered by data tion (Ordonez & Roggen, 2016). Moreover, transfer learning in
acquisition and memory constrained in the current mobile and mobile wearable sensor based human activity recognition will
wearable devices. Furthermore, a high number of parameters minimize source, target and environment specific applications
tuning and initialisation in deep learning increases computa- implementation which have not received the needed attention.
tional time and is not suitable for low energy mobile de- • Implementation of deep learning based decision fusion for human
vices. Therefore, utilising methods such as optimal compression activity recognition in mobile and wearable devices: Decision fu-
and use of mobile phone enabled GPU to minimise computa- sion is an essential step to improve the performance and diver-
tion time and resources consumptions is highly needed. Other sity of human activity recognition systems by combining sev-
methods that may provide enabling techniques for real-time eral architectures, sensors and classifiers into a single decision.
implementation is leveraging mobile cloud computing plat- Typical areas that require further researches are heterogeneous
forms for training to reduce training time and memory usage. sensor fusion, combining expert knowledge with deep learning
With this type of implementation, the system can become self- algorithm and combination of different unsupervised feature
adaptive and require minimal user inputs for a new source of learning methods to improve performance of activity recogni-
information. tion systems.
• Comprehensive evaluation of pre-processing and hyper-parameter • Solving the class imbalance problem for deep learning in mobile
settings on learning algorithms: Pre-processing and dimension- and wearable based human activity recognition: Class imbalance
ality reduction is an important aspect of the human activity issues can be found in datasets for human activity recogni-
recognition process. Dimensionality reduction provide mecha- tion and detection of abnormal activities. Class imbalance prob-
nism to minimize computational complexity especially in mo- lem is vital in healthcare monitoring especially fall detection
bile and wearable devices with limited computation powers in which what constitute actual fall is difficult. For mobile and
and memory by projecting high dimensional sensor data into wearable sensor based human activity recognition, class im-
lower dimensional vectors. However, the method and extent of balance maybe as a result of a distortion in the dataset and
pre-processing on the performance of deep learning is an open sensor data calibration which reduce performance generalisa-
research challenge. A number of pre-processing techniques such tion (Edel & Köppe, 2016). Existing studies have proposed a
as normalisation, standardisation and different dimensionality range of solutions such as mixed kernel based weighted ex-
reduction methods need to be experimented with, to know the treme learning machine and cost sensitive learning strategies
effects on performances, computational time and accuracy of (D. Wu, Wang, Chen, & Zhao, 2016). However, there are no stud-
deep-learning methods. Issues such as learning rate optimisa- ies on how class imbalance affect deep learning implementa-
tion to accelerate computation and reduce model and data size, tion especially for mobile wearable sensors. Therefore, strate-
kernel reuse, filter size, computation time, memory analysis and gies to reduce class imbalance will significantly improve human
learning process still require further research as current studies activity recognition using deep learning methods.
depend on heuristics method to apply these hyper-parameters. • Augmentation of mobile and wearable sensor data to enhance
Moreover, the use of grid search and evolutionary optimisation deep learning performance: Another aspect of open research
methods on mobile based deep learning methods that support challenge is the use of data augmentation techniques to im-
lower energy consumption, dynamic and adaptive applications, prove the performance of deep learning methods for motion
and new techniques that enable mobile GPUs to reduce compu- sensors (accelerometer, gyroscopes, etc.) based human activity
tational time are very significant research directions (Ordonez & recognition with the convolutional neural network. Data aug-
Roggen, 2016). mentation methods exploit limited amount of mobile and wear-
• A collection of large sensor datasets for evaluation of deep learnable sensor data by transforming the existing training sensor
ing methods: Training and evaluation of deep learning tech- data to generate new data. These processes are important as
niques require large datasets that abound through different sen- it help to generate enough training data to avoid overfitting,
sor based Internet of Thing (IoT) devices and technologies. The improve translation invariance to sensor orientation, distortion
current review indicates that most studies on deep learning and changes especially in convolutional neural network (CNN)
implementation of mobile and wearable based human activity model. In image classification, data augmentation is a common
recognition depend on benchmark dataset from conventional training strategy (Y. Guo et al., 2016). However, there is need to
machine learning algorithms such as OPPORTUNITY, Skoda and evaluate the impacts and performances of data augmentation
WSDM for evaluation. Data collection methods through cyber- in mobile and wearable sensor-based human activity recogni-
physical systems and mobile crowdsourcing to leverage data tion to generate more training examples and prevent overfit-
collected through the smart home and mobile location data for ting resulting from small datasets. Different data augmentation
transportation mode, smart home environment for elderly care approaches such as change of sensor placements, arbitrary rota-
and monitoring, GPS data for context aware location recognitions, permutation of locations with sensor events, time warp-
tion and other important applications. Therefore, collection of ing and scaling will provide effective means to enhance perfor-
large dataset through the synergy of these technologies are im- mance of deep learning based human activity recognition (Um
portant for performance improvements. et al., 2017).
• Transfer learning for mobile and wearable devices implementa-
tion of deep learning algorithms: Transfer learning based activity 9. Conclusion
recognition is a challenging task to accomplish. Transfer learn-
ing leverage experience acquired in different domains to im- Automatic feature learning in human activity recognition is in-
prove the performance of new areas yet to be experienced by creasing in momentum. This is as results of the steady rise in com-
the system. The main vital reasons for application of trans- putation facilities and large datasets available through mobile and
fer learning are to reduce training time, provide robust and wearable sensing, Internet of Things (IoT) and crowd sourcing. In
versatile activity details and reuse of existing knowledge into this paper, we reviewed various deep learning methods that en-
able automatic feature extraction in human activity recognition. Ahmad, M., Saeed, M., Saleem, S., & Kamboh, A. M. (2016). Seizure detection us-
Deep learning methods such as Restricted Boltzmann Machine, Au- ing EEG: A survey of different techniques. In Emerging technologies (ICET), 2016
International conference on (pp. 1–6). IEEE.
toencoder, and Convolutional Neural Networks and Recurrent neu- Al Rahhal, M., Bazi, Y., AlHichri, H., Alajlan, N., Melgani, F., & Yager, R. (2016). Deep
ral network were presented and their characteristics, advantages learning approach for active classification of electrocardiogram signals. Informa-
and drawback were equally exposed. Deep learning methods can tion Sciences, 345, 340–354.
Alsheikh, M. A., Niyato, D., Lin, S., Tan, H.-P., & Han, Z. (2016). Mobile big data ana-
be classified as generative, discriminative and hybrid methods. We lytics using deep learning and apache spark. IEEE Network, 30, 22–29.
utilise the categorisations to review and outline deep learning im- Alsheikh, M.A., Selim, A., Niyato, D., Doyle, L., Lin, S., & Tan, H.-P. (2015).
plementation of human activity recognition. Those in the gener- Deep activity recognition models with triaxial accelerometers. arXiv preprint
arXiv:1511.04664.
ative categories are the Restricted Boltzmann Machine, autoen-
Alvear-Sandoval, R. F., & Figueiras-Vidal, A. R. (2018). On building ensembles of
coder, sparse coding and deep mixture model while the discrim- stacked denoising auto-encoding classifiers and their further improvement. In-
inative approaches include the convolutional neural network, re- formation Fusion, 39, 41–52.
Alzantot, M., Chakraborty, S., & Srivastava, M.B. (2017). SenseGen: A deep
current neural network, deep neural model and hydrocarbon. Simi-
learning architecture for synthetic sensor data generation. arXiv preprint
larly, hybrid methods combine generative and discriminative model arXiv:1701.08886.
to enhance feature learning and such combination dominant re- Angermueller, C., Parnamaa, T., Parts, L., & Stegle, O. (2016). Deep learning for com-
search landscape of deep learning for human activity recognition putational biology. Molecular Systems Biology, 12.
Anguita, D., Ghio, A., Oneto, L., Parra, X., & Reyes-Ortiz, J. L. (2012). Human activity
lately. Hybrid methods incorporate diverse generative model such recognition on smartphones using a multiclass hardware-friendly support vec-
as autoencoder, Restricted Boltzmann Machine with the convolu- tor machine. In International workshop on ambient assisted living (pp. 216–223).
tional neural network or combine discriminative models such as Springer.
Attal, F., Mohammed, S., Dedabrishvili, M., Chamroukhi, F., Oukhellou, L., & Ami-
convolutional neural network and long short term memory. These rat, Y. (2015). Physical human activity recognition using wearable sensors. Sen-
approaches are an important step to achieving automatic feature sors, 15, 31314–31338.
learning and enhancing performance generalisation across datasets Bader, G. D., & Hogue, C. W. (2003). An automated method for finding molecular
complexes in large protein interaction networks. BMC bioinformatics, 4, 2.
and activities. Bahrampour, S., Ramakrishnan, N., Schott, L., & Shah, M. (2015a). Comparative
On the other hand, the implementation of deep learning meth- study of caffe, neon, theano, and torch for deep learning. arXiv preprint
ods is driven by the availability of high-performance computing arXiv:1511.06435.
Bahrampour, S., Ramakrishnan, N., Schott, L., & Shah, M. (2015b). Comparative study
GPU and software frameworks. A number of these software frame-
of deep learning software frameworks. arXiv preprint arXiv:1511.06435.
works were recently released to the research community as open Banos, O., Galvez, J. M., Damas, M., Guillen, A., Herrera, L. J., Pomares, H.,
sources projects. These software frameworks were discussed, tak- et al. (2015). Multiwindow fusion for wearable activity recognition. In I. Ro-
jas, G. Joya, & A. Catala (Eds.). In Advances in computational intelligence: 9095
ing into cognizance their characteristics and what inform develop-
(pp. 290–297). Pt Ii.
ers’ choice in using particular frameworks. Also, training, classifica- Banos, O., Garcia, R., Holgado-Terriza, J. A., Damas, M., Pomares, H., Rojas, I.,
tion and evaluation of deep learning algorithm for human activity et al. (2014). mHealthDroid: A novel framework for agile development of mo-
recognition is not always a trivial case. To provide the best compar- bile health applications. In International workshop on ambient assisted living
(pp. 91–98). Springer.
ison and categorisations of recent events in the research commu- Barshan, B., & Yüksek, M. C. (2014). Recognizing daily and sports activities in two
nity, we reviewed the training and optimisation strategies adopted open source machine learning environments using body-worn sensor units. The
by different studies recently proposed for mobile and wearable Computer Journal, 57, 1649–1667.
Bengio, Y. (2009). Learning deep architectures for AI. Foundations and Trends® in
based human activity recognition. Furthermore, classification and Machine Learning, 2, 1–127.
performance metrics with different validation techniques are im- Benuwa, B., Zhan, Y. Z., Ghansah, B., Wornyo, D. K., & Kataka, F. B. (2016). A review
portant to ensure generalisation across datasets. These approaches of deep machine learning. International Journal of Engineering Research in Africa,
24, 124–136.
are adopted to avoid overfitting the model on the training set. Also, Bergstra, J., Breuleux, O., Bastien, F., Lamblin, P., Pascanu, R., Desjardins, G.,
we provide some of the publicly available benchmark datasets for et al. (2010). Theano: A CPU and GPU math compiler in Python. In Proceedings
modelling and testing deep learning algorithms for human activ- of the 9th python in science conference (pp. 1–7).
Bhattacharya, S., & Lane, N. D. (2016). From smart to deep: Robust activity recog-
ity recognition. Some of these datasets that are widely used for
nition on smartwatches using deep learning. In 2016 IEEE International confer-
evaluation are OPPORTUNITY, Skoda, and PAMAP2 which are also ence on pervasive computing and communication workshops (PerCom Workshops)
popular with classical machine learning algorithms. (pp. 1–6).
Bhattacharya, S., Nurmi, P., Hammerla, N., & Plötz, T. (2014). Using unlabeled data in
To provide further insight on the directions of the research
a sparse-coding framework for human activity recognition. Pervasive and Mobile
progress, we presented the open research challenges that require Computing, 15, 242–262.
the attention of researchers. For instance, areas such as deep learn- Bordes, A., Chopra, S., & Weston, J. (2014). Question answering with subgraph em-
ing based decision fusion, implementation of deep learning on- beddings. arXiv preprint arXiv:1406.3676.
Boureau, Y.-L., Ponce, J., & LeCun, Y. (2010). A theoretical analysis of feature pool-
board mobile devices, transfer learning and class imbalance prob- ing in visual recognition. In Proceedings of the 27th international conference on
lems that enable implementation of human activity recognition machine learning (ICML-10) (pp. 111–118).
for enhanced performance accuracy. With further development of Bulling, A., Blanke, U., & Schiele, B. (2014a). A tutorial on human activity
recognition using body-worn inertial sensors. Acm Computing Surveys, 46,
high computational resources that increase the online and real- 1–33.
time deep learning implementation on mobile and wearable de- Bulling, A., Blanke, U., & Schiele, B. (2014b). A tutorial on human activity recognition
vices, such machine learning techniques are projected to improve using body-worn inertial sensors. ACM Computing Surveys (CSUR), 46, 33.
Cao, L., Wang, Y., Zhang, B., Jin, Q., & Vasilakos, A. V. (2017). GCHAR: An efficient
human activity recognition researches. Group-based Context–aware human activity recognition on smartphone. Journal
of Parallel and Distributed Computing.
Capela, N. A., Lemaire, E. D., & Baddour, N. (2015). Feature selection for wearable
References smartphone-based human activity recognition with able bodied, elderly, and
stroke patients. Plos One, 10, e0124414.
Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C., et al. (2016). Tensor- Castro, D., Hickson, S., Bettadapura, V., Thomaz, E., Abowd, G., Christensen, H.,
flow: Large-scale machine learning on heterogeneous distributed systems. arXiv et al. (2015). Predicting daily activities from egocentric images using deep learn-
preprint arXiv:1603.04467. ing. In Proceedings of the 2015 ACM international symposium on wearable comput-
Abdel-Hamid, O., Deng, L., & Yu, D. (2013). Exploring convolutional neural network ers (pp. 75–82). ACM.
structures and optimization techniques for speech recognition. In Interspeech Charalampous, K., & Gasteratos, A. (2016). On-line deep learning method for action
(pp. 3366–3370). recognition. Pattern Analysis and Applications, 19, 337–354.
Abidine, B. M. H., Fergani, L., Fergani, B., & Oussalah, M. (2016). The joint use of Chen, M., Xu, Z., Weinberger, K., & Sha, F. (2012). Marginalized denoising autoen-
sequence features combination and modified weighted SVM for improving daily coders for domain adaptation. arXiv preprint arXiv:1206.4683.
activity recognition. Pattern Analysis and Applications, 1–20. Chen, T., Li, M., Li, Y., Lin, M., Wang, N., Wang, M., et al. (2015). Mxnet: A flexible
Aggarwal, J. K., & Xia, L. (2014). Human activity recognition from 3D data: A review. and efficient machine learning library for heterogeneous distributed systems.
Pattern Recognition Letters, 48, 70–80. arXiv preprint arXiv:1512.01274.
Chen, Y.-L., Wu, X., Li, T., Cheng, J., Ou, Y., & Xu, M. (2016). Dimensionality re- Figo, D., Diniz, P. C., Ferreira, D. R., Jo, #227, & Cardoso, o. M. (2010). Preprocessing
duction of data sequences for human activity recognition. Neurocomputing, 210, techniques for context recognition from accelerometer data. Personal and Ubiq-
294–302. uitous Computing, 14, 645–662.
Chen, Y., & Xue, Y. (2015). A deep learning approach to human activity recognition Fischer, A., & Igel, C. (2014). Training restricted Boltzmann machines: An introduc-
based on single accelerometer. In Systems, man, and cybernetics (SMC), 2015 IEEE tion. Pattern Recognition, 47, 25–39.
international conference on (pp. 1488–1492). IEEE. Gamboa, J.C.B. (2017). Deep learning for time-series analysis. arXiv preprint
Chen, Y., Zhong, K., Zhang, J., Sun, Q., & Zhao, X. (2016). LSTM networks for mobile arXiv:1701.01887.
human activity recognition. Gao, J., Yang, J., Wang, G., & Li, M. (2016). A novel feature extraction method for
Chen, Y. Q., Xue, Y., & Ieee (2015). A deep learning approach to human activity scene recognition based on centered convolutional restricted Boltzmann ma-
recognition based on single accelerometer. In 2015 IEEE international conference chines. Neurocomputing, 214, 708–717.
on systems, man and cybernetics (pp. 1488–1492). Los Alamitos: IEEE Computer Gao, Y., & Glowacka, D. (2016). Deep gate recurrent neural network. arXiv preprint
Soc. arXiv:1604.02910.
Chetlur, S., Woolley, C., Vandermersch, P., Cohen, J., Tran, J., Catanzaro, B., Gawehn, E., Hiss, J. A., & Schneider, G. (2016). 6 Deep learning in drug discovery.
et al. (2014). CuDNN: Efficient primitives for deep learning. arXiv preprint Molecular Informatics, 35, 3–14.
arXiv:1410.0759. Gjoreski, H., Bizjak, J., Gjoreski, M., & Gams, M. (2015). Comparing deep and clas-
Cho, K., Raiko, T., & Ihler, A. T. (2011). Enhanced gradient and adaptive learning rate sical machine learning methods for human activity recognition using wrist ac-
for training restricted Boltzmann machines. In Proceedings of the 28th interna- celerometer.
tional conference on machine learning (ICML-11) (pp. 105–112). Gjoreski, M., Gjoreski, H., Luštrek, M., & Gams, M. (2016). How accurately can your
Cho, K., Van Merriënboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., wrist device recognize daily activities and detect falls? Sensors, 16, 800.
et al. (2014). Learning phrase representations using RNN encoder-decoder for Gong, J., Cui, L., Xiao, K., & Wang, R. (2012). MPD-Model: A distributed multipref-
statistical machine translation. arXiv preprint arXiv:1406.1078. erence-driven data fusion model and its application in a WSNs-based health-
Choi, E., Schuetz, A., Stewart, W. F., & Sun, J. (2016). Using recurrent neural network care monitoring system. International Journal of Distributed Sensor Networks, 8,
models for early detection of heart failure onset. Journal of the American Medical 602358.
Informatics Association ocw112. Gong, J.Q., Goldman, M.D., & Lach, J. (2016). DeepMotion: A deep convolutional neu-
Chollet, F. Keras: Deep learning library for theano and tensorflow URL: https://keras. ral network on inertial body sensors for gait assessment in multiple sclerosis.
io/k. 2016 Ieee Wireless Health (Wh), 164–171.
Chung, J., Gulcehre, C., Cho, K., & Bengio, Y. (2014). Empirical evaluation of gated Goodfellow, I.J., Warde-Farley, D., Lamblin, P., Dumoulin, V., Mirza, M., Pascanu,
recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555. R., et al. (2013). Pylearn2: A machine learning research library. arXiv preprint
Chung, J., Gülçehre, C., Cho, K., & Bengio, Y. (2015). Gated feedback recurrent neural arXiv:1308.4214.
networks. In ICML (pp. 2067–2075). Graves, A. (2013). Generating sequences with recurrent neural networks. arXiv
Cichy, R.M., Khosla, A., Pantazis, D., Torralba, A., & Oliva, A. (2016). Deep neural net- preprint arXiv:1308.0850.
works predict hierarchical spatio-temporal cortical dynamics of human visual Graves, A., & Schmidhuber, J. (2005). Framewise phoneme classification with bidi-
object recognition. arXiv preprint arXiv:1601.02970. rectional LSTM and other neural network architectures. Neural Networks, 18,
Collobert, R., Kavukcuoglu, K., & Farabet, C. (2011). Torch7: A matlab-like environ- 602–610.
ment for machine learning. BigLearn, NIPS workshop. Gravina, R., Alinia, P., Ghasemzadeh, H., & Fortino, G. (2017). Multi-sensor fusion
Cornacchia, M., Ozcan, K., Zheng, Y., & Velipasalar, S. (2017). A survey on activ- in body sensor networks: State-of-the-art and research challenges. Information
ity detection and classification using wearable sensors. IEEE Sensors Journal, 17, Fusion, 35, 68–80.
386–403. Gravina, R., Ma, C., Pace, P., Aloi, G., Russo, W., Li, W., et al. (2017). Cloud-based
Dauphin, G. M. Y., Glorot, X., Rifai, S., Bengio, Y., Goodfellow, I., Lavoie, E., Activity-aaService cyber–physical framework for human activity monitoring in
et al. (2012). Unsupervised and transfer learning challenge: A deep learning ap- mobility. Future Generation Computer Systems, 75, 158–171.
proach. In G. Isabelle, D. Gideon, L. Vincent, T. Graham, & S. Daniel (Eds.). In Pro- Guan, Y., & Ploetz, T. (2017). Ensembles of deep LSTM learners for activity recogni-
ceedings of ICML workshop on unsupervised and transfer learning: 27 (pp. 97–110). tion using wearables. arXiv preprint arXiv:1703.09370.
Proceedings of Machine Learning Research: PMLR. Guo, J., Xie, X., Bie, R., & Sun, L. (2014). Structural health monitoring by using a
Deng, L. (2014). A tutorial survey of architectures, algorithms, and applications for sparse coding-based deep learning algorithm with wireless sensor networks.
deep learning. APSIPA Transactions on Signal and Information Processing, 3, e2. Personal and Ubiquitous Computing, 18, 1977–1987.
Dieleman, S., Schlüter, J., Raffel, C., Olson, E., Sønderby, S. K., Nouri, D., et al. (2015). Guo, Y., Liu, Y., Oerlemans, A., Lao, S., Wu, S., & Lew, M. S. (2016). Deep learning for
Lasagne: first release. Geneva, Switzerland: Zenodo 3. visual understanding: A review. Neurocomputing, 187, 27–48.
Ding, X., Lei, H., & Rao, Y. (2016). Sparse codes fusion for context enhance- Ha, S., Yun, J. M., & Choi, S. (2015). Multi-modal convolutional neural networks for
ment of night video surveillance. Multimedia Tools and Applications, 75, activity recognition. In 2015 IEEE International conference on systems, man, and
11221–11239. cybernetics (pp. 3017–3022).
Dolmans, D., Loyens, S. M. M., Marcq, H., & Gijbels, D. (2016). Deep and surface Habib, C., Makhoul, A., Darazi, R., & Couturier, R. (2016). Multisensor data fusion and
learning in problem-based learning: A review of the literature. Advances in decision support in wireless body sensor networks. In S. O. Badonnel, M. Ulema,
Health Sciences Education, 21, 1087–1112. C. Cavdar, L. Z. Granville, & C. R. P. DosSantos (Eds.), Noms 2016 - 2016 Ieee/Ifip
Domingos, P. (2012). A few useful things to know about machine learning. Commu- network operations and management symposium (pp. 708–712).
nications of the ACM, 55, 78–87. Hammerla, N.Y. (2015). Activity recognition in naturalistic environments using body-
Duchi, J., Hazan, E., & Singer, Y. (2011). Adaptive subgradient methods for online worn sensors.
learning and stochastic optimization. Journal of Machine Learning Research, 12, Hammerla, N. Y., Fisher, J., Andras, P., Rochester, L., Walker, R., & Plötz, T. (2015). PD
2121–2159. Disease state assessment in naturalistic environments using deep learning. In
Edel, M., & Köppe, E. (2016). Binarized-BLSTM-RNN based Human Activity Recogni- AAAI (pp. 1742–1748).
tion. In 2016 International conference on indoor positioning and indoor navigation Hammerla, N.Y., Halloran, S., & Ploetz, T. (2016). Deep, convolutional, and recur-
(IPIN) (pp. 1–7). rent models for human activity recognition using wearables. arXiv preprint
Egede, J., Valstar, M., & Martinez, B. (2017). Fusing deep learned and hand-crafted arXiv:1604.08880.
features of appearance, shape, and dynamics for automatic pain estimation. Harandi, M. T., Sanderson, C., Hartley, R., & Lovell, B. C. (2012). Sparse coding and
arXiv preprint arXiv:1701.04540. dictionary learning for symmetric positive definite matrices: A kernel approach.
Erfani, S. M., Rajasegarar, S., Karunasekera, S., & Leckie, C. (2016). High-dimen- In Computer Vision–ECCV 2012 (pp. 216–229). Springer.
sional and large-scale anomaly detection using a linear one-class SVM with Harasimowicz, A. (2014). Comparison of data preprocessing methods and the impact
deep learning. Pattern Recognition, 58, 121–134. on auto-encoder’s performance in activity recognition domain.
Erickson, B. J., Korfiatis, P., Akkus, Z., Kline, T., & Philbrick, K. (2017). Toolkits and Hasan, M., & Roy-Chowdhury, A. K. (2014). Continuous learning of human ac-
libraries for deep learning. Journal of Digital Imaging, 1–6. tivity models using deep nets. In European conference on computer vision
Eskofier, B. M., Lee, S. I., Daneault, J.-F., Golabchi, F. N., Ferreira-Carvalho, G., Ver- (pp. 705–720). Springer.
gara-Diaz, G., et al. (2016). Recent machine learning advancements in sen- Hasan, M., & Roy-Chowdhury, A. K. (2015). A continuous learning framework for
sor-based mobility analysis: Deep learning for Parkinson’s disease assessment. activity recognition using deep hybrid feature models. IEEE Transactions on Mul-
In Engineering in Medicine and Biology Society (EMBC), 2016 IEEE 38th Annual In- timedia, 17, 1909–1922.
ternational Conference of the (pp. 655–658). IEEE. He, Y., Kavukcuoglu, K., Wang, Y., Szlam, A., & Qi, Y. (2014). Unsupervised feature
Fang, H., He, L., Si, H., Liu, P., & Xie, X. (2014). Human activity recognition based on learning by deep sparse coding. In Proceedings of the 2014 SIAM international
feature selection in smart home using back-propagation algorithm. ISA Transac- conference on data mining (pp. 902–910). SIAM.
tions, 53, 1629–1638. Hinton, G. (2010). A practical guide to training restricted Boltzmann machines. Mo-
Fang, H., & Hu, C. (2014). Recognizing human activity in smart home using mentum, 9, 926.
deep learning algorithm. In Proceedings of the 33rd chinese control conference Hinton, G., Deng, L., Yu, D., Dahl, G. E., Mohamed, A.-R., Jaitly, N., et al. (2012). Deep
(pp. 4716–4720). neural networks for acoustic modeling in speech recognition: The shared views
Feng, Y., Yuan, Y., & Lu, X. (2017). Learning deep event models for crowd anomaly of four research groups. IEEE Signal Processing Magazine, 29, 82–97.
detection. Neurocomputing, 219, 548–556. Hinton, G. E., Osindero, S., & Teh, Y.-W. (2006). A fast learning algorithm for deep
Figo, D., Diniz, P. C., Ferreira, D. R., & Cardoso, J. M. (2010). Preprocessing techniques belief nets. Neural computation, 18, 1527–1554.
for context recognition from accelerometer data. Personal and Ubiquitous Com- Hinton, G. E., & Salakhutdinov, R. R. (2006). Reducing the dimensionality of data
puting, 14, 645–662. with neural networks. Science, 313, 504–507.
Hinton, G. E., & Sejnowski, T. J. (1986). Learning and releaming in Boltzmann ma- Li, G., Deng, L., Xu, Y., Wen, C., Wang, W., Pei, J., et al. (2016). Temperature based
chines. Parallel Distributed Processing: Explorations in the Microstructure of Cogni- Restricted Boltzmann Machines. Scientific reports, 6.
tion, 1, 282–317. Li, X., Zhang, Y., Zhang, J., Chen, S., Marsic, I., Farneth, R.A., et al. (2017). Concur-
Hinton, G.E., Srivastava, N., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R.R. (2012). rent activity recognition with multimodal CNN-LSTM structure. arXiv preprint
Improving neural networks by preventing co-adaptation of feature detectors. arXiv:1702.01638.
arXiv preprint arXiv:1207.0580. Li, Y., Shi, D., Ding, B., & Liu, D. (2014). Unsupervised feature learning for human
Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural Computa- activity recognition using smartphone sensors. In Mining intelligence and knowl-
tion, 9, 1735–1780. edge exploration (pp. 99–107). Springer.
Hodo, E., Bellekens, X., Hamilton, A., Tachtatzis, C., & Atkinson, R. (2017). Shallow Lin, L., Wang, K., Zuo, W., Wang, M., Luo, J., & Zhang, L. (2015). A deep structured
and deep networks intrusion detection system: A taxonomy and survey. arXiv model with radius–margin bound for 3D human activity recognition. Interna-
preprint arXiv:1701.02145. tional Journal of Computer Vision, 1–18.
Hollensen, P., & Trappenberg, T. P. (2015). An introduction to deep learning. In Lin, L., Wang, K. Z., Zuo, W. M., Wang, M., Luo, J. B., & Zhang, L. (2016). A deep
D. Barbosa, & E. Milios (Eds.). Advances in artificial intelligence: 9091. Berlin: structured model with radius-margin bound for 3D human activity recognition.
Springer-Verlag Berlin. International Journal of Computer Vision, 118, 256–273.
Hubel, D. H., & Wiesel, T. N. (1962). Receptive fields, binocular interaction and Liou, C.-Y., Cheng, W.-C., Liou, J.-W., & Liou, D.-R. (2014). Autoencoder for words.
functional architecture in the cat’s visual cortex. The Journal of physiology, 160, Neurocomputing, 139, 84–96.
106–154. Liu, G., Liang, J., Lan, G., Hao, Q., & Chen, M. (2016). Convolution neutral network
Ijjina, E. P., & Mohan, C. K. (2016). Hybrid deep neural network model for human enhanced binary sensor network for human activity recognition. In SENSORS,
action recognition. Applied Soft Computing, 46, 936–952. 2016 IEEE (pp. 1–3). IEEE.
Incel, O. (2015). Analysis of movement, orientation and rotation-based sensing for Liu, H., & Taniguchi, T. (2014). Feature extraction and pattern recognition for human
phone placement recognition. Sensors, 15, 25474. motion by a deep sparse autoencoder. In Computer and information technology
Incel, O. D., Kose, M., & Ersoy, C. (2013). A review and taxonomy of activity recog- (CIT), 2014 IEEE international conference on (pp. 173–181). IEEE.
nition on mobile phones. BioNanoScience, 3, 145–171. Liu, L., Cheng, L., Liu, Y., Jia, Y., & Rosenblum, D. S. (2016). Recognizing complex
Inoue, M., Inoue, S., & Nishida, T. (2016). Deep recurrent neural network for activities by a probabilistic interval-based model. In AAAI (pp. 1266–1272).
mobile human activity recognition with high throughput. arXiv preprint Liu, W., Ma, H. D., Qi, H., Zhao, D., & Chen, Z. N. (2017). Deep learning hashing for
arXiv:1611.03607. mobile visual search. EURASIP Journal on Image and Video Processing.
Ioffe, S., & Szegedy, C. (2015). Batch normalization: Accelerating deep network train- Liu, W., Wang, Z., Liu, X., Zeng, N., Liu, Y., & Alsaadi, F. E. (2016). A survey of deep
ing by reducing internal covariate shift. arXiv preprint arXiv:1502.03167. neural network architectures and their applications. Neurocomputing.
Jänicke, M., Tomforde, S., & Sick, B. (2016). Towards self-improving activity recogni- Lu, Y., Wei, Y., Liu, L., Zhong, J., Sun, L., & Liu, Y. (2016). Towards unsupervised phys-
tion systems based on probabilistic, generative models. In Autonomic computing ical activity recognition using smartphone accelerometers. Multimedia Tools and
(ICAC), 2016 IEEE international conference on (pp. 285–291). IEEE. Applications, 1–19.
Jia, X., Li, K., Li, X., & Zhang, A. (2014). A novel semi-supervised deep learning frame- Ma, G., Yang, X., Zhang, B., & Shi, Z. (2016). Multi-feature fusion deep networks.
work for affective state recognition on eeg signals. In Bioinformatics and bioengi- Neurocomputing, 218, 164–171.
neering (BIBE), 2014 IEEE international conference on (pp. 30–37). IEEE. Ma, J., Sheridan, R. P., Liaw, A., Dahl, G. E., & Svetnik, V. (2015). Deep neural nets
Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., et al. (2014). as a method for quantitative structure–activity relationships. Journal of Chemical
Caffe: Convolutional architecture for fast feature embedding. In Proceedings of Information and Modeling, 55, 263–274.
the 22nd ACM international conference on multimedia (pp. 675–678). ACM. Ma, S., Sigal, L., & Sclaroff, S. (2016). Learning activity progression in LSTMs for ac-
Jia, Y., Song, X., Zhou, J., Liu, L., Nie, L., & Rosenblum, D. S. (2016). Fusing social net- tivity detection and early detection. In Proceedings of the IEEE conference on com-
works with deep learning for volunteerism tendency prediction. Thirtieth AAAI puter vision and pattern recognition (pp. 1942–1950).
conference on artificial intelligence. Ma, X., Tao, Z., Wang, Y., Yu, H., & Wang, Y. (2015). Long short-term memory neu-
Jiang, W., & Yin, Z. (2015). Human activity recognition using wearable sensors by ral network for traffic speed prediction using remote microwave sensor data.
deep convolutional neural networks. In Proceedings of the 23rd ACM international Transportation Research Part C: Emerging Technologies, 54, 187–197.
conference on multimedia (pp. 1307–1310). ACM. Mamoshina, P., Vieira, A., Putin, E., & Zhavoronkov, A. (2016). Applications of deep
Jing, L., Wang, T., Zhao, M., & Wang, P. (2017). An adaptive multi-sensor data fu- learning in biomedicine. Molecular pharmaceutics, 13, 1445–1454.
sion method based on deep convolutional neural networks for fault diagnosis Marc’Aurelio Ranzato, C. P., Chopra, S., & LeCun, Y. (2007). Efficient learning of
of planetary gearbox. Sensors, 17, 414. sparse representations with an energy-based model. In Proceedings of NIPS.
Jokanovic, B., Amin, M., & Ahmad, F. (2016). Radar fall motion detection using deep Masci, J., Meier, U., Cireşan, D., & Schmidhuber, J. (2011). Stacked convolutional au-
learning. In Radar conference (RadarConf), 2016 IEEE (pp. 1–6). IEEE. to-encoders for hierarchical feature extraction. In International conference on ar-
Kanaris, L., Kokkinis, A., Liotta, A., & Stavrou, S. (2017). Fusing bluetooth beacon data tificial neural networks (pp. 52–59). Springer.
with Wi-Fi radiomaps for improved indoor localization. Sensors, 17, 812. Mesnil, G., Dauphin, Y., Glorot, X., Rifai, S., Bengio, Y., Goodfellow, I. J., et al. (2012).
Karpathy, A., Johnson, J., & Fei-Fei, L. (2015). Visualizing and understanding recur- Unsupervised and Transfer Learning Challenge: A deep learning approach. ICML
rent networks. arXiv preprint arXiv:1506.02078. Unsupervised and Transfer Learning, 27, 97–110.
Kautz, T., Groh, B. H., Hannink, J., Jensen, U., Strubberg, H., & Eskofier, B. M. (2017). Microsoft. (2017). Microsoft Cognitive Toolkit. In.
Activity recognition in beach volleyball using a Deep Convolutional Neural Net- Mohamed, A.-R., & Hinton, G. (2010). Phone recognition using restricted boltzmann
work. Data Mining and Knowledge Discovery, 1–28. machines. In Acoustics speech and signal processing (ICASSP), 2010 IEEE interna-
Khan, S. S., & Taati, B. (2017). Detecting unseen falls from wearable devices using tional conference on (pp. 4354–4357). IEEE.
channel-wise ensemble of autoencoders. Expert Systems with Applications, 87, Montavon, G., & Müller, K.-R. (2012). Deep Boltzmann machines and the centering
280–290. trick. In Neural Networks: Tricks of the Trade (pp. 621–637). Springer.
Kim, Y., & Ling, H. (2009). Human activity classification based on micro-Doppler Moon, G.E., & Hamm, J. (2016). A large-scale study in predictability of daily activities
signatures using a support vector machine. IEEE Transactions on Geoscience and and places.
Remote Sensing, 47, 1328–1337. Morales, F. J. O., & Roggen, D. (2016). Deep convolutional feature transfer across mo-
Kingma, D., & Ba, J. (2014). Adam: A method for stochastic optimization. arXiv bile activity recognition domains, sensor modalities and locations. In Proceed-
preprint arXiv:1412.6980. ings of the 2016 ACM international symposium on wearable computers (pp. 92–99).
Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep ACM.
convolutional neural networks. In Advances in neural information processing sys- Morales, J., & Akopian, D. (2017). Physical activity recognition by smartphones, A
tems (pp. 1097–1105). survey. Biocybernetics and Biomedical Engineering.
Kumari, P., Mathew, L., & Syal, P. (2017). Increasing trend of wearables and multi- Munoz-Organero, M., & Ruiz-Blazquez, R. (2017). Time-elastic generative model
modal interface for human activity monitoring: A review. Biosensors and Bioelec- for acceleration time series in human activity recognition. Sensors, 17,
tronics, 90, 298–307. 319.
Kwapisz, J. R., Weiss, G. M., & Moore, S. A. (2011). Activity recognition using cell Nair, V., & Hinton, G. E. (2010). Rectified linear units improve restricted boltzmann
phone accelerometers. ACM SigKDD Explorations Newsletter, 12, 74–82. machines. In Proceedings of the 27th international conference on machine learning
Langkvist, M., Karlsson, L., & Loutfi, A. (2014). A review of unsupervised feature (ICML-10) (pp. 807–814).
learning and deep learning for time-series modeling. Pattern Recognition Letters, Natarajasivan, D., & Govindarajan, M. (2016). Filter based sensor fusion for activ-
42, 11–24. ity recognition using smartphone. International Journal of Computer Science and
Längkvist, M., Karlsson, L., & Loutfi, A. (2012). Sleep stage classification using unsu- Telecommunications, 7, 26–31.
pervised feature learning. Advances in Artificial Neural Systems,, 2012, 5. Nervana-Systems. (2017). Neon. In.
LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521, 436–444. Neverova, N., Wolf, C., Lacey, G., Fridman, L., Chandra, D., Barbello, B., et al. (2016).
LeCun, Y., Huang, F. J., & Bottou, L. (2004). Learning methods for generic object Learning human identity from motion patterns. IEEE Access, 4, 1810–1820.
recognition with invariance to pose and lighting. In Computer vision and pat- Ng, A. (2011). Sparse autoencoder. CS294A Lecture notes, 72, 1–19.
tern recognition, 2004. CVPR 2004. Proceedings of the 2004 IEEE computer society Nguyen, D.T., Cohen, E., Pourhomayoun, M., & Alshurafa, N. (2016). SwallowNet: Re-
conference on: 2. IEEE pp. II-104. current Neural network detects and characterizes eating patterns.
Lee, H., Grosse, R., Ranganath, R., & Ng, A. Y. (2009). Convolutional deep belief Nicholson, A.C., & Gibson, A. (2017). Deeplearning4j: Open-source, Distributed Deep
networks for scalable unsupervised learning of hierarchical representations. Learning for the JVM. Deeplearning4j.org.
In Proceedings of the 26th annual international conference on machine learning NVidia-Corps. (2017). NVidia DGX-1. In.
(pp. 609–616). ACM.
O’Donoghue, J., & Roantree, M. (2015). A framework for selecting deep learning hy- Ronao, C. A., & Cho, S.-B. (2016). Human activity recognition with smartphone sen-
per-parameters. In British international conference on databases (pp. 120–132). sors using deep learning neural networks. Expert Systems with Applications, 59,
Springer. 235–244.
Olshausen, B. A., & Field, D. J. (1997). Sparse coding with an overcomplete basis set: Ronaoo, C. A., & Cho, S.-B. (2015). Evaluation of deep convolutional neural network
A strategy employed by V1? Vision Research, 37, 3311–3325. architectures for human activity recognition with smartphone sensors. In Pro-
Onofri, L., Soda, P., Pechenizkiy, M., & Iannello, G. (2016). A survey on using domain ceedings of the KIISE Korea Computer Congress (pp. 858–860).
and contextual knowledge for human activity recognition in video streams. Ex- Ruder, S. (2016). An overview of gradient descent optimization algorithms. arXiv
pert Systems with Applications, 63, 97–111. preprint arXiv:1609.04747.
Ordóñez, F. J., & Roggen, D. (2016). Deep convolutional and LSTM recurrent neural Safi, K., Mohammed, S., Attal, F., Khalil, M., & Amirat, Y. (2016). Recognition of differ-
networks for multimodal wearable activity recognition. Sensors, 16, 115. ent daily living activities using hidden Markov model regression. In Biomedical
Oyedotun, O. K., & Khashman, A. (2016). Deep learning in vision-based static hand engineering (MECBME), 2016 3rd middle east conference on (pp. 16–19). IEEE.
gesture recognition. Neural Computing and Applications, 1–11. Salakhutdinov, R., & Hinton, G. (2012). An efficient learning procedure for deep
Page, A., Sagedy, C., Smith, E., Attaran, N., Oates, T., & Mohsenin, T. (2015). A flexi- Boltzmann machines. Neural computation, 24, 1967–2006.
ble multichannel EEG feature extractor and classifier for seizure detection. IEEE Salakhutdinov, R., & Hinton, G. E. (2009). Deep Boltzmann machines. In AISTATS: 1
Transactions on Circuits and Systems II: Express Briefs, 62, 109–113. (p. 3).
Palumbo, F., Gallicchio, C., Pucci, R., & Micheli, A. (2016). Human activity recognition Salakhutdinov, R., & Larochelle, H. (2010). Efficient learning of deep Boltzmann ma-
using multisensor data fusion based on reservoir computing. Journal of Ambient chines. In AISTATs (pp. 693–700).
Intelligence and Smart Environments, 8, 87–107. Salakhutdinov, R., Mnih, A., & Hinton, G. (2007). Restricted Boltzmann machines for
Pereira, C. R., Pereira, D. R., Papa, J. P., Rosa, G. H., & Yang, X.-S. (2016). Convolu- collaborative filtering. In Proceedings of the 24th international conference on ma-
tional neural networks applied for Parkinson’s disease identification. In Machine chine learning (pp. 791–798). ACM.
learning for health informatics (pp. 377–390). Springer. Sargano, A. B., Angelov, P., & Habib, Z. (2017). A comprehensive review on hand-
Pires, I. M., Garcia, N. M., Pombo, N., & Flórez-Revuelta, F. (2016). From data acqui- crafted and learning-based action representation approaches for human activity
sition to data fusion: A comprehensive review and a roadmap for the identifi- recognition. Applied Sciences, 7, 110.
cation of activities of daily living using mobile devices. Sensors, 16, 184. Sarkar, S., Reddy, K., Dorgan, A., Fidopiastis, C., & Giering, M. (2016). Wearable
Plötz, T., Hammerla, N. Y., & Olivier, P. (2011). Feature learning for activity recogni- EEG-based activity recognition in PHM-related service environment via deep
tion in ubiquitous computing. In IJCAI proceedings-international joint conference learning. international Journal of Prognostics and Health Management, 7, 10.
on artificial intelligence: 22 (p. 1729). Sathyanarayana, A., Joty, S., Fernandez-Luque, L., Ofli, F., Srivastava, J., Elma-
Ponce, H., de Lourdes Martínez-Villaseñor, M., & Miralles-Pechúan, L. (2015). Com- garmid, A., et al. (2016). Sleep quality prediction from wearable data using deep
parative analysis of artificial hydrocarbon networks and data-driven approaches learning. JMIR mHealth and uHealth, 4.
for human activity recognition. In International conference on ubiquitous comput- Sathyanarayana, A., Joty, S., Fernandez-Luque, L., Ofli, F., Srivastava, J., Elmagarmid,
ing and ambient intelligence (pp. 150–161). Springer. A., et al. (2016b). Impact of physical activity on sleep: A deep learning based
Ponce, H., Martínez-Villaseñor, M. D. L., & Miralles-Pechuán, L. (2016). A novel wear- exploration. arXiv preprint arXiv:1607.07034.
able sensor-based human activity recognition approach using artificial hydrocar- Savazzi, S., Rampa, V., Vicentini, F., & Giussani, M. (2016). Device-free human sens-
bon networks. Sensors, 16, 1033. ing and localization in collaborative human–robot workspaces: A case study.
Ponce, H., Miralles-Pechuán, L., & Martínez-Villaseñor, M. d. L. (2016). A flexible IEEE Sensors Journal, 16, 1253–1264.
approach for human activity recognition using artificial hydrocarbon networks. Scherer, D., Müller, A., & Behnke, S. (2010). Evaluation of pooling operations in con-
Sensors, 16, 1715. volutional architectures for object recognition. In International conference on ar-
Pouladzadeh, P., Kuhad, P., Peddi, S. V. B., Yassine, A., & Shirmohammadi, S. (2016). tificial neural networks (pp. 92–101). Springer.
Food calorie measurement using deep learning neural network. In Instrumenta- Schmidhuber, J. (2015). Deep learning in neural networks: An overview. Neural Net-
tion and measurement technology conference proceedings (I2MTC), 2016 IEEE inter- works, 61, 85–117.
national (pp. 1–6). IEEE. Schulz, H., Cho, K., Raiko, T., & Behnke, S. (2015). Two-layer contractive encodings
Qian, N. (1999). On the momentum term in gradient descent learning algorithms. for learning stable nonlinear features. Neural Networks, 64, 4–11.
Neural Networks, 12, 145–151. Shahroudy, A., Liu, J., Ng, T. T., & Wang, G. (2016). NTU RGB+D: A large scale dataset
Rad, N.M., Bizzego, A., Kia, S.M., Jurman, G., Venuti, P., & Furlanello, C. (2015). Convo- for 3D human activity analysis. In 2016 IEEE conference on computer vision and
lutional neural network for stereotypical motor movement detection in autism. pattern recognition (CVPR) (pp. 1010–1019).
arXiv preprint arXiv:1511.01865. Shahroudy, A., Ng, T.-T., Gong, Y., & Wang, G. (2016). Deep multimodal feature anal-
Rad, N. M., & Furlanello, C. (2016). Applying deep learning to stereotypical motor ysis for action recognition in RGB+ D videos. arXiv preprint arXiv:1603.07120.
movement detection in autism spectrum disorders. International conference on Sheng, M., Jiang, J., Su, B., Tang, Q., Yahya, A. A., & Wang, G. (2016). Short-time ac-
data mining (ICDM 2016). IEEE. tivity recognition with wearable sensors using convolutional neural network. In
Radu, V., Lane, N. D., Bhattacharya, S., Mascolo, C., Marina, M. K., & Kawsar, F. (2016). Proceedings of the 15th ACM SIGGRAPH conference on virtual-reality continuum
Towards multimodal deep learning for activity recognition on mobile devices. and its applications in industry-volume 1 (pp. 413–416). ACM.
In Proceedings of the 2016 ACM international joint conference on pervasive and Shimizu, R., Yanagawa, S., Monde, Y., Yamagishi, H., Hamada, M., Shimizu, T.,
ubiquitous computing: Adjunct (pp. 185–188). ACM. et al. (2016). Deep learning application trial to lung cancer diagnosis for
Rahhal, M. M. A., Bazi, Y., AlHichri, H., Alajlan, N., Melgani, F., & Yager, R. R. (2016). medical sensor systems. In SoC Design conference (ISOCC), 2016 international
Deep learning approach for active classification of electrocardiogram signals. In- (pp. 191–192). IEEE.
formation Sciences, 345, 340–354. Shoaib, M., Bosch, S., Incel, O. D., Scholten, H., & Havinga, P. J. (2014). Fusion
Ravì, D., Wong, C., Deligianni, F., Berthelot, M., Andreu-Perez, J., Lo, B., et al. (2017). of smartphone motion sensors for physical activity recognition. Sensors, 14,
Deep learning for health informatics. IEEE journal of Biomedical and Health In- 10146–10176.
formatics, 21, 4–21. Shoaib, M., Bosch, S., Incel, O. D., Scholten, H., & Havinga, P. J. (2016). Complex
Ravi, D., Wong, C., Lo, B., & Yang, G.-Z. (2016a). A deep learning approach to on-node human activity recognition using smartphone and wrist-worn motion sensors.
sensor data analytics for mobile or wearable devices. IEEE journal of Biomedical Sensors, 16, 426.
and Health Informatics. Singh, S., Arora, C., & Jawahar, C. (2016). First person action recognition using deep
Ravi, D., Wong, C., Lo, B., & Yang, G. Z. (2016b). Deep learning for human activity learned descriptors. In Proceedings of the IEEE conference on computer vision and
recognition: A resource efficient implementation on low-power devices. In 2016 pattern recognition (pp. 2620–2628).
IEEE 13th international conference on wearable and implantable body sensor net- Song-Mi, L., Sang Min, Y., & Heeryon, C. (2017). Human activity recognition from ac-
works (BSN) (pp. 71–76). celerometer data using Convolutional Neural Network. In 2017 IEEE International
Ravì, D., Wong, C., Lo, B., & Yang, G. Z. (2017). A deep learning approach to on-node conference on big data and smart computing (BigComp) (pp. 131–134).
sensor data analytics for mobile or wearable devices. IEEE journal of Biomedical Song, Q., Zheng, Y.-J., Xue, Y., Sheng, W.-G., & Zhao, M.-R. (2017). An evolutionary
and Health Informatics, 21, 56–64. deep neural network for predicting morbidity of gastrointestinal infections by
Reiss, A., & Stricker, D. (2012). Introducing a new benchmarked dataset for ac- food contamination. Neurocomputing, 226, 16–22.
tivity monitoring. In 2012 16th International symposium on wearable computers Song, S., Chandrasekhar, V., Mandal, B., Li, L., Lim, J.-H., Sateesh Babu, G.,
(pp. 108–109). et al. (2016). Multimodal multi-stream deep learning for egocentric activity
Rifai, S., Vincent, P., Muller, X., Glorot, X., & Bengio, Y. (2011). Contractive auto-en- recognition. In Proceedings of the IEEE conference on computer vision and pattern
coders: Explicit invariance during feature extraction. In Proceedings of the 28th recognition workshops (pp. 24–31).
international conference on machine learning (ICML-11) (pp. 833–840). Srivastava, N., Hinton, G. E., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. (2014).
Rodan, A., & Tino, P. (2011). Minimum complexity echo state network. IEEE Transac- Dropout: A simple way to prevent neural networks from overfitting. Journal of
tions on Neural Networks, 22, 131–144. Machine Learning Research, 15, 1929–1958.
Rodriguez, M., Orrite, C., Medrano, C., & Makris, D. (2016). One-shot learning of hu- Sutskever, I., Vinyals, O., & Le, Q. V. (2014). Sequence to sequence learning
man activity with an MAP adapted GMM and simplex-HMM (pp. 1–12). with neural networks. In Advances in neural information processing systems
Rogers, E., Kelleher, J.D., & Ross, R.J. (2016). Towards a deep learning-based activity (pp. 3104–3112).
discovery system. Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., et al. (2015). Go-
Roggen, D., Calatroni, A., Rossi, M., Holleczek, T., Förster, K., Tröster, G., et al. (2010). ing deeper with convolutions. In Proceedings of the IEEE conference on computer
Collecting complex activity datasets in highly rich networked sensor environ- vision and pattern recognition (pp. 1–9).
ments. In Networked sensing systems (INSS), 2010 seventh international conference Taylor, G. W., Hinton, G. E., & Roweis, S. T. (2007). Modeling human motion us-
on (pp. 233–240). IEEE. ing binary latent variables. Advances in Neural Information Processing Systems,
19, 1345.
Tieleman, T., & Hinton, G. (2012). Lecture 6.5-rmsprop: Divide the gradient by a run- Yi, Y., Cheng, Y., & Xu, C. (2017). Mining human movement evolution for complex
ning average of its recent magnitude. COURSERA: Neural Networks for Machine action recognition. Expert Systems with Applications, 78, 259–272.
Learning, 4. Yin, W., Yang, X., Zhang, L., & Oki, E. (2016). ECG Monitoring system integrated with
Turaga, P., Chellappa, R., Subrahmanian, V. S., & Udrea, O. (2008). Machine recogni- IR-UWB radar based on CNN. IEEE Access, 4, 6344–6351.
tion of human activities: A survey. Ieee Transactions on Circuits and Systems for Younes, L. (1999). On the convergence of Markovian stochastic algorithms with
Video Technology, 18, 1473–1488. rapidly decreasing ergodicity rates. Stochastics: An International Journal of Prob-
Um, T.T., Pfister, F.M.J., Pichler, D., Endo, S., Lang, M., Hirche, S., et al. (2017). Data ability and Stochastic Processes, 65, 177–228.
augmentation of wearable sensor data for Parkinson’s disease monitoring using Zappi, P., Lombriser, C., Stiefmeier, T., Farella, E., Roggen, D., Benini, L., &
convolutional neural networks. arXiv preprint arXiv:1706.00527. Tröster, G. (2008). Activity recognition from on-body sensors: Accuracy-power
Unger, M., Bar, A., Shapira, B., & Rokach, L. (2016). Towards latent context-aware trade-off by dynamic sensor selection. In Wireless sensor networks (pp. 17–33).
recommendation systems. Knowledge-Based Systems, 104, 165–178. Springer.
Valipour, S., Siam, M., Jagersand, M., & Ray, N. (2016). Recurrent fully convolutional Zaremba, W. (2015). An empirical exploration of recurrent network architectures.
networks for video segmentation. arXiv preprint arXiv:1606.00487. Zdravevski, E., Lameski, P., Trajkovik, V., Kulakov, A., Chorbev, I., Goleva, R.,
Vepakomma, P., De, D., Das, S. K., & Bhansali, S. (2015). A-Wristocracy: Deep learning et al. (2017). Improving activity recognition accuracy in ambient-assisted living
on wrist-worn sensing for recognition of user complex activities. In 2015 IEEE systems by automated feature engineering. IEEE Access, 5, 5262–5280.
12th International conference on wearable and implantable body sensor networks Zebin, T., Scully, P. J., & Ozanyan, K. B. (2016). Human activity recognition with in-
(BSN) (pp. 1–6). ertial sensors using a deep learning approach. In SENSORS, 2016 IEEE (pp. 1–3).
Vincent, P., Larochelle, H., Bengio, Y., & Manzagol, P.-A. (2008). Extracting and com- IEEE.
posing robust features with denoising autoencoders. In Proceedings of the 25th Zeiler, M.D. (2012). ADADELTA: An adaptive learning rate method. arXiv preprint
international conference on machine learning (pp. 1096–1103). ACM. arXiv:1212.5701.
Vincent, P., Larochelle, H., Lajoie, I., Bengio, Y., & Manzagol, P.-A. (2010). Stacked de- Zeng, M., Nguyen, L. T., Yu, B., Mengshoel, O. J., Zhu, J., Wu, P., et al. (2014). Con-
noising autoencoders: Learning useful representations in a deep network with volutional neural networks for human activity recognition using mobile sen-
a local denoising criterion. Journal of Machine Learning Research, 11, 3371–3408. sors. In 6th International conference on mobile computing, applications and ser-
Vollmer, C., Gross, H.-M., & Eggert, J. P. (2013a). Learning features for activity recog- vices (pp. 197–205).
nition with shift-invariant sparse coding. In International conference on artificial Zhang, J., Shan, S., Kan, M., & Chen, X. (2014). Coarse-to-fine auto-encoder networks
neural networks (pp. 367–374). Springer. (cfan) for real-time face alignment. In European conference on computer vision
Vollmer, C., Gross, H. M., & Eggert, J. P. (2013b). Learning features for activity recog- (pp. 1–16). Springer.
nition with shift-invariant sparse coding. In V. Mladenov, P. KoprinkovaHristova, Zhang, J., & Wu, Y. (2017). Automatic sleep stage classification of single-channel
G. Palm, A. E. P. Villa, B. Appollini, & N. Kasabov (Eds.). In Artificial neural net- EEG by using complex-valued convolutional neural network. Biomedical Engi-
works and machine learning – ICANN 2013: 8131 (pp. 367–374). neering/Biomedizinische Technik.
Wan, L., Zeiler, M., Zhang, S., Cun, Y. L., & Fergus, R. (2013). Regularization of neural Zhang, L., Wu, X., & Luo, D. (2015a). Human activity recognition with HMM-DNN
networks using dropconnect. In Proceedings of the 30th international conference model. In 2015 IEEE 14th International conference on cognitive informatics & cog-
on machine learning (ICML-13) (pp. 1058–1066). nitive computing (ICCI∗ CC) (pp. 192–197).
Wang, A., Chen, G., Shang, C., Zhang, M., & Liu, L. (2016). Human activity recognition Zhang, L., Wu, X., & Luo, D. (2015b). Improving activity recognition with context in-
in a smart home environment with stacked denoising autoencoders. In Interna- formation. In 2015 IEEE International conference on mechatronics and automation
tional conference on web-age information management (pp. 29–40). Springer. (ICMA) (pp. 1241–1246).
Wang, L. (2016). Recognition of human activities using continuous autoencoders Zhang, L., Wu, X., & Luo, D. (2015c). Real-time activity recognition on smartphones
with wearable sensors. Sensors, 16, 189. using deep neural networks. In Ubiquitous intelligence and computing and 2015
Wang, L., Qiao, Y., & Tang, X. (2015). Action recognition with trajectory-pooled deep- IEEE 12th Intl conf on autonomic and trusted computing and 2015 IEEE 15th intl
-convolutional descriptors. In Proceedings of the IEEE conference on computer vi- conf on scalable computing and communications and its associated workshops
sion and pattern recognition (pp. 4305–4314). (UIC-ATC-ScalCom), 2015 IEEE 12th intl conf on (pp. 1236–1242). IEEE.
Wang, X. M., Zhang, B., Zhang, F. P., Teng, G. W., Sun, Z. L., & Wei, J. M. (2016). To- Zhang, L., Wu, X., & Luo, D. (2015d). Recognizing human activities from raw ac-
ward robust activity recognition: Hierarchical classifier based on Gaussian Pro- celerometer data using deep neural networks. In Machine learning and applica-
cess. Intelligent Data Analysis, 20, 701–717. tions (ICMLA), 2015 IEEE 14th international conference on (pp. 865–870). IEEE.
Wang, Z. L., Wu, D. H., Chen, J. M., Ghoneim, A., & Hossain, M. A. (2016). A triaxial Zhang, M., & Sawchuk, A. A. (2013). Human daily activity recognition with sparse
accelerometer-based human activity recognition via EEMD-based features and representation using wearable sensors. IEEE Journal of Biomedical and Health In-
game-theory-based feature selection. IEEE Sensors Journal, 16, 3198–3207. formatics, 17, 553–560.
Wu, D., Wang, Z., Chen, Y., & Zhao, H. (2016). Mixed-kernel based weighted extreme Zhao, R., Yan, R., Wang, J., & Mao, K. (2017). Learning to monitor machine health
learning machine for inertial sensor based human activity recognition with im- with convolutional bi-directional lstm networks. Sensors, 17, 273.
balanced dataset. Neurocomputing, 190, 35–49. Zhao, Y., & He, L. (2014). Deep learning in the EEG diagnosis of Alzheimer’s disease.
Wu, Z. Y., Ding, X. Q., & Zhang, G. R. (2016). A novel method for classification of ECG In Asian conference on computer vision (pp. 340–353). Springer.
arrhythmias using deep belief networks. International Journal of Computational Zheng, Y.-J., Ling, H.-F., & Xue, J.-Y. (2014). Ecogeography-based optimization: En-
Intelligence and Applications, 15. hancing biogeography-based optimization with ecogeographic barriers and dif-
Xu, H., & Plataniotis, K. N. (2016). EEG-based affect states classification using ferentiations. Computers & Operations Research, 50, 115–127.
Deep belief networks. In Digital media industry & academic forum (DMIAF) Zhou, X., Guo, J., & Wang, S. (2015). Motion recognition by using a stacked autoen-
(pp. 148–153). IEEE. coder-based deep learning algorithm with smart phones. In International confer-
Xu, X., Tang, J. S., Zhang, X. L., Liu, X. M., Zhang, H., & Qiu, Y. M. (2013). Exploring ence on wireless algorithms, systems, and applications (pp. 778–787). Springer.
techniques for vision based human activity recognition: Methods, systems, and Zhu, C., & Sheng, W. (2009). Multi-sensor fusion for human daily activity recogni-
evaluation. Sensors, 13, 1635–1650. tion in robot-assisted living. In 2009 4th ACM/IEEE International conference on
Yalçın, H. (2016). Human activity recognition using deep belief networks. In human-robot interaction (HRI) (pp. 303–304).
2016 24th Signal processing and communication application conference (SIU) Zhu, F., Shao, L., Xie, J., & Fang, Y. (2016). From handcrafted to learned representa-
(pp. 1649–1652). tions for human action recognition: A survey. Image and Vision Computing, 55,
Yan, Y., Qin, X., Wu, Y., Zhang, N., Fan, J., & Wang, L. (2015). A restricted Boltzmann 42–52.
machine based two-lead electrocardiography classification. In Wearable and im- Zhu, J., Pande, A., Mohapatra, P., & Han, J. J. (2015). Using deep learning for energy
plantable body sensor networks (BSN), 2015 IEEE 12th international conference on expenditure estimation with wearable sensors. In E-health networking, applica-
(pp. 1–9). IEEE. tion & services (HealthCom), 2015 17th international conference on (pp. 501–506).
Yanagimoto, M., & Sugimoto, C. (2016). Recognition of persisting emotional valence IEEE.
from EEG using convolutional neural networks. In Computational intelligence and Zhu, Q., Chen, Z., & Soh, Y. C. (2015). Using unlabeled acoustic data with locality–
applications (IWCIA), 2016 IEEE 9th international workshop on (pp. 27–32). IEEE. constrained linear coding for energy-related activity recognition in buildings. In
Yang, J. B., Nguyen, M. N., San, P. P., Li, X. L., & Krishnaswamy, S. (2015). Deep convo- Automation science and engineering (CASE), 2015 IEEE international conference on
lutional neural networks on multichannel time series for human activity recog- (pp. 174–179). IEEE.
nition. In Proceedings of the 24th international joint conference on artificial intel- Zhu, Y., Zhao, X., Fu, Y., & Liu, Y. (2010). Sparse coding on local spatial-temporal
ligence (IJCAI) (pp. 25–31). volumes for human action recognition. In Asian conference on computer vision
Yao, R., Lin, G., Shi, Q., & Ranasinghe, D. (2017). Efficient dense labeling of human (pp. 660–671). Springer.
activity sequences from wearables using fully convolutional networks. arXiv Zouba, N., Bremond, F., & Thonnat, M. (2009). Multisensor fusion for monitoring
preprint arXiv:1702.06212. elderly activities at home. In Advanced video and signal based surveillance, 2009.
Yao, S., Hu, S., Zhao, Y., Zhang, A., & Abdelzaher, T. (2016). DeepSense: A Unified AVSS’09. Sixth IEEE international conference on (pp. 98–103). IEEE.
deep learning framework for time-series mobile sensing data processing. arXiv
preprint arXiv:1611.01942.
Yao, S., Hu, S., Zhao, Y., Zhang, A., & Abdelzaher, T. (2017). Deepsense: A unified
deep learning framework for time-series mobile sensing data processing. In In-
ternational World Wide Web Conferences Steering Committee (pp. 351–360).

Expert Systems With Applications: Review

Uploaded by

Copyright:

Available Formats

Expert Systems With Applications: Review

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Expert Systems With Applications: Review

Uploaded by

Copyright:

Available Formats

Expert Systems With Applications 105 (2018) 233–261

Contents lists available at ScienceDirect

Expert Systems With Applications

Deep learning algorithms for human activity recognition using mobile

1. Introduction actions of users using different multimodal data generated by vari-

Fig. 1. Different architecture of deep learning algorithms.

sensor data. Deep Belief Network has directed connection between

Fig. 4. Deep convolutional neural network.

Methods Descriptions Strengths Weaknesses Recent application in human

References Methods Description Advantages

References Methods Description Advantages

References Methods Descriptions Advantages

Learning Rate 0.001 0.01 0.0 0 01 0.05 0.01 0.01 0.1

References Accuracy Precision Recall Confusion Matrix F1 -Score ROC/AUC

Authors Dataset Sensor modalities Number of # Activities

RNN CNN DBN

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.