TSP CMC 48061
TSP CMC 48061
TSP CMC 48061
DOI: 10.32604/cmc.2024.048061
ARTICLE
ABSTRACT
Recognizing human activity (HAR) from data in a smartphone sensor plays an important role in the field of
health to prevent chronic diseases. Daily and weekly physical activities are recorded on the smartphone and tell
the user whether he is moving well or not. Typically, smartphones and their associated sensing devices operate
in distributed and unstable environments. Therefore, collecting their data and extracting useful information is a
significant challenge. In this context, the aim of this paper is twofold: The first is to analyze human behavior based on
the recognition of physical activities. Using the results of physical activity detection and classification, the second
part aims to develop a health recommendation system to notify smartphone users about their healthy physical
behavior related to their physical activities. This system is based on the calculation of calories burned by each user
during physical activities. In this way, conclusions can be drawn about a person’s physical behavior by estimating
the number of calories burned after evaluating data collected daily or even weekly following a series of physical
workouts. To identify and classify human behavior our methodology is based on artificial intelligence models
specifically deep learning techniques like Long Short-Term Memory (LSTM), stacked LSTM, and bidirectional
LSTM. Since human activity data contains both spatial and temporal information, we proposed, in this paper, to
use of an architecture allowing the extraction of the two types of information simultaneously. While Convolutional
Neural Networks (CNN) has an architecture designed for spatial information, our idea is to combine CNN with
LSTM to increase classification accuracy by taking into consideration the extraction of both spatial and temporal
data. The results obtained achieved an accuracy of 96%. On the other side, the data learned by these algorithms
is prone to error and uncertainty. To overcome this constraint and improve performance (96%), we proposed to
use the fusion mechanisms. The last combines deep learning classifiers to model non-accurate and ambiguous data
to obtain synthetic information to aid in decision-making. The Voting and Dempster-Shafer (DS) approaches are
employed. The results showed that fused classifiers based on DS theory outperformed individual classifiers (96%)
with the highest accuracy level of 98%. Also, the findings disclosed that participants engaging in physical activities
are healthy, showcasing a disparity in the distribution of physical activities between men and women.
This work is licensed under a Creative Commons Attribution 4.0 International License,
which permits unrestricted use, distribution, and reproduction in any medium, provided the
original work is properly cited.
352 CMC, 2024, vol.79, no.1
KEYWORDS
Human physical activities; smartphone sensors; deep learning; distributed monitoring; recommendation system;
uncertainty; healthy; calories
1 Introduction
Human physical activities such as walking, running, cycling, doing sports, etc., are all defined as
body movements produced by the activation of skeletal muscles and resulting in a substantial release
of energy.
Human activity recognition is a field of study related to the spontaneous detection of daily
activities performed by individuals, based on recordings of time series using sensors. The use of sensors
for the recognition of these human actions represents a major advancement in the field of computer
science and technology. Significant progress has been made in interconnected sensor technology, the
Internet of Things (IoT), cloud computing, and edge computing. Sensors are cost-effective devices
that can be easily integrated or incorporated into portable and non-portable devices. Wearable devices
equipped with sensors, an omnipresent application of the IoT, record valuable data in a broader and
distributed manner, capturing information such as movements, variations in pressure, or visual images,
depending on their type.
Studies focus on activity recognition using video sequences collected by standard cameras and
surveillance cameras [1,2]. Recognizing activity with regular cameras can be challenging due to low
light conditions or darkness. In recent years, the widespread adoption of wearable devices, particularly
smartphones, featuring sensors like accelerometers and gyroscopes, has facilitated the identification
and localization of human movements, paving the way for the development of numerous applications
in the field of human activity recognition. Accelerometers and gyroscopes are indispensable tools for
human activity recognition, providing a rich and detailed source of information on body movements.
Their use significantly contributes to a variety of applications, ranging from personal health tracking
to the design of interactive interfaces and physical rehabilitation [3–7].
Specifically in the field of health, human activity recognition proves to be particularly promising,
encompassing the detection, classification, and interpretation of physical activities. Through this
approach, computer systems gain the ability to precisely understand human behavior, monitor and
assess the mobility of patients, track prescribed physical exercises, and facilitate physical rehabilitation.
Recognizing these physical activities is highly beneficial for heart, body, and mind health,
contributing to the prevention of diseases such as cardiovascular diseases, cancer, and diabetes. The
distributed monitoring of health by studying an individual’s activities is considered effective and
beneficial for improving the physical and mental well-being of individuals, enabling them to safeguard
against any potential danger that could lead to serious illnesses. In this context, private companies
develop medical devices based on a system of human physical activity recognition to predict health
indicators such as heart rate, pulse analysis, blood pressure, calorie consumption, and so on, these
devices often require patients to complete a series of well-defined exercises as part of their treatment
[8–12]. Typically, those medical devices are designed to operate in different distributed environments.
As a result, recognizing their daily activities becomes quite valuable in providing information on
the patient’s physical and even mental behavior to the healthcare team. Physical activity recognition
CMC, 2024, vol.79, no.1 353
systems can provide information on physical health as well as mental health if coupled with facial
recognition systems used to track facial expressions and assess an individual’s emotional state [13]. By
combining the two systems, it becomes possible to create more sophisticated systems for understanding
how people interact in various situations.
Machine and deep learning algorithms are commonly utilized in the recognition of human physical
activity based on smartphone sensors. Several researchers have attempted to classify human physical
activity using typical machine learning techniques, which produce less accurate results [4,5,14,15].
Furthermore, several writers have separately applied deep learning algorithms to CNN [16–20] and
Recurrent Neural Network (RNN) architectures. The results obtained in this respect suffer from
limitations, since CNN architectures, in reality, are made for retrieving geographical information,
whereas RNN or its derivatives such as LSTM architectures are made for retrieving data [21–24]. In
addition, some works have achieved precision that can be improved by imposing problems to classify
certain types of activities where the activities studied in the data are limited.
To overcome these limitations, some researchers have suggested human activity data contains both
spatial and temporal information [25–28]. In this case, an architecture capable of extracting both data
at the same time is required [8,19,22]. Inspired by the idea of spatial and temporal data, we proposed,
in this paper, to combine CNN with LSTM to identify and classify the physical activities based on the
smartphone sensor data and to obtain better accuracy.
In this setting, our contribution focuses on two dimensions:
• The first involves employing deep learning techniques such as CNN, LSTM which is a variation
of RNN, and bidirectional LSTM to classify human behaviors. The CNN-LSTM and the
convolutional LSTM are used to increase classification accuracy by taking into account the
extraction of both spatial and temporal data. The first is a hybrid of CNN and LSTM,
whereas the second is a CNN-LSTM variation in which the internal matrix multiplications
are substituted with convolution operations, thereby making state-to-state and input-to-state
transitions between the cell and its neighbors easier. A comparison between these different
architectures was conducted to show the effectiveness of the combined architecture providing
the best accuracy especially when the approach is used in distributed domains. Best accuracy
leads to making the best decisions related to health. But the accuracy is connected to the
smartphone data, which is always susceptible to a certain ambiguity. The uncertainty stems
from reading faulty data and/or acquiring a less-than-perfect sensor. Generally, most existing
publications in the literature suffer from this constraint because they base their research on
all available facts without taking into account the uncertain element which is considered
a limitation. To improve accuracy and reduce the negative influence of imperfection and
uncertainty in the data of the smartphone sensors, the DS theory of belief functions and the
voting method are utilized in this research effort [29–31]. These two strategies help to increase
the fundamental data quality and dependability. Their principle is to combine data from several
sources or classifiers.
• Based on the human activities classification results obtained in the first axis, the second
axis is to create a healthy suggestion system to tell smartphone users about their health
behavior concerning their human physical activities. This data is gathered by measuring the
number of calories each user burns during a physical activity calculated with mathematical
equations. Thus, conclusions could be drawn about a person’s physical behavior by computing
the number of calories burned after evaluating data collected daily or even weekly following a
series of physical workouts. Given the circumstances, it is crucial to acknowledge that calorie
354 CMC, 2024, vol.79, no.1
consumption varies from men to women, and it is influenced by a variety of factors that have
an impact on people’s health behavior, such as alcohol and smoking.
The structure of this paper will include the following sections: Section 2 will cover related research,
while Section 3 will offer (i) the customized database, (ii) the above-mentioned deep learning algo-
rithms applied to smartphone sensor data, (iii) the computation of calories based on daily activities,
and (iv) data fusion approaches to define our methodology for the classification of physical activities.
The experimental results and discussions are presented in Section 4. The final section demonstrates
the conclusion with some suggestions for further research.
2 Related Works
In the field of physical activity recognition several studies have been conducted based on deep
learning algorithms mainly on CNN and RNN architectures [22–27]. RNN architectures are specif-
ically built for data retrieval, whereas CNN structures are designed for spatial information retrieval.
CNN and LSTM models, which are derivative of RNN models, are used to identify and classify a
time series from numerous sources using labeled training data. These algorithms can be used to solve
a variety of problems, including recognizing human activities.
In [23], a CNN model is used to categorize two types of activities (dynamic and static), with two
3-class classifiers doing the recognition. The accuracy gained was 97.62%.
The authors suggested two deep learning approaches, a Deep Neural Network (DNN) and a feed-
forward deep neural network, to categorize physical activities [18]. Two physical movement datasets
acquired from numerous participants wearing tri-axials are used to test these two models. The first
dataset was obtained from 16 volunteers who each wore a sole tri-axial accelerometer on their wrist
and recorded 14 distinct daily activities. The second database, which contains 10 different everyday
activities, was gathered from eight participants who wore the sensors on their hips. According to the
findings, the RNN model achieved an accuracy of 82.56%.
In [3], the Bidirectional LSTM is used as a deep learning model with a bidirectional layer followed
by a dropout layer to read and extract its features using 10 epochs and 400 samples. The data used
comes from Wireless Sensor Data Mining (WISDM). A smartphone tri-axial accelerometer was used
for 36 users to characterize their actions, which included walking, jogging, moving downstairs and
upstairs, sitting, and standing. The results demonstrated that the precision is good for predicting some
activities but problems have appeared for clearly identifying the others activities.
In [4], the authors proposed a descriptor-based approach for classifying activities using integrated
smartphone sensors such as accelerometers and gyroscopes. Two descriptors, namely the gradient
histogram and the Fourier descriptor based on the centroid signature, are used to extract feature sets
from these signals. For the classification, the performance of multiclass support vector machine and
k-nearest neighbor classifiers are studied based on two publicly available data sets, namely, the UCI
HAR data set and physical activity sensor data. The experimental results show that the average activity
classification accuracy achieved 97.12%.
In [5], the authors used data from the accelerometer to recognize the kind of user movements
using the machine (decision tree algorithm) and deep learning (bidirectional LSTM) algorithms. The
data is provided from the Wireless Sensor Data Mining (WISDM) by collecting the sensor data from
smartphones obtained from 36 users with six attributes: User, activity, timestamp, x-acceleration y-
acceleration, z-acceleration, and the activities included walking, jogging, downstairs, upstairs, sitting
and standing. Two attributes age and gender are added to the data for every physical activity with
CMC, 2024, vol.79, no.1 355
random distribution. Based on the classification results, they have proposed a recommendation system
computing the total time of physical activities consuming calories to decide about the healthy behavior
of a person based on his gender and age, based on their daily physical activities. The paper presents
some limitations since the authors did not investigate a diverse range of activities in terms of type
and quantity, nor did they employ more robust algorithms. This has led to a relatively low precision,
yielding unreliable results for the recommendation system.
In [20], the authors proposed LSTM-based deep Recurrent Neural Networks to create models for
human activity recognition. They designed architectures using both unidirectional and bidirectional
RNNs with deep layers. The outcomes of their research, conducted on the UCI-HAD dataset,
demonstrated that the unidirectional DRNN model achieved a certain level of classification accuracy
of 96.7%.
In [25], the authors developed an activity recognition system using the data of an accelerometer
and a gyroscope. The results showed that using the UCI HAR dataset and a 3-layer LSTM model,
helped to reach an accuracy rate of 97.4% for the global classification of 7 activities (walking, jogging,
lying down, standing, falling, climbing, descending stairs).
Concerning human activity, some researchers used an architecture that can extract data containing
both spatial and temporal information simultaneously. Reference [21] presented a deep network with
four convolutional layers succeeded by two LSTM layers combining LSTM and CNN. The authors
of [19] constructed a hybrid model combining CNN and LSTM, in which the CNN is used to extract
spatial characteristics whereas the LSTM network is used to learn temporal information. The scientists
used a dataset of 12 different kinds of human physical activity that was collected from 20 subjects using
the Kinect V2 sensor. The accuracy attained 90.89%.
In [22], the authors explored various hybrid multi-layer deep learning architectures aimed at
improving human activity recognition performance by incorporating local features and being scale-
invariant with activity dependencies. the results obtained showed an activity recognition rate of 94.7%
on the University of California at Irvine public dataset for human activity recognition, comprising 6
activities, with a hybrid 2-layer CNN-1-layer LSTM model. In addition, on the University of Texas at
Dallas multimodal human activity dataset, comprising 27 activities, with a hybrid 4-layer CNN-1-layer
LSTM model, they achieved an activity recognition rate of 88.0%.
The authors introduced a hybrid deep neural network model in [28] to detect spatial-temporal
features. This model integrates the characteristics of a 1D CNN, which can extract spatial or local
information from the raw sensor data, and a Gated Recurrent Unit (GRU), which can recognize
temporal features. Derived from accelerometer-captured sensor data and a gyroscope obtained from
the WISDM data set, this model called CNN-GRU allows to categorization of eighteen distinct
simple and intricate human activities into three categories: Activities related to walking, general hand-
oriented activities, and hand-oriented activities involving feeding. On the smartphone dataset, the
CNN-GRU model was 90.44% accurate.
3 Proposed Methodology
We presented this work in two parts to make a reliable recommendation system. The system is
intended to operate and observe physical activities in largely distributed manners. The first part is
concerned with applying deep learning algorithms to recognize various physical activities. The second
part deals with calculating the calories burned for each user according to his daily physical activity,
based on the recognition results. This calculation allows to achieve a recommendation system that
356 CMC, 2024, vol.79, no.1
informs smartphone users about their physical health behavior. So, the proposed methodology is based
on four steps:
• Description of data.
• Classification of physical activities using deep learning algorithms.
• Computation of the calories to determine the physical behavior based on daily activities.
• DS theory to reduce the decision uncertainties.
A binary notation is assigned to these two attributes. 1 if the user consumes alcohol and smoking, 0
otherwise.
Output gates: These gates extract critical information from the new cell state for
The following set of equations represents the LSTM architecture in Fig. 1:
Čt = tanh (Wc xt + Uc ht−1 + bc )
*
(1)
it = σ(Wi xt + Ui ht−1 + bi ) (2)
ft = σ(Wf xt + Uf ht−1 + bf ) (3)
ot = σ(Wo xt + Uo ht−1 + bo ) (4)
Ct = Čt .it + Ct−1 .ft (5)
∗
ht = tanh (Ct ) .ot (6)
with Wz , Wi , Wf , Wo are input weights, the UC , Ui , Uf , Uo are recurrent weights and bC , bi , bf , bo are the
biases.
• Stacked LSTM: It has the same architecture as LSTM except that it has two hidden layers.
• Bidirectional LSTM: Two models are trained in the bidirectional LSTM. The first model learns
the input sequence, whereas the second model learns the inverse of that sequence. The results of
the two LSTM layers are subsequently merged through various techniques, such as averaging,
summing, multiplying, and concatenating.
• CNN-LSTM: First, in the CNN model, there is a convolutional layer responsible for processing
subsequences, which necessitates the specification of both the number of filters and the kernel
size. The number of filters corresponds to the quantity of input sequence “reads” or interpre-
tations. The kernel size is the number of time steps contained in each input sequence “read”
operation. Following the convolution layer, a maximum pooling layer reduces the filter maps to
half their original size and highlights the most important characteristics. These structures are
then flattened into a one-dimensional vector that the LSTM layer employs as a single input time
step. The LSTM component of the model, which comprehends the interpretation provided by
the CNN model for the input sequence, can then be defined. In our proposed hybrid model, by
combining these two models, we extract spatial features via the CNN, then pass these features on
to the LSTM for temporal learning and finally into a fully connected layer for recognition. The
first two layers of the CNN have different filter sizes: In the first layer, the filter size is 64, while
in the second layer, the filter size is 128. In addition to the filter size, the core size of both layers
is 3, and the activation function used in both layers is the ReLU activation function. These two
layers are followed by the maximum pooling layer with a pooling size of 2. These features from
the CNN layers pass through two LSTM layers with the same cell size of 64 in each layer. The
LSTM layer is followed by the flattening layer and the dense layer with a softmax activation
function.
• Convolutional LSTM: LSTM variation is closely tied to the CNN-LSTM approach, as it
incorporates the convolutional input processing directly within each LSTM unit.
In the tuning experiments, the models have been trained for 50 epochs and evaluated on the testing
sets using the Adam version of stochastic gradient descent to optimize the deep neural network with
a batch size of 32, a learning rate of 0.002, and a dropout of 0.03.
CMC, 2024, vol.79, no.1 359
Fig. 2 represents the convolutional LSTM architecture with as the convolutional operator.
In these equations, h is the number of heartbeats per minute, w is the weight in kg, a is the age in
years and t is time in hours.
belief functions, which serve as tools for gauging subjective probability, we can assess the degree
of truthfulness of a statement. By introducing masses of evidence, coefficients of their weakening,
and employing the rule of combination, the theory enables the handling of information from diverse
sources and fields, aiming to determine their reliability.
The DS theory allows for the generalization of additive probability measures. They might be
thought of as upper and lower boundaries on probabilities that are unknown. The theory’s principle
is given in the following steps [4]:
• Each source of information provides a belief function that expresses the confidence or degree
of belief associated with different hypotheses or propositions. We begin by presenting the
information provided by their mass functions. By distinguishing frame Ω, the mass function
m represents with m (A) is the following:
m : 2Ω → [0, 1] with ΣA ⊆ Ω m (A) = 1 (9)
• The information is then corrected using the uncertainty mass function m also known as belief
mass, representing the strength of belief for each set of assumptions, and the degree of belief in
the source’s credibility μ yielding the revised function:
μm(A) = μ ∗ m(A); ∀A = Ω (10)
• In the last, we combine the uncertainty masses from different sources to form a combined mass
and to obtain the informed choice. Consider two sources represented, respectively, by the mass
functions m1 and m2. This involves the use of combination rules, such as Dempster’s rule, which
takes into account the intersection of uncertainty masses. The following new mass function
appears from the merging of the two sources:
(m1 ⊕ m2)(C) = ΣA,B:C = A ∩ B m1(A) ∗ m2(B) (11)
• The final belief function is derived from the combined mass and reflects the synthesis of the
various sources of information to assess overall confidence in the various hypotheses. It is
given by pignistic transformation which is characterized as the probability distribution that
is formulated as follows:
m (A)
Betp(w) = (12)
{A⊆,ω∈A} (1 − m (∅)) |A|
The decision is made based on the pignistic transformation by selecting the element x with the
highest probability:
Rp(x) = argmaxBetp(w)(x) (13)
The DS fusion theory offers a notable benefit by enabling decision-making even in cases where
a classifier may falter. Additionally, despite employing diverse learning algorithms, classifiers can
approach the same problem from multiple perspectives, resulting in more precise decision outcomes.
Cette approche a bien montré son efficacité dans plusieurs domaines incluant HAR où elle a permis
d’améliorer les précisons obtenus sans fusion [5,37,38].
The first section of the project was devoted to computing the accuracy by calculating the error
rate for each deep learning classifier on its own. The fusion error rate is then computed by combining
the mass functions of classifiers.
For the deep learning algorithms, we present C1, C2, C3, C4, and C5, respectively, from LSTM,
stacked LSTM, bidirectional LSTM, CNN-LSTM, and convolutional LSTM.
We used the confusion matrix to measure classification skills when it comes to physical activities.
It compares the observed and expected values. The confusion matrix is used to define the following
measures:
• True Positives (TP) are the number of positive incidents that have been classified as such.
• True Negatives (TN): The number of negative incidents that have been classified as such.
• False Positives (FP) are the number of negative events that are mistakenly labeled as positive.
• False Negatives (FN) are the number of positive incidents that are incorrectly labeled as
negative.
The following metrics are defined based on the above values:
True positive
Precision = (14)
True Positive + False Positive
The precision is determined by the ratio of correctly classified positive instances to the total
number of instances classified as positive.
True positive
Recall = (15)
True Positive + False Negative
The Recall is determined as the ratio of correctly categorized positive instances to the total
number of positive instances.
Precision ∗ Recall
F1 = 2 ∗ (16)
Precision + Recall
F1 is a single measure that combines precision and recall.
As a backend, we used Python (version 3.6.5) with Anaconda distribution on Ubuntu 16.04.6 LTS
(XenialXerus), Keras (version 2.1.6), and Tensorflow (version 1.7.0) in this research.
Figure 4: (Continued)
364 CMC, 2024, vol.79, no.1
with fewer gates. However, the databases for each reference as well as my own are different, especially
in terms of the activities studied. It is also worth mentioning that the values of accuracy are obtained
without taking into account the uncertain aspect, which is considered a constraint for this task.
on their degree of belief, allowing for the consideration of situations where certain classifiers are more
reliable than others.
Based on the above steps, the mistake obtained by the DS theory is equal to 0.2, which equates
to 98% accuracy. The last value indicates the effectiveness of the DS approach, as it outperforms the
results obtained using each classifier alone (96%) as well as the voting approach. As can be seen, the
belief function method produced more robust fusion results than the voting method. We can explain
the contribution of this method as follows: The voting method generally combines the predictions
of the different classifiers, assigning equal weight to each vote and then selecting the majority class.
However, it does not take into account the uncertainty associated with each prediction, nor the
respective reliability of the classifiers. In contrast, the belief function method lies in its ability to model
uncertainty in a more refined way. It offers a formal approach for taking into account uncertainty,
confusion, and discrepancies between the predictions of different classifiers. It assigns belief masses to
sets of classes, taking into account the degree of confidence accorded to each classifier. These belief
masses are then combined to calculate an overall belief mass for each class.
Fig. 6 presents the confusion matrix of the combined classifiers.
From Table 3, the distribution of physical activities reveals that all participants (male and female)
are healthy. The fact that these participants produce more energy than the threshold value shown in
Table 3 justifies this outcome.
Also, Table 4 clearly illustrates that the distribution differs from men to women, particularly for
low and high-effort activities. 48% of males have a big distribution value for the large screen usage
activity that requires a lot of effort. Women, on the other hand, have the same percentage for the
low-effort sitting exercise.
Table 4: Distribution of activities and calorie values without alcohol and smoking
Men Women
Activity Activity Calories values Activity Calories values
distribution distribution
Sleeping (−) 33% 2200 47% 1950
Laying down (−) 40% 2300 45% 2000
Sitting (−) 42% 2350 48% 2100
Eating (+) 40% 2600 35% 2500
Light movement (slow walk) (+) 46% 2650 44% 2550
Small screen usage (+) 40% 2530 42% 2450
Caffeinated drink consumption (+) 47% 2500 46% 2400
Medium (fast walk) (++) 45% 3100 41% 2900
Heavy (running) (++) 43% 3300 39% 3100
Large screen usage (++) 48% 3000 30% 2800
368 CMC, 2024, vol.79, no.1
To demonstrate the detrimental impact of alcohol and tobacco consumption on human health,
even with regular exercise, we integrated these two factors as attributes in the database and studied their
influence on calorie consumption. Table 5 presents the obtained results, it reveals that all individuals
are unhealthy since the obtained calorie values are lower than the threshold values, demonstrating the
harmful impact of drinking and smoking on human health.
Table 5: Distribution of activities and calorie values with alcohol and smoking
Men Women
Activity Activity Calories values Activity Calories values
distribution distribution
Sleeping (−) 17% 1800 3% 1600
Laying down (−) 10% 1900 5% 1700
Sitting (−) 8% 2000 2% 1800
Eating (+) 10% 2350 15% 2200
Light movement (slow walk) (+) 4% 2430 6% 2320
Small screen usage (+) 10% 2400 8% 2300
Caffeinated drink consumption (+) 3% 2300 4% 2260
Medium (fast walk) (++) 5% 2800 9% 2600
Heavy (running) (++) 7% 2900 11% 2700
Large screen usage (++) 2% 2700 20% 2500
These limitations cannot negate the contribution of smartphones in the field of physical activity
recognition. However, progressive advances in sensor technology and advances in Artificial Intelli-
gence (AI) tools, in terms of the algorithms and methods used, may help to alleviate some of these
limitations.
5 Conclusion
Detecting human activity through smartphone sensor data is crucial in the health domain for
preventing chronic diseases. The smartphone records daily and weekly physical activities, providing
feedback to users on their movement patterns and determining a person’s health state. It is considered
effective since it can be applied in a large distributed manner.
This paper had a dual objective. Firstly, we analyzed human behavior by recognizing physical
activities. In this context, we considered and categorized low, medium, and high effort activities for
total participants of 22 men and women. Based on these data, the methodology relies on artificial
intelligence models, specifically deep learning techniques such as LSTM, stacked LSTM, and bidi-
rectional LSTM. Given that our smartphone sensor data, dedicated to physical activity recognition,
contains both spatial and temporal information, we proposed an approach combining both CNN and
LSTM (or its derivatives). In this combined architecture, spatial characteristics are derived from the
CNN layers and subsequently inputted into the LSTM for learning temporal information. The hybrid
architecture CNN-LSTM has demonstrated superior accuracy in discerning intricate human activities
when contrasted with the accuracies achieved by individual architecture types. The convolutional
LSTM had the best accuracy of 96%, according to data.
Acknowledging that the data learned by these algorithms is susceptible to errors and uncertainty,
the paper introduces fusion mechanisms to enhance performance. Despite achieving a 96% accuracy
level, the proposed solution incorporates Voting and DS approaches. The fused classifiers based on
DS theory surpass individual classifiers, achieving the highest accuracy level of 98%. This strategy
has mitigated inaccuracies and ambiguities in the data, providing synthetic information for decision-
making.
Secondly, based on the results of activity detection and classification, a health recommendation
system was created. The system is based on the calculation of the calories burned during each physical
activity. The assessment of daily or weekly data allows conclusions about an individual’s physical
behavior and has enabled to decide on the healthy physical behavior of users’ smartphones. The
computing of calories was calculated based on two mathematical equations dedicated to men and
women. The findings revealed that the participants are healthy, with a difference in the distribution
of physical activities between men and women. Also, it was demonstrated that individuals consuming
alcohol and smoking are not healthy, even if they participated in physical activities, underscoring the
impact of these two factors on human health.
In future works, we propose using a larger database that will include more physical activities of
daily living as well as an increased number of complex activities. In addition, to increase the obtained
accuracy, we propose to explore advanced techniques based on deep learning such as reinforcement
learning, transfer learning, and attention mechanism for activity recognition.
Acknowledgement: The authors extend their acknowledgment to all the researchers and the reviewers
who help in improving the quality of the idea, concept, and the paper overall.
370 CMC, 2024, vol.79, no.1
Funding Statement: The authors extend their appreciation to the Deputyship for Research & Inno-
vation, Ministry of Education in Saudi Arabia for funding this research work through the Project
Number 223202.
Author Contributions: The authors confirm contribution to the paper as follows: Study conception and
design: Ameni Ellouze, Nesrine Kadri; data collection: Alaa Alaerjan, Mohamed Ksantini; analysis
and interpretation of results: Ameni Ellouze, Nesrine Kadri; draft manuscript preparation: Alaa
Alaerjan, Mohamed Ksantini; review and improve the draft. All authors reviewed the results and
approved the final version of the manuscript.
Conflicts of Interest: The authors declare that they have no conflicts of interest to report regarding the
present study.
References
[1] A. Ullah, K. Muhammad, W. Ding, V. Palade, I. Haq and S. W. Baik, “Efficient activity recognition
using lightweight CNN and DS-GRU network for surveillance applications,” Appl. Soft Comput., vol. 103,
pp. 107102, 2021.
[2] A. Ullah, K. Muhammad, T. Hussain, and S. W. Baik, “Conflux LSTMs network: A novel approach for
multi-view action recognition,” Neurocomputing, vol. 435, no. 7, pp. 321–329, 2021.
[3] N. Kadri, A. Ellouze, and M. Ksantini, “Recommendation system for human physical activities using
smartphones,” in 2020 2nd Int. Conf. Comput. Inf. Sci. (ICCIS), Sakaka, Saudi Arabia, Oct. 2020.
[4] A. Jain and V. Kanhangad, “Human activity classification in smartphones using accelerometer and
gyroscope sensors,” IEEE Sens. J., vol. 18, no. 3, pp. 1169–1177, 2018.
[5] N. Kadri, A. Ellouze, and M. Ksantini, “Fusion of classifiers based on physical activities data from
smartphone user,” in 17th Int. Multi-Conf. Syst. Signals Dev. (SSD), Monastir, Tunisia, Jul. 2020.
[6] M. Shoaib, S. Bosch, O. D. Incel, H. Scholten, and P. J. M. Havinga, “Complex human activity recognition
using smartphone and wrist-worn motion sensors,” Sens., vol. 16, no. 4, pp. 426, 2016.
[7] I. D. Luptáková, M. Kubovčík, and J. Pospíchal, “Wearable sensor-based human activity recognition with
transformer model,” Sens., vol. 22, no. 5, pp. 1911, 2022.
[8] S. Gupta, “Deep learning based human activity recognition (HAR) using wearable sensor data,” Int. J. Inf.
Manag. Data Insights, vol. 1, no. 2, pp. 100046, 2021. doi: 10.1016/j.jjimei.2021.100046.
[9] S. C. Mukhopadhyay, “Wearable sensors for human activity monitoring: A review,” IEEE Sens. J., vol. 15,
no. 3, pp. 1321–1330, 2015. doi: 10.1109/JSEN.2014.2370945.
[10] W. Z. Tee, R. Dave, N. Seliya, and M. Vanamala, “A close look into human activity recognition models
using deep learning,” Int. Conf. on Comput., Netw. and Internet of Things (CNIOT), pp. 201–206, Qingdao,
China, Jul. 2022. doi: 10.1109/CNIOT55862.2022.00043.
[11] M. A. Mousse, C. Motamed, and E. C. Ezin, “Percentage of human-occupied areas for fall detection from
two views,” Vis. Comput., vol. 33, no. 12, pp. 1529–1540, 2017. doi: 10.1007/s00371-016-1296-y.
[12] Y. Zhou, Z. Yang, X. Zhang, and Y. Wang, “A hybrid attention-based deep neural network for simultaneous
multi-sensor pruning and human activity recognition,” IEEE Internet Things J., vol. 9, no. 24, pp. 25363–
25372, 2022. doi: 10.1109/JIOT.2022.3196170.
[13] V. A. Saeed, “A framework for recognition of facial expression using HOG features,” Int. J. Math. Stat.
Comput. Sci., vol. 2, no. 18, pp. 1–8, 2023. doi: 10.59543/ijmscs.v2i.7815.
[14] M. Abid et al., “Physical activity recognition based on a parallel approach for an ensemble of machine
learning and deep learning classifiers,” Sens., vol. 21, no. 14, pp. 1–11, Jul. 2021.
CMC, 2024, vol.79, no.1 371
[36] I. Bloch, “Some aspects of Dempster-Shafer evidence theory for classification of muti-modality medical
images taking partial volume effect into account,” Pattern Recognit. Lett., vol. 17, no. 8, pp. 905–919, 1996.
doi: 10.1016/0167-8655(96)00039-6.
[37] N. Triki, M. Ksantini, and M. Karray, “Traffic sign recognition system based on belief functions theory,”
in 13 th Int. Conf. on Agents Artif. Intell. (ICAART), Vienna, Austria, 2021.
[38] A. Ellouze, O. Kahouli, M. Ksantini, H. Alsaif, A. Aluoi and B. Kahouli, “Artificial intelligence-
based diabetes diagnosis with belief functions theory,” Symmetry, vol. 14, no. 10, pp. 2197, 2022. doi:
10.3390/sym14102197.