Active Machine Learning For Heterogeneity Activity
Active Machine Learning For Heterogeneity Activity
Active Machine Learning For Heterogeneity Activity
This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3362676
Date of publication xxxx 00, 0000, date of current version xxxx 00, 0000.
Digital Object Identifier 10.1109/ACCESS.2017.DOI
ABSTRACT Smartwatches with cutting-edge sensors are becoming commonplace in our daily lives.
Despite their widespread use, it can be challenging to interpret accelerometer and gyroscope data efficiently
for Human Activity Recognition (HAR). An effective remedy is the incorporation of active learning
strategies. This study explores this junction, intending to maximize the use of smartwatch technology
across a range of applications. The previous research on the dataset used in our article did not provide
results with a higher accuracy, which could make it difficult to make predictions. This paper proposes
a novel approach to predict human activity from the Heterogeneity human activity recognition (HHAR)
dataset that joins active learning with machine learning models: Random Forest (RF),Extreme Gradient
Boosting (XGBoost), K-nearest Neighbors (KNN), Decision Tree (DT), Gradient Boosting (GB) and Light
Gradient Boosting Machine (LGBM) classifier to predict heterogeneous activities accurately. We evaluated
our approach to these models on the HHAR dataset that was generated using an accelerometer and gyroscope
that were present in smartwatches. The dataset was evaluated on 3 iterations; the evaluation measures
demonstrated that we can predict human activity with the highest accuracy and F1-Score of 99.99%. The
results indicate that this approach is the most accurate and effective compared to the conventional machine
learning approaches.
INDEX TERMS Active Learning, Machine Learning, Wristwatch, Activity Detection, Gyroscope, Ac-
celerometer
VOLUME 4, 2016 1
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3362676
to be a crucial problem in the constantly changing field care by precisely forecasting behaviors, finding patterns, and
of machine learning[1]. Employing efficient methodologies providing timely health recommendations [11]. This drive
is essential for creating precise and trustworthy predictive emphasizes the societal impact of our research, which aims
models, given the explosion in the amount of data available not only to provide individuals with practical knowledge but
and the complexity of real-world situations. Active learning, also to advance the conversation on preventive healthcare
which selects the most useful data points iteratively, can practices in a society that is becoming more digitally and
greatly enhance model performance [2]. This article explores globally linked. This paper makes the following contribu-
a detailed study that aims to improve classification accuracy tions.
by utilizing active learning techniques and thorough model • This paper significantly advances the field of Human
comparisons. Smartwatches are developing into personalized activity recognition using Active Learning approaches.
health companions that offer real-time biometric data analy- The main contribution of this research work is a compre-
sis and health trend tracking in addition to their sensor capa- hensive evaluation of six popular machine learning al-
bilities [3]. They conveniently interact with smartphone apps, gorithms integrated with active learning: RF, XGBoost,
enabling users to set wellness objectives, track sleep habits, DT, GB, KNN, and LGBM classifier. This investigation
and get prompt notifications[4]. Additionally, these gadgets focuses on the interaction between active learning and
promote community among users by encouraging people to several classifiers that aid in selecting the best approach.
participate in social fitness challenges and collectively adopt • We use cutting-edge active learning to improve activity
healthier lifestyles[5]. Our research intends to take advan- recognition; we use uncertainty sampling to improve ac-
tage of these developments by utilizing the rich data from curacy. Our method deliberately includes difficult cases
accelerometers and gyroscopes, improving the accuracy of and shows a considerable improvement in accuracy over
predictive models and enabling users to make knowledgeable conventional techniques.
decisions about their daily activities and health. • After evaluating the results in 3 iterations, evaluation
Smartwatches are not as accurate as other monitoring tools measures demonstrate that the activity can be predicted
like cameras, radar, infrared sensors, etc. Their sensor range with the highest accuracy and F1-score of 99.99%. The
is restricted, they rely on line of sight, and their specialized final results indicate that our active learning approach
functions are limited. Still, the widespread adoption of smart- is the most effective and accurate; it increases the
watches in our daily lives creates a compelling potential for recognition rate from conventional and state-of-the-art
improving activity prediction and personal health manage- methods.
ment [6]. Realizing wristwatch technology’s full potential is
critical since more people rely on these gadgets to track their The article is distributed in five main parts. Section II
daily activities, evaluate their fitness, and control health fac- explains the previous and related work in detecting side-
tors [7]. These gadgets are data-rich platforms that can record channel attacks and their advancement. Section III describes
minute details of human movements and interactions. They the approach used to create better results. Section IV of this
include several accelerometers, gyroscopes, and cutting-edge paper contains the Experimental Analysis and Results, and
communication protocols like Bluetooth and Wi-Fi [8]. How- the Final section V concludes the article and describes the
ever, the best use of this abundance of data for precise future work.
predictive modeling is still a considerable barrier. The urgent
need to close this gap between data collection and insightful II. LITERATURE REVIEW
conclusions serves as the driving force for our research. Our Authors in [12] proposed a unique and portable deep learn-
work intends to revolutionize how these devices perceive ing framework for heterogeneous human activity recognition
user actions by examining cutting-edge machine-learning (HHAR) in sophisticated Internet of Things applications.
approaches adapted to wristwatch sensor data, providing The framework addresses issues with HHAR, including the
accurate health predictions and individualized suggestions. variety of sensors, the complexity of human actions, and
By doing this, we enable people to actively manage their the constrained computing power of IoT devices. The hi-
health and contribute to continuous developments in wear- erarchical multiscale extraction (HME) module employs a
able technology, paving the way for a healthier and more collection of residually connected shuffle group convolutions
informed society. (SG-Convs) that extract and learn picture representation from
In addition to meeting people’s immediate needs, our re- various receptive fields.As a result, the framework can extract
search advances bigger societal objectives. Preventive health- local and global properties from the sensor data. Meanwhile,
care practices and individualized health interventions are the ISCA module concentrates on the most informative
becoming more important as the demands on the world’s features, so the framework’s accuracy increases while its
healthcare systems rise [9]. As portable health companions, computing complexity decreases. The suggested approach
smartwatches have the potential to be a game-changer in this beat numerous cutting-edge techniques in terms of accuracy
paradigm shift [10]. These technologies can enable proac- and efficiency. When tested against three publicly available
tive health management, potentially decreasing the strain HHAR datasets, the framework attained an average accuracy
on healthcare infrastructures and improving the standard of of 99.5% on the WISDM dataset.
2 VOLUME 4, 2016
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3362676
Authors in [13] suggested a new time series data imaging On both datasets, the results demonstrated that DEBONAIR
and fusing framework for wearable sensor-based HAR. The outperformed cutting-edge CAR techniques. For instance,
framework handles HAR’s difficulties, including the large DEBONAIR attained an accuracy of 98.7% on the WISDM
dimensionality of sensor data, the difficulty separating fea- dataset.
tures from time series data, and the requirement for real- Authors in [16] suggested a change point-based data seg-
time recognition. The sensor data is converted into images mentation integration heterogeneous ensemble strategy for
by the time series imaging component, making it simpler activity recognition, known as HEA-DSeg (Heterogeneous
to do feature extraction and recognition. The framework Ensemble Approach with Data Segmentation), to increase the
creates images from the sensor data using a sliding window precision and robustness of activity recognition. It combines
method. Each image represents a series of sensor readings the strengths of various machine learning techniques with
taken inside the window. The framework does picture fusion change point detection. The sensor data is divided into vari-
and recognition using a deep neural network (DNN), and ous activity segments using a change point detection method,
a labeled collection of sensor data and images is used to and then the features are extracted from segmented sensor
train the DNN model. The suggested framework was tested data. After that, an ensemble classification module divides
on two publicly available HAR datasets, and in terms of the segmented sensor data into various activity classes by
accuracy and efficiency, it outperformed various cutting-edge combining the predictions of various machine learning meth-
techniques, including long-short term memory loss(LSTM) ods. The paper’s authors used two open activity recognition
network, recurrent neural network(RNN) and Convolutional datasets to evaluate HEA-DSeg. The outcomes demonstrated
neural networks(CNN) trained on raw sensor data. For in- that, on both datasets, HEA-DSeg beats cutting-edge activity
stance, the proposed framework attained an average accuracy recognition techniques.
of 97.0% on the HHAR dataset. A resource-constrained federated learning (FL) framework
Authors in [14] suggested KU-HAR, an open dataset for HAR with diverse labels and models was suggested
for heterogeneous human activity recognition. Its collection by [17] known as Resource-Constrained FL with Hetero-
included information from 90 users who used their smart- geneous Labels and Models for HAR (RCFLA-HAR). It
phone’s accelerometer and gyroscope sensors to conduct addresses the difficulties given by statistical and model het-
18 different tasks. The exercises included standing, sitting, erogeneities across users and the constrained resources of
walking, jogging, jumping, and other indoor and outdoor mobile devices. The process of distilling transforms the local
activities. Participants were asked to complete each exercise models of each user into student models while minimizing
for a set period to gather the data, which was gathered at a the size and complexity of the student model; it combines
100 Hz sampling rate. Raw activity samples and subsamples all users’ distilled student models into one big model.In this
were used to partition the dataset into two halves. The raw module, the student models are aggregated using a weighted
activity samples represented the initial information gathered averaging technique, where the relative performance of each
from the participants. The subsamples with 3 seconds of data student model determines the weights. The authors assessed
for each activity were taken from the raw activity samples. RCFLA-HAR, and the findings showed that RCFLA-HAR
The dataset has 20750 subsamples and a total of 1945 raw performs better than state-of-the-art FL techniques for HAR,
activity samples. The dataset is balanced since there was even in contexts with limited resources. RCFLA-HAR, for
an equal number of samples for each activity. The dataset example, achieved 95.1% accuracy on the Heterogeneity
has been used to test several HAR algorithms and has been HAR (HHAR) dataset.
proven to work well for HAR in several contexts. The authors Authors in [18] suggested a framework for recognizing
trained an RF classifier on the dataset, and the classifier had HAR using several smartphone sensors known as DailyHAR;
a subsample accuracy of 89.7%. it effectively recognizes various daily activities by fusing
Authors in [15] proposed a deep learning method for sensor data with machine learning algorithms. The data from
complex activity recognition (CAR) using various wearable the smartphone’s sensor is cleaned and prepared, combined
device sensors. It is an end-to-end model that systemati- with the preprocessed data from the various sensors. The rele-
cally collects characteristics and learns the sequential details vance of each sensor for HAR is considered using a weighted
of complex activities, dubbed DEBONAIR (Deep learning- average technique to combine the sensor data. That data is
based multimodal complex human Activity Recognition). divided into many activity classes of the fused features. It
Firstly, temporal and spatial features are extracted from uses machine learning approaches like support vector ma-
the sensor data using recurrent neural networks (RNNs) chines (SVMs) or RFs to categorize the features. The paper’s
and convolutional neural networks (CNNs). Then, sensor authors used two open datasets of HAR activities to evaluate
fusion is used to understand the connections between the DailyHAR. On both datasets, the results demonstrated that
various features; it uses a fully connected neural network. DailyHAR outperformed cutting-edge HAR techniques. For
A fully connected neural network layer is used to discover instance, DailyHAR achieved a 99.2% accuracy rate on the
the connections between the fused features and the vari- WISDM dataset.
ous activity classes. The research authors used two open Authors in [19] proposed a new DiamondNet neural
datasets of complicated activities to evaluate DEBONAIR. network-based architecture for heterogeneous sensor atten-
VOLUME 4, 2016 3
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3362676
tive fusion for HAR. DiamondNet greatly enhances HAR A. DATASET SELECTION
performance by utilizing the power of several sensor modali- A dataset of human activity recognition information gathered
ties and attention mechanisms. The feature extraction module from cellphones and smartwatches is called the Heterogene-
derives robust characteristics from each sensor modality by ity Activity Recognition dataset (HHAR). It was developed
denoising and extracting the most pertinent features. After to reflect various sensor types and authentic environments
that, the attention-based GCN creates new heterogeneous in actual deployments. Data on activity recognition and data
multisensor modalities by adaptively utilizing the possible from still experiments make up the dataset. The accelerome-
connections between various sensors. The attentive fusion ter and gyroscope, two motion sensors frequently present in
subnet combines a global attention mechanism and shallow smartphones and smartwatches, provide readings for the ac-
features to calibrate various feature levels across many sensor tivity recognition data. Data was gathered from 9 participants
modalities efficiently. This guarantees that all the features who used cellphones and smartwatches while engaging in 6
are fairly weighted and combined, leading to a more reliable activities. The exercises included walking, running, sitting,
and precise HAR model. Analyzed DiamondNet results us- standing, lying down, and stair climbing. The accelerometer
ing three open HAR datasets. The outcomes demonstrated records from stationary devices are included in the still
that DiamondNet beats cutting-edge HAR techniques on all experiment data. The dataset classes are shown in Figure 2,
three datasets. For instance, DiamondNet achieved a 99.5% which are the same in both datasets of accelerometer and
accuracy rate on the WISDM dataset. gyroscope.
Authors in [20] suggested a unique context-aware The HHAR dataset accurately depicts the sensor hetero-
HAR heterogeneous hyper-graph neural network (HHGNN) geneities that can be anticipated in actual deployments; the
model. The HHGNN model can use contextual information HHAR dataset is exceptional. This indicates that a wide
from sensor data, user profiles, and social network data to range of smartphone and smartwatch models were used in
increase HAR accuracy. There are two primary parts to the a wide range of use cases to acquire the data. This increases
HHGNN model: a heterogeneous hyper-graph neural net- the difficulty of the dataset for machine learning algorithms
work and a feature extraction module. A dataset of HAR to learn from while increasing its realism. A variety of
actions with labels trains the HHGNN model. The associa- research papers on human activity recognition have made
tion between the features from various data sources and the use of the HHAR dataset. It has been employed to assess
user’s behavior is taught to the model during training. After various machine learning algorithms, create fresh feature
training, the model can forecast user behavior based on fresh, extraction and classification techniques, and look at how sen-
unlabeled data. A public dataset of HAR actions was used sor heterogeneities affect the effectiveness of human activity
to evaluate the HHGNN model. According to the results, recognition. 8 smartphones and 4 smartwatches contributed
the HHGNN model beats cutting-edge HAR techniques on to the data. Here, we focused on the data collected from
the dataset. For instance, the HHGNN model has a 93.1% smartwatches that LG and Samsung made. The environment
accuracy rate. and scenario for activity recognition were created to generate
Authors in[21] suggested a Heterogeneous Clustering Ap- numerous activity primitives realistically. Users took two
proach for Human Activity Recognition (HCA-HAR); it separate routes for bicycling and walking, and two sets of
tackles the difficulties of data heterogeneity and the re- stairs were used for climbing and descending.
quirement for interpretable outcomes in HAR. HCA-HAR
comprises three primary parts: feature extraction, heteroge- B. FEATURES
neous data clustering, and activity interpretation. Data from The Heterogeneity Activity Recognition (HHAR) dataset’s
various sensors are grouped into several activity groupings smartwatch data samples are in the Watch_accelerometer.csv
using the heterogeneous data clustering module. It uses a and Watch_gyroscope.csv files. Each column in the file has a
new heterogeneous data clustering technique that considers separate data value, and each row represents a single sample.
the various feature types and their associations, and the A more thorough description of each column is provided in
activity interpretation module interprets the activity clusters Table 1. The user’s activity when the sample was collected
to provide activity labels that are understandable to humans. is indicated by the ground truth activity labels ’ Biking,’
Based on an open dataset of HAR activities, the research ’Sitting,’ ’Standing,’ ’Walking,’ ’Stair Up’, and ’Stair down.’
authors evaluated HCA-HAR. The outcomes demonstrated
that, on the dataset, HCA-HAR beats cutting-edge HAR C. ACTIVE LEARNING
clustering techniques. For instance, HCA-HAR achieved a Active learning is essential when gathering labeled data re-
95.1% accuracy rate. quires many resources or is expensive. Active learning works
by incrementally improving the machine learning model’s
III. PROPOSED APPROACH performance with less labeled input[22].Active learning
The steps of our Proposed approach are described in this takes an initial set of labeled data, and an unlabeled dataset
part, including dataset Information, features and the machine is used to start the procedure. It carefully chooses each
learning algorithm using Active Learning approaches. Figure iteration’s unlabeled dataset’s most confusing or uncertain
1 visually represents our approach. data point[23]. The decision function’s prediction proba-
4 VOLUME 4, 2016
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3362676
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3362676
training and testing sets using the train_test_split function. A labeled data (X_test and y_test). The code uses prediction
machine learning model is initialized by the training_model probabilities across several iterations to choose the data point
function in preparation for further training. The fit technique with the greatest uncertainty intelligently. The labeled dataset
trains the model using the available labeled data, and pre- is then supplemented with this sample, allowing the model
dict_proba forecasts class probabilities for test set samples. to concentrate on difficult cases. The questionable sample is
The algorithm chooses the most ambiguous sample in the taken from the test set to avoid repeat evaluations. Following
test set using Xtest. Drop to remove it from the test set and training, performance metrics, including accuracy, F1 score,
append it to the labeled training data. The accuracy, f1_score, precision, and recall, are computed by comparing the model’s
precision, recall, and accuracy scores are used to assess the predictions with the actual labels. The active learning loop
model’s predictions. continues, improving the model’s comprehension of intricate
patterns. Equation (1) represents the prediction mechanism
Algorithm 1 Active Learning with Machine Learning Clas- utilized by random forest.
sifier and Evaluation Metrics
Require: X (Features), Y (Labels)
N
!
trees
1: Split the dataset into training and X
ŷ = argmaxk I(yi = k) (1)
testing sets: Xtrain , Xtest , ytrain , ytest =
i=1
train_test_split(X, Y, test_size = 0.25)
2: Initialize empty lists: accuracies, f1_measure_list, preci-
The Random Forest model’s ability to handle high-
sion_list, recall_list
dimensional data and identify complex correlations within
3: Number of Iterations = 3
the dataset is one of its key benefits. Furthermore, it has
4: for i in range(Number of Iterations) do
integrated feature importance measures that help with feature
5: Initialize a model: model = training_model()
selection and understanding the underlying data dynamics
6: Train the model using the current labeled data:
by highlighting each feature’s significance in the prediction
model.fit(Xtrain , ytrain )
process. Thesearch’s active learning-based attack detection
7: Get the uncertainty scores for the sam-
methodology heavily relies on it. Important hyperparame-
ples in the test set: uncertainty =
ters that impact the model’s performance are the number
model.predict_proba(Xtest ).max(axis = 1)
of estimators, maximum tree depth, splitting criterion, and
8: Get the index of the most uncertain sample:
minimum sample requirements for splitting nodes. Using the
uncertain_sample = uncertainty.argmax()
Random Forest model’s ensemble diversity, we can easily
9: Add the uncertain sample to the labeled data
include it with active learning techniques, especially uncer-
10: Xtrain = concat(Xtrain , [Xtest .(uncertain_sample)])
tainty sampling.
11: ytrain = concat(ytrain , [ytest .(uncertain_sample]))
12: Remove the uncertain sample from the test set:
13: Xtest = Xtest .drop(index = uncertain_sample) 2) XGBoost with Active Learning
14: ytest .pop(uncertain_sample) A powerful GB technique, the active learning approach with
15: ypred = model.predict(Xtest ) the XGBoost classifier iteratively improves model accuracy.
16: Calculate evaluation metrics: The iterative approach starts with separating the labeled data
17: accuracy = accuracy_score(ytest , ypred ) (X_train and y_train) from an unlabeled dataset (X_test and
18: f 1_measure = f1_score(ytest , ypred ) y_test). Using the prediction probabilities derived from the
19: precision = precision_score(ytest , ypred ) XGBoost model, it chooses the test set predictions that are
20: recall = recall_score(ytest , ypred ) most uncertain throughout each iteration. The training set
21: Append the metrics to the respective lists: is enriched by adding the sample with the highest level of
22: Print the results: uncertainty to the labeled data. This unsure sample is sub-
23: print(Iterationi + 1 : sequently eliminated from the test set to prevent duplicates.
Accuracy, F 1Score, P recision, Recall On the new test set, the model makes predictions, and accu-
24: end for racy measures, including accuracy, F1 score, precision, and
25: Get the iteration with the best accuracy, F1 measure, recall, are computed. This active learning cycle continues,
precision and Recall testing the model’s understanding with difficult examples.
26: Print the best accuracy, F1 measure, Precision, Recall The XGBoost classifier ensures that the model learns from
the most instructive data points, resulting in higher accuracy
over iterations. The XGBoost classifier is renowned for its
1) Random Forest with Active Learning robustness and speed in processing complicated datasets.
To iteratively improve the model’s accuracy, the RF Classi- The technique effectively takes advantage of the XGBoost
fier with an active learning approach is a flexible ensemble classifier’s advantages, making it useful in active learning
learning method. This procedure begins by separating the settings when data labeling requires many resources. The
available labeled data (X_train and y_train) from the un- prediction function for XGBoost is represented in equation
6 VOLUME 4, 2016
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3362676
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3362676
model, harnessing the boosting power of GB. GB improves rows and 10 columns using active learning approaches. Data
performance and reduces labeling work by incorporating encoding was used after the dataset underwent preprocess-
active learning principles, making it an effective technique ing. Uncertainty sampling was used in the active learning
when labeled data is expensive or rare. Softmax function strategy, where samples were chosen according to expected
for GB is represented in equation (5), it is mostly used for probabilities. A typical 75%-25% split was used to divide the
multiclassification problems: dataset into training and test sets. Random Forest (RF), XG-
Boost, K-Nearest Neighbours (KNN), Decision Tree (DT),
eη·hk (xi ) Gradient Boosting (GB), and LightGBM Classifier (LGBM)
ŷi,k = PK (5)
η·hj (xi ) were the six machine learning models used based on active
j=1 e
learning. The models were trained iteratively on 3 iterations,
The "n estimators" parameter, which has its default value
with samples for labeling chosen each time via uncertainty
set, indicates how many boosting steps will be conducted. By
sampling. We used the Classifier’s default hyperparameters,
avoiding overstretching this parameter, the model determines
which varied accordingly. The objective of this procedure
the optimal number of phases for effectively minimizing mis-
was to evaluate the effect of active learning on model per-
takes. Furthermore, the "max depth" parameter determines
formance for various methods.
the maximum depth of every tree. While a shallower tree
avoids overfitting, a deeper tree can better catch intricate
A. PERFORMANCE METRICS
patterns. The default value of this parameter is maintained to
Performance metrics are critical indicators that measure how
ensure a balanced performance of the model. Every iteration
well machine learning algorithms work. They enable com-
employs the concept of uncertainty sampling by selecting test
parison and the choice of the best algorithms for particu-
set examples with the greatest uncertainty scores.
lar tasks by providing numerical insights into how well a
model is performing. Metrics for measuring categorization
6) LGBMClassifier with Active Learning
include accuracy, precision, recall, and F1 score. These in-
Due to its exceptional efficiency and accuracy in active learn-
dicators direct model improvement, assisting with ongoing
ing contexts, it implements the LightGBM algorithm. Light-
machine learning solution improvement and guaranteeing
GBM is a gradient-boosting system that uses a histogram-
alignment with expected results. The performance of the
based learning strategy, making it especially suitable for big
models was evaluated on the accuracy, precision, recall, F1-
datasets. LGBMClassifier excels in active learning by effi-
Score and confusion matrix. Confusion Matrix provides a
ciently managing the selection of ambiguous samples. The
clear overview of the model’s ability to classify instances.
learning process is considerably accelerated by its capacity to
build DTs top-down leaf-wise, facilitating quick adaption to
B. EXPERIMENTAL RESULTS
difficult situations. The fundamental idea of active learning,
The trained models are evaluated by the true positive, true
emphasizing instructive data points, fits in perfectly with
negative, false positive and false negative in the Confusion
LightGBM’s advantages. The iterative process of sample
Matrix, as seen in Figure 3 and Figure 5. The target column
selection and model refining becomes particularly effective
consists of 7 classes stand, null, sit, walk, stairup, stairdown
when LGBMClassifier is used inside an active learning
and bike, referred to in the confusion matrix as 0, 1, 2, 3,
framework.
4, 5 and 6, respectively. We used accuracy and F1-score in
the classification tasks to identify instances. Our study used
PK
wk ·hk (xi )
e k=1
ŷi,k = PK PK (6) accuracy to assess how well our models performed over the
e k=1 wk ·hk (xi )
j=1 7 classes, producing a range of accuracy and F1-score values
Equation (6) represents the Softmax function, a crucial part shown in Table 2 and Table 3.
of the LGBM Classifier. The number of boosting stages, or
’n_estimators’, is kept at its default value, enabling the model 1) Accelerometer
to dynamically determine the optimal number of stages for An accelerometer is a sensor that measures the rate at which
effective error minimization. To balance model performance an object’s velocity changes over time. It can track velocity
and avoid overfitting, the maximum depth of each tree is changes across various dimensions, including the x, y, and z
specified by the ’max_depth’ parameter, which is set to its axes. Accelerometers are used in smartphones and wearable
default value. technology to provide functions like screen rotation, step
counting, and gesture recognition. Accelerometers gather
IV. EXPERIMENTAL ANALYSIS AND RESULTS vital information for analyzing motion, keeping track of
The dataset used in our experiment consists of data from 7 physical activity, and improving user experiences in contem-
different classes collected from 2 sensors: an Accelerometer porary technology by detecting the acceleration of an object
and a gyroscope using wristwatches. We used different mod- or device. Table 2 provides the results on accelerometer data.
els. The classes include walking, sitting, standing, Biking, We applied 6 models on the Watch accelerometer dataset,
stairs-up, stairs-down, and null. Various machine learning providing high accuracy and F1-Score rates of 99.98% and
models were trained on the dataset, which contains 3,540,962 99.98% on RF. Meanwhile, XGBoost and DT, KNN, GB, and
8 VOLUME 4, 2016
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3362676
LGBMClassifier gave us the accuracy of 98.26%, 99.97%, gave us a somewhat accuracy of 98.26%, 99.35%, 94.79%,
99.35%, 94.26%, and 98.30%. and 98.30%.
TABLE 3: Evaluation Results on Gyroscope Data
TABLE 2: Evaluation Results on Accelerometer Data
Iterations Model Accuracy F1-Score
Iterations Model Accuracy F1-Score Random Forest 99.99% 99.99%
Random Forest 99.98% 99.98% XGboost 98.14% 98.14%
XGboost 98.26% 98.26% KNN 99.26% 99.26%
KNN 99.35% 99.35% Iteration 1
Iteration 1 Decision Tree 99.98% 99.98%
Decision Tree 99.97% 99.97% Gradient Boosting 94.79% 94.76%
Gradient Boosting 94.26% 94.24% LGBMClassifier 98.18% 98.18%
LGBMClassifier 98.30% 98.30% Random Forest 99.99% 99.99%
Random Forest 99.98% 99.98% XGboost 98.14% 98.14%
XGboost 98.26% 98.26% KNN 99.26% 99.26%
KNN 99.35% 99.35% Iteration 2
Iteration 2 Decision Tree 99.99% 99.99%
Decision Tree 99.97% 99.97% Gradient Boosting 94.79% 94.76%
Gradient Boosting 94.26% 94.24% LGBMClassifier 98.18% 98.18%
LGBMClassifier 98.30% 98.30% Random Forest 99.98% 99.98%
Random Forest 99.98% 99.98% XGboost 98.14% 98.14%
XGboost 98.26% 98.26% KNN 99.26% 99.26%
KNN 99.35% 99.35% Iteration 3
Iteration 3 Decision Tree 99.99% 99.99%
Decision Tree 99.97% 99.97% Gradient Boosting 94.79% 94.76%
Gradient Boosting 94.26% 94.24% LGBMClassifier 98.18% 98.18%
LGBMClassifier 98.30% 98.30%
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3362676
showed decent accuracy, including XGBoost, KNN, GB, With RF and DT in particular, the active learning approach
and LGBMClassifier, which ranged from 94.26% to 99.26%. demonstrated good efficacy, obtaining the best accuracy of
10 VOLUME 4, 2016
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3362676
99.99% and an F1-score of 99.99%. learning algorithms, and neural networks. This work focuses
Our active learning approach to human activity recognition on utilizing data from wearable sensors to make better pre-
differentiates us from other methods that rely on SVMs, deep dictions and improve the gap left behind. This is crucial due
VOLUME 4, 2016 11
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3362676
to the unique challenges posed by wearable sensors, such as [10] V. Patel, A. Chesmore, C. M. Legner, and S. Pandey, “Trends in workplace
the diverse range of human actions and real-world situations, wearable technologies and connected-worker solutions for next-generation
occupational safety, health, and productivity,” Advanced Intelligent Sys-
making our method invaluable for practical applications. tems, vol. 4, no. 1, p. 2100099, 2022.
[11] G. French, M. Hulse, D. Nguyen, K. Sobotka, K. Webster, J. Corman,
V. CONCLUSION AND FUTURE SCOPE B. Aboagye-Nyame, M. Dion, M. Johnson, B. Zalinger, et al., “Impact
of hospital strain on excess deaths during the covid-19 pandemic—united
This paper proposed an active learning approach to iden- states, july 2020–july 2021,” American Journal of Transplantation, vol. 22,
tify human behavior from the data collected through an no. 2, pp. 654–657, 2022.
accelerometer and gyroscope on the HHAR dataset. It gave [12] P. Kumar and S. Suresh, “Deeptranshhar: Inter-subjects heterogeneous
activity recognition approach in the non-identical environment using wear-
much better results than the baseline approach, with an able sensors,” National Academy Science Letters, vol. 45, no. 4, pp. 317–
increase of 5.74% F1-score gain. We looked over several 323, 2022.
research studies on human activity recognition to get in- [13] Z. Qin, Y. Zhang, S. Meng, Z. Qin, and K.-K. R. Choo, “Imaging and
fusing time series for wearable sensor-based human activity recognition,”
sights and potential benefits for our investigation. We ap- Information Fusion, vol. 53, pp. 80–87, 2020.
plied various machine-learning models with an active learn- [14] N. Sikder and A.-A. Nahid, “Ku-har: An open dataset for heterogeneous
ing approach to the HHAR. The highest results achieved human activity recognition,” Pattern Recognition Letters, vol. 146, pp. 46–
54, 2021.
were 99.99% from RF and DT on the gyroscope dataset, [15] N. Hnoohom, A. Jitpattanakul, I. You, and S. Mekruksavanich, “Deep
and on the accelerometer dataset, they were 99.98% with learning approach for complex activity recognition using heterogeneous
RF. The other models on the gyroscope, including XG- sensors from wearable device,” in 2021 Research, Invention, and Innova-
tion Congress: Innovation Electricals and Electronics (RI2C), pp. 60–65,
Boost, KNN, GB, and LGBM, provided an accuracy of IEEE, 2021.
98.14%, 99.26%, 94.79% and 98.18%, and F1-score of [16] Q. Ni, L. Zhang, and L. Li, “A heterogeneous ensemble approach for activ-
98.14%, 99.26%, 94.76% and 98.18% respectively. Mean- ity recognition with integration of change point-based data segmentation,”
Applied Sciences, vol. 8, no. 9, p. 1695, 2018.
while, on the accelerometer, other models, including DT, [17] G. K. Gudur and S. K. Perepu, “Resource-constrained federated learning
XGBoost, KNN, GB, and LGBM, provided an accuracy of with heterogeneous labels and models for human activity recognition,” in
99.97%, 98.26%, 99.35%, 94.26 and 98.30%, and F1-score International Workshop on Deep Learning for Human Activity Recogni-
tion, pp. 57–69, Springer, 2021.
of 99.97%, 98.26%, 99.35%, 94.24% and 98.30% respec- [18] M.-S. Dao, T.-A. Nguyen-Gia, and V.-C. Mai, “Daily human activities
tively. The results imply that our approach can generate recognition using heterogeneous sensors from smartphones,” Procedia
the most accurate results. In the future, we intend to fuse computer science, vol. 111, pp. 323–328, 2017.
[19] Y. Zhu, H. Luo, R. Chen, and F. Zhao, “Diamondnet: A neural-network-
multiple datasets into one to provide a generalized model to based heterogeneous sensor attentive fusion for human activity recog-
predict activities. Further, we intend to collect data from more nition,” IEEE Transactions on Neural Networks and Learning Systems,
sensors. 2023.
[20] W. Ge, G. Mou, E. O. Agu, and K. Lee, “Heterogeneous hyper-graph neu-
ral networks for context-aware human activity recognition,” in 2023 IEEE
ACKNOWLEDGEMENT International Conference on Pervasive Computing and Communications
This work was funded by the Deanship of Scientific Research Workshops and other Affiliated Events (PerCom Workshops), pp. 350–
354, IEEE, 2023.
at Jouf University under grant No (DSR-2021-02-0383). [21] S. Kafle and D. Dou, “A heterogeneous clustering approach for human
activity recognition,” in Big Data Analytics and Knowledge Discovery:
REFERENCES 18th International Conference, DaWaK 2016, Porto, Portugal, September
[1] S. B. Kotsiantis, I. D. Zaharakis, and P. E. Pintelas, “Machine learning: a 6-8, 2016, Proceedings 18, pp. 68–81, Springer, 2016.
review of classification and combining techniques,” Artificial Intelligence [22] Y. Shen, H. Yun, Z. C. Lipton, Y. Kronrod, and A. Anandkumar,
Review, vol. 26, pp. 159–190, 2006. “Deep active learning for named entity recognition,” arXiv preprint
[2] S. Budd, E. C. Robinson, and B. Kainz, “A survey on active learning and arXiv:1707.05928, 2017.
human-in-the-loop deep learning for medical image analysis,” Medical [23] J. Kasai, K. Qian, S. Gurajada, Y. Li, and L. Popa, “Low-resource
Image Analysis, vol. 71, p. 102062, 2021. deep entity resolution with transfer and active learning,” arXiv preprint
[3] A. H. George, A. Shahul, and A. S. George, “Wearable sensors: A new way arXiv:1906.08042, 2019.
to track health and wellness,” Partners Universal International Innovation
Journal, vol. 1, no. 4, pp. 15–34, 2023.
[4] R. Jerath, M. Syam, and S. Ahmed, “The future of stress management:
Integration of smartwatches and hrv technology,” Sensors, vol. 23, no. 17,
p. 7314, 2023.
[5] P. C. Shih, K. Han, E. S. Poole, M. B. Rosson, and J. M. Carroll, “Use
and adoption challenges of wearable activity trackers,” IConference 2015
proceedings, 2015.
[6] S. Dash, S. K. Shakyawar, M. Sharma, and S. Kaushik, “Big data in SIDRA ABBAS has received her BS degree De-
healthcare: management, analysis and future prospects,” Journal of big partment of Computer Science, COMSATS Uni-
data, vol. 6, no. 1, pp. 1–25, 2019. versity, Islamabad, Pakistan. Her research interests
[7] M. Pantzar and M. Ruckenstein, “The heart of everyday analytics: emo- include but are not limited to computer forensics,
tional, material and practical extensions in self-tracking market,” Con- machine learning, criminal profiling, software wa-
sumption Markets & Culture, vol. 18, no. 1, pp. 92–109, 2015. termarking, intelligent systems, and data privacy
[8] A. Ometov, V. Shubina, L. Klus, J. Skibińska, S. Saafi, P. Pascacio, protection.
L. Flueratoru, D. Q. Gaibor, N. Chukhno, O. Chukhno, et al., “A survey
on wearable technology: History, state-of-the-art and current challenges,”
Computer Networks, vol. 193, p. 108074, 2021.
[9] W. H. O. N. Disease and M. H. Cluster, Innovative care for chronic condi-
tions: building blocks for action: global report. World Health Organization,
2002.
12 VOLUME 4, 2016
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3362676
VOLUME 4, 2016 13
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4