0% found this document useful (0 votes)

11 views41 pages

HH Docx-1

Uploaded by

poornima devi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views41 pages

HH Docx-1

Uploaded by

poornima devi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 41

INTEGRATED CURL COUNTING AND ACTION

RECOGNITION SYSTEM

MINI PROJECT REPORT

Submitted by

Nithya Shree.V (3122213002069)

Poornima Devi.M (3122213002077)
Sujatha Natarajan (3122213002105)

UEC2604

MACHINE

LEARNING

Department of Electronics and Communication

Engineering Sri Sivasubramaniya Nadar College of

Engineering
(An Autonomous Institution, Affiliated to Anna
University)

Rajiv Gandhi Salai (OMR), Kalavakkam – 603

110 EVEN SEM 2023-2024

Sri Sivasubramaniya Nadar College of Engineering
(An Autonomous Institution, Affiliated to Anna University)

BONAFIDE CERTIFICATE

Certified that this mini project titled “INTEGRATED CURL

COUNTING AND ACTION RECOGNITION SYSTEM” is the bonafide
work of “NithyaShree.V(3122213002069), PoornimaDevi.M
(3122213002077) and Sujatha Natarajan (3122213002105) of VI
Semester Electronics and Communication Engineering Branch
during Even Semester 2023 – 2024 for UEC2604 Machine Learning

Submitted for examination held on

INTERNAL EXAMINER
ABSTRACT

Human action recognition in videos is an active area of research in

computer vision and pattern recognition. Nowadays, artificial
intelligence (AI) based systems are needed for human-behavior
assessment and security purposes. The existing action recognition
techniques are mainly using pretrained weights of different AI
architectures for the visual representation of video frames in the training
stage, which affect the features’ discrepancy determination, such as the
distinction between the visual and temporal signs. To address this issue,
we propose a bi-directional long short-term memory (BiLSTM) based
attention mechanism with a dilated convolutional neural network
(DCNN) that selectively focuses on effective features in the input frame
to recognize the different human actions in the videos. In this diverse
network, we use the DCNN layers to extract the salient discriminative
features by using the residual blocks to upgrade the features that keep
more information than a shallow layer. Furthermore, we feed these
features into a BiLSTM to learn the long-term dependencies, which is
followed by the attention mechanism to boost the performance and
extract the additional high-level selective action related patterns and cues
We further use the center loss with Softmax to improve the loss function
that achieves a higher performance in the video-based action
classification. The proposed system is evaluated on three benchmarks.

3
TABLE OF

CHAPTER NO TITLE PAGE NO

1 Introduction 7
2 Literature Survey 8
3 Methodology 10
3.1 Introduction 10
3.2 ML Process Flow 11
3.3 Real Time Curl Counter 12
3.3.1 Making Detections 12
3.3.2 Calculation Of Angles 13
3.3.3 Exercise Detection Logic 14
3.3.4 Determining Joints 15
3.3.5 Counting Exercise Repetitions 16
3.4 Action Recognition 18
3.4.1 CNN Workflow 18
3.4.1.1 Pooling 20
3.4.2 The Neural Network 22
3.4.3 Convolution LSTM Workflow 23
3.4.4 Data Acquisition 25
3.4.5 Data Visualization 26
3.4.6 Main Working 28

4 Results 45
4.1 Multiple Activity Prediction 45
4.1 Single Prediction on a Test Video 47
5 Conclusion 48
6 References 49

4
LIST OF

Figure no Content Page no.

3.2.a Flow diagram of the process of 11

Machine Learning
3.3.1.a Making Detections 12
3.3.2.a Calculation of Angles 13
3.3.4.a Determining Joints Using Mediapipe 15
3.3.5.a Curl Counting 16
3.3.5.b Curl Counting 16
3.4.1.a CNN Workflow 17
3.4.1.b How CNN Works 17
3.4.1.c Convolution Process 18
3.4.1.1.a Pooling Process 19
3.4.1.1.b Max Pooling 19
3.4..2.a CNN Architecture 20
3.4.3.a Convolutional LSTM Workflow 21
3.4.5.a Data Visualization 23
3.4.6.a ConvLSTM Working 24
3.4.6.b ConvLSTM Model Architecture 27
3.4.6.c Loss Curve 28
3.4.6.d Accuracy Curve 28
3.4.6.e LRCN Model Architecture 32
3.4.6.f,&g Loss Curve and Accuracy Curve 37
4.a Multiple Action Recognition 38
4.b Single Prediction Results 39

5
SYMBOLS AND

Symbols Abbreviation
ML Machine Learning
CNN Convolutional Neural Network
LSTM Long Short-Term Memory Networks
DCNN Dilated convolutional neural network
BiLSTM Bi-directional long short-term memory
LRCN Long-term Recurrent Convolutional Network
HAR Human Activity Recognition

6
CHAPTER

INTRODUCTION

Physical activity is crucial for maintaining a healthy lifestyle. Video

based action recognition is an emerging and challenging area of
research in this era particularly for identifying and recognizing actions
in a video sequence from a surveillance stream. The action recognition
in a video has many applications, such as content-based video
retrieval,surveillance systems for security and privacy purposes,human–
computer-interaction, and activity recognition.. Nowadays, the digital
contents are exponentially growing day-by-day, so effective AI-based
intelligent internet of things systems are needed for surveillance to
monitor and identify human actions and activities. The aim of action
recognition is to detect and identify people, their behavior, suspicious
activities in the videos, and deliver appropriate information to support
interactive programs and IoT based applications Action recognition still
poses many challenges when it comes to ensuring the security and the
safety of the residents, including industrial monitoring, violence
detection, person identification, virtual reality, and cloud environments
due to significant improvements in camera movements, occlusions,
complex background, and variations in illumination. (add curl counter
intro)

7
CHAPTER

LITERATURE SURVEY

This section presents previous work related to our proposed

method.

• Ajay L, Vidyadevi G Biradar, Chandu M, Bharath JB developed

AI-powered fitness trainer utilizing human pose estimation to
analyze real-time exercise movements, providing personalized
feedback for enhanced workout effectiveness and healthier
lifestyles

• Rutuja Mhaiskar, Preeti Verma, named as “Performance Analysis

of Human Activity” aims to create an AI gym assistant through
Jupyter Notebook and MediaPipe, leveraging pre-trained models
for accurate pose estimation and hand tracking.

• Shaikh Mohd presented a paper named as Pushup Counting and

Evaluating Based on Human Keypoint Detection aims into AI and
ML integration via the MediaPipe framework for real-time push-
up counting and evaluation, achieving over 90% accuracy. While
some detection errors exist, expanding datasets and refining error
classification are envisioned for future enhancements.

8
• Sejal Bhatia named the title as “Activity Identification with
Machine Learning on Wearable Tracker" Wearable study shows
ML can identify activities (success with XGBoost).Their wearable
data might not be ideal for your camera-based project, but ML for
activity detection is promising.

• Litao Guang, Jiancheng Zou, Zibo Wen investigates real-time

human heart rate and blood pressure detection during exercise via
MediaPipe, integrating YOLOv8-pose for monitoring. By
leveraging pose estimation and hand key points, it introduces a
non-contact system with potential applications in fitness tracking
and health monitoring projects.

9
CHAPTER 3

METHODOLOGY

3.1 INTRODUCTION

A quick overview of our project would entail

● Implementation of Google MediaPipe's BlazePose model
for real-time human pose estimation
● Computer vision tools (i.e., OpenCV) for color
conversion, detecting cameras, detecting camera
properties, displaying images, and custom
graphics/visualization
● Inferred 3D joint angle computation according to
relative coordinates of surrounding body landmarks
● Guided training data generation
● Data preprocessing and callback methods for efficient deep
neural network training
● Customizable LSTM and Attention-Based LSTM models
● Real-time visualization of joint angles, rep counters, and
probability distribution of exercise classification
predictions

In this chapter, let us discuss the basic methodology of how ML

works, and about the methodology and working of CNN,
convolutional LSTM and rep counting.

1
3.2 ML PROCESS

Let us look at a breakdown of the ML process:

* Training data: This is the initial data that’s fed into the machine
learning algorithm. The quality and quantity of this data heavily
influences the outcome of the model.
* Train ML algorithm: This step involves training the algorithm on the
provided data. During training, the algorithm learns to identify patterns
and relationships within the data.
* Model Input Data: Once trained, the model can then be used to make
predictions on new data. This new data is fed into the model as input.
* ML Algorithm: The machine learning algorithm leverages the learned
patterns and relationships from the training data to analyze the new input
data.

Fig 3.2.a:Flow diagram of the process of Machine Learning

1
3.3 REAL-TIME CURL

3.3.1 MAKING DETECTIONS:

We begin by detecting and tracking human keypoints using Mediapipe.

Both YOLO and Mediapipe offer pre-trained models optimized for
human pose detection. We have decided to use Mediapipe as it best fits
our model. Keypoints are marked and the angles between them are
calculated. Once the human keypoints are detected, we proceed to
identify specific key points relevant to different exercises. These key
points typically include joints like shoulders, elbows, hips, and knees.
Calculate the angles between these key points to accurately assess the
person's posture and form during exercises.

Fig 3.3.1.a Making Detections

1
3.3.2 CALCULATION OF

Angle Computation: Custom functions are developed to calculate the

angles between specific body landmarks, notably the shoulder,
elbow, and wrist.

Thresholding: Threshold values are defined to identify the initiation and

completion of a curl motion based on the angle measurements. For
instance, a curl might be considered initiated when the arm is below a
certain angle threshold and completed when it surpasses another
threshold.

Incremental Counting: A logic scheme is implemented to track the

number of curls completed by individuals in real-time. This involves
incrementing a counter each time the detected angle pattern suggests the
completion of a curl motion.

Fig 3.3.2.a Calculation of Angles

1
3.3.3. EXERCISE DETECTION

We utilize the detected keypoints and angles to implement logic for

exercise detection. This logic should be adaptable to different types of
exercises, such as push-ups, squats, lunges, etc. We define thresholds or
rules specific to each exercise to classify and identify them accurately.

The process involves leveraging the detected keypoints and angles to

discern whether the observed movement aligns with the execution of a
curl. Key elements to consider include the positioning of the elbows and
shoulders relative to the torso, as well as the trajectory of the hands
during the exercise. By analyzing the angles formed between these key
points, specific criteria can be established to differentiate between a
bicep curl and other activities or poses. For instance, the angle at the
elbow joint could be a critical factor, with thresholds set to identify the
bending and extension phases of the curl movement. Additionally,
factors such as the range of motion, consistency in movement patterns,
and temporal sequence of actions can further refine the detection logic,
enhancing its accuracy and reliability.

This entails accounting for variations in body proportions, techniques,

and equipment used during the exercise. Incorporating machine learning
techniques or adaptive algorithms can enable the system to learn and
adjust its detection criteria based on observed data, allowing for real-
time adjustments to accommodate diverse scenarios.

1
3.3.4 DETERMINING

Fig 3.3.4.a Determining Joints Using Mediapipe

By leveraging convolutional neural networks (CNNs) and other

machine learning techniques, Mediapipe analyzes input data to identify
anatomical landmarks such as joints, including those for the shoulders,
elbows, wrists, hips, knees, and ankles.

These joints are crucial for understanding human pose and movement,
serving as the foundation for a wide range of applications, from fitness
tracking to gesture recognition. Through its sophisticated architecture
and training methodology, Mediapipe achieves robustness and accuracy
in joint detection across diverse body types, poses, and environmental
conditions, making it a valuable tool for researchers, developers, and
practitioners in various fields.

1
3.3.5 COUNTING EXERCISE

Monitor the person's movements and transitions between different

exercise phases to count repetitions. Utilize techniques like state
machines or temporal analysis to track the progression of each repetition
and accurately count them. Example(Counting Push-Up Repetitions):
As the person performs push-ups, the application keeps track of the
number of repetitions completed. This is done by counting instances
where the person transitions from the starting position to the ending
position of a push-up.

Fig 3.3.5.a Curl Counting

Fig 3.3.5.b Curl Counting

1
3.4 ACTION

3.4.1 CNN WORKFLOW

Fig 3.4.1.a :CNN Workflow

Fig 3.4.1.b:How CNN works

A convolution is a linear operation that involves the multiplication of a

set of weights with the input. These weights are present in a smaller
matrix called a kernel or filter. Convolution is basically done to

1
emphasize the important or key features of an image such as borders,
edges, corners, highlighted portions, etc. The filters will contain values
that will help extract necessary portions of the image. The below image
shows how it’s done:

Fig 3.4.1.c: Convolution Process

As it is observed, the filter is superimposed onto the image starting from

the right corner most set of pixels. The weights of the filter are
multiplied by the values of the pixels over which it is placed. The
multiplied values are added up to give the first value of the convolved
matrix as shown (Fig 3.5). The filter moves right by one pixel and does
the same, until it reaches the last section of the image. The size of the
convolved feature is given by,

1
3.4.1.1

Pooling is a convolution process where the filter extracts a single value

from the area it convolves. It is done in order to summarize or reduce
the size of data to be able to make the CNN process simpler. It is a form
of image compression. It is similar to convolution but is done more
differently. The diagram below shows an example:

Fig 3.4.1.1.a: Pooling Process

Pooling in our case is done via Max Pooling, where the maximum value
is chosen from the sub-matrix of the input and is used as the first value
of the pooling matrix. For the next value, calculate the maximum value
in the next sub-matrix and update the new element into the Pooling
matrix. This goes on till the pooling matrix is filled.

Fig 3.4.1.1.b: Pooling Process-Max Pooling

1
3.4.2 THE NEURAL

Input layer: The flattened layer that was just created acts as the input
layer for the upcoming neural network. the data from the input layer is
further transferred onto the deeper layers

Hidden layer : The inputs coming from the previous layer, are
multiplied with weights and summed up along with a bias. The
weighted sum is then passed through an Activation Function. Activation
Function has the responsibility of which node to fire for feature
extraction and finally output is calculated. This whole process is known
as Forward Propagation.

Fig 3.4.2.a: CNN Architecture

2
3.4.3 CONVOLUTIONAL LSTM

Fig 3.4.3.a:Convolutional LSTM Workflow

2
3.4.4 DATA

Data of the various 30 activities were taken from corporate sources and
used to train the model. For this purpose, we use the TensorFlow
module. This is nothing but an open-source library developed by Google.
We will be using the [UCF50 - Action Recognition Dataset]
(https://www.crcv.ucf.edu/data/UCF50.php), consisting of realistic
videos taken from youtube which differentiates this data set from most of
the other available action recognition datasets as they are not realistic
and are staged by actors. The Dataset contains:

* `50` Action Categories

* `25` Groups of Videos per Action Category

* `133` Average Videos per Action Category

* `199` Average Number of Frames per Video

* `320` Average Frames Width per Video

* `240` Average Frames Height per Video

* `26` Average Frames Per Seconds per Video

2
3.4.5 DATA

In the first step, we will visualize the data along with labels to get an idea
about what we will be dealing with.

For visualization, we will pick `20` random categories from the dataset
and a random video from each selected category and will visualize the
first frame of the selected videos with their associated labels written.
This way we’ll be able to visualize a subset ( `20` random videos ) of the
dataset.

Fig 3.4.5.a : Data Visualization

2
3.4.6 MAIN

* Prediction: Based on the analysis of the new data, the model generates
a prediction or output.
* Accuracy: The accuracy of the model’s predictions are assessed. This
is usually determined by comparing the model’s predictions against a set
of known values.
* Successful Model: If the model’s accuracy meets a certain threshold,
it’s considered successful.
* Testing- single prediction vids along with multiple action recognition
videos for multiple video prediction we downloaded a yt video and for
single action prediction we have implemented a real-time curl counter
for exercise tracking. we feed that input here.

Preprocessing of the Dataset is performed where we extract, resize the

videos and normalize them to set parameters.
-Split the Data into Train and Test Set-split our data to create training
and testing sets. We will also shuffle the dataset before the split to avoid
any bias and get splits representing the overall distribution of the data.
- Implement the ConvLSTM Approach-
In this step, we will implement the first approach by using a combination
of ConvLSTM cells. A ConvLSTM cell is an LSTM with convolution
embedded in the architecture, which makes it capable of identifying
spatial features of the data while keeping into account the temporal
relation.

Fig 3.4.6.a ConvLSTM Working

2
- Construct the Model-
To construct the model, we will use Keras [`ConvLSTM2D`] recurrent
layers. The `ConvLSTM2D` layer also takes in the number of filters and
kernel size required for applying the convolutional operations. The
output of the layers is flattened in the end and is fed to the `Dense` layer
with softmax activation which outputs the probability of each action
category. Model: "sequential"

Layer (type) Output Shape Param #

=======================================================
conv_lstm2d (ConvLSTM2D) (None, 20, 62, 62, 4) 1024

max_pooling3d(MaxPooling3D) (None, 20, 31, 31, 4) 0

time_distributed (TimeDistributed) (None, 20, 31, 31, 4) 0

conv_lstm2d_1 (ConvLSTM2D) (None, 20, 29, 29, 8) 3488

max_pooling3d_1 (MaxPooling 3D) (None, 20, 15, 15, 8) 0

time_distributed_1 (TimeDistributed) (None, 20, 15, 15, 8) 0

conv_lstm2d_2 (ConvLSTM2D) (None, 20, 13, 13, 14) 11144

max_pooling3d_2 (MaxPooling 3D) (None, 20, 7, 7, 14) 0

time_distributed_2 (TimeDistributed) (None, 20, 7, 7, 14) 0

conv_lstm2d_3 (ConvLSTM2D) (None, 20, 5, 5, 16) 17344

max_pooling3d_3 (MaxPooling 3D) (None, 20, 3, 3, 16) 0

flatten (Flatten) (None, 2880) 0

dense (Dense) (None, 4) 11524

========================================================
Total params: 44,524
Trainable params:
44,524 Non-trainable
params:
—

Model Created Successfully!

2
Explanation:

1. ConvLSTM2D Layers:
- There are multiple ConvLSTM2D layers in the model, each followed
by max-pooling layers.
- ConvLSTM2D layers combine convolutional and LSTM operations,
allowing them to learn spatial-temporal patterns directly from the input
sequence of images.
- The output shape of each layer indicates a sequence of 20 frames
with different spatial dimensions and depths of feature maps.
- The number of output channels (4, 8, 14, 16) in each ConvLSTM2D
layer increases gradually, indicating a progressive extraction of more
complex features.

2. MaxPooling3D Layers:
- Max-pooling layers are applied after each ConvLSTM2D layer to
reduce the spatial dimensions of the feature maps.
- MaxPooling3D layers operate over both spatial and temporal
dimensions, reducing computational complexity and focusing on the
most relevant features.

3. TimeDistributed Layers:
- TimeDistributed layers are used to apply operations (possibly
additional convolutions or transformations) independently to each time
step of the input sequence.
- In this model, TimeDistributed layers don't introduce any additional
parameters but may be used for further feature processing.

4. Flatten Layer:
- The Flatten layer is applied to convert the 3D feature maps into a 1D
vector, which can be fed into a fully connected dense layer for
classification.

5. Dense Layer:
- The final Dense layer performs classification based on the features
extracted by the ConvLSTM2D layers.
- The output shape indicates that the model predicts 4 classes.

2
Fig 3.4.6.b:ConvLSTM Model Architecture

2
-Step 4.3: Plot Model’s Loss & Accuracy

Fig 3.4.6.c: Loss Curve

Fig 3.4.6.d: Accuracy Curve

2
- Implement the LRCN

In this step, we implement the LRCN Approach by combining

Convolution and LSTM layers in a single model. The CNN model can be
used to extract spatial features from the frames in the video, and for this
purpose, a pre-trained model can be used that can be fine-tuned for the
problem. And the LSTM model can then use the features extracted by
CNN, to predict the action being performed in the video. The
Convolutional layers are used for spatial feature extraction from the
frames, and the extracted spatial features are fed to LSTM layer(s) at
each time-steps for temporal sequence modeling. This way the network
learns spatiotemporal features directly in an end-to-end training,
resulting in a robust model.

-Construct the Model-

We use time-distributed `Conv2D` layers which will be followed by

`MaxPooling2D` and `Dropout` layers. The feature extracted from the
`Conv2D` layers will be then flattened using the `Flatten` layer and will
be fed to a `LSTM` layer. The `Dense` layer with softmax activation will
then use the output from the `LSTM` layer to predict the action being
performed. Model: "sequential_1"

2
Layer (type) Output Shape Param #
============================================================
time_distributed_3 (TimeDistributed) (None, 20, 64, 64, 16) 448

time_distributed_4 (TimeDistributed) (None, 20, 16, 16, 16) 0

time_distributed_5 (TimeDistributed) (None, 20, 16, 16, 16) 0

time_distributed_6 (TimeDistributed) (None, 20, 16, 16, 32) 4640

time_distributed_7 (TimeDistributed) (None, 20, 4, 4, 32) 0

time_distributed_8 (TimeDistributed) (None, 20, 4, 4, 32) 0

time_distributed_9 (TimeDistributed) (None, 20, 4, 4, 64) 18496

time_distributed_10 (TimeDistributed) (None, 20, 2, 2, 64) 0

time_distributed_11 (TimeDistributed) (None, 20, 2, 2, 64) 0

time_distributed_12 (TimeDistributed) (None, 20, 2, 2, 64) 36928

time_distributed_13 (TimeDistributed) (None, 20, 1, 1, 64) 0

time_distributed_14 (TimeDistributed) (None, 20, 64) 0

lstm (LSTM) (None, 32) 12416

dense_1 (Dense) (None, 5) 165

====================================================
Total params: 73093 (285.52 KB)
Trainable params: 73093 (285.52 KB)
Non-trainable params: 0 (0.00 Byte)

Model Created Successfully!

3
Explanation:

1. Input Layer (TimeDistributed):

- The input images are processed in a time-distributed manner,
indicating that the model is designed to handle sequences of images (20
frames in this case).
- The input images have a shape of (64, 64) pixels.

2. Convolutional Layers (TimeDistributed):

- There are several convolutional layers (with ReLU activation
functions) in the model. These layers extract features from the input
images at different spatial resolutions and depths.
- The number of filters increases from 16 to 64 across these layers,
indicating a progressive extraction of more complex features.

3. MaxPooling Layers (TimeDistributed):

- Max-pooling layers are used to downsample the spatial dimensions
of the feature maps, reducing computational complexity and extracting
the most important features.

4. LSTM Layer:
- The LSTM layer processes the extracted features from the CNN
layers over time (across the sequence of 20 frames).
- LSTM networks are effective for sequence modeling tasks, as they
can capture temporal dependencies and patterns in the data.

5. Dense Layer:
- The output of the LSTM layer is fed into a dense layer with softmax
activation, which produces the final output.
- The output shape indicates that the model predicts 5 classes, likely
corresponding to different exercises or actions.

3
Fig 3.4.6.e LRCN Model Architecture

3
-Compile & Train the Model

Epoch 1/70
86/86 [==============================] - 20s 202ms/step -
loss: 1.5562 - accuracy: 0.2711 - val_loss: 1.5446 - val_accuracy: 0.2558
Epoch 2/70
86/86 [==============================] - 16s 190ms/step -
loss: 1.2941 - accuracy: 0.4548 - val_loss: 1.4454 - val_accuracy: 0.5000
Epoch 3/70
86/86 [==============================] - 16s 190ms/step -
loss: 1.0564 - accuracy: 0.5539 - val_loss: 0.9310 - val_accuracy: 0.6744
Epoch 4/70
86/86 [==============================] - 16s 187ms/step -
loss: 1.0035 - accuracy: 0.5627 - val_loss: 0.8670 - val_accuracy: 0.6628
Epoch 5/70
86/86 [==============================] - 16s 191ms/step -
loss: 0.7846 - accuracy: 0.6443 - val_loss: 0.9286 - val_accuracy: 0.6512
Epoch 6/70
86/86 [==============================] - 16s 189ms/step -
loss: 0.7097 - accuracy: 0.7055 - val_loss: 0.7071 - val_accuracy: 0.7326
Epoch 7/70
86/86 [==============================] - 16s 190ms/step -
loss: 0.6251 - accuracy: 0.7318 - val_loss: 0.6335 - val_accuracy: 0.7791
Epoch 8/70
86/86 [==============================] - 19s 227ms/step -
loss: 0.5355 - accuracy: 0.7464 - val_loss: 0.6527 - val_accuracy: 0.8140
Epoch 9/70
86/86 [==============================] - 17s 198ms/step -
loss: 0.4992 - accuracy: 0.8076 - val_loss: 0.8590 - val_accuracy: 0.6512
Epoch 10/70
86/86 [==============================] - 17s 191ms/step -
loss: 0.4256 - accuracy: 0.8484 - val_loss: 0.5051 - val_accuracy: 0.8140
Epoch 11/70
86/86 [==============================] - 16s 184ms/step -
loss: 0.3412 - accuracy: 0.8746 - val_loss: 0.4942 - val_accuracy: 0.8372
Epoch 12/70
86/86 [==============================] - 17s 195ms/step -

3
loss: 0.2829 - accuracy: 0.8892 - val_loss: 0.4248 - val_accuracy: 0.8488
Epoch 13/70
86/86 [==============================] - 17s 195ms/step -
loss: 0.2577 - accuracy: 0.9155 - val_loss: 0.6338 - val_accuracy: 0.7791
Epoch 14/70
86/86 [==============================] - 17s 198ms/step -
loss: 0.3098 - accuracy: 0.8892 - val_loss: 0.4802 - val_accuracy: 0.8256
Epoch 15/70
86/86 [==============================] - 17s 201ms/step -
loss: 0.1505 - accuracy: 0.9563 - val_loss: 0.6157 - val_accuracy: 0.8372
Epoch 16/70
86/86 [==============================] - 16s 189ms/step -
loss: 0.1907 - accuracy: 0.9417 - val_loss: 0.5611 - val_accuracy: 0.8372
Epoch 17/70
86/86 [==============================] - 16s 189ms/step -
loss: 0.1115 - accuracy: 0.9679 - val_loss: 0.6157 - val_accuracy:
0.8256 Epoch 18/70
86/86 [==============================] - 16s 188ms/step -
loss: 0.1582 - accuracy: 0.9534 - val_loss: 0.4729 - val_accuracy: 0.8721
Epoch 19/70
86/86 [==============================] - 16s 190ms/step -
loss: 0.1523 - accuracy: 0.9329 - val_loss: 0.4059 - val_accuracy: 0.8837
Epoch 20/70
86/86 [==============================] - 16s 192ms/step -
loss: 0.0773 - accuracy: 0.9796 - val_loss: 0.3830 - val_accuracy: 0.8953
Epoch 21/70
86/86 [==============================] - 16s 191ms/step -
loss: 0.0349 - accuracy: 0.9971 - val_loss: 0.4478 - val_accuracy: 0.8605
Epoch 22/70
86/86 [==============================] - 18s 212ms/step -
loss: 0.0311 - accuracy: 0.9942 - val_loss: 0.5619 - val_accuracy: 0.8721
Epoch 23/70
86/86 [==============================] - 17s 193ms/step -
loss: 0.0609 - accuracy: 0.9796 - val_loss: 0.4596 - val_accuracy: 0.8721
Epoch 24/70
86/86 [==============================] - 16s 191ms/step -
loss: 0.1835 - accuracy: 0.9388 - val_loss: 0.5093 - val_accuracy: 0.8837

3
Epoch 25/70
86/86 [==============================] - 16s 189ms/step -
loss: 0.2025 - accuracy: 0.9534 - val_loss: 0.4255 - val_accuracy: 0.8721
Epoch 26/70
86/86 [==============================] - 16s 189ms/step -
loss: 0.1295 - accuracy: 0.9621 - val_loss: 0.4286 - val_accuracy: 0.8721
Epoch 27/70
86/86 [==============================] - 17s 199ms/step -
loss: 0.0509 - accuracy: 0.9883 - val_loss: 0.4392 - val_accuracy: 0.8721
Epoch 28/70
86/86 [==============================] - 16s 190ms/step -
loss: 0.0294 - accuracy: 0.9971 - val_loss: 0.3164 - val_accuracy: 0.8953
Epoch 29/70
86/86 [==============================] - 16s 192ms/step -
loss: 0.0225 - accuracy: 0.9971 - val_loss: 0.3819 - val_accuracy: 0.8953
Epoch 30/70
86/86 [==============================] - 16s 188ms/step -
loss: 0.0186 - accuracy: 0.9971 - val_loss: 0.4409 - val_accuracy: 0.8837
Epoch 31/70
86/86 [==============================] - 16s 184ms/step -
loss: 0.0191 - accuracy: 0.9971 - val_loss: 0.4313 - val_accuracy: 0.8837
Epoch 32/70
86/86 [==============================] - 16s 188ms/step -
loss: 0.0170 - accuracy: 0.9971 - val_loss: 0.3776 - val_accuracy: 0.8953
Epoch 33/70
86/86 [==============================] - 16s 186ms/step -
loss: 0.0135 - accuracy: 0.9971 - val_loss: 0.4151 - val_accuracy: 0.9070
Epoch 34/70
86/86 [==============================] - 16s 191ms/step -
loss: 0.0100 - accuracy: 0.9971 - val_loss: 0.5455 - val_accuracy: 0.8721
Epoch 35/70
86/86 [==============================] - 18s 211ms/step -
loss: 0.0040 - accuracy: 1.0000 - val_loss: 0.4936 - val_accuracy: 0.8837
Epoch 36/70
86/86 [==============================] - 18s 204ms/step -
loss: 0.0093 - accuracy: 0.9971 - val_loss: 0.4147 - val_accuracy: 0.8953
Epoch 37/70

3
86/86 [==============================] - 17s 202ms/step -
loss: 0.0066 - accuracy: 0.9971 - val_loss: 1.0140 - val_accuracy: 0.7907
Epoch 38/70
86/86 [==============================] - 16s 191ms/step -
loss: 0.0180 - accuracy: 0.9971 - val_loss: 0.4302 - val_accuracy: 0.9070
Epoch 39/70
86/86 [==============================] - 16s 188ms/step -
loss: 0.0072 - accuracy: 0.9971 - val_loss: 0.3273 - val_accuracy: 0.9186
Epoch 40/70
86/86 [==============================] - 16s 188ms/step -
loss: 0.0042 - accuracy: 1.0000 - val_loss: 0.3865 - val_accuracy: 0.9186
Epoch 41/70
86/86 [==============================] - 17s 194ms/step -
loss: 0.0027 - accuracy: 1.0000 - val_loss: 0.3989 - val_accuracy: 0.9186
Epoch 42/70
86/86 [==============================] - 17s 195ms/step -
loss: 0.0025 - accuracy: 1.0000 - val_loss: 0.3846 - val_accuracy: 0.9186
Epoch 43/70
86/86 [==============================] - 17s 198ms/step -
loss: 0.0021 - accuracy: 1.0000 - val_loss: 0.4022 - val_accuracy: 0.9186

Evaluation Of The Trained Model:

5/5 [==============================] - 17s 3s/step - loss:

0.4397 - accuracy: 0.8741

3
-Plot Model’s Loss & Accuracy Curves:

Fig 3.4.6.f&g: Loss and Accuracy Curve

3
CHAPTER
4 RESULTS

We conducted extensive experimentation to evaluate and verify the

efficiency of the HAR system for identifying actions in videos. We first
validated the proposed system having both temporal and spatial features
and compared it with a system that uses only sequential features. We
further evaluated and compared our system having a spatio-temporal
attention network with a spatial attention net where the attention network
is applied after the convolutional process. We achieved better results
with the suggested attention-based system than the other baseline
attention methods, which implement either spatial or spatial–temporal
information. It indicates the importance of temporal information in
sequential data that can enhance the recognition performance, such as
video-based action recognition.. The output of our system is superior to
the other deep learning techniques on these datasets. Our system
achieved a higher accuracy with the sports video, which is primarily due
to these videos containing many similar activities that are difficult to
recognize using a simple system. Our system learns deep spatial as well
as temporal information to support its judgment in correctly identifying
the actions within the sports videos.

The results and accuracy obtained from the HAR system are given
below:

Fig 4.a: Multiple Action Recognition

3
Further integrating MediaPipe curl counting with human action
recognition has significantly enriched the system's capabilities,
particularly in the realm of fitness tracking and exercise monitoring.
Moreover, the real-time nature of MediaPipe's pose estimation allows for
seamless integration with live video feeds, enabling users to receive
instant feedback during their workouts. Whether it's monitoring the
number of curls performed during a bicep curl exercise or tracking the
consistency of form throughout a set, the system provides timely
guidance to help users optimize their workouts and maximize results.
We obtain an accuracy of 93% with our model. The results obtained
from mediapipe are given below:

Figs 4.b:Single Prediction Results

3
CHAPTER

CONCLUSION

Spatiotemporal features play an essential role in recognizing various

actions in surveillance video data such as human action recognition. In
this article, we proposed a unique attention-based pipeline for human
action recognition, utilizing both the spatial and the temporal features
from a sequence of frames. For this purpose, we used a CNN network to
extract the high-level salient features from the video frames, and we then
used the skip connection approach to upgrade the learned features using
the UFLBs and a dilated CNN. Furthermore, these spatial features were
fed into the CLSTM network to learn the temporal information. An
attention layer is embedded to further determine the spatiotemporal
information in more detail, which enhances the performance at each step
of the LSTM.The center and the softmax loss functions are employed to
improve the classification performance of the human actions in the
videos. We conducted extensive experiments on three standard
benchmark datasets including the UCF50, the UCF Sports, and the J-
HMDB.

4
REFERENC

[1] N. Spolaôr, et al., A systematic review on content-based video

retrieval, Eng. Appl. Artif. Intell. 90 (2020) 103557.

[2] A. Keshavarzian, S. Sharifian, S. Seyedin, Modified deep residual

network architecture deployed on serverless framework of IoT
platform based on human activity recognition application, Future
Gener. Computer System 101 (2019) 14–28.

[3] A.D. Antar, M. Ahmed, M.A.R. Ahad, Challenges in sensor-based

human activity recognition and a comparative analysis of benchmark
datasets: A review, in: 2019 Joint 8th International Conference on
Informatics, Electronics & Vision (ICIEV) and 2019 3rd
International Conference on Imaging, Vision & Pattern Recognition,
IcIVPR, IEEE, 2019.

[4] K.A. da Costa, et al., Internet of things: A survey on machine

learning-based intrusion detection approaches, Comput. Netw. 151
(2019) 147–157.

[5] J.K. Aggarwal, M.S. Ryoo, Human activity analysis: A review,

ACM Comput. Surv. 43 (3) (2011)

[6] S. Kulkarni, S. Jadhav, D. Adhikari, A survey on human group

activity recognition by analyzing person action from video sequences
using machine learning techniques.Springer, 2020.

Summer Internship Report.
No ratings yet
Summer Internship Report.
27 pages
2017 Allah Bux PHD
No ratings yet
2017 Allah Bux PHD
173 pages
Sample Report - Abiram
No ratings yet
Sample Report - Abiram
86 pages
B.E Cse Batchno 10
No ratings yet
B.E Cse Batchno 10
81 pages
3D Convolutional Neural Networks For Human Action Recognition
No ratings yet
3D Convolutional Neural Networks For Human Action Recognition
11 pages
Human Activity Prediction Using Deep Learning JAIN
No ratings yet
Human Activity Prediction Using Deep Learning JAIN
93 pages
Human Activity Recognition With Sensor Approach
No ratings yet
Human Activity Recognition With Sensor Approach
179 pages
Rollback Based Active Semi-Super Vised Learning For Action Recognion - Pageindexforupload
No ratings yet
Rollback Based Active Semi-Super Vised Learning For Action Recognion - Pageindexforupload
101 pages
Seminar PPT On HAR Depth
No ratings yet
Seminar PPT On HAR Depth
37 pages
Project Files 9
No ratings yet
Project Files 9
32 pages
Human Activity - Merged
No ratings yet
Human Activity - Merged
34 pages
Continuous Human Action Recognition For Human Machine Interaction A Review
No ratings yet
Continuous Human Action Recognition For Human Machine Interaction A Review
31 pages
1806 11230 PDF
No ratings yet
1806 11230 PDF
20 pages
Dissertation
No ratings yet
Dissertation
62 pages
Deepanshu Training
No ratings yet
Deepanshu Training
18 pages
Real Time Violence Detection in Surveillance Video
No ratings yet
Real Time Violence Detection in Surveillance Video
24 pages
HAR Documentation
No ratings yet
HAR Documentation
15 pages
Batch 7
No ratings yet
Batch 7
21 pages
10.1007@s00371 020 01868 8
No ratings yet
10.1007@s00371 020 01868 8
15 pages
Action Recognition
No ratings yet
Action Recognition
14 pages
Attention Based Bidirectional Long Short Term Memory For Abnormal Human Activity Detection
No ratings yet
Attention Based Bidirectional Long Short Term Memory For Abnormal Human Activity Detection
12 pages
APCS Thesis-Proposal
No ratings yet
APCS Thesis-Proposal
18 pages
Informatics 09 00056
No ratings yet
Informatics 09 00056
13 pages
Minor Project (18Cs64) Topic: Mutual Action Recognition Using Deep Learning
No ratings yet
Minor Project (18Cs64) Topic: Mutual Action Recognition Using Deep Learning
11 pages
Action Recognition in Video Sequences Using Deep Bi-Directional LSTM With CNN Features
No ratings yet
Action Recognition in Video Sequences Using Deep Bi-Directional LSTM With CNN Features
12 pages
Wang - Human Action Recognition Algorithm Based On Multi-Feature Map Fusion - 2020
No ratings yet
Wang - Human Action Recognition Algorithm Based On Multi-Feature Map Fusion - 2020
9 pages
Human Activity
No ratings yet
Human Activity
25 pages
Presentation Iccdw 2020
No ratings yet
Presentation Iccdw 2020
11 pages
Secure and Efficient Facial Identification Based Attendance System For Institution
No ratings yet
Secure and Efficient Facial Identification Based Attendance System For Institution
11 pages
5 6280382869936280464
No ratings yet
5 6280382869936280464
14 pages
3D Convolutional Neural Networks For Human Action Recognition
No ratings yet
3D Convolutional Neural Networks For Human Action Recognition
11 pages
Privana
100% (1)
Privana
36 pages
Deep Neural Network Approachesfor Video Based Human Activity Recognition
No ratings yet
Deep Neural Network Approachesfor Video Based Human Activity Recognition
4 pages
General Activity Detection
No ratings yet
General Activity Detection
10 pages
My Thesis
No ratings yet
My Thesis
13 pages
1 s2.0 S0031320316000169 Main
No ratings yet
1 s2.0 S0031320316000169 Main
14 pages
Revisedhighlighted
No ratings yet
Revisedhighlighted
5 pages
Activity Recognition Based On Spatio-Temporal Features With Transfer Learning
No ratings yet
Activity Recognition Based On Spatio-Temporal Features With Transfer Learning
9 pages
Action Classification and Highlighting in Videos
No ratings yet
Action Classification and Highlighting in Videos
12 pages
Smart Video Monitoring: Advanced Deep Learning For Activity and Object Recognition
No ratings yet
Smart Video Monitoring: Advanced Deep Learning For Activity and Object Recognition
5 pages
Zou 2018
No ratings yet
Zou 2018
6 pages
Video Survivallence
No ratings yet
Video Survivallence
3 pages
Ufc Sports Data
No ratings yet
Ufc Sports Data
10 pages
Human Activity Detection Using Deep - 2-1
No ratings yet
Human Activity Detection Using Deep - 2-1
8 pages
International Journal of Intelligent Systems and Applications in Engineering
No ratings yet
International Journal of Intelligent Systems and Applications in Engineering
6 pages
Human Action Recognition Using CNN and LSTM-RNN With Attention Model
No ratings yet
Human Action Recognition Using CNN and LSTM-RNN With Attention Model
5 pages
Iarjset 2024 11739
No ratings yet
Iarjset 2024 11739
4 pages
Convolution Neural Network For Human Activity
No ratings yet
Convolution Neural Network For Human Activity
5 pages
Video Based Action Recognition RM
No ratings yet
Video Based Action Recognition RM
3 pages
Raushan Pandey Review Paper of Deep Learning
No ratings yet
Raushan Pandey Review Paper of Deep Learning
3 pages
57.Light-Weight Deep Learning Model For Human
No ratings yet
57.Light-Weight Deep Learning Model For Human
6 pages
Gradient Local Auto-Correlation Features For Depth Human Action Recognition - SpringerLink
No ratings yet
Gradient Local Auto-Correlation Features For Depth Human Action Recognition - SpringerLink
3 pages
Robust Feature-Based Automated Multi-View Human Action Recognition System
No ratings yet
Robust Feature-Based Automated Multi-View Human Action Recognition System
3 pages
SLFLSDFKSFLDKJ
No ratings yet
SLFLSDFKSFLDKJ
3 pages
1 s2.0 S2666307424000214 Main 3
No ratings yet
1 s2.0 S2666307424000214 Main 3
1 page
1 s2.0 S2666307424000214 Main 4
No ratings yet
1 s2.0 S2666307424000214 Main 4
1 page
Invertec 250-315a
No ratings yet
Invertec 250-315a
91 pages
Fluid Mechanics PDF
No ratings yet
Fluid Mechanics PDF
65 pages
Material Safety Data Sheet - Night Cream 1 New
50% (2)
Material Safety Data Sheet - Night Cream 1 New
4 pages
III Associate
No ratings yet
III Associate
2 pages
For Ucsp 2nd Topic For 2nd Semester 4th Quarter
No ratings yet
For Ucsp 2nd Topic For 2nd Semester 4th Quarter
31 pages
ICSE G6 - Phy Light Notes
No ratings yet
ICSE G6 - Phy Light Notes
5 pages
Core Parameter Files Part II
No ratings yet
Core Parameter Files Part II
14 pages
Psychology Notes
No ratings yet
Psychology Notes
15 pages
UO 1 Distillation 2 (Lec 9)
No ratings yet
UO 1 Distillation 2 (Lec 9)
39 pages
5.0 Field Effect Transistor (FET)
No ratings yet
5.0 Field Effect Transistor (FET)
50 pages
Advances in Biomechanical Engineering
No ratings yet
Advances in Biomechanical Engineering
23 pages
and Resin Content Relation
No ratings yet
and Resin Content Relation
9 pages
Private Sewage Disposal Sys.
50% (2)
Private Sewage Disposal Sys.
3 pages
Dissertation Broken Down
100% (2)
Dissertation Broken Down
8 pages
E61-433T30D Usermanual EN v1.2
No ratings yet
E61-433T30D Usermanual EN v1.2
19 pages
Grade 5 - Mapeh 2nd Quarter
No ratings yet
Grade 5 - Mapeh 2nd Quarter
54 pages
The Pros and Cons of Social Media
No ratings yet
The Pros and Cons of Social Media
4 pages
Innovative Lesson Template
No ratings yet
Innovative Lesson Template
7 pages
Class 6 Ch-5 Changes Around Us
No ratings yet
Class 6 Ch-5 Changes Around Us
6 pages
Communication Process Model: Lesson 3
No ratings yet
Communication Process Model: Lesson 3
17 pages
Melc Ia Smaw G7-8
100% (3)
Melc Ia Smaw G7-8
2 pages
Theory: B - B B B
No ratings yet
Theory: B - B B B
10 pages
Math in Our World - Module 3.3
No ratings yet
Math in Our World - Module 3.3
13 pages
Chapter 2 PDF
No ratings yet
Chapter 2 PDF
13 pages
Decoders and Encoders
No ratings yet
Decoders and Encoders
7 pages
Fable 4 19 18
No ratings yet
Fable 4 19 18
5 pages
Robust High Voltage Cable Joint Design
No ratings yet
Robust High Voltage Cable Joint Design
5 pages
Welcome! - Introduction - AE1110x Courseware - Edx
No ratings yet
Welcome! - Introduction - AE1110x Courseware - Edx
4 pages
Five Axis Articulated Robot (Scorbot)
No ratings yet
Five Axis Articulated Robot (Scorbot)
2 pages
A Practitioner's Approach for Problem-Solving using AI
From Everand
A Practitioner's Approach for Problem-Solving using AI
Satvik Vats
No ratings yet
Handbook of Artificial Intelligence
From Everand
Handbook of Artificial Intelligence
Dumpala Shanthi
No ratings yet
Python迁移学习: Chinese Edition
From Everand
Python迁移学习: Chinese Edition
Posts & Telecom Press
No ratings yet
Fundamentals of Machine Learning: An Introduction to Neural Networks
From Everand
Fundamentals of Machine Learning: An Introduction to Neural Networks
Peter Johnson
No ratings yet
DATA MINING and MACHINE LEARNING. CLASSIFICATION PREDICTIVE TECHNIQUES: NAIVE BAYES, NEAREST NEIGHBORS and NEURAL NETWORKS: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. CLASSIFICATION PREDICTIVE TECHNIQUES: NAIVE BAYES, NEAREST NEIGHBORS and NEURAL NETWORKS: Examples with MATLAB
César Pérez López
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.