0% found this document useful (0 votes)
39 views93 pages

Human Activity Prediction Using Deep Learning JAIN

Uploaded by

sriramvedantam
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
39 views93 pages

Human Activity Prediction Using Deep Learning JAIN

Uploaded by

sriramvedantam
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 93

Department of Computer Science and

Engineering
Global Campus, Jakkasandra Post, Kanakapura Taluk, Ramanagara District, Pin Code: 562 112

2022-2023

A Project Report on

“HUMAN ACTIVITY PREDICTION USING


DEEP LEARNING”
Submitted in partial fulfilment for the award of the degree of

B ACHELOR OF T ECHNOLOGY
IN
COMPUTER SCIENCE AND ENGINEERING

Submitted by
Guruprasad
19BTRCS024

Manesh Suhas
19BTRCS040

Hrithik
19BTRCS028

Under the guidance

of Dr. Vanitha K
Professor/Associate/Assistant Professor
Department of Computer Science &
Engineering

Faculty of Engineering & Technology


JAIN (DEEMED TO BE) UNIVERSITY
Department of Computer Science and
Engineering
Global Campus, Jakkasandra Post, Kanakapura Taluk, Ramanagara District, Pin Code: 562 112

CERTIFICATE

This is to certify that the project work titled “Human Activity Prediction Using Deep
Learning” is carried out by Guruprasad PD (19BTRCS024), Manesh Suhas S M
(19BTRCS040) , Hrithik Krishna (19BTRCS028), a bonafide students of Bachelor of
Technology at the Faculty of Engineering & Technology, Jain (Deemed-to-be) University,
Bangalore in partial fulfillment for the award of degree in Bachelor of Technology in Computer
Science & Engineering, during the year 2022-2023.

Dr.Vanitha K Dr. Mahesh T R Dr. Geetha G


Assistant/Associate/Professor Program Head, Director,
Dept. of CSE, Dept. of CSE, School of Computer
Faculty of Engineering & Faculty of Engineering & Science & Engineering,
Technology, Technology, Jain (Deemed-to-be)
Jain (Deemed-to-be) University Jain (Deemed-to-be) University University
Date: Date: Date:

Name of the Examiner Signature of Examiner

1.

2.
DECLARATION

We, Guruprasad P D (19BTRCS024), Manesh Suhas S M (19BTRCS040), Hrithik


Krishna (19BTRCS028), are students of eighth semester B.Tech in Computer Science &
Engineering, at Faculty of Engineering & Technology, Jain (Deemed-to-be) University,
hereby declare that the project titled “Human Activity Prediction Using Deep
Learning” has been carried out by us and submitted in partial fulfilment for the award of
degree in Bachelor of Technology in Computer Science & Engineering during the
academic year 2022-2023. Further, the matter presented in the project has not been
submitted previously by anybody for the award of any degree or any diploma to any other
University, to the best of our knowledge and faith.

Signature
Name1 Guruprasad
PD
USN :19BTRCS024
Name2: Manesh S M
USN2 :19BTRCS040
Name3:Hrithik K
USN:19BTRCS028

Place : Bangalore
ACKNOWLEDGEMENT

It is a great pleasure for us to acknowledge the assistance and support of a large


number of individuals who have been responsible for the successful completion of this project
work.
First, we take this opportunity to express our sincere gratitude to Faculty of
Engineering and Technology, Jain Deemed to be University for providing us with a great
opportunity to pursue our Bachelor’s Degree in this institution.
In particular, we would like to thank Dr. Geetha G, Director, School of Computer
Science and Engineering and Dr. Mahesh T R, Program Head, Department of Computer
Science and Engineering, Jain (Deemed-to-be University) for their constant encouragement
and expert advice.

We would like to thank our guide Dr. Vanitha K , Professor/Associate professor/ /


Assistant Professor, Department of Computer Science and Engineering, Jain (Deemed-to-
be University), for sparing his/her valuable time to extend help in every step of our project
work, which paved the way for smooth progress and fruitful culmination of the project.

We would like to thank our Project Coordinators Dr. Chandrasekhar V and Dr.
Rajat and all the staff members of Computer Science and Engineering for their support.
We are also grateful to our family and friends who provided us with every
requirement throughout the course.
We would like to thank one and all who directly or indirectly helped us in
completing the Project work successfully.

Signature of Students
ABSTRACT

The problem of action prediction has recently gained traction due to its many possible uses in real
life applications. Predicting future actions is an important application of various computer vision
models such as monitoring or autonomous systems requiring prompt action based on little input
data. For this particular project we propose a method to predict actions from video sequences
using the LSTM architecture that leverages features extracted from the video data to predict
actions according to a predefined set of classes. This system allows one to input a short video
sequence and obtain the possible outcome of the same. This system can be used in several
applications that will be discussed

Department of Electronics and Communication, SET-JU.5


TABLE OF CONTENTS

Page No

Abstract

List of Figures i
List of Tables ii
Nomenclature used iii
Chapter 1
1.1 Introduction 11
1.2 Literature Survey 12
1.3 Existing System and Disadvantages 16
1.4 Proposed System and Advantages 17
1.5 Objectives and Limitations of the Current Work 20
1.6 Limitations of current work 21
Chapter 2
2.1 System Architecture 22
2.2 Methodology 24
2.3 Hardware and Software requirements 25
Chapter 3
3.1 HARDWARE AND SOFTWARE TOOL DESCRIPTION 26
Chapter 4
4.1 Hardware Design and Implementation 28
4.1.1 Use Case diagram 28
4.1.2 Data flow diagram 29
4.1.3 Sequence diagram 30

Department of Electronics and Communication, SET-


Chapter 5
5.1 TESTING
5.1.1 Testing the performance of the Haar cascade 35
5.1.2 Testing the performance of whole project 32

5.2 Results and Screenshots 56


5.2.1 Output Screenshots 58

CONCLUSIONS AND FUTURE SCOPE 60

References v
Appendices
Appendix- I vi
Appendix- II vii
Details of Paper Publication xiii
Information regarding students xiv

Department of Electronics and Communication, SET-


LIST OF FIGURES

Fig. No. Description of the Page No.


figure
1.4 Flow Chart of the proposed system. 19
2.1 System Architecture of the proposed system. 22
4.1.1 Use Case Diagram 28
4.1.2 Data Flow Diagram 29
4.1.3 Sequence Diagram 30
4.2.2 Execution of LBPH 38

Department of Electronics and Communication, SET-


LIST OF TABLES

Table No. Description of the Table Page No.

ii

Department of Electronics and Communication, SET-


NOMENCLATURE USED

LSTM Long Short term memory


CNN Convolutional neural network

iii

Department of Electronics and Communication, SET-


Human Activity Prediction Using Deep

Chapter 1
1.1. Introduction
Human Activity Prediction is the problem of predicting the action of a person only by
observing the first few frames of the video sample. The main goal of human activity
prediction is to enable early recognition of activities instead of predicting the activities
after completion. This early prediction can help in detecting any unauthorized or
suspicious activities for example in prisons and measures can be taken to avoid it
beforehand. Human activity prediction can be applied in many real life situations such as
in autonomous cars to decide which action to take based on the situation and to prevent
accidents, prisons, hospitals, to predict the behavior of a driver to detect any drowsiness
and various other monitored areas. Although great progress has occurred in the field of
action recognition, activity prediction or anticipation is still being researched and has
become a popular topic of research recently. The main difference between predicting and
recognizing action is that in the former the prediction of class must be done early with
just a few frames or a small part of the video sequence. In this project we propose the use
of a deep learning system that predicts an action upon seeing just a small clip of the video
so it may be able to predict actions that will occur in the future with real time footage. We
have used a Long Short Term Memory (LSTM) model to carry out the required function

Department of Computer Science & Engineering, FET, Jain (Deemed-to-be)


University 11
Human Activity Prediction Using Deep

1.2. Literature Survey


Almeida, Aitor, and Gorka Azkune et al [I] work on the detection of mild cognitive impairment using a

multi level model that describes the behaviour of users on the basis of actions and activities. They use the

long short memory architecture (LSTM) for prediction of action of the masses for smarter and more

intelligent cities. the model created by the authors was tested on an extensive activity recognition dataset

and was 85.89% accurate in the prediction of behaviors of people

Singh, Pulkit, et al. [2] In the paper "End-to-end deep prototype and exemplar models for predicting

human behavior." aim to extend the classic prototype of category learning and models to learn several

representations ofthe given data from raw input. For the prediction model they use The CIFAR-10 ~taste

on which 2 CNN architectures are applied, Reset and All-CNN and then evaluate the resulting model.

Both the models proposed in the paper perform better than the baselines of neural networks, but with an

improvement, mainly for the DGMM and DEM.

Battle day, Ruairidh M., Joshua C. Peterson, and Thomas L. Griffiths et al. [3] in "Capturing human

categorization of natural images by combining deep networks and cognitive models." aim to categorize

human actions over a very big dataset, containing over 500,000 data points over 10,000 Natural images

from ten different classes.

These data representations are crucial to capture the categorization and hence allow simpler models tt at

represent certain categories to perform better than more complex memory based models that u;rnally

dominate other studiesnot based on natural stimuli.

They show the use of Mahalanobis distance to find vector distance subject to certain constraints ad

unifying different models In the results around 34% of the image data that was used had a perfect

consensus for a single category

Lin, Kaixiang, et al.[4] In "Efficient large-scale fleet management via multi-agent deep reinforcement learn

mg. conside~ the ~ro~lem of managing a large number of available vehicles for ride-sharing platforms.

The mam objective of the project is to maximize the total volume of merchandise of the given platform by

repositioning thevehicles available to locations with more demand and supply gaps than present one that

they are at.

Department of Computer Science & Engineering, FET, Jain (Deemed-to-be)


University 12
Human Activity Prediction Using Deep

The approach proposed in the paper gives a novel Deep Reinforcement Learning method to learn a very

efficient policy for fleet management allocation of all vehicles to maximize utilization based on demand

and supply.

As for results~ the more! cA2C achieves the highest performance with lesser number repositions, around

65 .3 71/o when it 1s compared with the model cA2C-v I. Additionally, CCE achieves many useful

advantages when the cost of reposition is taken into account.

Cai, Hooey, et al. [ 5] in "Deep video generation, prediction and completion of human action sequences."

proposea general, deep framework in 2 stages for generating, predicting and completing videos of humans

by using proprietary models for completing a given video with high quality. They consider video

generation, prediction and completion as a single problem instead of addressing it as 3 separate problems

as is done before.

The 2 stages used in the model are firstly starting with a GAN that performs certain actions of a category

followedby a supervised reconstruction network.

The model in 2 steps that generates motion sequences using the skeletal structure of humans from random

noise is proposed. In the first step, improved WGAN is applied. In the second step, normal GAN.

The final model can generate believable human motion videos with very high quality. It has the highest

inceptionscore amongst all competing models on being evaluated on the Human3.6m dataset.

Carrera, Joao, and Andrew Zisserman. et al. [6] in "Quo vadis, action recognition? a new model and tile

kinetics dataset." Aim to see if a network to classify action is trained on a larger dataset, it gives a better

performance than when it is used for a different temporal dataset that as fever data points. They introduce a

new 2 Stream Inflated 3D Convent based on 2D Convent mflat10n. Alter the 2 streams are trained, they

give similar performance but averaging both predictions changes j e results from 74.6% up to 80.2%.

Sadegh Aliakbarian, Mohammad, et al. [7] in "Encouraging lstms to anticipate actions very early." The

authors propose a method for anticipating or predicting an action, that gives an overall high ai curacy of

prediction even if only a small part of video or sequence is present to achieve high ai curacy. The proposed

model has an LSTMarc~itecture w!th multiple stages.' _ The authors create an LSTM with several stages

m the architecture that takesmto account different I f: . f, atures and encourages the model to predict the

class
Department of Computer Science & Engineering, FET, Jain (Deemed-to-be)
University 13
Human Activity Prediction Using Deep
as ast ~s 1t can. "f he model performs better than the state-of-the-art" method m early pred1ct1on by a

Department of Computer Science & Engineering, FET, Jain (Deemed-to-be)


University 14
Human Activity Prediction Using Deep

higher accuracy o "22.0% on JHMDB-21 " and that of 49.9% on UCF-101 and 14.0% on UT-Interaction.

Muhammad Sajjad et al [8] explore the concept of human etiquette analysis through facial recognition. The

inputdata consists of video clips from famous English TV series. Understanding human behaviour can

prove to be veryhelpful in many areas like entertainment, healthcare and others. The main steps in the

proposed method are: detection and tracking of facial features, face registration and facial expression

recognition. After using an algorithm to identify the faces in the video data, Simple vector machine model

(SVM) is used for recognizing them. Then the facial expression is detected using the CNN model proposed

in this paper.

The KDEF dataset is used here which consists of around 4000 different facial expressions. The proposed

CNN model achieved an accuracy of 82% and after achieving data augmentation, an accuracy of 94%.

A subjective evaluation of the model was carried out to examine the performance of the method.

Neziha Jaouedi et al [9] explore the different applications of human activity recognition including in video

surveillance, prisons and human computer interaction. Due to the increase in the use of neural networks of

deep learning, the author proposes a technique using gated recurrent neural networks. The reason to use

gated RNN is because of its high computational powers. Here, it is used for sequential data and video

frameclassification.

The features of the dataset play an important rule here, so the best features need to be selected using

feature extraction. This technique is useful when there's a huge dataset and to reduce noise points without

losing important data. Feature selection impacts the performance of the deep learning model, sufficient

time must be spent on selecting the best features from the dataset.

GMM method has been explicitly used in the tracking of objects in each frame of a video sequence. The

Kalman filter is used to predict the location of an object, in other words both of these methods are used to

track the movement of objects. Then Gated RNN is used for classifying the action. This is evaluated over a

few popular datasets - UCF 101 , UCF sports dataset and KTH human activity dataset, each of which

consists of a variety of activities. The proposed technique in this paper can be used in different

applications.
Department of Computer Science & Engineering, FET, Jain (Deemed-to-be)
University 15
Human Activity Prediction Using Deep

The author has implemented the human activity recognition approach through 4 steps

1. GMM and KF methods were first used to track the motion in the input video.

2. Next the K-nearest neighbors algorithm was used along with GMM and KF techniques for human

activity classification. The achieved accuracy was around 72%.

3. Finally, implementation of gated RNN was used for video data recognition. It achieved the highest

accuracy with the KTH dataset - 96%.

4 .A test video was used to check if human activity recognition is working properly.

Department of Computer Science & Engineering, FET, Jain (Deemed-to-be)


University 16
Human Activity Prediction Using Deep

1.3. Existing System and Disadvantages

In the Existing System, a Deep Neural Network architecture is used based on recurrent neural
network architecture. In the problem of modeling behavior, prediction of a class label of an activity
for any around done s based on the actions registered earlier with the use of a sensor. The recurrent
nature of LSTM allows the problem to be modelled considering certain sequential dependencies.
is input, the system takes in raw data from the sensors and compares and maps it to previously
defined actions using certain conditions and equivalencies.
The actions identified are then passed on to the embedding layer. This layer gets the IDs of the
actions J d changes and transforms them into embeddings which also have some semantic
meaning. This layer is also made trainable, i.e. being able to learn incrementally during training.
Weights of the layer are initialized using values that are obtained using the Word2Vec algorithm.
The embeddings of the actions thus obtained in the first module are followed by processing by the
sequence modelling part of , e algorithm. Finally, after the LSTM layer, the final module for
prediction, uses the sequence
models made by the LSTMs to finally predict the action that is observed.

Department of Computer Science & Engineering, FET, Jain (Deemed-to-be)


University 17
Human Activity Prediction Using Deep

1.4. Proposed System and Advantages.


In the proposed system we take in a video input and pass it through a multi-stage LSTM
architecture.
The first stage consists of a Convolutional Neural Network Model that first extracts features
related to the action. The features are both context and action based to make a prediction based on
both for
highest accuracy.
The features are extracted as the layers of the deep neural network correspond to Convolutional
Neural Networks that are capable of extracting base level features that make up the input data.
Following the CNN layers the video is passed through the LSTM architecture to learn the
features.
The model, having been trained on entire sequences of data and learning the base features of
different classes of actions will be able to predict the future action in a video based on the first
few frames or the entire video or real time data.

Advantage: -

a. Accurate Predictions: Deep learning models have demonstrated impressive capabilities in


learning complex patterns and representations from large-scale datasets. They can capture
intricate relationships between input features and human activities, leading to accurate
predictions.

b. Non-intrusive Data Collection: Deep learning models can make predictions based on various
data sources, such as wearable devices, video recordings, or motion sensors. This allows for non-
intrusive data collection, reducing the need for invasive or discomforting measurement methods.

c. Real-time Predictions: Once trained, deep learning models can make predictions in real-time,
enabling immediate responses or interventions based on the predicted activities. This is
particularly beneficial in applications such as healthcare monitoring, where prompt actions can be
critical.

d. Adaptability and Generalization: Deep learning models can adapt to new activities or scenarios
without requiring significant changes to the underlying architecture. They have the potential to
generalize well across different individuals, environments, and variations in activity patterns.

Department of Computer Science & Engineering, FET, Jain (Deemed-to-be)


University 18
Human Activity Prediction Using Deep

e. Scalability: Deep learning models can handle large-scale datasets, making them suitable for
analyzing extensive collections of human activity data. This scalability enables the development
of robust models that can handle diverse activity prediction tasks.

f. Automation and Efficiency: Human activity prediction using deep learning can automate the
process of activity recognition, reducing the manual effort required for analyzing and labeling
large amounts of data. This automation can lead to increased efficiency and productivity in
various domains.

Department of Computer Science & Engineering, FET, Jain (Deemed-to-be)


University 19
Human activity prediction using deep

Fig1.4: Flow chart of the proposed system.

Department of Computer Science & Engineering, FET, Jain (Deemed-to-be)


University 20
Human activity prediction using deep
1.5. Objectives
• To Examine activities from the first few frames of video sequences correctly
classifyinput data into its activity category before seeing the entire video.

• To Predict activities from a set of many observations on subjects regarding theiractions


and the conditions of the environment.

• Reduce time of predicting action compared to existing models.

• Use the result of predictions for various applications.

• To effectively use deep learning models to predict actions in highly sensitive andwell
monitored areas.

• Create an algorithm for action anticipation and prediction.

Department of Computer Science & Engineering, FET, Jain (Deemed-to-be)


University 21
Human activity prediction using deep
1.6. Limitations of Current work
Data Availability and Quality: Deep learning models require large amounts of labeled training data to
generalize well. Acquiring such datasets, especially for specific and diverse human activities, can be
challenging and time-consuming. Additionally, the quality of the data, including noise, missing values, or
biased annotations, can affect the performance and generalization ability of the models.

Interpretability and Explainability: Deep learning models are often considered as black boxes due to
their complex architectures and the inability to provide detailed explanations for their predictions. This
lack of interpretability can limit the understanding of why a particular prediction was made, which is
crucial in applications where human interpretability and trust are required, such as healthcare or legal
domains.

Overfitting and Generalization: Deep learning models can be prone to overfitting, where they
memorize the training data but fail to generalize well to unseen data. This is particularly challenging
when dealing with human activities, as there can be significant variations and individual differences in
how activities are performed. Ensuring the generalization ability of the models beyond the training data
is a persistent challenge.

Variability and Complexity: Human activities can exhibit high variability and complexity, making it
difficult to capture all the possible variations in a single model. Different individuals may perform the
same activity differently, and environmental factors can also influence activity patterns. Designing deep
learning models that can effectively handle such variability and complexity is a non-trivial task

Department of Computer Science & Engineering, FET, Jain (Deemed-to-be)


University 22
Human activity prediction using deep
Chapter 2
2.1 System Architecture

Fig 2.1: System architecture of the proposed model.

In the above system architecture, each frame of input video is passed through the
convolutional layers of the model and then they are passed through the LSTM layers and
the output layer where the video frame will be processed to predict the final activity.

MODULE DESIGN (DFD WITH LEVELS)

0 - level DFD

DFD Level 0 is also called a Context Diagram. It's a basic overview of the whole system
or process being analyzed or modeled. It shows the system as a single high-level process,
with its relationship to external entities.

Department of Computer Science & Engineering, FET, Jain (Deemed-to-be)


University 23
Human activity prediction using deep

1- LEVEL DFD

In 1-level DFD, the context diagram is decomposed into multiple bubbles/processes.


In this level, we highlight the main functions of the system and breakdown the high-
level process of 0-level DFD into subprocesses.

2-LEVEL DFD

2-level DFD goes one step deeper into parts of 1-level DFD

Department of Computer Science & Engineering, FET, Jain (Deemed-to-be)


University 24
Human activity prediction using deep

2.2 Methodology

 the proposed system we take in a video input and pass it through a multi-stage LSTM architecture.

The first stage consists of a Convolutional Neural Network Model that first extracts features

related to the action.

 The features are both context and action based to make a prediction based on both for highest

accuracy. The features are extracted as the layers of the deep neural network correspond to

Convolutional Neural Networks that are capable of extracting base level features that make up

the input data.

 Following the CNN layers the video is passed through the LSTM architecture to learn the features.

The model, having been trained on entire sequences of data and learning the base features of

different classes of actions will be able to predict the future action in a video based on the first

few frames or the entire video or real time data.

Department of Computer Science & Engineering, FET, Jain (Deemed-to-be)


University 25
Human activity prediction using deep

2.3 Hardware and Software requirements


The following are the basic hardware and software requirements for development
and deployment of the project.

Hardware Requirements:

As the application is an internet based one, all the hardware which is needed to connect to
the internet will act as a hardware interface for the system. For example Modem, WAN -
LAN, Ethernet Cross-Cable.

A browser which supports CGI, HTML &

Javascript. Quad core Intel Core i7 Skylake or

higher.

J6GB of RAM (8GB is okay but higher performance rate cannot be achieved).

Software Requirements:

Since this is a software hence it will have to run on some hardware and operating system
obviously, so below are the requirements to run this software :

1. A Central Processing Unit (CPU) - an Intel Core i5 6th Generation processor


or higher or an AMD equivalent processor could also be optimal.
2. Operating Systems used - Ubuntu or Microsoft Windows I0.
3. GPU software :
i .CUDA Deep
NeuralNetwork(cuDNN). ii.NVIDIA
CUDA JO.I
iii.NVIDIA GPU driver 4

4. Software such as Neural Designer,Keras, DeepLeamingKit, H2O.ai, Microsoft


Cognitive Toolkit, MXNet, et cetera are some Deep Leaming Software that could
be used.

Department of Computer Science & Engineering, FET, Jain (Deemed-to-be)


University 26
Human activity prediction using deep
Chapter 3
3.1. Hardware and Software Tool Description
Central Processing Unit (CPU):
A powerful CPU is essential for data preprocessing, model training, and evaluation. CPUs with
multiple cores and high clock speeds can significantly speed up these computationally intensive
tasks. Popular CPUs for deep learning include those from Intel (e.g., Core i7 or Xeon
processors) and AMD (e.g., Ryzen processors).

Graphics Processing Unit (GPU):


GPUs are crucial for accelerating the training and inference processes of LSTM models. Their
parallel computing capabilities enable efficient matrix operations, which are fundamental to
deep learning. GPUs from NVIDIA, such as the GeForce or Tesla series, are commonly used
due to their extensive support for deep learning frameworks.

Tensor Processing Unit (TPU):


TPUs are specialized hardware accelerators developed by Google specifically for deep learning
workloads. They provide high-speed, low-latency computation for neural network operations.
TPUs, such as those available on Google Cloud, are particularly advantageous when working
with large-scale datasets or training complex LSTM models.

Memory:
Adequate RAM (Random Access Memory) is crucial for storing and manipulating large volumes
of data during preprocessing and model training. The required memory capacity depends on the
size of the dataset and the complexity of the LSTM models. Higher RAM capacity allows for
faster processing and larger batch sizes during training.

Storage:
Deep learning projects often involve working with large datasets, so having ample storage capacity
is important. High-speed storage options, such as solid-state drives (SSDs), are beneficial for fast
data access and training speed. Additionally, network-attached storage (NAS) or cloud storage
solutions can be utilized for efficient data storage and retrieval.

Software Tools:

Deep Learning Frameworks:


Deep learning frameworks provide the necessary tools and libraries for building, training, and
deploying LSTM models. Popular frameworks such as TensorFlow, PyTorch, Keras, and Caffe
offer comprehensive support for LSTM networks. These frameworks provide pre-built LSTM
layers, optimizers, loss functions, and utilities that simplify the implementation of LSTM models.

Python:
Python is a widely used programming language for deep learning due to its simplicity, versatility,
and extensive libraries for scientific computing. Python, along with its associated packages like
NumPy, Pandas, and Scikit-learn, provides a rich ecosystem for data preprocessing, feature
extraction, and model evaluation. It also offers seamless integration with deep learning
frameworks and enables efficient prototyping and experimentation.

Department of Computer Science & Engineering, FET, Jain (Deemed-to-be)


University 27
Human activity prediction using deep

CUDA:
CUDA (Compute Unified Device Architecture) is a parallel computing platform developed by
NVIDIA. It allows developers to harness the computational power of NVIDIA GPUs for deep
learning tasks. Deep learning frameworks often provide CUDA integration, enabling efficient
GPU acceleration during model training. CUDA significantly speeds up LSTM computations and
improves training time.

Jupyter Notebooks:
Jupyter Notebooks provide an interactive development environment for data exploration, model
prototyping, and experimentation. They allow for code execution, visualizations, and
documentation in a single interface. Jupyter Notebooks facilitate an iterative development process,
making it convenient to experiment with LSTM models, visualize results, and share code and
insights with others.

Data Visualization Libraries:


Visualizing data and model results is crucial for understanding and interpreting LSTM predictions.
Libraries such as Matplotlib, Seaborn, and Plotly can be used

Department of Computer Science & Engineering, FET, Jain (Deemed-to-be)


University 28
Human activity prediction using deep

Chapter 4
4.1 Hardware Design and Implementation
4.1.1 Use Case diagram

Fig 4.1.1: Use Case diagram.

Department of Computer Science & Engineering, FET, Jain (Deemed-to-be)


University 29
Human activity prediction using deep

4.1.1 Data flow diagram

Fig 4.1.2: Dataflow diagram.

An activity diagram describes the behaviour of the system aka. the control flow from start to finish.
Here the control begins with raw data being input to the system, which is data gotten from the dataset
considered for our purpose of human activity prediction. This data showing human actions then is pre-
processed to make it in standard form to pass it to our system, followed by which it is split into train
and test data which are used for training the model on and testing its output respectively.
The training data is input to the LSTM Model to train it to learn the features of the data using given
training data. After it has been trained, the LSTM model is then evaluated and corrected using the testing
data. Once the model has been finalized it is deployed and can then be used by users tomake predictions
on data input into it in video form

Department of Computer Science & Engineering, FET, Jain (Deemed-to-be)


University 30
Human activity prediction using deep

4.1.1 Sequence diagram

Fig 4.1.3: Sequence diagram.

Department of Computer Science & Engineering, FET, Jain (Deemed-to-be)


University 31
Human activity prediction using deep

For the class diagram we have taken 4main classes, the database which is for input into the system that in
our case is a video input with several image frames as part of the video sequence that is considered along
with the number of videos passed to the model as training and testing data.
The system represents our deep learning system created for this project that takes in the data passed from
the database and splits it into training and testing data that is passed to the LSTM model that has
convolutional neural network layers followed by layers of neural networks that finally make a prediction
that is given to the system. The system then shows this prediction as output of the class that a data point
is apart of.
This is the system built for prediction of human activity class that aparticular action is predicted to be

Department of Computer Science & Engineering, FET, Jain (Deemed-to-be)


University 32
Human activity prediction using deep

Chapter 5
1.1 Testing

◦ Testing techniques are the best practices used by the testing team to assess the developed
software in regards to given requirements.That is why, we have employed testing in our software
as well.
◦ We have used unit testing methods to test the validity of our developed software.
◦ Unit Testing is one of the many stages of software testing and looks at single units, otherwise
known as components, individually. This validates that each component of the software
being tested works as it is designed to.
◦ Unit testing is done during the coding phase while the software or other product is
being developed to make sure it is clear of bugs and ready before its release.
◦ So, by using the unit testing in our project we could see the working and failing units and hence,
we're able to rectify the problem in the source code until the unit testing gave positive results,
thus rendering our code bug free and/or error free.

Below given is the function for testing the compilation of the model

Department of Computer Science & Engineering, FET, Jain (Deemed-to-be)


University 33
Human activity prediction using deep

Function for testing feature scaling:

Function for testing model creation

Department of Computer Science & Engineering, FET, Jain (Deemed-to-be)


University 34
Human activity prediction using deep

Department of Computer Science & Engineering, FET, Jain (Deemed-to-be)


University 35
Human activity prediction using deep

1.1.1 Testing the performance of whole project


Visualizing the video data:
For visualization, we have randomly selected 20 classes from the dataset and we shall visualize the
first frame of a video selected randomly from each class of view what format the data is present in
and visualize what the possible actions are in the videos are.

Department of Computer Science & Engineering, FET, Jain (Deemed-to-be)


University 36
Human activity prediction using deep

Department of Computer Science & Engineering, FET, Jain (Deemed-to-be)


University 37
Human activity prediction using deep

Department of Computer Science & Engineering, FET, Jain (Deemed-to-be)


University 38
Human activity prediction using deep

Department of Computer Science & Engineering, FET, Jain (Deemed-to-be)


University 39
Human activity prediction using deep

Department of Computer Science & Engineering, FET, Jain (Deemed-to-be)


University 40
Human activity prediction using deep

Department of Computer Science & Engineering, FET, Jain (Deemed-to-be)


University 41
Human activity prediction using deep

Department of Computer Science & Engineering, FET, Jain (Deemed-to-be)


University 42
Human activity prediction using deep
Human activity prediction using deep
Human activity prediction using deep
Human activity prediction using deep
Human activity prediction using deep
Human activity prediction using deep
Human activity prediction using deep
Human activity prediction using deep
Human activity prediction using deep

Department of Computer Science & Engineering, F ET, Jain (Deemed-to-be)


University 5E0T, Jain (Deemed-to-be) University49
Human activity prediction using deep

Department of Computer Science & Engineering, FET, Jain (Deemed-to-be)


University 51
Human activity prediction using deep

Department of Computer Science & Engineering, FET, Jain (Deemed-to-be)


University 52
Human activity prediction using deep

Department of Computer Science & Engineering, FET, Jain (Deemed-to-be)


University 53
Human activity prediction using deep

Department of Computer Science & Engineering, FET, Jain (Deemed-to-be)


University 54
Human activity prediction using deep

Department of Computer Science & Engineering, FET, Jain (Deemed-to-be)


University 55
Human activity prediction using deep

5.2 Results and Discussion


Multifaceted surveillance cameras, also known as multi sensor cameras, are cameras that
use multiple lenses or sensors to capture and monitor a wider area than traditional
cameras. These cameras typically have a 180-degree or 360- degree field of view and can
monitor a larger area with fewer cameras than traditional surveillance systems.

Multifaceted surveillance cameras are becoming increasingly popular in security


applications because they offer several advantages over traditional cameras. For example,
they can provide more comprehensive coverage with fewer blind spots, which can help
improve overall security. They are also more cost-effective, as they require fewer
cameras and less installation and maintenance.

However, multifaceted surveillance cameras also raise concerns around privacy and data
protection. The cameras capture a lot of data, and it can be challenging to control who
has access to that data and how it is used. As a result, it is important to carefully consider
the use of multifaceted surveillance cameras and implement appropriate safeguards to
protect individuals' privacy.

Overall, multifaceted surveillance cameras can be a useful tool in security applications,


but it is important to weigh the benefits against the potential privacy concerns and
implement appropriate safeguards.

The Project Multifaceted Surveillance Camera system is designed using wireless


technology. This project is basically design for providing a security in different areas
like military, banks, and industries etc and also save the power, memory required for the
Surveillance and Surveillance Cam footage respectively. The library plays an important
role in delivering knowledge and information to users whenever needed. The library
resources are thus needed to be secured to conserve and preserve the integrity,
availability and confidentiality of information. Although traditional methods have
brought some protection to the resources, with the boom in the technology, it has
become hard for the professional to maintain the integrity of any library. Nevertheless,
deployment of ESS such as surveillance cameras, biometrics, RFID systems and
different

Department of Computer Science & Engineering, FET, Jain (Deemed-to-be)


University 56
Human activity prediction using deep

types of alarms has managed to bring down malpractices against library resources as well
as safeguarded the library staff, patrons and the library collections at stake. It is to be
considered that these security issues will not disappear overnight or forever, but the
implementation of ESS has reduced the impacts and possible security threats or
vulnerabilities to the acceptable limits.

Department of Computer Science & Engineering, FET, Jain (Deemed-to-be)


University 57
Human activity prediction using deep

5.2.1 Output Screenshots

Fig 5.2.1: Starting the execution of project.

Department of Computer Science & Engineering, FET, Jain (Deemed-to-be)


University 58
Human activity prediction using deep

Department of Computer Science & Engineering, FET, Jain (Deemed-to-be)


University 59
Human activity prediction using deep
Conclusion and Future scope

In our project, we have presented an approach used for human activity prediction using
deep learning algorithms. We collect raw video samples and observe only a part of each
video clip and use deep learning models to predict the possible action. The main aim of
this project was to enable early recognition of activities instead of detecting it after
activity completion. We have used the LSTM model for human activity prediction. It
considers different features extracted to make the prediction and can be used for several
different applications in computer vision. This early detection of activities can be useful
for several different applications such as autonomous vehicles, medical care, surveillance
systems and smart homes. Another objective was to find the right model to implement
this approach when compared with all the state-of-the- art models.
1. There’s a developing need in elderly care (both bodily and mentally), future programs
of human activity prediction ought to assist prevent damage, e.g., to discover older
peoples' risky situations. An architecture in the smartphone can be developed with the
purpose of users' fal detection. activity prediction and recognition sensors could assist
elders in a proactive way, including lifestyle routine reminders (e.g. acing medication),
living activity tracking for remote robotic assists.
2. Children's care is another area that could benefit from activity prediction studies and
future improvement. Its applications could include tracking infants' napping status and
predicting their needs for food or other stuff.
3. Activity prediction techniques can also be utilized in kids (ASD) detection.

Department of Computer Science & Engineering, FET, Jain (Deemed-to-be)


University 60
REFERENCES
[1] Almeida, Aitor, and Gorka Azkune. "Predicting human behaviour with recurrent neural networks." Applied Sciences
8.2 (2018): 305.
[2] Singh, Pulkit, et al. "End-to-end deep prototype and exemplar models for predicting human behavior." arXiv
preprint arXiv:2007.08723 (2020).
[3] Battleday, Ruairidh M., Joshua C. Peterson, and Thomas L. Griffiths. "Capturing human categorization of
natural images by combining deep networks and cognitive models." Nature communications 11.1 (2020): 1-14.
[4] Lin, Kaixiang, et al. "Efficient large-scale fleet management via multi-agent deep reinforcement learning."
Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2018.
[5] Cai, Haoye, et al. "Deep video generation, prediction and completion of human action sequences." Proceedings of
the European Conference on Computer Vision (ECCV). 2018.
[6] Carreira, Joao, and Andrew Zisserman. "Quo vadis, action recognition? a new model and the kinetics dataset."
proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017.
[7] Sadegh Aliakbarian, Mohammad, et al. "Encouraging lstms to anticipate actions very early." Proceedings of the
IEEE International Conference on Computer Vision. 2017.
[8] Sajjad, Muhammad, et al. "Human behavior understanding in big multimedia data using CNN based facial
expression recognition." Mobile networks and applications 25.4 (2020): 1611-1621.
[9] Jaouedi, Neziha, Noureddine Boujnah, and Med Salim Bouhlel. "A new hybrid deep learning model for human
action recognition." Journal of King Saud University-Computer and Information Sciences 32.4 (2020): 447-453.
[10] Ronao, Charissa Ann, and Sung-Bae Cho. "Human activity recognition with smartphone sensors using deep
learning neural networks." Expert systems with applications 59 (2016): 235-244.

[11]]Zhou, Xiaokang, et al. "Deep-learning-enhanced human activity recognition for Internet of healthcare things." IEEE
Internet of Things Journal 7.7 (2020): 6429-6438.
[12] Chen, Kaixuan, et al. "Deep Learning for Sensor-based Human Activity Recognition: Overview, Challenges,
and Opportunities." ACM Computing Surveys (CSUR) 54.4 (2021): 1-40.
[13] Luvizon, Diogo, David Picard, and Hedi Tabia. "Multi-task deep learning for real-time 3D human pose estimation
and action recognition." IEEE transactions on pattern analysis and machine intelligence (2020).
[14] Liciotti, Daniele, et al. "A sequential deep learning application for recognising human activities in smart homes."
Neurocomputing 396 (2020): 501-513.
“[15]Henry Friday Nweke, Ying Wah Teh, et al. "Deep learning algorithms for human activity recognition using mobile
and wearable sensor networks: State of the art and research challenges".”
“[16]Jian Bo Yang, Minh Nhut Nguyen, Phyo Phyo San, Xiao Li Li, Shonali Krishnaswamy, et al. "Deep
Convolutional Neural Networks On Multichannel Time Series For Human Activity Recognition".”
“[17]Deepika Singh, Erinc Merdiven, Ismini Psychoula, Johannes Kropf, Sten Hanke, Matthieu Geist, and
Andreas Holzinger, et al. "Human Activity Recognition

v
Using Recurrent Neural Networks".”
“[18]Ming Zeng, Tong Yu, Xiao Wang, Le T. Nguyen, Ole J. Mengshoel, Ian Lane, et al. "Semi-
Supervised Convolutional Neural Networks for Human Activity Recognition".”
“[19]Shreyank N Gowda, et al. "Human activity recognition using combinatorial Deep Belief Networks".”
“[20]Julieta Martinez, Michael J. Black, and Javier Romero, et al. [20] in "On human motion prediction using recurrent
neural networks".

v
v
v
v
Using facial landmarks to detect

Department of Computer Science & Engineering, FET, Jain (Deemed-to-be)


University 66
Using facial landmarks to detect

Department of Computer Science & Engineering, FET, Jain (Deemed-to-be)


University 67
APPENDIX - I

PHOTOGRAPHS (WITH STUDENT AND GUIDE INFORMATION)

Team member Photo

Guru prasad P D

USN: 19BTRCS024

Email: guruprasad1pounarkar@gmail.com

Dept. of Computer Science and Engineering

Manesh Suhas S M

USN: 19BTRCS040

Email: manishsuhas098@gmail.com

Dept. of Computer Science and Engineering

Hrithik Krishna

USN: 19BTRCS028

Email: 19btrcs028@jainuniversity.ac.in

Dept. of Computer Science and Engineering

DR. Vanitha

k Guide

Email: k.vanitha@jainuniversity.ac.in

Dept. of Computer Science and Engineering

VI
APPENDIX -
SOURCE CODE

We will start by installing and importing the required libraries.


"""

# Discard the output of this


cell. #%%capture

# Install the required libraries.


!pip install tensorflow opencv-contrib-python youtube-dl moviepy pydot
!pip install git+https://github.com/TahaAnwar/pafy.git#egg=pafy
!pip install pytube

# Commented out IPython magic to ensure Python compatibility.


# Import the required libraries.
import os
import cv2
import pafy
import math
import random
import numpy as np
import datetime as dt
import tensorflow as
tf
from collections import deque
import matplotlib.pyplot as
plt from pytube import
Youtube

from moviepy.editor import


* # %matplotlib inline

from sklearn.model_selection import

train_test_split from tensorflow.keras.layers import

*
from tensorflow.keras.models import Sequential
from tensorflow.keras.utils import to_categorical
from tensorflow.keras.callbacks import
EarlyStopping from tensorflow.keras.utils import
plot_model

"""And will set `Numpy`, `Python`, and `Tensorflow` seeds to get consistent results on every execution."""

seed_constant = 27
np.random.seed(seed_constant)
random.seed(seed_constant)
tf.random.set_seed(seed_constant
)

"""## *<font style="color:rgb(134,19,348)">Step 1: Visualize the Data with its Labels</font>*

In the first step, we will visualize the data along with labels to get an idea about what we will be dealing with. We will be using the
[UCF50 - Action Recognition Dataset](https://www.crcv.ucf.edu/data/UCF50.php), consisting of realistic videos taken from youtube
which differentiates this data set from most of the other available action recognition data sets as they are not realistic and are staged
by actors. The Dataset contains:

* *`50`* Action Categories

* *`25`* Groups of Videos per Action Category


APPENDIX -

vii
APPENDIX
* *`133`* Average Videos per Action Category

* *`199`* Average Number of Frames per Video

* *`320`* Average Frames Width per Video

* *`240`* Average Frames Height per Video

* *`26`* Average Frames Per Seconds per Video

For visualization, we will pick `20` random categories from the dataset and a random video from each selected category and will
visualize the first frame of the selected videos with their associated labels written. This way we’ll be able to visualize a subset ( `20`
random videos ) of the dataset.
"""

from google.colab import


drive
drive.mount('/content/drive')

"""For Visualization, we wil pick 20 random categories from the Dataset and a random video from each selected category and
will visualize the first frame of the selected videos with their associated labels written. This way we'll be able to visualize a subset
(20 random videos) of the dataset.

"""

# Create a Matplotlib figure and specify the size of the


figure. plt.figure(figsize = (20, 20))

# Get the names of all classes/categories in UCF50.


all_classes_names = os.listdir('/content/drive/MyDrive/Colab Notebooks/UCF50')

# Generate a list of 20 random values. The values will be between 0-


50, # where 50 is the total number of class in the dataset.
random_range = random.sample(range(len(all_classes_names)), 20)

# Iterating through all the generated random values.


for counter, random_index in enumerate(random_range, 1):

# Retrieve a Class Name using the Random Index.


selected_class_Name = all_classes_names[random_index]

# Retrieve the list of all the video files present in the randomly selected Class Directory.
video_files_names_list = os.listdir(f'/content/drive/MyDrive/Colab Notebooks/UCF50/{selected_class_Name}')

# Randomly select a video file from the list retrieved from the randomly selected Class Directory.
selected_video_file_name = random.choice(video_files_names_list)

# Initialize a VideoCapture object to read from the video File.


video_reader = cv2.VideoCapture(f'/content/drive/MyDrive/Colab
Notebooks/UCF50/{selected_class_Name}/{selected_video_file_name}')

# Read the first frame of the video file.


_, bgr_frame = video_reader.read()

# Release the VideoCapture


object. video_reader.release()

# Convert the frame from BGR into RGB format.


rgb_frame = cv2.cvtColor(bgr_frame, cv2.COLOR_BGR2RGB)

# Write the class name on the video frame. vii


cv2.putText(rgb_frame, selected_class_Name, (10, 30), cv2.FONT_HERSHEY_SIMPLEX, 1, (255, 255, 255), 2)
APPENDIX
# Write the class name on the video frame.
cv2.putText(rgb_frame, selected_class_Name, (10, 30), cv2.FONT_HERSHEY_SIMPLEX, 1, (255, 255, 255),
2) # Display the frame.
plt.subplot(5, 4, counter);plt.imshow(rgb_frame);plt.axis('off')

"""## *<font style="color:rgb(134,19,348)">Step 2: Preprocess the Dataset</font>*

Next, we will perform some preprocessing on the dataset. First, we will read the video files from the dataset and resize the frames of
the videos to a fixed width and height, to reduce the computations and normalized the data to range `[0-1]` by dividing the pixel
values with `255`, which makes convergence faster while training the network.

But first, let's initialize some


constants. """

# Specify the height and width to which each video frame will be resized in our dataset.
IMAGE_HEIGHT , IMAGE_WIDTH = 64, 64

# Specify the number of frames of a video that will be fed to the model as one sequence.
SEQUENCE_LENGTH = 20

# Specify the directory containing the UCF50 dataset.


DATASET_DIR = "/content/drive/MyDrive/Colab
Notebooks/UCF50"

# Specify the list containing the names of the classes used for training. Feel free to choose any set of classes.
CLASSES_LIST = ["WalkingWithDog", "TaiChi", "Swing", "HorseRace"]

"""*Note:* The *`IMAGE_HEIGHT`*, *`IMAGE_WIDTH`* and *`SEQUENCE_LENGTH`* constants can be increased for better
results, although increasing the sequence length is only effective to a certain point, and increasing the values will result in the process
being more computationally expensive.

### *<font style="color:rgb(134,19,348)">Create a Function to Extract, Resize & Normalize Frames</font>*

We will create a function *`frames_extraction()`* that will create a list containing the resized and normalized frames of a video whose
path is passed to it as an argument. The function will read the video file frame by frame, although not all frames are added to the list as
we will only need an evenly distributed sequence length of frames.
"""

def
frames_extraction(video_path):
'''
This function will extract the required frames from a video after resizing and normalizing
them. Args:
video_path: The path of the video in the disk, whose frames are to be
extracted. Returns:
frames_list: A list containing the resized and normalized frames of the video.
'''

# Declare a list to store video


frames. frames_list = []

# Read the Video File using the VideoCapture object.


video_reader = cv2.VideoCapture(video_path)

# Get the total number of frames in the video.


video_frames_count = int(video_reader.get(cv2.CAP_PROP_FRAME_COUNT))

# Calculate the the interval after which frames will be added to the list.
skip_frames_window = max(int(video_frames_count/SEQUENCE_LENGTH), 1)

# Iterate through the Video Frames.


for frame_counter in range(SEQUENCE_LENGTH):
success, frame = video_reader.read() APPENDIX
# Check if Video frame is not successfully read then break the loop
if not success:
break

# Resize the Frame to fixed height and width.


resized_frame = cv2.resize(frame, (IMAGE_HEIGHT, IMAGE_WIDTH))

# Normalize the resized frame by dividing it with 255 so that each pixel value then lies between 0 and 1
normalized_frame = resized_frame / 255

# Append the normalized frame into the frames


list frames_list.append(normalized_frame)

# Release the VideoCapture


object. video_reader.release()

# Return the frames


list. return frames_list

"""### *<font style="color:rgb(134,19,348)">Create a Function for Dataset Creation</font>*

Now we will create a function *`create_dataset()`* that will iterate through all the classes specified in the
*`CLASSES_LIST`* constant and will call the function *`frame_extraction()`* on every video file of the selected classes and
return the frames (*`features`), class index ( *`labels`*), and video file path (`video_files_paths`*).
"""

def
create_dataset():
'''
This function will extract the data of the selected classes and create the required
dataset. Returns:
features: A list containing the extracted frames of the videos.
labels: A list containing the indexes of the classes associated with the
videos. video_files_paths: A list containing the paths of the videos in the disk.
'''

# Declared Empty Lists to store the features, labels and video file path values.
features = []
labels = []
video_files_paths = []

# Iterating through all the classes mentioned in the classes list


for class_index, class_name in enumerate(CLASSES_LIST):

# Display the name of the class whose data is being


extracted. print(f'Extracting Data of Class: {class_name}')

# Get the list of video files present in the specific class name directory.
files_list = os.listdir(os.path.join(DATASET_DIR, class_name))

# Iterate through all the files present in the files


list. for file_name in files_list:

# Get the complete video path.


video_file_path = os.path.join(DATASET_DIR, class_name, file_name)

# Extract the frames of the video file.


frames = frames_extraction(video_file_path)

# Check if the extracted frames are equal to the SEQUENCE_LENGTH specified above.
# So ignore the vides having frames less than the SEQUENCE_LENGTH.
if len(frames) == SEQUENCE_LENGTH:
APPENDIX
# Append the data to their repective lists.
features.append(frames)
labels.append(class_index)
video_files_paths.append(video_file_path
)

# Converting the list to numpy


arrays features =
np.asarray(features)
labels = np.array(labels)

# Return the frames, class index, and video file


path. return features, labels, video_files_paths

"""Now we will utilize the function *`create_dataset()`* created above to extract the data of the selected classes and create
the required dataset."""

# Create the dataset.


features, labels, video_files_paths = create_dataset()

"""Now we will convert `labels` (class indexes) into one-hot encoded vectors."""

# Using Keras's to_categorical method to convert labels into one-hot-encoded


vectors one_hot_encoded_labels = to_categorical(labels)

"""## *<font style="color:rgb(134,19,348)">Step 3: Split the Data into Train and Test Set</font>*

As of now, we have the required *`features`* (a NumPy array containing all the extracted frames of the videos) and
*`one_hot_encoded_labels`* (also a Numpy array containing all class labels in one hot encoded format). So now, we will split our
data to create training and testing sets. We will also shuffle the dataset before the split to avoid any bias and get splits representing
the overall distribution of the data.
"""

# Split the Data into Train ( 75% ) and Test Set ( 25% ).
features_train, features_test, labels_train, labels_test = train_test_split(features, one_hot_encoded_labels,
test_size = 0.25, shuffle =
True, random_state =
seed_constant)

"""## *<font style="color:rgb(134,19,348)">Step 4: Implement the ConvLSTM Approach</font>*

In this step, we will implement the first approach by using a combination of ConvLSTM cells. A ConvLSTM cell is a variant of an
LSTM network that contains convolutions operations in the network. it is an LSTM with convolution embedded in the
architecture, which makes it capable of identifying spatial features of the data while keeping into account the temporal relation.

<center>
<img src="https://drive.google.com/uc?export=view&id=1KHN_JFWJoJi1xQj_bRdxy2QgevGOH1qP" width= 500px>
</center>

For video classification, this approach effectively captures the spatial relation in the individual frames and the temporal relation
across the different frames. As a result of this convolution structure, the ConvLSTM is capable of taking in 3-dimensional input
`(width, height, num_of_channels)` whereas a simple LSTM only takes in 1-dimensional input hence an LSTM is incompatible for
modeling Spatio-temporal data on its own.

You can read the paper [*Convolutional LSTM Network: A Machine Learning Approach for Precipitation
Nowcasting](https://arxiv.org/abs/1506.04214v1) by **Xingjian Shi* (NIPS 2015), to learn more about this architecture.

### *<font style="color:rgb(134,19,348)">Step 4.1: Construct the Model</font>*

To construct the model, we will use Keras [*`ConvLSTM2D`](https://keras.io/api/layers/recurrent_layers/conv_lstm2d) recurrent


layers. The *`ConvLSTM2D`* layer also takes in the number vof filters and kernel size required for applying the convolutional
operations. The output of the layers is flattened in the end and is fed to the *`Dense`** layer with softmax activation which outputs
the probability of each action category.
APPENDIX
APPENDIX
We will also use *`MaxPooling3D`* layers to reduce the dimensions of the frames and avoid unnecessary computations and
*`Dropout`* layers to prevent [overfitting](https://en.wikipedia.org/wiki/Overfitting) the model on the data. The architecture is a
simple one and has a small number of trainable parameters. This is because we are only dealing with a small subset of the dataset
which does not require a large-scale model.
"""

def
create_convlstm_model():
'''
This function will construct the required convlstm model.
Returns:
model: It is the required constructed convlstm model.
'''

# We will use a Sequential model for model


construction model = Sequential()

# Define the Model Architecture.

############################################################################################################
############

model.add(ConvLSTM2D(filters = 4, kernel_size = (3, 3), activation = 'tanh',data_format = "channels_last",


recurrent_dropout=0.2, return_sequences=True, input_shape = (SEQUENCE_LENGTH,
IMAGE_HEIGHT, IMAGE_WIDTH, 3)))

model.add(MaxPooling3D(pool_size=(1, 2, 2), padding='same', data_format='channels_last'))


model.add(TimeDistributed(Dropout(0.2)))

model.add(ConvLSTM2D(filters = 8, kernel_size = (3, 3), activation = 'tanh', data_format = "channels_last",


recurrent_dropout=0.2, return_sequences=True))

model.add(MaxPooling3D(pool_size=(1, 2, 2), padding='same',


data_format='channels_last')) model.add(TimeDistributed(Dropout(0.2)))

model.add(ConvLSTM2D(filters = 14, kernel_size = (3, 3), activation = 'tanh', data_format =


"channels_last", recurrent_dropout=0.2, return_sequences=True))

model.add(MaxPooling3D(pool_size=(1, 2, 2), padding='same',


data_format='channels_last')) model.add(TimeDistributed(Dropout(0.2)))

model.add(ConvLSTM2D(filters = 16, kernel_size = (3, 3), activation = 'tanh', data_format = "channels_last",


recurrent_dropout=0.2, return_sequences=True))

model.add(MaxPooling3D(pool_size=(1, 2, 2), padding='same', data_format='channels_last'))


#model.add(TimeDistributed(Dropout(0.2)))

model.add(Flatten())

model.add(Dense(len(CLASSES_LIST), activation = "softmax"))

############################################################################################################
############

# Display the models


summary. model.summary()

# Return the constructed convlstm


model. return model

"""Now we will utilize the function *`create_convlstm_model()`* created above, to construct the required `convlstm` model."""
# Construct the required convlstm model. APPENDIX
convlstm_model =
create_convlstm_model()

# Display the success message.


print("Model Created
Successfully!")

"""#### *<font style="color:rgb(134,19,348)">Check Model’s Structure:</font>*

Now we will use the *`plot_model()`* function, to check the structure of the constructed model, this is helpful while constructing
a complex network and making that the network is created correctly.
"""

# Plot the structure of the contructed model.


plot_model(convlstm_model, to_file = 'convlstm_model_structure_plot.png', show_shapes = True, show_layer_names = True)

# Loading the weights of the model

checkpoint_path =
'/content/convlstm_model Date_Time_2023_05_0813_13_24_Loss_0.42100247740745544
Accuracy_0.8524590134620667.h5' convlstm_model.load_weights(checkpoint_path)

# Compile the model and specify loss function, optimizer and metrics values to the model
convlstm_model.compile(loss = 'categorical_crossentropy', optimizer = 'Adam', metrics = ["accuracy"])

"""### *<font style="color:rgb(134,19,348)">Step 4.2: Compile & Train the Model</font>*

Next, we will add an early stopping callback to prevent [overfitting](https://en.wikipedia.org/wiki/Overfitting) and start the
training after compiling the model.
"""

checkpoint_path = "/content/drive/MyDrive/Colab
Notebooks/trained_model_/cp.ckpt" checkpoint_dir =
os.path.dirname(checkpoint_path)

# Create a callback that saves the model's weights


cp_callback = tf.keras.callbacks.ModelCheckpoint(filepath=checkpoint_path,
save_weights_only=True
, verbose=1)

# Create an Instance of Early Stopping Callback


early_stopping_callback = EarlyStopping(monitor = 'val_loss', patience = 10, mode = 'min', restore_best_weights = True)

# Compile the model and specify loss function, optimizer and metrics values to the model
convlstm_model.compile(loss = 'categorical_crossentropy', optimizer = 'Adam', metrics = ["accuracy"])

# Start training the model.


convlstm_model_training_history = convlstm_model.fit(x = features_train, y = labels_train, epochs = 50, batch_size =
4, shuffle = True, validation_split = 0.2,
callbacks = [cp_callback, early_stopping_callback])

os.listdir(checkpoint_dir)

"""#### *<font style="color:rgb(134,19,348)">Evaluate the Trained

Model</font>* After training, we will evaluate the model on the test set.
"""

# Evaluate the trained model.


model_evaluation_history = convlstm_model.evaluate(featureo,f labels_test) """####
A PPENDIX
*<font style="color:rgb(134,19,348)">Save the Model</font>*
APPENDIX
Now we will save the model to avoid training it from scratch every time we need the
model. """

# Get the loss and accuracy from model_evaluation_history.


model_evaluation_loss, model_evaluation_accuracy = model_evaluation_history

# Define the string date format.


# Get the current Date and Time in a DateTime Object.
# Convert the DateTime object to string according to the style mentioned in date_time_format string.
date_time_format = '%Y_%m_%d_%H%M_%S'
current_date_time_dt = dt.datetime.now()
current_date_time_string = dt.datetime.strftime(current_date_time_dt, date_time_format)

# Define a useful name for our model to make it easy for us while navigating through multiple saved models.
model_file_name =
f'convlstm_model Date_Time{current_date_time_string} Loss{model_evaluation_loss} Accuracy{model_evaluation_accuracy}
.h5'

# Save your Model.


convlstm_model.save(model_file_name)

"""### *<font style="color:rgb(134,19,348)">Step 4.3: Plot Model’s Loss & Accuracy Curves</font>*

Now we will create a function *`plot_metric()`* to visualize the training and validation metrics. We already have separate
metrics from our training and validation steps so now we just have to visualize them.
"""

def plot_metric(model_training_history, metric_name_1, metric_name_2,


plot_name): '''
This function will plot the metrics passed to it in a graph.
Args:
model_training_history: A history object containing a record of training and validation
loss values and metrics values at successive epochs
metric_name_1: The name of the first metric that needs to be plotted in the graph.
metric_name_2: The name of the second metric that needs to be plotted in the
graph. plot_name: The title of the graph.
'''

# Get metric values using metric names as identifiers.


metric_value_1 =
model_training_history.history[metric_name_1] metric_value_2 =
model_training_history.history[metric_name_2]

# Construct a range object which will be used as x-axis (horizontal plane) of the
graph. epochs = range(len(metric_value_1))

# Plot the Graph.


plt.plot(epochs, metric_value_1, 'blue', label =
metric_name_1) plt.plot(epochs, metric_value_2, 'red', label =
metric_name_2)

# Add title to the plot.


plt.title(str(plot_name)
)

# Add legend to the


plot. plt.legend()

"""Now we will utilize the function *`plot_metric()`* created above, to visualize and understand the

metrics.""" # Visualize the training and validation loss metrices.


plot_metric(convlstm_model_training_history, 'loss', 'val_lossv' Total Loss vs Total Validation
APPENDIX
Loss') # Visualize the training and validation accuracy metrices.
A
PPENDIX
plot_metric(convlstm_model_training_history, 'accuracy', 'val_accuracy', 'Total Accuracy vs Total Validation Accuracy')

"""## *<font style="color:rgb(134,19,348)">Step 5: Implement the LRCN Approach</font>*

In this step, we will implement the LRCN Approach by combining Convolution and LSTM layers in a single model. Another
similar approach can be to use a CNN model and LSTM model trained separately. The CNN model can be used to extract spatial
features from the frames in the video, and for this purpose, a pre-trained model can be used, that can be fine-tuned for the problem.
And the LSTM model can then use the features extracted by CNN, to predict the action being performed in the video.

But here, we will implement another approach known as the Long-term Recurrent Convolutional Network (LRCN), which
combines CNN and LSTM layers in a single model. The Convolutional layers are used for spatial feature extraction from the
frames, and the extracted spatial features are fed to LSTM layer(s) at each time-steps for temporal sequence modeling. This way the
network learns spatiotemporal features directly in an end-to-end training, resulting in a robust model.

<center>
<img src='https://drive.google.com/uc?export=download&id=1I-q5yLsIoNh2chfzT7JYvra17FsXvdme'>
</center>

You can read the paper [Long-term Recurrent Convolutional Networks for Visual Recognition and
Description](https://arxiv.org/abs/1411.4389?source=post_page--------------------------) by Jeff Donahue (CVPR 2015), to learn
more
about this architecture.

We will also use [*`TimeDistributed`*](https://keras.io/api/layers/recurrent_layers/time_distributed/) wrapper layer, which allows


applying the same layer to every frame of the video independently. So it makes a layer (around which it is wrapped) capable of
taking input of shape `(no_of_frames, width, height, num_of_channels)` if originally the layer's input shape was `(width, height,
num_of_channels)` which is very beneficial as it allows to input the whole video into the model in a single shot.

<center>
<img src='https://drive.google.com/uc?export=download&id=1CbauSm5XTY7ypHYBHH7rDSnJ5LO9CUWX' width=400>
</center>

### *<font style="color:rgb(134,19,348)">Step 5.1: Construct the Model</font>*

To implement our LRCN architecture, we will use time-distributed *`Conv2D`* layers which will be followed by *`MaxPooling2D`*
and *`Dropout`* layers. The feature extracted from the *`Conv2D`* layers will be then flattened using the *`Flatten`* layer and will
be fed to a *`LSTM`* layer. The *`Dense`* layer with softmax activation will then use the output from the *`LSTM`* layer to
predict the action being performed.
"""

def
create_LRCN_model():
'''
This function will construct the required LRCN
model. Returns:
model: It is the required constructed LRCN model.
'''

# We will use a Sequential model for model


construction. model = Sequential()

# Define the Model Architecture.

############################################################################################################
############

model.add(TimeDistributed(Conv2D(16, (3, 3), padding='same',activation = 'relu'),


input_shape = (SEQUENCE_LENGTH, IMAGE_HEIGHT, IMAGE_WIDTH, 3)))

model.add(TimeDistributed(MaxPooling2D((4,
4)))) model.add(TimeDistributed(Dropout(0.25)))

model.add(TimeDistributed(Conv2D(32, (3, 3), padding='same',activation = 'relu')))


APPENDIX
model.add(TimeDistributed(MaxPooling2D((4,
model.add(TimeDistributed(Dropout(0.25)))

model.add(TimeDistributed(Conv2D(64, (3, 3), padding='same',activation = 'relu')))


model.add(TimeDistributed(MaxPooling2D((2,
2)))) model.add(TimeDistributed(Dropout(0.25)))

model.add(TimeDistributed(Conv2D(64, (3, 3), padding='same',activation = 'relu')))


model.add(TimeDistributed(MaxPooling2D((2,
2)))) #model.add(TimeDistributed(Dropout(0.25)))

model.add(TimeDistributed(Flatten())

) model.add(LSTM(32))

model.add(Dense(len(CLASSES_LIST), activation = 'softmax'))

############################################################################################################
############

# Display the models


summary. model.summary()

# Return the constructed LRCN


model. return model

"""Now we will utilize the function *`create_LRCN_model()`* created above to construct the required `LRCN`

model.""" # Construct the required LRCN model.


LRCN_model = create_LRCN_model()

# Display the success message.


print("Model Created
Successfully!")

"""#### *<font style="color:rgb(134,19,348)">Check Model’s Structure:</font>*

Now we will use the *`plot_model()`* function to check the structure of the constructed `LRCN` model. As we had checked for
the previous model.
"""

# Plot the structure of the contructed LRCN model.


plot_model(LRCN_model, to_file = 'LRCN_model_structure_plot.png', show_shapes = True, show_layer_names = True)

# Loading the LRCN model

checkpoint_path =
'/content/LRCN_model Date_Time_2023_05_0813_23_44_Loss_0.3155522644519806
Accuracy_0.8934426307678223.h5' LRCN_model.load_weights(checkpoint_path)

"""### *<font style="color:rgb(134,19,348)">Step 5.2: Compile & Train the Model</font>*

After checking the structure, we will compile and start training the
model. """

checkpoint_path = "/content/drive/MyDrive/Colab Notebooks/trained_LRCN_model/cp.ckpt"


checkpoint_dir = os.path.dirname(checkpoint_path)

# Create a callback that saves the model's weights


cp_callback = tf.keras.callbacks.ModelCheckpoint(filepath=checkpoint_path,
save_weights_only=True
, verbose=1)
APPENDIX
# Create an Instance of Early Stopping Callback.
early_stopping_callback = EarlyStopping(monitor = 'val_loss', patience = 15, mode = 'min', restore_best_weights = True)

# Compile the model and specify loss function, optimizer and metrics to the model.
LRCN_model.compile(loss = 'categorical_crossentropy', optimizer = 'Adam', metrics = ["accuracy"])

# Start training the model.


LRCN_model_training_history = LRCN_model.fit(x = features_train, y = labels_train, epochs = 70, batch_size = 4 ,
shuffle = True, validation_split = 0.2, callbacks = [cp_callback, early_stopping_callback])

"""#### *<font style="color:rgb(134,19,348)">Evaluating the trained

Model</font>* As done for the previous one, we will evaluate the `LRCN` model on

the test set.


"""

# Evaluate the trained model.


model_evaluation_history = LRCN_model.evaluate(features_test, labels_test)

"""#### *<font style="color:rgb(134,19,348)">Save the Model</font>*

After that, we will save the model for future uses using the same technique we had used for the previous
model. """

# Get the loss and accuracy from model_evaluation_history.


model_evaluation_loss, model_evaluation_accuracy = model_evaluation_history

# Define the string date format.


# Get the current Date and Time in a DateTime Object.
# Convert the DateTime object to string according to the style mentioned in date_time_format string.
date_time_format = '%Y_%m_%d_%H%M_%S'
current_date_time_dt = dt.datetime.now()
current_date_time_string = dt.datetime.strftime(current_date_time_dt, date_time_format)

# Define a useful name for our model to make it easy for us while navigating through multiple saved
models. model_file_name =
f'LRCN_model Date_Time{current_date_time_string} Loss{model_evaluation_loss} Accuracy{model_evaluation_accuracy}.h
5'

# Save the Model.


LRCN_model.save(model_file_name)

"""### *<font style="color:rgb(134,19,348)">Step 5.3: Plot Model’s Loss & Accuracy Curves</font>*

Now we will utilize the function *`plot_metric()`* we had created above to visualize the training and validation metrics of this

model. """

# Visualize the training and validation loss metrices.


plot_metric(LRCN_model_training_history, 'loss', 'val_loss', 'Total Loss vs Total Validation Loss')

# Visualize the training and validation accuracy metrices.


plot_metric(LRCN_model_training_history, 'accuracy', 'val_accuracy', 'Total Accuracy vs Total Validation Accuracy')

"""## *<font style="color:rgb(134,19,348)">Step 6: Test the Best Performing Model on YouTube videos</font>*

From the results, it seems that the LRCN model performed significantly well for a small number of classes. so in this step, we will
put the `LRCN` model to test on some youtube videos.

### *<font style="color:rgb(134,19,348)">Create a Function to Download YouTube Videos:</font>*

We will create a function *`download_video()`* to download the YouTube videos first using *`pytube`* library. The library
only requires a URL to a video to download it along with its associated metadata like the title of the video.
""" APPENDIX
!pip install pytube

from pytube import YouTube

def download_video(url,
output_path): try:
yt = YouTube(url)
video =
yt.streams.get_highest_resolution()
video.download(output_path)
print("Video downloaded
successfully!") except Exception as e:
print("Error:", str(e))

"""### *<font style="color:rgb(134,19,348)">Download a Test Video:</font>*

Now we will utilize the function *`download_youtube_videos()`* created above to download a youtube video on which the `LRCN`
model will be tested.

Below is me testing whether i can download different youtube videos


"""

#OG code
# Provide the URL of the YouTube video you want to download
video_url = "https://www.youtube.com/watch?v=8u0qjmHIOcE"

# Create a YouTube object with the provided


URL yt = YouTube(video_url)

# Get the title of the


video video_title = yt.title

# Make the Output directory if it does not exist


test_videos_directory = '/content/test_videos'
os.makedirs(test_videos_directory, exist_ok = True)

# Download a Youtube video


output_path =
"/content/test_videos"
video = download_video(video_url, output_path)

# Get the YouTube Video's path we just downloaded.


input_video_file_path = f'{test_videos_directory}/{video_title}.mp4'

# Print the title


print("Video Title:", video_title)

"""### *<font style="color:rgb(134,19,348)">Create a Function To Perform Action Recognition on Videos</font>*

Next, we will create a function *`predict_on_video()`* that will simply read a video frame by frame from the path passed in as
an argument and will perform action recognition on video and save the results.
"""

def predict_on_video(video_file_path, output_file_path, SEQUENCE_LENGTH):


'''
This function will perform action recognition on a video using the LRCN model.
Args:
video_file_path: The path of the video stored in the disk on which the action recognition is to be performed.
output_file_path: The path where the ouput video with the predicted action being performed overlayed will be stored.
SEQUENCE_LENGTH: The fixed number of frames of a video that can be passed to the model as one sequence.
'''

# Initialize the VideoCapture object to read from the video file.


A
video_reader = cv2.VideoCapture(video_file_path)PPENDIX

# Get the width and height of the video.


original_video_width = int(video_reader.get(cv2.CAP_PROP_FRAME_WIDTH))
original_video_height = int(video_reader.get(cv2.CAP_PROP_FRAME_HEIGHT))

# Initialize the VideoWriter Object to store the output video in the disk.
video_writer = cv2.VideoWriter(output_file_path, cv2.VideoWriter_fourcc('M', 'P', '4', 'V'),
video_reader.get(cv2.CAP_PROP_FPS), (original_video_width, original_video_height))

# Declare a queue to store video frames.


frames_queue = deque(maxlen = SEQUENCE_LENGTH)

# Initialize a variable to store the predicted action being performed in the video.
predicted_class_name = ''

# Iterate until the video is accessed


successfully. while video_reader.isOpened():

# Read the frame.


ok, frame = video_reader.read()

# Check if frame is not read properly then break the


loop. if not ok:
break

# Resize the Frame to fixed Dimensions.


resized_frame = cv2.resize(frame, (IMAGE_HEIGHT, IMAGE_WIDTH))

# Normalize the resized frame by dividing it with 255 so that each pixel value then lies between 0 and 1.
normalized_frame = resized_frame / 255

# Appending the pre-processed frame into the frames


list. frames_queue.append(normalized_frame)

# Check if the number of frames in the queue are equal to the fixed sequence
length. if len(frames_queue) == SEQUENCE_LENGTH:

# Pass the normalized frames to the model and get the predicted probabilities.
predicted_labels_probabilities = LRCN_model.predict(np.expand_dims(frames_queue, axis = 0))[0]

# Get the index of class with highest probability.


predicted_label =
np.argmax(predicted_labels_probabilities)

# Get the class name using the retrieved index.


predicted_class_name = CLASSES_LIST[predicted_label]

# Write predicted class name on top of the frame.


cv2.putText(frame, predicted_class_name, (10, 30), cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 255, 0), 2)

# Write The frame into the disk using the VideoWriter


Object. video_writer.write(frame)

# Release the VideoCapture and VideoWriter objects.


video_reader.release()
video_writer.release()

"""### *<font style="color:rgb(134,19,348)">Perform Action Recognition on the Test Video</font>*

Now we will utilize the function *`predict_on_video()`* created above to perform action recognition on the test video we had
downloaded using the function *`download_youtube_videos() and display the output video with the predicted action overlayed on it. """
# Construct the output video path. APPENDIX
output_video_file_path = f'{test_videos_directory}/{video_title}-Output-SeqLen{SEQUENCE_LENGTH}.mp4'

# Perform Action Recognition on the Test Video.


predict_on_video(input_video_file_path, output_video_file_path, SEQUENCE_LENGTH)

# Display the output video.


VideoFileClip(output_video_file_path, audio=False, target_resolution=(300,None)).ipython_display()

"""### *<font style="color:rgb(134,19,348)">Create a Function To Perform a Single Prediction on Videos</font>*

Now let's create a function that will perform a single prediction for the complete videos. We will extract evenly distributed *N*
*`(SEQUENCE_LENGTH)`* frames from the entire video and pass them to the `LRCN` model. This approach is really useful
when you are working with videos containing only one activity as it saves unnecessary computations and time in that scenario.
"""

def predict_single_action(video_file_path,
SEQUENCE_LENGTH): '''
This function will perform single action recognition prediction on a video using the LRCN model.
Args:
video_file_path: The path of the video stored in the disk on which the action recognition is to be performed.
SEQUENCE_LENGTH: The fixed number of frames of a video that can be passed to the model as one sequence.
'''

# Initialize the VideoCapture object to read from the video


file. video_reader = cv2.VideoCapture(video_file_path)

# Get the width and height of the video.


original_video_width = int(video_reader.get(cv2.CAP_PROP_FRAME_WIDTH))
original_video_height = int(video_reader.get(cv2.CAP_PROP_FRAME_HEIGHT))

# Declare a list to store video frames we will extract.


frames_list = []

# Initialize a variable to store the predicted action being performed in the video.
predicted_class_name = ''

# Get the number of frames in the video.


video_frames_count = int(video_reader.get(cv2.CAP_PROP_FRAME_COUNT))

# Calculate the interval after which frames will be added to the list.
skip_frames_window = max(int(video_frames_count/SEQUENCE_LENGTH),1)

# Iterating the number of times equal to the fixed length of


sequence. for frame_counter in range(SEQUENCE_LENGTH):

# Set the current frame position of the video.


video_reader.set(cv2.CAP_PROP_POS_FRAMES, frame_counter * skip_frames_window)

# Read a frame.
success, frame = video_reader.read()

# Check if frame is not read properly then break the


loop. if not success:
break

# Resize the Frame to fixed Dimensions.


resized_frame = cv2.resize(frame, (IMAGE_HEIGHT, IMAGE_WIDTH))

# Normalize the resized frame by dividing it with 255 so tteach pixel value then lies between 0 and 1.
normalized_frame = resized_frame / 255
APPENDIX
# Appending the pre-processed frame into the frames
list frames_list.append(normalized_frame)

# Passing the pre-processed frames to the model and get the predicted probabilities.
predicted_labels_probabilities = LRCN_model.predict(np.expand_dims(frames_list, axis = 0))[0]

# Get the index of class with highest probability.


predicted_label =
np.argmax(predicted_labels_probabilities)

# Get the class name using the retrieved index.


predicted_class_name = CLASSES_LIST[predicted_label]

# Display the predicted action along with the prediction confidence.


print(f'Action Predicted: {predicted_class_name}\nConfidence: {predicted_labels_probabilities[predicted_label]}')

# Release the VideoCapture


object. video_reader.release()

"""### *<font style="color:rgb(134,19,348)">Perform Single Prediction on a Test Video</font>*

Now we will utilize the function *`predict_single_action()`* created above to perform a single prediction on a complete youtube test
video that we will download using the function *`download_youtube_videos()`*, we had created above.
"""

# Download the youtube video.


video_title = download_youtube_videos('https://youtu.be/fc3w827kwyA', test_videos_directory)

# Construct tihe nput youtube video path


input_video_file_path = f'{test_videos_directory}/{video_title}.mp4'

# Perform Single Prediction on the Test Video.


predict_single_action(input_video_file_path, SEQUENCE_LENGTH)

# Display the input video.


VideoFileClip(input_video_file_path, audio=False, target_resolution=(300,None)).ipython_display(6566666

v
APPENDIX

v
v
i
APPENDIX-III
DATASHEETS
The data set can be accessed from below link:

\https://www.crcv.ucf.edu/data/UCF50.rar

x
Publication Details

Acceptance certificate

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy