0% found this document useful (0 votes)

21 views

SER_documentation_satwik

The project report focuses on the development of a Speech Emotion Recognition (SER) system aimed at improving human-computer interaction by accurately classifying emotions from speech. It outlines the methodology, including data collection, feature extraction, and the use of machine learning algorithms, specifically the Multi-layer Perceptron classifier, to enhance the performance of emotion detection systems. The report also discusses the significance of the RAVDESS dataset and the importance of feature selection in achieving effective emotion classification.

Uploaded by

vamsiladi14

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

21 views

SER_documentation_satwik

Uploaded by

vamsiladi14

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 47

A

Project Report
On

“SPEECH TO EMOTION REGONITION”

Submitted in partial fulfillment of

the requirements for the 8th Semester Sessional Examination of

BACHELOR OF TECHNOLOGY
IN

COMPUTER SCIENCE & ENGINEERING

By
D SHIVA SAWIK
20UG010391

Under the esteemed guidance of

Mr Sitanshu Kar
Dept of CSE

DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

GANDHI INSTITUTE OF ENGINEERING AND TECHNOLOGY
GUNUPUR – 765022 2023 - 24

i
GIET UNIVERSITY, GUNUPUR School of
Engineering and Technology
Department of Computer Science & Engineering
Approved by Govt. of Odisha

CERTIFICATE

This is to certify that the project work entitled “SPEECH TO EMOTION

REGONITION” is done by Name-D SHIVA SATWIK, Regd. No.- 20UG010391

in partial fulfilment of the requirements for the 8th Semester Sessional

Examination of Bachelor of Technology in Computer Science and

Engineering during the academic year 2023-24. This work is submitted to

the department as a part of evaluation of 8th Semester Project.

Mr Sitanshu Kar Dr. K.Murali Gopal

Project Supervisor HoD, CSE

ii
ABSTRACT

Communication is the key to express one’s thoughts and ideas clearly. Amongst all forms of
communication, speech is the most preferred and powerful form of communications in human. The era of the
Internet of Things (IoT) is rapidly advancing in bringing more intelligent systems available for everyday use.
These applications range from simple wearables and widgets to complex self- driving vehicles and automated
systems employed in various fields. Intelligent applications are interactive and require minimum user effort to
function, and mostly function on voice-based input.

This creates the necessity for these computer applications to completely comprehend human speech. A
speech percept can reveal information about the speaker including gender, age, language, and emotion. Several
existing speech recognition systems used in IoT applications are integrated with an emotion detection system
in order to analyze the emotional state of the speaker. The performance of the emotion detection system can
greatly influence the overall performance of the IoT application in many ways and can provide many advantages
over the functionalities of these applications.

This research presents a speech emotion detection system with improvements over an existing system
in terms of data, feature selection, and methodology that aims at classifying speech percepts based on emotions,
more accurately.

i
CONTENTS:

Table Page.No
1.Introduction------------------------------------------------------------------------ 1
2. SYSTEM ANALYSIS----------------------------------------------------------- 3

3. Methodology---------------------------------------------------------------------- 4
4.DFD diagram---------------------------------------------------------------------- 5
5. Modules--------------------------------------------------------------------------- 6
6. dataset----------------------------------------------------------------------------- 7
7.feature Extraction----------------------------------------------------------------- 8
8. Algorithms------------------------------------------------------------------------ 10
9. Classification report------------------------------------------------------------- 18
10.System Design------------------------------------------------------------------- 23
11. Analysis-------------------------------------------------------------------------- 24
12.Coding---------------------------------------------------------------------------- 25
13.Conclusion----------------------------------------------------------------------- 42
14.Reference------------------------------------------------------------------------ 43

ii
SPEECH TO EMOTION RECOGNITION

1. INTRODUCTION
Speech emotion recognition is an act of predicting human's emotion through their speech along
with the accuracy of prediction. It creates a better human computer interaction. Though it is difficult to
predict the emotion of a person as emotions are subjective and annotation audio is challenging, “Speech
Emotion Recognition(SER)” makes this possible.

This is the same theory which is used by animals like dogs, elephants and horses etc. Do to be
able to understand human emotion. There are various states to predict one's emotion, they are tone, pitch,
expression, behavior etc.

• Speaker Identification

• Speech Recognition

• Speech Emotion Detection

1.1. PURPOSE
• The primary objective of SER is to improve man-machine interface.

• It can also be used to monitor the psycho physiological state of a person in lie detectors.

• In recent time, speech emotion recognition also find its applications in medicine and forensics.

1.2. PROJEECT SCOPE

This project deals with the various functioning in College management process. The main
idea is to implement a proper process to system. In our existing system contains a many
operations registration, student search, fees, attendance, exam records, performance of the
student etc. All these activity takeout manually by administrator.

1.3. EXISTING SYSTEM

The speech emotion detection system is implemented as a Machine Learning (ML)

model. The steps of implementation are comparable to any other ML project, with additional
fine-tuning procedures to make the model function better. The flowchart represents a pictorial
overview of the process (see Figure 1). The first step is data collection, which is of prime
importance. The model being developed will learn from the data provided to it and all the
decisions and results that a developed model will produce is guided by the data. The second step,
called feature engineering, is a collection of several machine learning tasks that are executed

1
SPEECH TO EMOTION RECOGNITION

over the collected data. These procedures address the several data representation and data quality
issues. The third step is often considered the core of an ML project where an algorithmic based
model is developed. This model uses an ML algorithm to learn about the data and train itself to
respond to any new data it is exposed to. The final step is to evaluate the functioning of the built
model. Very often, developers repeat the steps of developing a model and evaluating it to
compare the performance of different algorithms. Comparison results help to choose the
appropriate ML algorithm most relevant to the problem.

1.4. PROPOSED SYSTEM

In this current study, we presented an automatic speech emotion 6 recognition (SER)

system using machine learning algorithms to classify the emotions. The performance of the
emotion detection system can greatly influence the overall performance of the application in
many ways and can provide many advantages over the functionalities of these applications. This
research presents a speech emotion detection system with improvements over an existing system
in terms of data, feature selection, and methodology that aims at classifying speech percepts
based on emotions, more accurately.

2
SPEECH TO EMOTION RECOGNITION

2. SYSTEM ANALYSIS
2.1. HARDWARE REQUIREMENTS

Processor Brand : Intel

Processor Type : Core i3

Processor Speed : 2GHZ

Processor Count : 1

RAM Size : 8GB

Memory Technology : DDR4

Computer Memory Type : DDR4

SDRAM Hard Drive Size : 160 GB

2.2. SOFTWARE REQUIREMENTS

Operating system : Windows 10 or 11

Application server : Jupyter Notebook

Frontend : Machine learning using Python

Datasets : RAVDESS Data set

3
SPEECH TO EMOTION RECOGNITION

3. METHODOLOGY
The speech emotion detection system is implemented as a Machine Learning (ML)
model. The steps of implementation are comparable to any other ML project, with additional
fine-tuning procedures to make the model function better. The flowchart represents a
pictorial overview of the process (see Figure 1). The first step is data collection, which is of
prime importance. The model being developed will learn from the data provided to it and
all the decisions and results that a developed model will produce is guided by the data. The
second step, called feature engineering, is a collection of several machine learning tasks that
are executed over the collected data. These procedures address the several data
representation and data quality issues. The third step is often considered the core of an ML
project where an algorithmic based model is developed. This model uses an ML algorithm
to learn about the data and train itself to respond to any new data it is exposed to. The final
step is to evaluate the functioning of the built model. Very often, developers repeat the steps
of developing a model and evaluating it to compare the performance of different algorithms.
Comparison results help to choose the appropriate ML algorithm most relevant to the
problem.

4
SPEECH TO EMOTION RECOGNITION

4. DFD DIAGRAM

The DFD is also called as bubble chart. It is a simple graphical formalism that can
be used to represent a system in terms of input data to the system, various processing carried
out on this data, and the output data is generated by this system. The data flow diagram
(DFD) is one of the most important modeling tools. It is used to model the system
components. These components are the system process, the data used by the process, an
external entity that interacts with the system and the information flows in the system. DFD
shows how the information moves through the system and how it is modified by a series of
transformations. It is a graphical technique that depicts information flow and the
transformations that are applied as data moves from input to output. DFD is also known as
bubble chart. A DFD may be used to represent a system at any level of abstraction. DFD
may be partitioned into levels that represent increasing information flow and functional
detail.

Figure 3.1: Flow of implementation

5
SPEECH TO EMOTION RECOGNITION

5. MODULES
• Speech input Module

• Feature extraction and selection

• Classification

• Recognized emotional output

5.1. MODULE DESCRIPTION

• Speech input Module Input to the system is speech taken with the help of
audio. Then equivalent digital representation of received audio is produced
through sound file.

• Feature extraction and selection There are so many emotional states of

emotion and emotion relevance is used to select the extracted speech features.
For speech feature extraction to selection corresponding to emotions all
procedure revolves around the speech signal.

• Classification Module Finding a set of significant emotions for

classification is the main concern in speech emotion recognition system.
There are various emotional states contains in a typical set of emotions that
makes classification a complicated task.

• Recognized emotional output Fear, surprise, anger, joy, disgust and sadness
are primary emotions and naturalness of database level is the basis for speech
emotion recognition system evaluation.

6
SPEECH TO EMOTION RECOGNITION
6. DATASET

6.1. RAVDESS DATASET

(The Ryerson Audio-Visual Database of Emotional Speech and Song)

The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS) contains
7,356 files (total size: 24.8 GB). The database contains 24 professional actors (12 female, 12 male),
vocalizing two lexically-matched statements in a neutral North American accent. Speech includes
calm, happy, sad, angry, fearful, surprise, and disgust expressions, and song contains calm, happy,
sad, angry, and fearful emotions. Each expression is produced at two levels of emotional intensity
(normal, strong), with an additional neutral expression. All conditions are available in three modality
formats: Audio-only (16bit, 48kHz .wav), Audio-Video (720p H.264, AAC 48kHz, .mp4), and
Video-only (no sound). Note, there are no song files for Actor_18.

• The size of the dataset is large enough for the model to be trained effectively. The more
exposure to data given to a model helps it to perform better.

• All basic emotional categories of data are present. A combination of these emotions can be
used for further research like Sarcasm and Depression detection.

• Data is collected from two different age groups which will improve the classification.

• The audio files are mono signals, which ensures an error-free conversion with most of the
programming libraries.

7
SPEECH TO EMOTION RECOGNITION

7. FEATURE EXTRACTION

7.1. THE PROCESS:

Speech is a varying sound signal. Humans are capable of making modifications to the sound
signal using their vocal tract, tongue, and teeth to pronounce the phoneme. The features are a way
to quantify data. A better representation of the speech signals to get the most information from the
speech is through extracting features common among speech signals. Some characteristics of good
features include [14]:

• The features should be independent of each other. Most features in the feature vector are
correlated to each other. Therefore it is crucial to select a subset of features that are individual
and independent of each other.

• The features should be informative to the context. Only those features that are more
descriptive about the emotional content are to be selected for further analysis.

• The features should be consistent across all data samples. Features that are unique and
specific to certain data samples should be avoided.

• The values of the features should be processed. The initial feature selection process can result
in a raw feature vector that is unmanageable. The process of Feature Engineering will remove
any outliers, missing values, and null values

The features in a speech percept that is relevant to the emotional content can be grouped into
two main categories:

• Prosodic features

• Phonetic features.

The prosodic features are the energy, pitch, tempo, loudness, formant, and intensity. The
phonetic features are mostly related to the pronunciation of the words based on the language.
Therefore for the purpose of emotion detection, the analysis is performed on the prosodic
features or a combination of them. Mostly the pitch and loudness are the features that are very
relevant to the emotional content

8
SPEECH TO EMOTION RECOGNITION

7.2.MEL FREQUENCY CEPSTRUM COEFFICIENTS (MFCC) FEATURES

A subset of features that are used for speech emotion detection is grouped under a category called
the Mel Frequency Cepstrum Coefficients (MFCC) [16]. It can be explained as follows:

• The word Mel represents the scale used in Frequency vs Pitch measurement (see Figure 2)
[16]. The value measured in frequency scale can be converted into Mel scale using the
formula m = 2595 log10 (1 + (f/700))

• The word Cepstrum represents the Fourier Transform of the log spectrum of the speech
signal

9
SPEECH TO EMOTION RECOGNITION

8. ALGORITHMS
8.1. MLP CLASSIFIER

MLPClassifier stands for Multi-layer Perceptron classifier which in the name itself connects
to a Neural Network. Unlike other classification algorithms such as Support Vectors or Naive Bayes
Classifier, MLPClassifier relies on an underlying Neural Network to perform the task of
classification

We will use the confusion matrix to determine the accuracy which is measured as the total
number of correct predictions divided by the total number of predictions.

A multi-layer rather than a single layer network is required since a single layer perceptron
(SLP) can only compute a linear decision boundary, which is not flexible enough for most realistic
learning problems. For a problem that is linearly separable, (that is capable of being perfectly
separated by linear decision boundary), the perceptron convergence theorem guarantees
convergence. In its simplest form, SLP training is based on the simple idea of adding or subtracting
a pattern from the current weights when the target and predicted class disagrees, otherwise the
weights are unchanged. For a non-linearly separable problem, this simple algorithm can go on
cycling indefinitely.

The modification known as least mean square (LMS) algorithm uses a mean squared error
cost function to overcome this difficulty, but since there is only a single perceptron, the decision
boundary is still linear. An MLP is a universal approximator [6] that typically uses the same squared
error function as LMS.

However, the main difficulty with the MLP is that the learning algorithm has a complex error
surface, which can become stuck in local minima. There does not exist any MLP learning algorithm
that is guaranteed to converge, as with SLP. The popular MLP back propagation algorithm has two
phases, the first being a forward pass, which is a forward simulation for the current training pattern
and enables the error to be calculated. It is followed by a backward pass, that calculates for each
weight in the network how a small change will affect the error function.

The derivative calculation is based on the application of the chain rule, and training typically
proceeds by changing the weights proportional to the derivative.

10
SPEECH TO EMOTION RECOGNITION

Fig 6.1 MLP Classifier Confusion Matrix.

11
SPEECH TO EMOTION RECOGNITION

8.2. XGBOOST CLASSIFIER

• XGBoost is an optimized distributed gradient boosting library designed to be highly

efficient, flexible and portable. It implements machine learning algorithms under the
Gradient Boosting framework..

• XGBoost provides a parallel tree boosting (also known as GBDT, GBM) that solve many
data science problems in a fast and accurate way. The same code runs on major distributed
environment (Hadoop, SGE, MPI) and can solve problems beyond billions of examples.

• We will use the confusion matrix to determine the accuracy which is measured as the total
number of correct predictions divided by the total number of predictions.

Fig 6.2 XGBOOST Classifier Confusion Matrix.

12
SPEECH TO EMOTION RECOGNITION

8.3. LGBM CLASSIFIER

• LightGBM is a fast, distributed, high performance gradient boosting framework based on
decision tree algorithms, used for ranking, classification and many other machine learning
tasks. Another reason why Light GBM is so popular is because it focuses on accuracy of
results. LGBM also supports GPU learning and thus data scientists are widely using
LGBM for data science application development.

• We will use the confusion matrix to determine the accuracy which is measured as the total
number of correct predictions divided by the total number of predictions.

Fig 6.3 LGBM Classifier Confusion Matrix.

13
SPEECH TO EMOTION RECOGNITION
8.4. RANDOMFOREST CLASSIFIER

Random Forest is a classifier that contains a number of decision trees on various subsets of
the given dataset and takes the average to improve the predictive accuracy of that dataset. Random
Forest is a popular machine learning algorithm that belongs to the supervised learning technique. It
can be used for both Classification and Regression problems in ML. It is based on the concept of
ensemble learning, which is a process of combining multiple classifiers to solve a complex problem
and to improve the performance of the model.

The term ―Random Forest Classifier‖ refers to the classification algorithm made up of
several decision trees. The algorithm uses randomness to build each individual tree to promote
uncorrelated forests, which then uses the forest‘s predictive powers to make accurate decisions.

Random forest classifiers fall under the broad umbrella of ensemble based learning methods. They
are simple to implement, fast in operation, and have proven to be extremely successful in a variety
of domains. The key principle underlying the random forest approach comprises the construction of
many ―simple‖ decision trees in the training stage and the majority vote (mode) across them in the
classification stage. Among other benefits, this voting strategy has the effect of correcting for the
undesirable property of decision trees to overfit training data. In the training stage, random forests
apply the general technique known as bagging to individual trees in the ensemble. Bagging
repeatedly selects a random sample with replacement from the training set and fits trees to these
samples. Each tree is grown without any pruning. The number of trees in the ensemble is a free
parameter which is readily learned automatically using the so-called out-of-bag error .

We will use the confusion matrix to determine the accuracy which is measured as the total number
of correct predictions divided by the total number of predictions.

Much like in the case of naïve Bayes– and k-nearest neighbor–based algorithms, random
forests are popular in part due to their simplicity on the one hand, and generally good performance
on the other. However, unlike the former two approaches, random forests exhibit a degree of
unpredictability as regards the structure of the final trained model. This is an inherent consequence
of the stochastic nature of tree building. As we will explore in more detail shortly, one of the key
reasons why this characteristic of random forests can be a problem in regulatory reasons—clinical
adoption often demands a high degree of repeatability not only in terms of the ultimate performance
of an algorithm but also in terms of the mechanics as to how a specific decision is made.

14
SPEECH TO EMOTION RECOGNITION

Fig 6.4 RandomForest Classifier Confusion Matrix.

15
SPEECH TO EMOTION RECOGNITION

8.5. KNN CLASSIFIER

K-Nearest Neighbour is one of the simplest Machine Learning algorithms based on

Supervised Learning technique. It is simple to implement. It is robust to the noisy training data. It
can be more effective if the training data is large.

The concept of the k-nearest neighbor classifier can hardly be simpler described. This is an
old saying, which can be found in many languages and many cultures.This means that the concept
of the k-nearest neighbor classifier is part of our everyday life and judging: Imagine you meet a
group of people, they are all very young, stylish and sportive. They talk about there friend Ben, who
isn't with them. So, what is your imagination of Ben? Right, you imagine him as being yong, stylish
and sportive as well.If you learn that Ben lives in a neighborhood where people vote conservative
and that the average income is above 200000 dollars a year? Both his neighbors make even more
than 300,000 dollars per year? What do you think of Ben? Most probably, you do not consider him
to be an underdog and you may suspect him to be a conservative as well?

The principle behind nearest neighbor classification consists in finding a predefined number,
i.e. the 'k' - of training samples closest in distance to a new sample, which has to be classified. The
label of the new sample will be defined from these neighbors. k-nearest neighbor classifiers have a
fixed user defined constant for the number of neighbors which have to be determined. There are also
radius-based neighbor learning algorithms, which have a varying number of neighbors based on the
local density of points, all the samples inside of a fixed radius. The distance can, in general, be any
metric measure: standard Euclidean distance is the most common choice. Neighbors-based methods
are known as non-generalizing machine learning methods, since they simply "remember" all of its
training data. Classification can be computed by a majority vote of the nearest neighbors of the
unknown sample.

The k-NN algorithm is among the simplest of all machine learning algorithms, but despite
its simplicity, it has been quite successful in a large number of classification and regression
problems, for example character recognition or image analysis.

The algorithm for the k-nearest neighbor classifier is among the simplest of all machine
learning algorithms. k-NN is a type of instance-based learning, or lazy learning. In machine learning,
lazy learning is understood to be a learning method in which generalization of the training data is
delayed until a query is made to the system. On the other hand, we have eager learning, where the
system usually generalizes the training data before receiving queries. In other words: The function
is only approximated locally and all the computations are performed, when the actual classification
is being performed. 16
SPEECH TO EMOTION RECOGNITION

We will use the confusion matrix to determine the accuracy which is measured as the total
number of correct predictions divided by the total number of predictions.

Fig 6.5 KNN Classifier Confusion Matrix

17
SPEECH TO EMOTION RECOGNITION

9. CLASSIFICATION REPORT

9.1. MLP CLASSIFIER

A Classification report is used to measure the quality of predictions from a classification

algorithm. How many predictions are True and how many are False. More specifically, True
Positives, False Positives, True negatives and False Negatives are used to predict the metrics of a
classification report.

Table 7.1 MLPClassifier Classification Report

18
SPEECH TO EMOTION RECOGNITION

9.2. XGBOOST CLASSIFIER

A Classification report is used to measure the quality of predictions from a classification

Table 7.2 XGBClassifier Classification Report

19
SPEECH TO EMOTION RECOGNITION

9.3. LGBM CLASSIFIER

A Classification report is used to measure the quality of predictions from a classification

Table 7.3 LGBMClassifier Classification Report

20
SPEECH TO EMOTION RECOGNITION

9.4. RANDOMFOREST CLASSIFIER

A Classification report is used to measure the quality of predictions from a classification

Table 7.4 Random Forest Classifier Classification Report

21
SPEECH TO EMOTION RECOGNITION

9.5. KNN CLASSIFIER

A Classification report is used to measure the quality of predictions from a classification

Table 7.5 KNN Classifier Classification Report

22
SPEECH TO EMOTION RECOGNITION

10. SYSTEM DESIGN

10.1. INPUT DESIGN

The input design is the link between the information system and the user. It comprises the
developing specification and procedures for data preparation and those steps are necessary to put
transaction data in to a usable form for processing can be achieved by inspecting the computer to
read data from a written or printed document or it can occur by having people keying the data directly
into the system. The design of input focuses on controlling the amount of input required, controlling
the errors, avoiding delay, avoiding extra steps and keeping the process simple. The input is designed
in such a way so that it provides security and ease of use with retaining the privacy. Input Design
considered the following things: What data should be given as input? How the data should be
arranged or coded? The dialog to guide the operating personnel in providing input. Methods for
preparing input validations and steps to follow when error occur.

10.2. OUTPUT DESIGN

A quality output is one, which meets the requirements of the end user and presents the
information clearly. In any system results of processing are communicated to the users and to other
system through outputs. In output design it is determined how the information is to be displaced for
immediate need and also the hard copy output. It is the most important and direct source information
to the user. Efficient and intelligent output design improves the system‘s relationship to help user
decision-making. The output form of an information system should accomplish one or more of the
following objectives. Convey information about past activities, current status or projections of the
Future. Signal important events, opportunities, problems, or warnings. Trigger an action. Confirm
an action

23
SPEECH TO EMOTION RECOGNITION

11. ANALYSIS
11.1. FINAL REPORT

In Speech to Emotion Recognition after analyzing these five Classifiers we find that MLP classifier
gives the higher accuracy.

XGBOOST & LGBM classifier gives nearly higher accuracy shown in below bar chart.

Figure 8.1: Final Report Analysis

24
SPEECH TO EMOTION RECOGNITION

12. CODING

Loading Libraries:

Loading a Audio:

25
SPEECH TO EMOTION RECOGNITION

26
SPEECH TO EMOTION RECOGNITION

Feature Preprocessing:

27
SPEECH TO EMOTION RECOGNITION

28
SPEECH TO EMOTION RECOGNITION

29
SPEECH TO EMOTION RECOGNITION

30
SPEECH TO EMOTION RECOGNITION

Feature Extraction

31
SPEECH TO EMOTION RECOGNITION

32
SPEECH TO EMOTION RECOGNITION

33
SPEECH TO EMOTION RECOGNITION

34
SPEECH TO EMOTION RECOGNITION

35
SPEECH TO EMOTION RECOGNITION

36
SPEECH TO EMOTION RECOGNITION

37
SPEECH TO EMOTION RECOGNITION

Evaluation

38
SPEECH TO EMOTION RECOGNITION

39
SPEECH TO EMOTION RECOGNITION

Testing

40
SPEECH TO EMOTION RECOGNITION

41
SPEECH TO EMOTION RECOGNITION

13. CONCLUSION
The emerging growth and development in the field of AI and machine learning have led to
the new era of automation. Most of these automated devices work based on voice commands from
the user. Many advantages can be built over the existing systems if besides recognizing the words,
the machines could comprehend the emotion of the speaker (user). Some applications of a speech
emotion detection system are computer- based tutorial applications, automated call center
conversations, a diagnostic tool used for therapy and automatic translation system.

In this thesis, the steps of building a speech emotion detection system were discussed in
detail and some experiments were carried out to understand the impact of each step. Initially, the
limited number of publically available speech database made it challenging to implement a well-
trained model. Next, several novel approaches to feature extraction had been proposed in the earlier
works, and selecting the best approach included performing many experiments. Finally, the classifier
selection involved learning about the strength and weakness of each classifying algorithm with
respect to emotion recognition. At the end of the experimentation, it can be concluded that an
integrated feature space will produce a better recognition rate when compared to a single feature.

42
SPEECH TO EMOTION RECOGNITION

14. REFERENCE
• Code for Interview YouTube Channel.

• Soegaard, M. and Friis Dam, R. (2013). The Encyclopedia of Human-Computer

Interaction. 2nd ed.

• Internet & World Wide Web: How to Program Deitel, PJ Deitel.

• T. L. Nwe, S. W. Foo, and L. C. De Silva, “Speech emotion recognition using

hidden Markov models,” Speech Commun., vol. 41, no. 4, pp. 603–623, Nov.
2003.

• www.data-flair.training.com

• www.researchgate.net

Speech Emotion Recognition Using Machine Learning - A Systematic Review
No ratings yet
Speech Emotion Recognition Using Machine Learning - A Systematic Review
25 pages
Learning Software Engineering
From Everand
Learning Software Engineering
IT Campus Academy
No ratings yet
SER Group Documentation
No ratings yet
SER Group Documentation
50 pages
final report
No ratings yet
final report
27 pages
Sample Poster Template CSE (1)
No ratings yet
Sample Poster Template CSE (1)
1 page
Speech Emotion Recognition For Enhanced User Experience: A Comparative Analysis of Classification Methods
No ratings yet
Speech Emotion Recognition For Enhanced User Experience: A Comparative Analysis of Classification Methods
12 pages
Speech and Text Emotion Recognition Using Machine Learning Batch Number - 08 First Review 2.0
No ratings yet
Speech and Text Emotion Recognition Using Machine Learning Batch Number - 08 First Review 2.0
12 pages
Book
No ratings yet
Book
25 pages
Documentation Batch(12)
No ratings yet
Documentation Batch(12)
38 pages
Sentiment Emotion Recognition
No ratings yet
Sentiment Emotion Recognition
6 pages
Demo
No ratings yet
Demo
29 pages
4. Phase1 reference report
No ratings yet
4. Phase1 reference report
11 pages
Speech Emotion Recognition Using Deep Learning: Nithya Roopa S., Prabhakaran M, Betty.P
No ratings yet
Speech Emotion Recognition Using Deep Learning: Nithya Roopa S., Prabhakaran M, Betty.P
4 pages
speech emotion recognization
No ratings yet
speech emotion recognization
65 pages
IJRPR4210
No ratings yet
IJRPR4210
12 pages
9.-Yogendra
No ratings yet
9.-Yogendra
5 pages
SET CONFERENCE DRAFT PAPER_223585
No ratings yet
SET CONFERENCE DRAFT PAPER_223585
6 pages
Speech Emotion Detection Using Machine Learning Techniques
No ratings yet
Speech Emotion Detection Using Machine Learning Techniques
75 pages
Project Repoprt Final-Speech Emotion Recognition
No ratings yet
Project Repoprt Final-Speech Emotion Recognition
25 pages
Winter Semester 2021-22 CSE4020-Machine Learning Digital Assignment-1
No ratings yet
Winter Semester 2021-22 CSE4020-Machine Learning Digital Assignment-1
20 pages
Md. Maruf Hossain
No ratings yet
Md. Maruf Hossain
60 pages
Final Year Project Report
No ratings yet
Final Year Project Report
52 pages
s10772-023-10047-8
No ratings yet
s10772-023-10047-8
13 pages
Divya Synopsis
No ratings yet
Divya Synopsis
14 pages
Group No 37
No ratings yet
Group No 37
19 pages
SER (Research Paper)
No ratings yet
SER (Research Paper)
5 pages
Speech
No ratings yet
Speech
12 pages
ppt 1-1
No ratings yet
ppt 1-1
14 pages
Enhancing Emergency Response Through Speech Emotion Recognition A Machine Learning Approach
No ratings yet
Enhancing Emergency Response Through Speech Emotion Recognition A Machine Learning Approach
5 pages
audio spotlight pdf
No ratings yet
audio spotlight pdf
29 pages
SEDS PROJECT
No ratings yet
SEDS PROJECT
54 pages
Sample Course End Project Report
No ratings yet
Sample Course End Project Report
27 pages
EMOTIONDETECTION (1)mini project
No ratings yet
EMOTIONDETECTION (1)mini project
5 pages
2. Synopsis Content
No ratings yet
2. Synopsis Content
6 pages
Speech Emotion Recognition1
No ratings yet
Speech Emotion Recognition1
86 pages
Speech Emotion Recognition Using Deep Learning
No ratings yet
Speech Emotion Recognition Using Deep Learning
4 pages
An Ensemble 1D-CNN-LSTM-GRU Model With Data Augmentation For Speech Emotion Recognition
No ratings yet
An Ensemble 1D-CNN-LSTM-GRU Model With Data Augmentation For Speech Emotion Recognition
19 pages
SERDL 2
No ratings yet
SERDL 2
10 pages
Research Paper On Speech Emotion Recogtion System
No ratings yet
Research Paper On Speech Emotion Recogtion System
9 pages
Speech Emotions Recognition Using Machine Learning
No ratings yet
Speech Emotions Recognition Using Machine Learning
5 pages
A Complete Phase 3
No ratings yet
A Complete Phase 3
14 pages
Speech Emotion Journal phase 2-3
No ratings yet
Speech Emotion Journal phase 2-3
6 pages
Pre Processing
No ratings yet
Pre Processing
54 pages
Human_Speech_Emotion_Recognition_Using_Artificial_Neural_Networks_Technique
No ratings yet
Human_Speech_Emotion_Recognition_Using_Artificial_Neural_Networks_Technique
7 pages
Emotion Detection Final Paper
No ratings yet
Emotion Detection Final Paper
15 pages
Speech Emotion Recognition System
No ratings yet
Speech Emotion Recognition System
4 pages
XEmoAccent Embracing Diversity in Cross-Accent Emotion Recognition Using Deep Learning
No ratings yet
XEmoAccent Embracing Diversity in Cross-Accent Emotion Recognition Using Deep Learning
18 pages
Wa0007
No ratings yet
Wa0007
6 pages
Phase 2 Report
No ratings yet
Phase 2 Report
63 pages
Project Report
No ratings yet
Project Report
106 pages
10 1109@access 2019 2936124
No ratings yet
10 1109@access 2019 2936124
19 pages
Python Machine Learning: Machine Learning Algorithms for Beginners - Data Management and Analytics for Approaching Deep Learning and Neural Networks from Scratch
From Everand
Python Machine Learning: Machine Learning Algorithms for Beginners - Data Management and Analytics for Approaching Deep Learning and Neural Networks from Scratch
Ahmed Ph. Abbasi
No ratings yet
Machine Learning Algorithms for Data Scientists: An Overview
From Everand
Machine Learning Algorithms for Data Scientists: An Overview
Vinaitheerthan Renganathan
No ratings yet
Speech Emotion Recognition Using Machine Learningg
No ratings yet
Speech Emotion Recognition Using Machine Learningg
19 pages
Real-Time Speech Emotion Recognition Using Deep Le
No ratings yet
Real-Time Speech Emotion Recognition Using Deep Le
40 pages
A Comprehensive Review of Speech Emotion Recognition Systems
No ratings yet
A Comprehensive Review of Speech Emotion Recognition Systems
20 pages
Enhancements in Immediate Speech Emotion Detection: Harnessing Prosodic and Spectral Characteristics
No ratings yet
Enhancements in Immediate Speech Emotion Detection: Harnessing Prosodic and Spectral Characteristics
9 pages
THIRD - s10772 022 09985 6
No ratings yet
THIRD - s10772 022 09985 6
19 pages
1822 B.E Cse Batchno 140
No ratings yet
1822 B.E Cse Batchno 140
55 pages
Speech Emotion Recognition
No ratings yet
Speech Emotion Recognition
13 pages
Cloud Infrastructure Security
No ratings yet
Cloud Infrastructure Security
10 pages
Hrd
No ratings yet
Hrd
50 pages
B. Tech - (AR19 & AR 20) Question Bank-7th Sem- CS
No ratings yet
B. Tech - (AR19 & AR 20) Question Bank-7th Sem- CS
7 pages
ITM LONG
No ratings yet
ITM LONG
2 pages
7. References
No ratings yet
7. References
1 page
3. System Analysis
No ratings yet
3. System Analysis
1 page
Library Management System
No ratings yet
Library Management System
18 pages
Cs504-Midterm Solved Subjective With Refrences by Moaaz PDF
100% (1)
Cs504-Midterm Solved Subjective With Refrences by Moaaz PDF
14 pages
Chap-2-5-AIS-terms-MCQ
No ratings yet
Chap-2-5-AIS-terms-MCQ
11 pages
OCMMS Report
No ratings yet
OCMMS Report
22 pages
python-based Personalized Recommendation System Development
No ratings yet
python-based Personalized Recommendation System Development
37 pages
Food Donation App Review3-Converted (3) .. - Abhishek Datta
No ratings yet
Food Donation App Review3-Converted (3) .. - Abhishek Datta
80 pages
SE-ch2
No ratings yet
SE-ch2
17 pages
Employee Management System
No ratings yet
Employee Management System
16 pages
Chapter 4 (Interaction Diagrams) - Sequence Diagrams, Collaboration Diagrams, Dfds
No ratings yet
Chapter 4 (Interaction Diagrams) - Sequence Diagrams, Collaboration Diagrams, Dfds
9 pages
231 Rudransh Sharma
No ratings yet
231 Rudransh Sharma
253 pages
Metro Rail Management Online
No ratings yet
Metro Rail Management Online
97 pages
Ca Ipcc - It
No ratings yet
Ca Ipcc - It
82 pages
Advance Software Engineering Notes
100% (1)
Advance Software Engineering Notes
188 pages
Driving License Final (Defense)
No ratings yet
Driving License Final (Defense)
60 pages
Project Report FINAL
No ratings yet
Project Report FINAL
66 pages
IPT Course Description
No ratings yet
IPT Course Description
23 pages
Online Bus Pass Registration For Students
No ratings yet
Online Bus Pass Registration For Students
60 pages
Vegan Spoon (Project File)
No ratings yet
Vegan Spoon (Project File)
142 pages
REPORT - On Road Fuel Demand
No ratings yet
REPORT - On Road Fuel Demand
43 pages
A Project Submitted in Partial Fulfilment of The Requirements For The Award of The Degree OF Bachelor of Computer Application
No ratings yet
A Project Submitted in Partial Fulfilment of The Requirements For The Award of The Degree OF Bachelor of Computer Application
33 pages
Software Requirements Specification
No ratings yet
Software Requirements Specification
16 pages
Wedding Report
No ratings yet
Wedding Report
30 pages
Jobin
No ratings yet
Jobin
67 pages
Payroll Management System: Sunny Chaudhary (1713104020)
No ratings yet
Payroll Management System: Sunny Chaudhary (1713104020)
46 pages
9.1 Process Specifications and Structured Decisions
100% (1)
9.1 Process Specifications and Structured Decisions
33 pages
green coffe
No ratings yet
green coffe
89 pages
A Project Report: in Partial Fulfilment For The Award of The Degree
No ratings yet
A Project Report: in Partial Fulfilment For The Award of The Degree
32 pages
Final Paper
No ratings yet
Final Paper
48 pages
Jurnal Kesehatan: Perancangan Sistem Informasi Medical Check Up Guna Mempercepat Pelayanan MCU Di RSUD Brebes
No ratings yet
Jurnal Kesehatan: Perancangan Sistem Informasi Medical Check Up Guna Mempercepat Pelayanan MCU Di RSUD Brebes
16 pages
Design Programining Logic
No ratings yet
Design Programining Logic
21 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

SER_documentation_satwik

Uploaded by

SER_documentation_satwik

Uploaded by

A

“SPEECH TO EMOTION REGONITION”

Submitted in partial fulfillment of

COMPUTER SCIENCE & ENGINEERING

Under the esteemed guidance of

DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

This is to certify that the project work entitled “SPEECH TO EMOTION

REGONITION” is done by Name-D SHIVA SATWIK, Regd. No.- 20UG010391

in partial fulfilment of the requirements for the 8th Semester Sessional

Examination of Bachelor of Technology in Computer Science and

Engineering during the academic year 2023-24. This work is submitted to

the department as a part of evaluation of 8th Semester Project.

Mr Sitanshu Kar Dr. K.Murali Gopal

• Speech Emotion Detection

1.2. PROJEECT SCOPE

1.3. EXISTING SYSTEM

The speech emotion detection system is implemented as a Machine Learning (ML)

1.4. PROPOSED SYSTEM

In this current study, we presented an automatic speech emotion 6 recognition (SER)

Processor Brand : Intel

Processor Type : Core i3

Processor Speed : 2GHZ

RAM Size : 8GB

Memory Technology : DDR4

Computer Memory Type : DDR4

SDRAM Hard Drive Size : 160 GB

2.2. SOFTWARE REQUIREMENTS

Operating system : Windows 10 or 11

Application server : Jupyter Notebook

Frontend : Machine learning using Python

Datasets : RAVDESS Data set

Figure 3.1: Flow of implementation

• Feature extraction and selection

• Recognized emotional output

5.1. MODULE DESCRIPTION

• Feature extraction and selection There are so many emotional states of

• Classification Module Finding a set of significant emotions for

6.1. RAVDESS DATASET

7.1. THE PROCESS:

7.2.MEL FREQUENCY CEPSTRUM COEFFICIENTS (MFCC) FEATURES

Fig 6.1 MLP Classifier Confusion Matrix.

8.2. XGBOOST CLASSIFIER

• XGBoost is an optimized distributed gradient boosting library designed to be highly

Fig 6.2 XGBOOST Classifier Confusion Matrix.

8.3. LGBM CLASSIFIER

Fig 6.3 LGBM Classifier Confusion Matrix.

Fig 6.4 RandomForest Classifier Confusion Matrix.

8.5. KNN CLASSIFIER

K-Nearest Neighbour is one of the simplest Machine Learning algorithms based on

Fig 6.5 KNN Classifier Confusion Matrix

9.1. MLP CLASSIFIER

A Classification report is used to measure the quality of predictions from a classification

Table 7.1 MLPClassifier Classification Report

9.2. XGBOOST CLASSIFIER

A Classification report is used to measure the quality of predictions from a classification

Table 7.2 XGBClassifier Classification Report

9.3. LGBM CLASSIFIER

A Classification report is used to measure the quality of predictions from a classification

Table 7.3 LGBMClassifier Classification Report

9.4. RANDOMFOREST CLASSIFIER

A Classification report is used to measure the quality of predictions from a classification

Table 7.4 Random Forest Classifier Classification Report

9.5. KNN CLASSIFIER

A Classification report is used to measure the quality of predictions from a classification

Table 7.5 KNN Classifier Classification Report

10. SYSTEM DESIGN

10.2. OUTPUT DESIGN

Figure 8.1: Final Report Analysis

• Soegaard, M. and Friis Dam, R. (2013). The Encyclopedia of Human-Computer

• Internet & World Wide Web: How to Program Deitel, PJ Deitel.

• T. L. Nwe, S. W. Foo, and L. C. De Silva, “Speech emotion recognition using

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.