Speech and Text Emotion Recognition Using Machine Learning Batch Number - 08 First Review 2.0
Speech and Text Emotion Recognition Using Machine Learning Batch Number - 08 First Review 2.0
Speech and Text Emotion Recognition Using Machine Learning Batch Number - 08 First Review 2.0
OF ENGINEERING
Nagawara , Bengaluru – 560 045
Department of Information Science & Engineering
Batch Number – 08
FIRST REVIEW 2.0
Objective
Abstract
Introduction
Literature Survey
Existing System
Proposed System
Architecture Diagram
Hardware & Software Specification
Gantt Chart
OBJECTIVE
To detect the emotion of a person using both text and speech of a person.
To increase the accuracy of the result.
The goal of speech emotion recognition is to predict the emotional content of
speech and to classify speech according to one of several labels (i.e., happy,
sad, neutral, and angry).
The primary objective of this project is to improve man-machine interface.
It can also be used to monitor the psycho physiological state of a person in lie
detectors.
ABSTRACT
Speech emotion recognition is a challenging task, and extensive reliance has been placed on
models that use audio features in building well-performing classifiers.
In this project, we propose a model that utilizes text data and audio signals to obtain a better
understanding of speech data.
As emotional dialogue is composed of sound and spoken content, our model encodes the
information from audio and text sequences using neural networks and then combines the
information from these sources to predict the emotion class.
This architecture analyzes speech data from the signal level to the language level, and it thus
utilizes the information within the data more comprehensively than models that focus on audio
features.
INTRODUCTION
Recently, machine learning algorithms have successfully addressed problems in various fields, such
as image classification, machine translation, speech recognition, text-to-speech generation and
other machine learning related areas.
Similarly, substantial improvements in performance have been obtained when machine learning
algorithms have been applied to statistical speech processing.
In developing emotionally aware intelligence, the very first step is building robust emotion
classifiers that display good performance regardless of the application; this outcome is considered
to be one of the fundamental research goals in affective computing.
In particular, the speech emotion recognition task is one of the most important problems in the field
of paralinguistics.
This field has recently broadened its applications, as it is a crucial factor in optimal
humancomputer interactions, including dialog systems.
LITERATURE SURVEY
S.No Title Author and Year Technology used Concept
1 Automatic Dialogue Generation with Chenyang Huang, Osmar Neural dialogue generation We present three models
Expressed Emotions R. Za¨ıane, Amine Trabelsi, systems that either concatenate the
Nouha Dziri (2019) desired emotion with the
source input during the
learning, or push the
emotion in the decoder.
2 Toward effective automatic Atiquzzaman Mondal Effective Automatic Speech Collection and organization
recognition systems of emotion in (2018) Emotion Recognition of databases and emotional
speech descriptors, the calculation,
selection, and
normalization of relevant
speech features, and the
models used to recognize
emotions.
3 Evaluating Google Speech-to-Text Bogdan (2020) Google Cloud Speech-to- ASR on multimedia e-
API’s Performance for Romanian e- Text API learning resources available
Learning Resources in Romanian with the usage
of the Google Cloud
Speech-to-Text API
Existing System
Speech emotion recognition is a challenging task, and extensive reliance has been placed on
models that use audio features in building well-performing classifiers.
In this paper, we propose a novel deep dual recurrent encoder model that utilizes text data and
audio signals simultaneously to obtain a better understanding of speech data.
As emotional dialogue is composed of sound and spoken content, our model encodes the
information from audio and text sequences using dual recurrent neural networks (RNNs) and then
combines the information from these sources to predict the emotion class.
This architecture analyzes speech data from the signal level to the language level, and it thus
utilizes the information within the data more comprehensively than models that focus on audio
features. Extensive experiments are conducted to investigate the efficacy and properties of the
proposed model.
Our proposed model outperforms previous state-of-the-art methods in assigning data to one of four
emotion categories (i.e., angry, happy, sad and neutral) when the model is applied to the IEMOCAP
dataset, as reflected by accuracies ranging from 68.8% to 71.8%.
Proposed System
Literature
Survey
Problem
Formulation
Research on
Software
Requirements
Hardware
Requirements
First phase
Review and
report
submission
REFERENCES