0% found this document useful (0 votes)

33 views

Capstone Paper

Uploaded by

gokul

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

33 views

Capstone Paper

Uploaded by

gokul

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 3

ABSTRACT:

We propose a system for audiovisual translation and dubbing, which translates

videos from one language to another. Initially, the given input video is broken into
audio and video. Then the audio is being recognised using google’s speech
recognition python module. Then the source language’s speech content is
transcribed to text using google’s speech to text python module and translated,
and automatically synthesized into target language speech using the wave2lip
pre-trained model. The visual material is translated into the target language by
synthesizing the speaker's lip motions to match the translated audio, resulting in
a continuous audiovisual experience.

INTRODUCTION:
Related Works:

We first reviewed research papers for speech recognition and analysed how each one
has done it. Bano, S., Jithendra, P., Niharika, G. L., & Sikhi, Y. (2020).[1] has come up
with a Speech Recognition model that converts the speech data given by the user as an
input into the text format in his desired language. This model is developed by adding
Multilingual features to the existent Google Speech Recognition model based on some
of the natural language processing principles. The goal of this research is to build a
speech recognition model that even facilitates an illiterate person to easily communicate
with the computer system in his regional language.By implementing this model they
have shown how they use the SpeechRecognition packages to build a Speech
Translation model. The more they use these kinds of packages they get more flexibility
in the code and output that is to be displayed. This model can be used for any purpose
of translation of speech to text.

We then proceeded to the next part which is Speech to Text(STT) conversion. Shivangi
Nagdewani, Ashika Jain[2] have created a model for STT conversion which is carried
out by Hidden Markov Model (HMM) and Neural Network as it gives the highest
accuracy for STT. HMM is a statistical model used in speech recognition because a
speech signal can be viewed as a piecewise stationary signal or a short-time stationary
signal. According to their research the most suitable technique for STT conversion is by
deploying a combination of Hidden Markov Model with Deep Neural Network, which can
be implemented in Python using Google’s Speech Recognition API module. This
system can be improved by considering the punctuation marks while converting speech
to text.

Later, we read about the translation from one language to another. Fathimath Shouna
Shayyam C A and Pragisha K [3] have demonstrated the use of machine translation
models using python programming language together with an open source library called
the Natural Language Toolkit (NLTK) to translate code from one natural language to
another. This led to good results and stood marginally greater compared to other
methods.

After translation, we then learnt about lip synchronization and how it is done. K R
Prajwal,Rudrabha Mukhopadhyay, Vinay P. Namboodiri and V Jawahar jawahar[4] have
investigated the problem of lip-syncing a talking face video of an arbitrary identity to
match a target speech segment. Then they identified and showed the key reasons
pertaining to this and hence resolved them by learning from a powerful lip-sync
discriminator. Next, they proposed new, rigorous evaluation benchmarks and metrics to
accurately measure lip synchronization in unconstrained videos. Extensive quantitative
evaluations on their challenging benchmarks showed that the lip-sync accuracy of the
videos generated by their Wav2Lip model was almost as good as real synced videos.

Finally, we studied more about audio-video synchronization and came with a research
paper by Joon Son Chung and Andrew Zisserman[5] where they determined the audio-
video synchronization between mouth motion and speech in a video. They also
proposed a two-stream ConvNet architecture that enables a joint embedding between
the sound and the mouth images to be learnt from unlabelled data. The trained network
is used to determine the lip-sync error in a video. Finally they applied the network to
two further tasks such as active speaker detection and lip reading.

REFERENCE:

1) Bano, S., Jithendra, P., Niharika, G. L., & Sikhi, Y. (2020). Speech to Text
Translation enabling Multilingualism. 2020 IEEE International Conference for
Innovation in Technology (INOCON). doi:10.1109/inocon50539.2020.9298.

2) A REVIEW ON METHODS FOR SPEECH-TO-TEXT AND TEXT-TO-SPEECH

CONVERSION Shivangi Nagdewani, Ashika Jain. International Research Journal
of Engineering and Technology (IRJET) e-ISSN: 2395-0056 p-ISSN: 2395-0072.

3) Advancements in Machine Translation as a part of Natural Language Processing

in Python Fathimath Shouna Shayyam C A , Pragisha K. International Journal of
Advanced Research in Computer and Communication Engineering IJARCCE.

4) A Lip Sync Expert Is All You Need for Speech to Lip Generation In The
Wild K R Prajwal∗ prajwal.k@research.iiit.ac.in IIIT, Hyderabad, India
Rudrabha Mukhopadhyay∗ radrabha.m@research.iiit.ac.in IIIT,
Hyderabad, India Vinay P. Namboodiri vpn22@bath.ac.uk University of
Bath, England C V Jawahar jawahar@iiit.ac.in IIIT, Hyderabad, India
arXiv:2008.10010v1 [cs.CV] 23 Aug 2020

5) Out of time: automated lip sync in the wild Joon Son Chung and Andrew
Zisserman Visual Geometry Group, Department of Engineering Science,
University of Oxford.

Translating the Future: Exploring the Impact of Technology and AI on Modern Translation Studies
From Everand
Translating the Future: Exploring the Impact of Technology and AI on Modern Translation Studies
Tian Chuanmao
No ratings yet
133-138, Tesma0810,IJEAST
No ratings yet
133-138, Tesma0810,IJEAST
6 pages
Read_11_1-s2.0-S2949719124000323-main
No ratings yet
Read_11_1-s2.0-S2949719124000323-main
11 pages
Text To Speechh Technology
No ratings yet
Text To Speechh Technology
28 pages
Speech To Text
No ratings yet
Speech To Text
4 pages
Paper 4
No ratings yet
Paper 4
5 pages
Development of Multilingual Speech
No ratings yet
Development of Multilingual Speech
13 pages
(IJCST-V9I2P18) :swati, Harpreet Kaur
No ratings yet
(IJCST-V9I2P18) :swati, Harpreet Kaur
6 pages
Wa0011.
No ratings yet
Wa0011.
11 pages
Conversion of Sign Language To Text and Audio Using Deep Learning Techniques
No ratings yet
Conversion of Sign Language To Text and Audio Using Deep Learning Techniques
9 pages
Wavllm: Towards Robust and Adaptive Speech Large Language Model
No ratings yet
Wavllm: Towards Robust and Adaptive Speech Large Language Model
21 pages
AudioChatLlama: Towards General-Purpose Speech Abilities For LLMs
No ratings yet
AudioChatLlama: Towards General-Purpose Speech Abilities For LLMs
11 pages
Project ppt
No ratings yet
Project ppt
12 pages
Real-Time Conversion for Sign-to-Text and Text-to-Speech Communication using Machine Learning
No ratings yet
Real-Time Conversion for Sign-to-Text and Text-to-Speech Communication using Machine Learning
8 pages
Batch_A3
No ratings yet
Batch_A3
7 pages
Paper_TTS+Conversion
No ratings yet
Paper_TTS+Conversion
13 pages
3_GAN
No ratings yet
3_GAN
12 pages
real time voice translator
No ratings yet
real time voice translator
28 pages
23021 d 0515
No ratings yet
23021 d 0515
16 pages
2505.04639v1
No ratings yet
2505.04639v1
12 pages
Exploration of English Speech Translation Recognition Based On The LSTM RNN Algorithm
No ratings yet
Exploration of English Speech Translation Recognition Based On The LSTM RNN Algorithm
10 pages
SpeechToSpeech 1
No ratings yet
SpeechToSpeech 1
30 pages
DL Based Speech To Text Converter For Audio Visual Applications
No ratings yet
DL Based Speech To Text Converter For Audio Visual Applications
4 pages
THANK_YOU
No ratings yet
THANK_YOU
23 pages
75 Online
No ratings yet
75 Online
14 pages
DOC-20241111-WA0002.
No ratings yet
DOC-20241111-WA0002.
10 pages
Voice_Translator_Research_paper(27-10-24) (1)
No ratings yet
Voice_Translator_Research_paper(27-10-24) (1)
15 pages
Fin Ijprems1697272136
No ratings yet
Fin Ijprems1697272136
5 pages
Chapter 1 Introduction
No ratings yet
Chapter 1 Introduction
4 pages
Smart Translation
No ratings yet
Smart Translation
24 pages
Thesis
No ratings yet
Thesis
37 pages
8.5 Multilingual Speech Processing
No ratings yet
8.5 Multilingual Speech Processing
24 pages
Virtual Personal Assistant
No ratings yet
Virtual Personal Assistant
4 pages
Speech Representation Models For Speech Synthesis and Multimodal Speech Recognition
No ratings yet
Speech Representation Models For Speech Synthesis and Multimodal Speech Recognition
63 pages
ssrn-4973124
No ratings yet
ssrn-4973124
9 pages
7sem_projectreport
No ratings yet
7sem_projectreport
33 pages
French To English Translator in PyTorch
No ratings yet
French To English Translator in PyTorch
30 pages
An_20Advanced_20NLP_20Framework_20-Formatted_20Paper-libre
No ratings yet
An_20Advanced_20NLP_20Framework_20-Formatted_20Paper-libre
12 pages
GLM-4-VOICE
No ratings yet
GLM-4-VOICE
14 pages
Ijeit1412202004 05
No ratings yet
Ijeit1412202004 05
5 pages
TEXT - TO - SPEECH - CONVERSION - 22215a1211
No ratings yet
TEXT - TO - SPEECH - CONVERSION - 22215a1211
8 pages
Language To Language Translation System Using LSTM
No ratings yet
Language To Language Translation System Using LSTM
5 pages
A Lip Sync Expert Is All You Need - A Review
No ratings yet
A Lip Sync Expert Is All You Need - A Review
17 pages
book report for today needs editing and alighnment
No ratings yet
book report for today needs editing and alighnment
11 pages
Language Model Bootstrapping Using Neural Machine Translation For Conversational Speech Recognition
No ratings yet
Language Model Bootstrapping Using Neural Machine Translation For Conversational Speech Recognition
7 pages
SPA.2018.8563389
No ratings yet
SPA.2018.8563389
6 pages
22341077,20101347,20101073,20101004 Cse
No ratings yet
22341077,20101347,20101073,20101004 Cse
39 pages
Presentation 3
No ratings yet
Presentation 3
24 pages
Article 5
No ratings yet
Article 5
7 pages
Deep Learning For Lip Reading Using Audio-Visual Information For Urdu Language
No ratings yet
Deep Learning For Lip Reading Using Audio-Visual Information For Urdu Language
5 pages
doc
No ratings yet
doc
5 pages
Mini-Omni: Language Models Can Hear, Talk While Thinking in Streaming
No ratings yet
Mini-Omni: Language Models Can Hear, Talk While Thinking in Streaming
10 pages
updated research paper
No ratings yet
updated research paper
14 pages
Project Proposal: Project Title: Speech To Text Conversion Problem Statement
No ratings yet
Project Proposal: Project Title: Speech To Text Conversion Problem Statement
2 pages
dl_proj_rep
No ratings yet
dl_proj_rep
11 pages
Ai Text Generation
No ratings yet
Ai Text Generation
10 pages
AudioPaLM- A Large Language Model That Can Speak and Listen
No ratings yet
AudioPaLM- A Large Language Model That Can Speak and Listen
27 pages
Real-Time Sign Language to Speech Translation using Convolutional Neural Networks and Gesture Recognition
No ratings yet
Real-Time Sign Language to Speech Translation using Convolutional Neural Networks and Gesture Recognition
5 pages
Synopsis
No ratings yet
Synopsis
5 pages
Speech Recognition Using Neural Networks: A. Types of Speech Utterance
No ratings yet
Speech Recognition Using Neural Networks: A. Types of Speech Utterance
24 pages
Scanned by Tapscanner
No ratings yet
Scanned by Tapscanner
3 pages
Revised - Quiz - DA Schedule - Vellore
No ratings yet
Revised - Quiz - DA Schedule - Vellore
1 page
Module 2
No ratings yet
Module 2
46 pages
Lesson Plan
No ratings yet
Lesson Plan
1 page
Ece 2003
No ratings yet
Ece 2003
3 pages
DLD Assignment - 1
No ratings yet
DLD Assignment - 1
2 pages
Academic Calendar Fall 2019-20
No ratings yet
Academic Calendar Fall 2019-20
2 pages
Fuffi C: Omr''-R,.tir
No ratings yet
Fuffi C: Omr''-R,.tir
1 page
Shariah Advisory
No ratings yet
Shariah Advisory
8 pages
CS Energy Site Conditions
No ratings yet
CS Energy Site Conditions
35 pages
Repair of Damaged Prestressed Concrete Bridges Using CFRP: F. Wayne Klaiber, Terry J. Wipf
No ratings yet
Repair of Damaged Prestressed Concrete Bridges Using CFRP: F. Wayne Klaiber, Terry J. Wipf
9 pages
RiteWay Landroller Web
No ratings yet
RiteWay Landroller Web
3 pages
Indian Penal Code MCQ Booklet
No ratings yet
Indian Penal Code MCQ Booklet
41 pages
760 Design Guide
0% (1)
760 Design Guide
14 pages
NBC 2016 Vol 2 Escalators
No ratings yet
NBC 2016 Vol 2 Escalators
44 pages
CAN Bus Location Guide
No ratings yet
CAN Bus Location Guide
89 pages
Bahria Town
No ratings yet
Bahria Town
2 pages
WAssets Downloads Documents Brochures Leaflets Flyers Tucker DCE-En
No ratings yet
WAssets Downloads Documents Brochures Leaflets Flyers Tucker DCE-En
6 pages
Kalluri Kanavu Kanniyakumari
No ratings yet
Kalluri Kanavu Kanniyakumari
140 pages
CSCL - ENG - 19 Critical Spare List
No ratings yet
CSCL - ENG - 19 Critical Spare List
1 page
Tutorial I (First Year EPP)
No ratings yet
Tutorial I (First Year EPP)
3 pages
Calibrator, Radiac, An/Udm-6: Technical Manual
No ratings yet
Calibrator, Radiac, An/Udm-6: Technical Manual
30 pages
Case Study Ril Petrol
No ratings yet
Case Study Ril Petrol
13 pages
Swap Discounting and OIS Curve
100% (2)
Swap Discounting and OIS Curve
6 pages
Sample Questionnaire
No ratings yet
Sample Questionnaire
5 pages
2 Cariaga V People
No ratings yet
2 Cariaga V People
3 pages
Cyber Security Greens Syllabus
No ratings yet
Cyber Security Greens Syllabus
35 pages
38F RCP 033
No ratings yet
38F RCP 033
22 pages
Car Parking Validation
No ratings yet
Car Parking Validation
2 pages
General Release Form
No ratings yet
General Release Form
2 pages
Final-Advertisement-2025
No ratings yet
Final-Advertisement-2025
12 pages
Accounting Department
No ratings yet
Accounting Department
4 pages
Instant download The Art of Data Science Roger D. Peng pdf all chapter
100% (3)
Instant download The Art of Data Science Roger D. Peng pdf all chapter
50 pages
Reliabiity Report of Tec
No ratings yet
Reliabiity Report of Tec
20 pages
Arguments For and Against Quebec Separation
25% (4)
Arguments For and Against Quebec Separation
2 pages
Topic 17 Distributions
No ratings yet
Topic 17 Distributions
10 pages
Stockton Manual
No ratings yet
Stockton Manual
2 pages
Adsorption For Efficient Low Carbon Hydrogen Production: Part 1-Adsorption Equilibrium and Breakthrough Studies For H /CO /CH On Zeolite 13X
No ratings yet
Adsorption For Efficient Low Carbon Hydrogen Production: Part 1-Adsorption Equilibrium and Breakthrough Studies For H /CO /CH On Zeolite 13X
18 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Capstone Paper

Uploaded by

Capstone Paper

Uploaded by

ABSTRACT:

We propose a system for audiovisual translation and dubbing, which translates

2) A REVIEW ON METHODS FOR SPEECH-TO-TEXT AND TEXT-TO-SPEECH

3) Advancements in Machine Translation as a part of Natural Language Processing

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.