0% found this document useful (0 votes)

182 views24 pages

A Framework For Deepfake V2

This document presents a framework for a deepfake voice synthesis project. It includes an abstract describing the goals of using voice cloning technology for speech synthesis. It then reviews relevant literature on neural voice cloning and deepfake generation/detection. The document outlines the project assumptions, constraints, and working principles of voice cloning technology. It also includes sections on requirements, use cases, interfaces, schedule, risks, and budget for the voice cloning project.

Uploaded by

Abdullah fawaz altulahi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

182 views24 pages

A Framework For Deepfake V2

Uploaded by

Abdullah fawaz altulahi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 24

A Framework for Deepfake Voice Synthesis

Abdullah Altulahi 382102957

Bander Altamimi 382102941

Under the supervision of

Dr.Jayadev Gyani

Department of Computer Science

College of Computer and Information Sciences
Majmaah University

Spring 2022
Table of Contents
Abstract : ................................................................................................................... 3
Literature review : ...................................................................................................... 4
Working of Voice Cloning Technology : ....................................................................... 5
Project Assumptions : ................................................................................................. 5
Project constraints : .................................................................................................... 5
Working principles of Voice Cloning : ......................................................................... 6
Table of possible token type text : ................................................................................ 7
Difference between text and speech : ............................................................................ 7
Survey of Existing apps and Webs : ............................................................................. 8
Scope Management / WBS : ........................................................................................ 9
Improvements management : ...................................................................................... 9
Schedule/ time management / include Milestone : ......................................................... 9
Gantt chart : ............................................................................................................. 10
Risk and issue management : ..................................................................................... 10
Information of Budget : ............................................................................................. 10
Project Requirement : ............................................................................................... 11
Use Case Diagram : ................................................................................................... 12
Sequence Diagram : .................................................................................................. 13
User Interface : ......................................................................................................... 19
Reference : ............................................................................................................... 24

2
Abstract :
Artificial Intelligence and specially Machine Learning and Deep Learning techniques are increasingly populating
today’s technological and social landscape. These advancements have overwhelmingly contributed to the development
of Speech Synthesis, also known as, Text-To-Speech, where speech is artificially produced from text by means of
computer technology. That's where Voice Cloning technology comes into play, which allows to generate an artificial
synthetic speech that resembles a targeted human voice. Today, Artificial Intelligence (AI) and advances in Deep
Learning are advancing the quality of synthetic speech. Applications for TTS are now commonplace. Everyone who has
interacted with a phone-based Interactive Voice Response system, Apple’s Siri, Amazon Alexa, car navigation systems,
or numerous other voice interfaces, has experienced synthetic speech.in the past have their two approaches to TTS. The
first, Concatenative TTS, uses audio recordings to create a library of words and units of sound (phonemes) that can be
strung together to form sentences. it lacks the emotion and inflection found in natural human speech. When using
Concatenative TTS, and certainly the effort to clone any individual voice using this method requires enormous
investment. The second approach is Parametric TTS, a method that uses statistical models of speech to simplify creating
a voice, reducing the cost and effort compared to Concatenation. However, the effort for creating any single voice has
historically been expensive, and the results clearly not human. We can use the voice cloning in good things like
Education and Audiobooks and Assistive Tech and also in in cultural films about Saudi Arabia.

3
Literature review :
Below we have put several topics related to the main topic.
• Neural Voice Cloning with a Few Samples: In this paper, we introduce a neural voice cloning system that takes a
few audio samples as input. We study two approaches: speaker adaptation and speaker encoding. Speaker adaptation
is based on fine-tuning a multi-speaker generative model with a few cloning samples. Done by Sercan ̈O. Arık * 1
Jitong Chen * 1 Kainan Peng * 1 Wei Ping * 1 Yanqi Zhou 1

• Deepfakes Generation and Detection: State-of-the-art, open challenges, countermeasures, and way forward: provides
a comprehensive review and detailed analysis of existing tools and machine learning (ML) based approaches for
deepfake generation and the methodologies used to detect such manipulations for both audio and visual deepfakes.

For each category of deepfake, we discuss information related to manipulation approaches, current public datasets,
and key standards for the performance evaluation of deepfake detection techniques along with their results. Done by
Momina Masood1, Mariam Nawaz2, Khalid Mahmood Malik3, Ali Javed4, Aun Irtaza5

• DATA EFFICIENT VOICE CLONING FOR NEURAL SINGING SYNTHESIS: we adapt one such technique to
the case of singing synthesis. By leveraging data from many speakers to first create a multispeaker model, small
amounts of target data can then efficiently adapt the model to new unseen voices. Done by Merlijn Blaauw⋆, Jordi
Bonada⋆, and Ryunosuke Daido

• Learning to Speak Fluently in a Foreign Language: Multilingual Speech Synthesis and Cross-Language Voice
Cloning: We present a multispeaker, multilingual text-to-speech (TTS) synthesis model based on Tacotron that is
able to produce high quality speech in multiple languages. Moreover, the model is able to transfer voices across
languages, e.g. synthesize fluent Spanish speech using an English speaker’s voice, without training on any bilingual
or parallel examples. Such transfer works across distantly related languages, e.g. English and Mandarin. Done by Yu
Zhang, Ron J. Weiss, Heiga Zen, Yonghui Wu, Zhifeng Chen, RJ Skerry-Ryan, Ye Jia,Andrew Rosenberg, Bhuvana
Ramabhadran

• Combining Statistical Parametric Speech Synthesis and Unit-Selection for Automatic Voice Cloning: In this paper
we will present two state of the art systems, an HMM based system HTS-2007, developed by CSTR and Nagoya

Institute Technology, and a commercial unit-selection system CereVoice, developed by Cereproc. Both systems have
been used to mimic the voice of George W. Bush (43rd presi-dent of the United States) using freely available audio
from the web

4
Working of Voice Cloning Technology :
It is a technology that allows your voice to be someone else, like a British energy company where an unknown hacking
organization used artificial intelligence voice cloning technology to make fraudulent calls, and succeeded in defrauding
220 thousand euros.
Voice cloning technology may seem from the perspective of most people that it is a bad technology, but from another
perspective, it may become an effective technology in helping many patient in hospital and student in education. For
the voice must be:
• sound clarity
• correct pronunciation and language recognition
• sound language resources

Project Assumptions :
1. Python environment must be installed on the system
2. Pyttsx3 Library must be installed on the system
3. Google COLAP platform account

Project constraints :
1. Scope constraint, project aim to help some companies, so not all users will be interested in
2. good CPU for converting text to speech quickly
3. Enable multiple voices for the user
4. clearness and loudness speaker

5
Working principles of Voice Cloning :
Determine language sources for speech synthesis. Text to speech synthesis is converting the text to the synthetic speech
that is as close to real speech as possible according to the pronunciation norms of special language. Such systems are
called text to speech (TTS) systems. Input element of TTS system is a text, output element is synthetic speech. There
are two possible cases. When it is necessary to pronounce the limited number of phrases (and their pronouncing linearly
does not vary), the necessary speech material is simply recorded in advance. In this case, certain problems are originated.
For example, in this approach, it is not possible to sound the text, which is not known in advance. For this purpose, the
pronounced text has to be kept in computer memory. And it will lead to increase of the size of memory required for
information content. This will bring to essential load of computer memory in case of much information and can create
certain problems in operation. The main approach used in this paper is voicing of previously unknown text based on a
specific algorithm.

Every language has its own unique features. For example: there are certain contradictions between letters and certain
sounds in English language. Thus, two different letters coming together, sound differently than when they are used
separately. For example: letters (t), (h) separately do not sound the same as in chain (th). This is only one of problems
faced in English language. In other words, the place of the letters affect on how they should be or should not be
pronounced. Thus, according to the phonetic rules of English language the first letter (k) of the word (know) is not
pronounced. As well, Russian language has certain pronunciation features. First of all, it should be noted that the letter
(o) does not always pronounce like sound.

Two parameters, naturalness of sounding and intelligibility of speech, are applied for the assessment of the quality of
synthesis system. One can say that naturalness of sounding of a speech synthesizer depends on how many generated
sounds are close to natural human speech. By a intelligibility (ease for understanding) of a speech synthesizer is meant
the easiness of artificial speech understanding. The ideal speech synthesizer should possess both characteristics:
naturalness of sounding and intelligibility. Existing and being developed systems for speech synthesis are aimed at
improvement of these two characteristics.

6
Table of possible token type text :
Type Text Speech
Decimal numbers 1.2 One and two Tenth
Ordinal numbers 1-st First
Roman numbers VI,X Sixth ,Tenth
Alphanumeric strings 110 One a power of a ten
Phone numbers +966501068872 Plus nine ,double six, five ,zero ,one ,zero
,six , double eight , seven , two

Count 45 Forty five

Date 29/11/1999 Twenty nine of November nineteen ninety
nine
Time 11:15 pm Quarter past eleven post meridiem

Mathematical 5+4=9 Five plus four is equal to nine

Difference between text and speech :

Text and speech signal have clearly defined hierarchical nature. In view of hierarchical representation, we can conclude
that for the qualitative construction of systems of speech synthesis it is necessary to develop a model of mechanism of
speech formation. In the system, initially we should define the flow of the information which should proceed according
to the scheme presented below:

TEXT SPEECH

Phonoparagraph
Paragraph
Utterance
Sentence
Phonoword
Word
Diphone
Syllable
Phoneme
Letter

7
Survey of Existing apps and Webs :

Web n. characters n. voice language Time download Free speed

ttsmp3 3,000 61 28 no Yes yes no

fromtexttospeech 50,000 17 8 yes Yes yes yes

ibm 5,000 40 13 yes no yes yes

fakeyou 1000 1385 5 yes yes yes yes

IOS/ANDROID n. characters n. voice language Time download Free speed
MOTOREAD 2500 1 9 No no Yes/no No

Voice dream reader 5000 15 11 No No no Yes

Voice aloud reader 5500 2 15 Yes No yes No

Speech central 3000 2 5 No No Yes/no yes

• More words more time, less word less time to work

• Most apps have both gender voices
• There are some words mispronounced like :
o Bowser will say Boh-zer
o Calaway will say cal-uh-wah-ee purk
o Cheespider will say cheese-pitter

8
Scope Management / WBS :

Improvements management :
application improvement count on users' feedback and the next version will be released with enhanced features. We
welcome all feedback, whether positive or negative, will improve the APP.

Schedule/ time management / include Milestone :

9
Gantt chart :

Risk and issue management :

Information of Budget :
The project is using open source software(python) and google COLAB account so there is no cost to build the project.

10
Project Requirement :

11
Use Case Diagram :

12
Sequence Diagram :

• Sequence Diagram for Sign Up

13
• Sequence Diagram for Login

14
• Sequence Diagram for Manage Account

15
• Sequence Diagram for Voice Clone

16
• Sequence Diagram for Create Voice

17
• Sequence Diagram for Manage Voice

18
User Interface :

19
20
21
22
23
Reference :
[1]. Expressive Neural Voice Cloning. Proceedings of Machine Learning Research 157:–, 2021, Paarth Neekhara*
Shehzeen Hussain* Shlomo Dubnov ,Farinaz Koushanfar, Julian McAuley ,University of California, San Diego, 9500
Gilman Dr, La Jolla, CA 92093 * Denotes Equal Contribution
[2]. Combining Statistical Parameteric Speech Synthesis and Unit-Selection for Automatic Voice Cloning by Matthew
P. Aylett12, Junichi Yamagishi1, 1Centre for Speech Technology Research, University of Edinburgh, U.K.2Cereproc
Ltd., U.K.
[3]. TTS-SYNTHESIZER AS A COMPUTER MEANS FOR PERSONAL VOICE “CLONING” Boris M. Lobanov*
and Helena B. Karnevskaya** Institute of Engineering Cybernetics, Nat. Ac. of Sc. Belarus * Minsk Linguistic State
University
[4]. Text Analysis and Word Pronunciation in Text-to-speech Synthesis, Mark Y. Liberman, Kenneth W. Church,
AT&T Bell Laboratories, 600 Mountain Ave. Murray Hill, N.J., 07974
[5]. The Main Principles of Text-to-Speech Synthesis System by К.R. Aida–Zade, C. Ardil and A.M. Sharifova
[6]. From website massam.fandom we understand the List of words the Microsoft speech engines can't say correctly
[7]. ARCHISEGMENT-BASED LETTER-TO-PHONE CONVERSION FOR CONCATENATIVE SPEECH
SYNTHESIS IN PORTUGUESE, Eleonora Cavalcante Albano and Agnaldo Antonio Moreira, LAFAPE-IEL-
UNICAMP, Campinas, SP, Brazil
[8]. Sadhan ¯ a¯ Vol. 36, Part 5, October 2011, pp. 837–852. c Indian Academy of Sciences, An introduction to
statistical parametric speech synthesis SIMON KING, The Centre for Speech Technology Research, University of
Edinburgh, Edinburgh
[9]. DATA EFFICIENT VOICE CLONING FOR NEURAL SINGING SYNTHESIS by Merlijn Blaauw, Jordi
Bonada, and Ryunosuke Daido, Music Technology Group, Universitat Pompeu Fabra, Barcelona, Spain, Sound
Processing Group, Yamaha Corporation, Hamamatsu, Japan

Computer Network Lab Manual r22 CSD
No ratings yet
Computer Network Lab Manual r22 CSD
61 pages
Debugging Like a Pro: A Practical Guide with Examples
From Everand
Debugging Like a Pro: A Practical Guide with Examples
William E. Clark
No ratings yet
Wipro TalentNext final assessment exam
100% (1)
Wipro TalentNext final assessment exam
3 pages
Qspiders Incubation
No ratings yet
Qspiders Incubation
16 pages
TOC Notes
No ratings yet
TOC Notes
104 pages
Verilog PPT Presentation
No ratings yet
Verilog PPT Presentation
44 pages
COA Course File 2023-24
No ratings yet
COA Course File 2023-24
61 pages
PPS Course Material
100% (1)
PPS Course Material
177 pages
Textbook of Engineering Chemistry
From Everand
Textbook of Engineering Chemistry
C. Parameswara Murthy
No ratings yet
25th August MCA New First Year Syllabus 2020
No ratings yet
25th August MCA New First Year Syllabus 2020
24 pages
ccs341 Data Warehouse Lab Experiments
No ratings yet
ccs341 Data Warehouse Lab Experiments
26 pages
Abdul-Rashid Hassan Pelpuo_Dilemma of Africa's Underdevelopment - A Comparative Study of Policy Response in Ghana and in Uganda_2013
No ratings yet
Abdul-Rashid Hassan Pelpuo_Dilemma of Africa's Underdevelopment - A Comparative Study of Policy Response in Ghana and in Uganda_2013
223 pages
Final Practical List Computer Peripherals and Interface
No ratings yet
Final Practical List Computer Peripherals and Interface
42 pages
BE LP5 Manual 23-24
No ratings yet
BE LP5 Manual 23-24
67 pages
UNIT V Streaming
No ratings yet
UNIT V Streaming
22 pages
Access Matrix: Implementation and Comparison
No ratings yet
Access Matrix: Implementation and Comparison
19 pages
CS8691 AI CO-PO Mapping
No ratings yet
CS8691 AI CO-PO Mapping
6 pages
CN and WP Lab Manual
No ratings yet
CN and WP Lab Manual
101 pages
Unit 4 Cloud Dr. Preeti Patil
100% (1)
Unit 4 Cloud Dr. Preeti Patil
81 pages
Unit - 2 Array: Linear Data Structure
No ratings yet
Unit - 2 Array: Linear Data Structure
14 pages
Ancient Civilizations Assessment
No ratings yet
Ancient Civilizations Assessment
2 pages
DT For Strategic Innovation
No ratings yet
DT For Strategic Innovation
79 pages
TCS Prime Interview Experience
No ratings yet
TCS Prime Interview Experience
15 pages
AJ - Lab Manual
No ratings yet
AJ - Lab Manual
97 pages
Bput Syllabus
No ratings yet
Bput Syllabus
18 pages
Bootnext Technical Questions
No ratings yet
Bootnext Technical Questions
13 pages
300+ TOP Operating System LAB VIVA Questions and Answers
No ratings yet
300+ TOP Operating System LAB VIVA Questions and Answers
25 pages
Group-09 HCL Tech Case Study Analysis
No ratings yet
Group-09 HCL Tech Case Study Analysis
5 pages
JNTUK M.Tech Project Viva-Voce Notification June 2022 (Phase-1)
No ratings yet
JNTUK M.Tech Project Viva-Voce Notification June 2022 (Phase-1)
2 pages
SORTING (Bubble Sort) Aim of The Experiment: Write A C Program That Implement Bubble Sort Method To Sort A Given
No ratings yet
SORTING (Bubble Sort) Aim of The Experiment: Write A C Program That Implement Bubble Sort Method To Sort A Given
12 pages
Stqa Mini Project 1
No ratings yet
Stqa Mini Project 1
36 pages
Chapter-4: Symbol Table Construction and Issues Representing Scope Information
No ratings yet
Chapter-4: Symbol Table Construction and Issues Representing Scope Information
19 pages
Commitment and Motivation
No ratings yet
Commitment and Motivation
30 pages
MMMUT PPT Template
No ratings yet
MMMUT PPT Template
3 pages
(XXXX) 2 Marks (XXXX) : - Class B Extends A
No ratings yet
(XXXX) 2 Marks (XXXX) : - Class B Extends A
11 pages
(Ebook PDF) Clear Aligner Technique by Sandra Tai Ebook All Chapters PDF
100% (2)
(Ebook PDF) Clear Aligner Technique by Sandra Tai Ebook All Chapters PDF
25 pages
Industrial Extreme Programming: Submitted By: Group 3 Submitted To
No ratings yet
Industrial Extreme Programming: Submitted By: Group 3 Submitted To
7 pages
Who Will Cry When You Die
No ratings yet
Who Will Cry When You Die
5 pages
GE3151 PYTHON Syllabus
No ratings yet
GE3151 PYTHON Syllabus
2 pages
Data Science Problem Statements
No ratings yet
Data Science Problem Statements
3 pages
Advanced Algorithms - Cse-Cs
No ratings yet
Advanced Algorithms - Cse-Cs
2 pages
CMOSDICD
No ratings yet
CMOSDICD
19 pages
9 Task Scheduling - Co Operative Models
No ratings yet
9 Task Scheduling - Co Operative Models
25 pages
Data Structure LAB Manual 2024-2025
No ratings yet
Data Structure LAB Manual 2024-2025
58 pages
COA Chapter 6
No ratings yet
COA Chapter 6
6 pages
Tomasulo's Algorithm and Scoreboarding
No ratings yet
Tomasulo's Algorithm and Scoreboarding
17 pages
Java Question BPUT
100% (1)
Java Question BPUT
8 pages
Exercises Financial Statements
100% (1)
Exercises Financial Statements
3 pages
18CS42 Model Question Paper - 1 With Effect From 2019-20 (CBCS Scheme)
No ratings yet
18CS42 Model Question Paper - 1 With Effect From 2019-20 (CBCS Scheme)
3 pages
Cover Page: Template For The Unisys Innovation Program Student Technical Contest-Preliminary Project Plan and Abstract
No ratings yet
Cover Page: Template For The Unisys Innovation Program Student Technical Contest-Preliminary Project Plan and Abstract
3 pages
Wartsila Nigeria White Paper
No ratings yet
Wartsila Nigeria White Paper
13 pages
Education Mismatch - PPTX Wheng
No ratings yet
Education Mismatch - PPTX Wheng
8 pages
Cambridge IGCSE™: Religious Studies 0490/22 October/November 2021
No ratings yet
Cambridge IGCSE™: Religious Studies 0490/22 October/November 2021
26 pages
BUS6100 FINAL PROJECT - Executive Summary Presentation
100% (1)
BUS6100 FINAL PROJECT - Executive Summary Presentation
11 pages
Software Maintenance Models
No ratings yet
Software Maintenance Models
2 pages
Last Will and Testament of Taimur Ahmad Kazmi PDF
No ratings yet
Last Will and Testament of Taimur Ahmad Kazmi PDF
15 pages
Algorithms Flowcharts Notes
100% (4)
Algorithms Flowcharts Notes
4 pages
Cramer. Catenae Graecorum Patrum in Novum Testamentum. 1844. Volume 8.
No ratings yet
Cramer. Catenae Graecorum Patrum in Novum Testamentum. 1844. Volume 8.
646 pages
Sa1 Artapre PDF
No ratings yet
Sa1 Artapre PDF
10 pages
Global International Relations Summary
No ratings yet
Global International Relations Summary
3 pages
Wild Bactrian Camel PDF
No ratings yet
Wild Bactrian Camel PDF
7 pages
2025 Specimen Paper 2
No ratings yet
2025 Specimen Paper 2
4 pages
NSP Application
No ratings yet
NSP Application
2 pages
PPS - Unit 1
No ratings yet
PPS - Unit 1
69 pages
Nathan Bridges Affidavit
No ratings yet
Nathan Bridges Affidavit
5 pages
Tricone
No ratings yet
Tricone
45 pages
Sasken Sample Programming Placement Paper Level1
No ratings yet
Sasken Sample Programming Placement Paper Level1
6 pages
Lipids Notes
100% (1)
Lipids Notes
3 pages
Pointers To Review in Oral Communication
No ratings yet
Pointers To Review in Oral Communication
4 pages
Graphs Assignment
No ratings yet
Graphs Assignment
5 pages
Education Board Bangladesh
No ratings yet
Education Board Bangladesh
1 page
Question Bank of Computer Network
No ratings yet
Question Bank of Computer Network
1 page
Android Building Blocks
No ratings yet
Android Building Blocks
11 pages
Collate Se Unit 4 Notes
No ratings yet
Collate Se Unit 4 Notes
37 pages
Ra 9184
No ratings yet
Ra 9184
12 pages
Dayananda Sagar Academy of Technology and Management Department of Computer Science Question Bank
100% (1)
Dayananda Sagar Academy of Technology and Management Department of Computer Science Question Bank
9 pages
Multithreading and Answers MCQ Java
No ratings yet
Multithreading and Answers MCQ Java
5 pages
Complete C Questions AND ANSWERS11 PDF
67% (3)
Complete C Questions AND ANSWERS11 PDF
92 pages
Soft Computing Lab Manual
No ratings yet
Soft Computing Lab Manual
13 pages
Human Rights of Arrested Persons432734 1585297178
No ratings yet
Human Rights of Arrested Persons432734 1585297178
3 pages
CCS369
No ratings yet
CCS369
2 pages
Distributed Computing Question Bank
No ratings yet
Distributed Computing Question Bank
6 pages
Regular Vebs
No ratings yet
Regular Vebs
1 page
Clinical Lesson Plan
100% (5)
Clinical Lesson Plan
6 pages
Unit 4 Notes
No ratings yet
Unit 4 Notes
46 pages
Cohesive Devices Test
0% (1)
Cohesive Devices Test
2 pages
1544 - Cotton Calico
No ratings yet
1544 - Cotton Calico
11 pages
CO4752 Web Development Assignment (2020 A) : Learning Outcomes Assessed
No ratings yet
CO4752 Web Development Assignment (2020 A) : Learning Outcomes Assessed
3 pages
Uid-Graphical System Advatages
No ratings yet
Uid-Graphical System Advatages
21 pages
RN/LPN Licensure by Examination Application - Additional Information For International Graduates
No ratings yet
RN/LPN Licensure by Examination Application - Additional Information For International Graduates
2 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

A Framework For Deepfake V2

Uploaded by

A Framework For Deepfake V2

Uploaded by

A Framework for Deepfake Voice Synthesis

Abdullah Altulahi 382102957

Bander Altamimi 382102941

Under the supervision of

Department of Computer Science

Count 45 Forty five

Mathematical 5+4=9 Five plus four is equal to nine

Difference between text and speech :

Web n. characters n. voice language Time download Free speed

ttsmp3 3,000 61 28 no Yes yes no

ibm 5,000 40 13 yes no yes yes

fakeyou 1000 1385 5 yes yes yes yes

Voice dream reader 5000 15 11 No No no Yes

Voice aloud reader 5500 2 15 Yes No yes No

Speech central 3000 2 5 No No Yes/no yes

• More words more time, less word less time to work

Schedule/ time management / include Milestone :

Risk and issue management :

• Sequence Diagram for Sign Up

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.