Design And Development Of Voice Transformation.
DESIGN AND DEVELOPMENT OF VOICE TRANSFORMATION
LILY LING AI LING
This report is submitted in partial fulfillment of the requirements for the award of
Bachelor of Electronic Engineering (Computer Engineering) With Honours
Faculty of Electronic Engineering and Computer Engineering
Universiti Teknikal Malaysia Melaka
April 2009
DESIGN AND DEVELOPMENT OF VOICE TRANSFORMATION
Sesi
Pengajian
:
…..2008/2009……………………………………………………………
Saya
………………LILY LING AI LING………………………………………………………..
(HURUF BESAR)
mengaku membenarkan Laporan Projek Sarjana Muda ini disimpan di Perpustakaan dengan syaratsyarat kegunaan seperti berikut:
1. Laporan adalah hakmilik Universiti Teknikal Malaysia Melaka.
2. Perpustakaan dibenarkan membuat salinan untuk tujuan pengajian sahaja.
3. Perpustakaan dibenarkan membuat salinan laporan ini sebagai bahan pertukaran antara institusi
pengajian tinggi.
4. Sila tandakan (
):
SULIT*
(Mengandungi maklumat yang berdarjah keselamatan atau
kepentingan Malaysia seperti yang termaktub di dalam AKTA
RAHSIA RASMI 1972)
TERHAD*
(Mengandungi maklumat terhad yang telah ditentukan oleh
organisasi/badan di mana penyelidikan dijalankan)
TIDAK
TERHAD
Disahkan oleh:
__________________________
(TANDATANGAN PENULIS)
Alamat Tetap: ……………………………......
……………………………......
___________________________________
(COP DAN TANDATANGAN
PENYELIA)
“I hereby declare that this report is the result of my own work except for quotes as cited
in the references”
Signature
: …………………………………
Author
: Lily Ling Ai Ling
Date
: 27 April 2009
“I hereby declare that I have read this report and in my opinion this report is sufficient
in terms of the scope and quality for the award of Bachelor of Electronic Engineering
(Computer Engineering) With Honours.”
Signature
: …………………………………
Supervisor’s Name
: Mdm Juwita Bt Mohd Sultan
Date
: 27 April 2009
Dedicated to my beloved family member especially my father, mother and also to my
friends.
ACKNOWLEDGEMENT
First of all, I would like to thank to my supervisor, Madam Juwita binti Mohd
Sultan for her valuable guidance in completing the project and thesis. I am especially
grateful to my beloved father, mother and my family member for all their esteem
support, patience and understanding regarding to my study load and research work.
I would like to acknowledge the contributions of my classmate in Universiti
Teknikal Malaysia Melaka, for their great efforts in successful completion of this
project, which was, otherwise, not possible without their priceless support and help.
Lastly, thanks to my dearest friend Leong Eng Chui and Pang Pek Hong for
their help, guidance and idea. Those with whom I did not have the pleasure of personal
interacting, nevertheless their contributions are extremely admirable and valuable to me.
ABSTRACT
This project is the DSP implementation of innovative algorithms for voice
transformation in real time. Voice transformation is the process of transforming the
characteristics of speech uttered by a source speaker, such that a listener would believe
the speech was uttered by a target speaker. In this project, two aspects of the
transformation problem are addressed: voice quality and intonation. The main steps of
the complete project include: a method for high quality voice transformation and
designing a suitable algorithm in Matlab/Simulink. Voice transformation technology
has been used more and more widely in many fields. Yet, the source voice patterns after
transformation may exhibit a substantial degree of variance from the target speaker. The
objective of this project was to develop a digital voice transformation program utilizing
Matlab will be able to transform the voice from target speaker to source speaker. Matlab
provided us with the necessary tools to record, filter, and analyze different voice
samples and compare them to the archived sample. Research about the related will be
done before design the program in Matlab. Troubleshooting will be done if there is any
error occurs. At the end of this project, a complete project includes a method for high
quality voice transformation will be implemented and a suitable algorithm in
Matlab/Simulink will be designed.
ABSTRAK
Projek ini bertujuan untuk menghasilkan DSP algorithm yang boleh menjalankan suara
transformasi. Suara transformasi adalah proses untuk mengubah bentuk sifat suara
seorang penutur supaya pendengar lain akan percaya bahawa suara ini adalah
dikeluarkan oleh sasaran penutur. Suara qualiti dan intonasi adalah dua aspek utama
untuk menghasilkan projek ini. Langkah utama untuk menyiapkan projek ini adalah
termasuk: cara untuk menghasilkan suara transformasi dengan quality tinggi dan
menghasilkan algorithm dalam Matlab/Simulink. Teknologi ini telah digunakan dalam
pelbagai bidang tetapi ketepatan keputusan suara transformasi ini adalah tidak
memuaskan. Objektif projek ini adalah menghasilkan sistem suara transformasi dalam
Matlab yang boleh rekod, menapis, membuat suara analisis dan membuat perbandingan.
Segala penyelidikan yang berkaitan akan dibuat sebelum menghasilkan system tersebut.
Hasilan untuk projek ini adalah untuk menghasilkan satu projek yang lengkap dengan
cara untuk mencapai suara transformasi yang mempunyai qualiti tinggi dan algorithm
yang sesuai akan direka dalam Matlab/Simulink.
CONTENTS
CHAPTER
SUBJECT
PAGE
TITLE
REPORT STATUS VERIFICATION
FORM
DECLARATION
iii
SUPERVISOR VERIFICATION
iv
DEDICATION
v
ACKNOWLEDGEMENT
vii
ABSTRACT
vii
ABSTRAK
viii
CONTENTS
ix
LIST OF FIGURES
xiv
LISTS OF TABLE
xvi
LIST OF SHORT FORM
xvii
CHAPTER 1
CHAPTER 2
INTRODUCTION
PAGE
1.1
Introduction of Project
1
1.2
Objective of Project
2
1.3
Problem Statement
2
1.4
Scope
2
1.5
Methodology
3
1.6
Thesis Outline
3
LITERATURE REVIEW
PAGE
2.1
Introduction of Voice Transformation
5
2.2
Speech Model
6
2.3
Speaker Characteristics
7
2.4
Component of Voice Conversion System
8
2.4.1
Feature Extraction
8
2.4.2
Model Estimation
9
2.4.3
Voice Mapping
10
2.5
2.6
Existing Voice Transformation Systems
10
2.5.1
Voice Quality Conversion
11
2.5.1.1 Representation of Speech
11
2.5.1.2 Mapping Method
12
Transforming the Spectral Envelope
12
2.6.1
Computing Transformation
13
Parameters
2.6.2
Unvoiced Section Transformation
13
2.7
Intonation Transformation
13
2.8
Sample Rate Conversion
14
2.9
Pitch and Frequency
15
2.9.1
16
Pitch Range
2.10
Pitch Synchronous Overlap Add (PSOLA) 16
2.11
Virtual Dubbing Process
17
2.11.1 Advantage of Virtual Dubbing
19
Application of Voice Transformation
19
2.12.1 Text to Speech Adaptation
19
2.12.2 Speaker Identification System
20
Matlab
20
2.13.1 History of Matlab
20
2.13.2 Rules on Variable and Function
21
2.12
2.13
Names
2.14
2.13.3 Graphics
22
2.13.4 Character Set
23
2.13.5 Commenting in MATLAB Editor
24
Graphical User Interface
26
2.14.1 Elements of GUI
27
CHAPTER 3
METHODOLOGY
PAGE
3.1
Introduction
29
3.2
Project methodology
29
3.2.1
Collect information
31
3.2.2
Understand basic of voice
31
Transformation
Design source code
31
3.2.4
Testing the program
32
3.3
Monitoring program flow chart
32
3.4
Software
34
3.4.1
34
3.5
CHAPTER 4
3.2.3
Matlab
Voice Analysis and Voice Mapping
RESULTS AND ANALYSIS
35
PAGE
4.1
Introduction
36
4.2
Results
37
4.3
Analysis of the Results
49
CHAPTER 5
CONCLUSION AND SUGGESTION
PAGE
5.1
Introduction
51
5.2
Conclusion and Recommendation
51
REFERENCES
APPENDIX
53
PAGE
Appendix A: Source for the main program
55
(Voice Transformation System)
Appendix B: Source Code for Sub Program
(Load The File)
70
LIST OF FIGURE
No.
TITLE
2.1
Human vocal tract
2.2
TD-PSOLA Transformation of pitch, intonation and duration
PAGE
6
16
Parameters
2.3
Virtual Dubbing Block Diagrams
18
2.4
Example of comment with comment symbol
25
2.5
Example of using Matlab editor to select group of line
25
26
Example of comment out part of statement
26
2.7
Comment out text within a multiline statement
26
3.1
Flow chart of project
30
3.2
Flow chart of program
33
4.1
Blank GUI (default)
37
4.2
GUI Window
38
4.3
Property Inspector for Voice Transformation System
39
4.4
Drawing GUI in GUIDE Template
40
4.5
Output GUI for the Voice Transformation System
41
4.6
GUI when “Load Voice” was clicked
42
4.7
Output when either one of the option was clicked
43
4.8
Signal waveform of the user before and after the transformation
44
(Cartoon Voice for 5 seconds)
4.9
Signal waveform of the user before and after the transformation
44
(Cartoon Voice for 15 seconds)
4.10
Signal waveform of the user before and after the transformation
45
(Man to Woman Voice for 5 seconds)
4.11
Signal waveform of the user before and after the transformation
45
(Man to Woman Voice for 15 seconds)
4.12
Signal waveform of the user before and after the transformation
46
(Woman to Man Voice for 5 seconds)
4.13
Signal waveform of the user before and after the transformation
46
(Woman to Man Voice for 15 seconds)
4.14
The wav file that had prerecorded and save in the file.
47
4.15
Signal waveform of the user before and after the transformation
47
(Load the file to transform the voice to cartoon voice)
4.16
Signal waveform of the user before and after the transformation
48
(Load the file to transform the voice from woman to man voice)
4.17
Signal waveform of the user before and after the transformation
(Load the file to transform the voice from man to woman voice)
49
LIST OF TABLE
No.
2.1
TITLE
Lists of Operator
PAGE
23
LIST OF SHORT FORM
DSP-
Digital Signal Processing
DFT
Discrete Fourier Transform
DTW-
Dynamic Time Warping
EM
Expected Maximization
FFT-
Fast Fourier Transform
FIR
Finite Impulse Response
GMM
Gaussian Mixture Model
HMM-
Hidden Markov Modeling
HNM
Harmonic plus Noise Model
IIR
Infinite Impulse Response
LPC-
Linear Prediction Coding
MFCC-
Mel Frequency Cepstral Coefficients
RELP
Residual Excited Linear Prediction
PSOLA
Pitch Synchronous Overlap Add
CHAPTER 1
INTRODUCTION
1.1
Introduction of Project
Speech is the most used way of communication for people. We born with the
skills of speaking learn it easily during our early childhood and mostly communicate
with each other with speech throughout our lives. By the developments of
communication technologies in the last era, speech starts to be an important interface
for many systems. Instead of using complex different interfaces, speech is easier to
communicate with computers.
This project is the DSP implementation of innovative algorithms for voice
transformation in real time. This entire set of operations represents a particular
implementation of the so-called Virtual Dubbing procedure. Voice transformation is
the process of transforming the characteristics of speech uttered by a source speaker,
such that a listener would believe the speech was uttered by a target speaker. In this
project, two aspects of the transformation problem are addressed: voice quality and
intonation. The main steps of the complete project include: a method for high quality
voice transformation and designing a suitable algorithm in Matlab/Simulink.
1.2
Objectives of Project
There are several objectives for this project.
To design and develop the algorithm for a high quality voice
transformation system.
To analyze the result of the signal after transformation.
1.3
Problem Statement
Nowadays, voice transformation technology has been used more and more
widely in many fields. For example, in virtual dubbing process, text to speech program
and so on.
There are also other factors which can affect the quality of voice samples other
than the noise disruptions created by microphones devices. For example, factors such as
mispronounced verbal phrases, different media used for enrollment and verification
(using a land line telephone for the enrollment process, but then using a cell phone for
the verification process), as well as the emotional and physical conditions of the
individual.
1.4
Scope
The system that implement for this project is a user independent system which
can transform any voice to the desired voice. The devices that we intended to used to
capture an individual's voice samples are computer microphones. There are two
important aspects of the transformation problem are addressed: voice quality and
intonation.
User can choose to transform their voice to two choices and record for 5 or 15
seconds. In this project, there is only mainly discussed in the algorithm of the system
and therefore will not include the hardware design.
1.5
Methodology
At first, after the title of the project was confirmed, the research about the topic
was done by find the important information from journal, reference book and internet
resource. The features of Matlab and basic concept of voice transformation was studied.
After that, the graphical user interface (GUI) and source code was designed in
Matlab. Program was checked and the troubleshooting was done if any errors had
occurred within the program.
The project was completed and successful if there is no error.
1.6
Thesis Outline
This thesis is a report that delivers the idea generated, concepts applied,
activities done and the final year project produced. It consists of five chapters which are
Chapter 1: Introduction, Chapter 2: Literature Review, Chapter 3: Methodology,
Chapter 4: Results and Discussion and finally last chapter, Chapter 5: Conclusion and
Recommendation.
Chapter 1 is delivering the introduction of the project. It contains objective,
problem statement, scope of work, methodology and thesis outline of this project.
Chapter 2 is discussing about the literature review of this project. The features of
Matlab are studied. The application of the voice transformation system was also learned
in this chapter.
Chapter 3 briefly described the method that used in this project in order to solve
the problem. It also covered the factor and reason that we consider when we choosing
the certain method. The advantage of the method was discussing in this chapter too.
Chapter 4 is deals with the analysis of the result at the final stage which is
complete designed and implemented the voice transformation in Matlab. The
monitoring source code is written by using the Matlab language.
Chapter 5 is described the conclusion and result of the project at the final stage.
The recommendation and future development of this project is discussed in order to
upgrade the voice transformation system.
CHAPTER 2
LITERATURE REVIEW
2.1
Introduction
Definition of voice conversion aims at transforming the characteristics of the
speech signal uttered by a speaker (Source Speaker), in such a way that a human listener
could believe that the transformed speech is produced by another specific speaker
(Target Speaker).
Voice transformation is the process of taking the speech of a source speaker and
transforming the characteristics of the signal, such that a human listener would believe
the speech was uttered by a target speaker.
2.2
Speech Model
The human voice consists of sound made by a human being using the vocal
folds for talking, singing, laughing, crying, screaming, etc. Human voice is specifically
that part of human sound production in which the vocal folds (vocal cords) are the
primary noise source. Generally speaking, the voice can be subdivided into three parts;
the lungs, the vocal folds, and the articulators. The lung must produce adequate airflow
to vibrate vocal folds (air is the fuel of the voice). The vocal folds (vocal cords) are the
vibrators, neuromuscular units that ‘fine tune’ pitch and tone [1]. The articulators (vocal
tract consisting of tongue, palate, cheek, lips, etc.) articulate and filter the sound.
Figure 2.1: Human vocal tract [2]
Human speech is produced by the vocal tract, which starts at the glottis (vocal
folds) and ends at the lips. The lung contract is to force air through the trachea and
!
pharynx and out through the nasal and oral cavities. In English there are four different
types of sounds that can be created: aspiration noise, plosion and voicing. Voicing is a
quasi periodic vibration of the vocal folds. The frequency of the vibration is called the
fundamental frequency of F0 and is perceived as pitch.
A voice frequency or voice band is one of the frequencies which within part of
the audio range that is used for the transmission of speech. The voiced speech of a
typical adult male will have a fundamental frequency of from 85 to 155 Hz, and that of
a typical adult female from 165 to 255 Hz [3]. Thus, the fundamental frequency of most
speech falls below the bottom of the "voice frequency" band as defined above.[4]
2.3
Speaker Characteristics
There are a very large number of respects in which speech may differ from different
speakers. These can be divided into three main types of speaker identity:
a. Segmental: In linguistics, the term segment may be defined as "any discrete unit
that can be identified, either physically or auditorily, in the stream of speech [5].
Segments are called “discrete” because they are separate and individual, such as
consonants and vowels and occur in distinct temporal order.
b. Suprasegmental: These characteristics describe the prosodic features of the
voice related to the style of speaking. This includes information about how the
fundamental frequency (F0) varies during utterances, duration variation and also
how stress varies over the course of a sentence. Other units, such as tone, stress,
and sometimes secondary articulations such as nasalization, may coexist with
multiple segments and cannot be discretely ordered with them [6]. These
elements are termed suprasegmental. It is not clear how the concept of segment
applies to sign languages.
LILY LING AI LING
This report is submitted in partial fulfillment of the requirements for the award of
Bachelor of Electronic Engineering (Computer Engineering) With Honours
Faculty of Electronic Engineering and Computer Engineering
Universiti Teknikal Malaysia Melaka
April 2009
DESIGN AND DEVELOPMENT OF VOICE TRANSFORMATION
Sesi
Pengajian
:
…..2008/2009……………………………………………………………
Saya
………………LILY LING AI LING………………………………………………………..
(HURUF BESAR)
mengaku membenarkan Laporan Projek Sarjana Muda ini disimpan di Perpustakaan dengan syaratsyarat kegunaan seperti berikut:
1. Laporan adalah hakmilik Universiti Teknikal Malaysia Melaka.
2. Perpustakaan dibenarkan membuat salinan untuk tujuan pengajian sahaja.
3. Perpustakaan dibenarkan membuat salinan laporan ini sebagai bahan pertukaran antara institusi
pengajian tinggi.
4. Sila tandakan (
):
SULIT*
(Mengandungi maklumat yang berdarjah keselamatan atau
kepentingan Malaysia seperti yang termaktub di dalam AKTA
RAHSIA RASMI 1972)
TERHAD*
(Mengandungi maklumat terhad yang telah ditentukan oleh
organisasi/badan di mana penyelidikan dijalankan)
TIDAK
TERHAD
Disahkan oleh:
__________________________
(TANDATANGAN PENULIS)
Alamat Tetap: ……………………………......
……………………………......
___________________________________
(COP DAN TANDATANGAN
PENYELIA)
“I hereby declare that this report is the result of my own work except for quotes as cited
in the references”
Signature
: …………………………………
Author
: Lily Ling Ai Ling
Date
: 27 April 2009
“I hereby declare that I have read this report and in my opinion this report is sufficient
in terms of the scope and quality for the award of Bachelor of Electronic Engineering
(Computer Engineering) With Honours.”
Signature
: …………………………………
Supervisor’s Name
: Mdm Juwita Bt Mohd Sultan
Date
: 27 April 2009
Dedicated to my beloved family member especially my father, mother and also to my
friends.
ACKNOWLEDGEMENT
First of all, I would like to thank to my supervisor, Madam Juwita binti Mohd
Sultan for her valuable guidance in completing the project and thesis. I am especially
grateful to my beloved father, mother and my family member for all their esteem
support, patience and understanding regarding to my study load and research work.
I would like to acknowledge the contributions of my classmate in Universiti
Teknikal Malaysia Melaka, for their great efforts in successful completion of this
project, which was, otherwise, not possible without their priceless support and help.
Lastly, thanks to my dearest friend Leong Eng Chui and Pang Pek Hong for
their help, guidance and idea. Those with whom I did not have the pleasure of personal
interacting, nevertheless their contributions are extremely admirable and valuable to me.
ABSTRACT
This project is the DSP implementation of innovative algorithms for voice
transformation in real time. Voice transformation is the process of transforming the
characteristics of speech uttered by a source speaker, such that a listener would believe
the speech was uttered by a target speaker. In this project, two aspects of the
transformation problem are addressed: voice quality and intonation. The main steps of
the complete project include: a method for high quality voice transformation and
designing a suitable algorithm in Matlab/Simulink. Voice transformation technology
has been used more and more widely in many fields. Yet, the source voice patterns after
transformation may exhibit a substantial degree of variance from the target speaker. The
objective of this project was to develop a digital voice transformation program utilizing
Matlab will be able to transform the voice from target speaker to source speaker. Matlab
provided us with the necessary tools to record, filter, and analyze different voice
samples and compare them to the archived sample. Research about the related will be
done before design the program in Matlab. Troubleshooting will be done if there is any
error occurs. At the end of this project, a complete project includes a method for high
quality voice transformation will be implemented and a suitable algorithm in
Matlab/Simulink will be designed.
ABSTRAK
Projek ini bertujuan untuk menghasilkan DSP algorithm yang boleh menjalankan suara
transformasi. Suara transformasi adalah proses untuk mengubah bentuk sifat suara
seorang penutur supaya pendengar lain akan percaya bahawa suara ini adalah
dikeluarkan oleh sasaran penutur. Suara qualiti dan intonasi adalah dua aspek utama
untuk menghasilkan projek ini. Langkah utama untuk menyiapkan projek ini adalah
termasuk: cara untuk menghasilkan suara transformasi dengan quality tinggi dan
menghasilkan algorithm dalam Matlab/Simulink. Teknologi ini telah digunakan dalam
pelbagai bidang tetapi ketepatan keputusan suara transformasi ini adalah tidak
memuaskan. Objektif projek ini adalah menghasilkan sistem suara transformasi dalam
Matlab yang boleh rekod, menapis, membuat suara analisis dan membuat perbandingan.
Segala penyelidikan yang berkaitan akan dibuat sebelum menghasilkan system tersebut.
Hasilan untuk projek ini adalah untuk menghasilkan satu projek yang lengkap dengan
cara untuk mencapai suara transformasi yang mempunyai qualiti tinggi dan algorithm
yang sesuai akan direka dalam Matlab/Simulink.
CONTENTS
CHAPTER
SUBJECT
PAGE
TITLE
REPORT STATUS VERIFICATION
FORM
DECLARATION
iii
SUPERVISOR VERIFICATION
iv
DEDICATION
v
ACKNOWLEDGEMENT
vii
ABSTRACT
vii
ABSTRAK
viii
CONTENTS
ix
LIST OF FIGURES
xiv
LISTS OF TABLE
xvi
LIST OF SHORT FORM
xvii
CHAPTER 1
CHAPTER 2
INTRODUCTION
PAGE
1.1
Introduction of Project
1
1.2
Objective of Project
2
1.3
Problem Statement
2
1.4
Scope
2
1.5
Methodology
3
1.6
Thesis Outline
3
LITERATURE REVIEW
PAGE
2.1
Introduction of Voice Transformation
5
2.2
Speech Model
6
2.3
Speaker Characteristics
7
2.4
Component of Voice Conversion System
8
2.4.1
Feature Extraction
8
2.4.2
Model Estimation
9
2.4.3
Voice Mapping
10
2.5
2.6
Existing Voice Transformation Systems
10
2.5.1
Voice Quality Conversion
11
2.5.1.1 Representation of Speech
11
2.5.1.2 Mapping Method
12
Transforming the Spectral Envelope
12
2.6.1
Computing Transformation
13
Parameters
2.6.2
Unvoiced Section Transformation
13
2.7
Intonation Transformation
13
2.8
Sample Rate Conversion
14
2.9
Pitch and Frequency
15
2.9.1
16
Pitch Range
2.10
Pitch Synchronous Overlap Add (PSOLA) 16
2.11
Virtual Dubbing Process
17
2.11.1 Advantage of Virtual Dubbing
19
Application of Voice Transformation
19
2.12.1 Text to Speech Adaptation
19
2.12.2 Speaker Identification System
20
Matlab
20
2.13.1 History of Matlab
20
2.13.2 Rules on Variable and Function
21
2.12
2.13
Names
2.14
2.13.3 Graphics
22
2.13.4 Character Set
23
2.13.5 Commenting in MATLAB Editor
24
Graphical User Interface
26
2.14.1 Elements of GUI
27
CHAPTER 3
METHODOLOGY
PAGE
3.1
Introduction
29
3.2
Project methodology
29
3.2.1
Collect information
31
3.2.2
Understand basic of voice
31
Transformation
Design source code
31
3.2.4
Testing the program
32
3.3
Monitoring program flow chart
32
3.4
Software
34
3.4.1
34
3.5
CHAPTER 4
3.2.3
Matlab
Voice Analysis and Voice Mapping
RESULTS AND ANALYSIS
35
PAGE
4.1
Introduction
36
4.2
Results
37
4.3
Analysis of the Results
49
CHAPTER 5
CONCLUSION AND SUGGESTION
PAGE
5.1
Introduction
51
5.2
Conclusion and Recommendation
51
REFERENCES
APPENDIX
53
PAGE
Appendix A: Source for the main program
55
(Voice Transformation System)
Appendix B: Source Code for Sub Program
(Load The File)
70
LIST OF FIGURE
No.
TITLE
2.1
Human vocal tract
2.2
TD-PSOLA Transformation of pitch, intonation and duration
PAGE
6
16
Parameters
2.3
Virtual Dubbing Block Diagrams
18
2.4
Example of comment with comment symbol
25
2.5
Example of using Matlab editor to select group of line
25
26
Example of comment out part of statement
26
2.7
Comment out text within a multiline statement
26
3.1
Flow chart of project
30
3.2
Flow chart of program
33
4.1
Blank GUI (default)
37
4.2
GUI Window
38
4.3
Property Inspector for Voice Transformation System
39
4.4
Drawing GUI in GUIDE Template
40
4.5
Output GUI for the Voice Transformation System
41
4.6
GUI when “Load Voice” was clicked
42
4.7
Output when either one of the option was clicked
43
4.8
Signal waveform of the user before and after the transformation
44
(Cartoon Voice for 5 seconds)
4.9
Signal waveform of the user before and after the transformation
44
(Cartoon Voice for 15 seconds)
4.10
Signal waveform of the user before and after the transformation
45
(Man to Woman Voice for 5 seconds)
4.11
Signal waveform of the user before and after the transformation
45
(Man to Woman Voice for 15 seconds)
4.12
Signal waveform of the user before and after the transformation
46
(Woman to Man Voice for 5 seconds)
4.13
Signal waveform of the user before and after the transformation
46
(Woman to Man Voice for 15 seconds)
4.14
The wav file that had prerecorded and save in the file.
47
4.15
Signal waveform of the user before and after the transformation
47
(Load the file to transform the voice to cartoon voice)
4.16
Signal waveform of the user before and after the transformation
48
(Load the file to transform the voice from woman to man voice)
4.17
Signal waveform of the user before and after the transformation
(Load the file to transform the voice from man to woman voice)
49
LIST OF TABLE
No.
2.1
TITLE
Lists of Operator
PAGE
23
LIST OF SHORT FORM
DSP-
Digital Signal Processing
DFT
Discrete Fourier Transform
DTW-
Dynamic Time Warping
EM
Expected Maximization
FFT-
Fast Fourier Transform
FIR
Finite Impulse Response
GMM
Gaussian Mixture Model
HMM-
Hidden Markov Modeling
HNM
Harmonic plus Noise Model
IIR
Infinite Impulse Response
LPC-
Linear Prediction Coding
MFCC-
Mel Frequency Cepstral Coefficients
RELP
Residual Excited Linear Prediction
PSOLA
Pitch Synchronous Overlap Add
CHAPTER 1
INTRODUCTION
1.1
Introduction of Project
Speech is the most used way of communication for people. We born with the
skills of speaking learn it easily during our early childhood and mostly communicate
with each other with speech throughout our lives. By the developments of
communication technologies in the last era, speech starts to be an important interface
for many systems. Instead of using complex different interfaces, speech is easier to
communicate with computers.
This project is the DSP implementation of innovative algorithms for voice
transformation in real time. This entire set of operations represents a particular
implementation of the so-called Virtual Dubbing procedure. Voice transformation is
the process of transforming the characteristics of speech uttered by a source speaker,
such that a listener would believe the speech was uttered by a target speaker. In this
project, two aspects of the transformation problem are addressed: voice quality and
intonation. The main steps of the complete project include: a method for high quality
voice transformation and designing a suitable algorithm in Matlab/Simulink.
1.2
Objectives of Project
There are several objectives for this project.
To design and develop the algorithm for a high quality voice
transformation system.
To analyze the result of the signal after transformation.
1.3
Problem Statement
Nowadays, voice transformation technology has been used more and more
widely in many fields. For example, in virtual dubbing process, text to speech program
and so on.
There are also other factors which can affect the quality of voice samples other
than the noise disruptions created by microphones devices. For example, factors such as
mispronounced verbal phrases, different media used for enrollment and verification
(using a land line telephone for the enrollment process, but then using a cell phone for
the verification process), as well as the emotional and physical conditions of the
individual.
1.4
Scope
The system that implement for this project is a user independent system which
can transform any voice to the desired voice. The devices that we intended to used to
capture an individual's voice samples are computer microphones. There are two
important aspects of the transformation problem are addressed: voice quality and
intonation.
User can choose to transform their voice to two choices and record for 5 or 15
seconds. In this project, there is only mainly discussed in the algorithm of the system
and therefore will not include the hardware design.
1.5
Methodology
At first, after the title of the project was confirmed, the research about the topic
was done by find the important information from journal, reference book and internet
resource. The features of Matlab and basic concept of voice transformation was studied.
After that, the graphical user interface (GUI) and source code was designed in
Matlab. Program was checked and the troubleshooting was done if any errors had
occurred within the program.
The project was completed and successful if there is no error.
1.6
Thesis Outline
This thesis is a report that delivers the idea generated, concepts applied,
activities done and the final year project produced. It consists of five chapters which are
Chapter 1: Introduction, Chapter 2: Literature Review, Chapter 3: Methodology,
Chapter 4: Results and Discussion and finally last chapter, Chapter 5: Conclusion and
Recommendation.
Chapter 1 is delivering the introduction of the project. It contains objective,
problem statement, scope of work, methodology and thesis outline of this project.
Chapter 2 is discussing about the literature review of this project. The features of
Matlab are studied. The application of the voice transformation system was also learned
in this chapter.
Chapter 3 briefly described the method that used in this project in order to solve
the problem. It also covered the factor and reason that we consider when we choosing
the certain method. The advantage of the method was discussing in this chapter too.
Chapter 4 is deals with the analysis of the result at the final stage which is
complete designed and implemented the voice transformation in Matlab. The
monitoring source code is written by using the Matlab language.
Chapter 5 is described the conclusion and result of the project at the final stage.
The recommendation and future development of this project is discussed in order to
upgrade the voice transformation system.
CHAPTER 2
LITERATURE REVIEW
2.1
Introduction
Definition of voice conversion aims at transforming the characteristics of the
speech signal uttered by a speaker (Source Speaker), in such a way that a human listener
could believe that the transformed speech is produced by another specific speaker
(Target Speaker).
Voice transformation is the process of taking the speech of a source speaker and
transforming the characteristics of the signal, such that a human listener would believe
the speech was uttered by a target speaker.
2.2
Speech Model
The human voice consists of sound made by a human being using the vocal
folds for talking, singing, laughing, crying, screaming, etc. Human voice is specifically
that part of human sound production in which the vocal folds (vocal cords) are the
primary noise source. Generally speaking, the voice can be subdivided into three parts;
the lungs, the vocal folds, and the articulators. The lung must produce adequate airflow
to vibrate vocal folds (air is the fuel of the voice). The vocal folds (vocal cords) are the
vibrators, neuromuscular units that ‘fine tune’ pitch and tone [1]. The articulators (vocal
tract consisting of tongue, palate, cheek, lips, etc.) articulate and filter the sound.
Figure 2.1: Human vocal tract [2]
Human speech is produced by the vocal tract, which starts at the glottis (vocal
folds) and ends at the lips. The lung contract is to force air through the trachea and
!
pharynx and out through the nasal and oral cavities. In English there are four different
types of sounds that can be created: aspiration noise, plosion and voicing. Voicing is a
quasi periodic vibration of the vocal folds. The frequency of the vibration is called the
fundamental frequency of F0 and is perceived as pitch.
A voice frequency or voice band is one of the frequencies which within part of
the audio range that is used for the transmission of speech. The voiced speech of a
typical adult male will have a fundamental frequency of from 85 to 155 Hz, and that of
a typical adult female from 165 to 255 Hz [3]. Thus, the fundamental frequency of most
speech falls below the bottom of the "voice frequency" band as defined above.[4]
2.3
Speaker Characteristics
There are a very large number of respects in which speech may differ from different
speakers. These can be divided into three main types of speaker identity:
a. Segmental: In linguistics, the term segment may be defined as "any discrete unit
that can be identified, either physically or auditorily, in the stream of speech [5].
Segments are called “discrete” because they are separate and individual, such as
consonants and vowels and occur in distinct temporal order.
b. Suprasegmental: These characteristics describe the prosodic features of the
voice related to the style of speaking. This includes information about how the
fundamental frequency (F0) varies during utterances, duration variation and also
how stress varies over the course of a sentence. Other units, such as tone, stress,
and sometimes secondary articulations such as nasalization, may coexist with
multiple segments and cannot be discretely ordered with them [6]. These
elements are termed suprasegmental. It is not clear how the concept of segment
applies to sign languages.