Text-to-Video: Text to Facial Animation Video Convertion – Hamdani Winoto, Hadi Suwastio, Iwan Iwut T.
ISSN 1858-1633 2005 ICTS 63
2. BASIC THEORY, DESIGN, AND IMPLEMENTATION
2.1 Basic Theory
2.1.1MPEG-4 Face and Body Animation
Video compression technology used before MPEG-4 is just talking about inter and intra frame
compression. By having the virtual and synthetic model on MPEG-4, the better compression than the
previous video is resulted.
The new MPEG-4 technology can be applied in every multimedia field. Nowadays, MPEG-4 is still in
the trial of appearing the face and body animation. FDP Face Definition Parameters is a set of values
which identified a certain pattern of a face. These FDP values will be translated, scaled, and rotated by FAP
Face Animation Parameters. FAP is the displacement vector value which normalized the FDP
changes [9].
2.1.2. Facial Animation
The origin of introducing morphing technique is because of the needed of special effect software, for
example; how to make the expression which is impossible to be done by human beings Like how to
open the mouth widely until it reach the four head.
Facial Animation technique is used to make the moving pictures effect smooth transformation among
pictures so that the video is resulted. Principally, there are only two pictures which called source picture
and target picture, morphing technique is used to make the transformation effect from the source picture
become target picture become some shift pictures smoothly, so that it seems like a video.
2.1.3. Morphing and Deformation
Deformation effect is a technique used in order to change an object 2D or 3D object into other object.
What is needed is just an o object which is going to be deformed.
Morphing effect is the effect which is used to change an object into other object. The difference
between morphing and deformation is that morphing needs two objects while deformation needs one object
only [8].
2.1.4. Cross Dissolve
Cross Dissolve morphing method is the simplest morphing method. In this method, what is needed is
just appearing two pictures by transparent mode. Linearly, Secara linier, transparansi gambar asal akan
berkurang dan transparansi gambar target akan bertambah seiring dengan berkurangnya transparansi
gambar asal. By using this method, the source pictures will be slowly disappear while the targets slowly
appear.
2.1.5. Feature Morphing
There are two steps in using morphing between two pictures, which are deformation between the
source pictures and the target one, and then using cross dissolve between them.
In order to know the topology relation between two objects face, there are feature lines which every
feature line has its pair on the target picture. There are three kinds of feature line changes,
which are translation, scaling and rotation. After deformation, do cross dissolve to combine
the texture or color between the source picture and the target one.
2.1.6. Mesh Morphing
Mesh morphing technique use closed curve in order to choose a feature. By using closed curve
usually triangle, the feature selection become more accurate.
In the mesh morphing, the source picture triangle will be interpolated into a target triangle by assuming
a triangle consist of three feature lines which will be interpolated. Then, the nodes on the triangle will be
deformed. Each of the node is deformed only by the three lines where it is located, so that every node is
relative towards all of the features lines lied on those picture.
2.1.7. Text-to-Speech TTS Basic Block
Text-to-Speech is a text into voice conversion system. Synthetic method in Text-to- Speech can be
classified into three categories with its strengths and weaknesses, articulation synthetic, formant synthetic
and concatenation synthetic [7].
In the articulation synthetic we can simulate the human voice system, like the movement of tongue, air
circulation on the throat, and voice band. This method is difficult o be implemented because of the long time
needed during the research [5].
Formant synthetic is based on the source-filter simulation, with an acoustic phonetics description
approach. The voice is not produced by a physic equation of vocal apparatus, but it is produced by
simulating the main acoustic characteristic of a voice signal. The basic acoustic model represent as filter. It
is made by some sets of formant, which reflect the articulation of voice[5].
The method which is normally used is concatenation. It can combine Every voice unit like
phone, diphone and triphone.
2.1.8. Speech Processing Basic Theory