Publication Repository Artikel Thailand
2013 Eleventh International Conference on ICT and Knowledge Engineering
Augmented Reality 3D Eyeglasses
Frame Simulator Using Active Shape Model
and Real Time Face Tracking
Endang Setyati, Yosi Kristian,
Yuliana Melita Pranoto
David Alexandre
Department of Industrial Management
National Taiwan University of Science and Technology
Taipei City, Taiwan
[email protected]
Department of Information Technology
Sekolah Tinggi Teknik Surabaya (STTS)
Surabaya, Indonesia.
[email protected], [email protected], [email protected]
Abstract—Combination of real-world object and computergenerated object know as Augmented Reality (AR) is considered
media of the future. In AR the computer-generated object is a
result of a three-dimensional graphics rendering. An AR
software designed to provide real-time interactivity with the user
in forms of video. In this paper we implement AR technology to
simulate eyeglasses frame model simulation so user can try many
eyeglasses frame models and see how they look without actually
have to try the eyeglasses. After activating this software, users in
the webcam’s range will be processed automatically, a 3D
eyeglasses frame model will be positioned in the user’s face. To
develop this software we must first implement a fast and effective
face tracking algorithm so user can feel the AR as a real object.
The speed of face detection or face tracking used in this software
will have a significant effect on user experience. On the other
hand the higher the video resolution the slower the face tracking
algorithm will be. In this paper we use a medium video resolution
(640 x 480) so we can increase the speed of face tracking
algorithm to make the frame per second user experience
acceptable. This simulation is an example of an effective AR
technology usage.
II. FACE TRACKING ON VIDEO
A. Augmented Reality
In concept, AR is a real world condition which modified by
computer into a new condition. Input used in AR varies from
sound, video, image, to GPS. In the larger scale, AR need
various sensors and special environment. The way of AR
working is involving an input which is a position or
coordinates in real environment and output which is a
generated object from AR calculation. The AR generated
object is a result from a 3D object.
Generally AR can be divided into two groups, marker-less
tracking and marker-based tracking. Marker-based detection
on augmented reality will involves marker as a basis for
defining input. Therefore, one of the algorithms used in this
simulation will be a marker tracking algorithm for finding a
marker type and position for generating AR object with
accurate position. The result is a picture from 3D object which
its position adjusted based on the marker position. Of course,
the user is required to have the marker mentioned above for
activating AR object.
Keywords—Technology Innovation, Augmented Reality, Face
Tracking, Eyeglasses Frame Simulator.
In marker-less tracking, AR defines its input by using
algorithms such as object tracking, hand tracking, or face
tracking. The result for these algorithms will be used to define
the position of AR generated object. In its development,
various methods and algorithms have been used for generating
more accurate and better AR. In addition, there are also a
significant enhancement in the supporting devices, such as
glasses for watching the AR result directly, etc.
I. INTRODUCTION
Face tracking (real time face detection) algorithm is one of
the fast-growing subjects in the computer vision. Numerous
algorithm and methods have been applied to accomplish a fast
and robust face tracking system. A good face tracking system
is able to track human faces on video stream with acceptable
frame rates and to detect only human face, not a human alike
face.
B. Haar-Like Feature
An AR system supplements the real world with virtual
(computer-generated) objects that appear to coexist in the
same space as the real world [1]. In AR implementation
generally, the media used is a video camera and a marker. By
determining the position detected form the marker, AR system
can position the computer generated object to look like a real
object in real life.
Haar-like features are digital image features used in object
recognition. This method works by changing a part of image
(Region of Interest) into a value. This method is one of the
common methods for face detection with high speed and
accuracy. They owe their name to their intuitive similarity
with Haar wavelets and commonly used in the real-time face
detector.
One of a significant breakthrough is in two dimensional
face processing. The result of this research is a possibility to
process a two dimensional (2D) face data into a three
dimensional (3D) space. In turn, these development resulted in
a larger face detection implementation area.
A Haar-like feature considers adjacent rectangular regions
at a specific location in a detection window, sums up the pixel
intensities in each region and calculates the difference
between these sums. This difference is then used to categorize
subsections of an image.
978-1-4673-2317-8/12/$31.00©2013 IEEE
38
Basically,
B
haaar-like featurres working with changinng a
regio
on near an im
mage into pix
xel area basedd on its classsifier.
The next step will be to calcuulate the intennsity differencce of
h pixel area. The
T resulting difference is used
u
to categorize
each
each
h area for an object
o
detection. Example of Haar-like feature
impllementation foor face detectiion can be seeen on Fig.1.
B
Basically, Active Shape Model
M
works w
with steps succh as
these:
Locating a bettter position for
f points in arround the poinnts in
(1) L
tthe image;
(2) U
Updating parameters (Xb, Yb, s, θ, b) shape and poses
p
aaccording the new position found in step 1;
(3) D
Deciding consstraints on paarameter b to ensure a matcching
sshape (ex: |bi| < 3√λi);
(4) R
Repeat the stteps until con
nvergent conddition is achiieved
((convergent means
m
there are no signnificant differrence
bbetween an iteeration and thee iteration beffore).
IIn practice, thhe iteration willl look in the image
i
aroundd each
poinnt for a better position and update the model
m
parameteers to
get the
t best matcch to these neew found posiition. The sim
mplest
wayy to get betterr position is to
t acquire thee location of edge
whicch have the highest
h
intenssity (with orieentation if kn
nown)
alonng the profile. The location of this edge iss a new location of
moddel’s point. Evven though, thhe best locatioon should be on
o the
stronng edges which
w
combin
ned by the references from
statiistical model point.
p
F
Figure
1. Haar Likke Implementatioon.
C. Active
A
Shape M
Model
A landmark reepresents a disstinguishable point
p
exist in most
of th
he images undder observatioon, for exampple, the locatioon of
the right eye puupil [2]. We can locate facial
f
featurees by
locatting landmarkks.
T
To get better result, the Acctive Shape Model
M
is combbined
withh Gaussian Pyyramid. With subsampling, the image will
w be
resizzed and it is stored
s
as temp
porary data. The
T next step is to
calculate the ASM
M result from
m one image to another im
mage
withh different sizze and then im
mage with thee best locationn will
be thhe best result.
A set of landm
marks forms a shape. Shappes are represeented
as vectors
v
of poiints. We alignn one shape to another with
w a
similarity transfo
form (allowin
ng translatioon, scaling, and
rotattion) that m
minimizes thee average euuclidean disttance
betw
ween shape points. The mean
m
shape is the mean of the
align
ned training shapes (whicch in our caase are manuually
land
dmarked faces)).
IIt should be considered
c
thaat a point in the model dooesn’t
alwaays located in highest inttensity edge in local struccture.
Those points can be representin
ng lower intennsity edge or other
imagge structure. The
T best apprroach is by annalyzing whatt it is
to bee found in figuure 2.
The
T ASM starrts the searchh for landmarkks from the mean
m
shap
pe aligned to tthe position and
a size of thhe face determ
mined
by a global face detector. It then
t
repeats the
t followingg two
stepss until convergence
(i)
(
Suggest a tentative sh
hape by adjussting the locaations
of sh
hape points bby template matching
m
of the image tex
xture
arou
und each pointt
(ii)) Conform tthe tentative shape
s
to a gloobal shape model.
m
The individdual templatee matches are unreliable and the
shape moddel pools thee results of thhe weak tem
mplate
matchers too form a stron
nger overall cllassifier.
Figure 2. ASM Implementation.
Active
A
Shape Model (ASM
M) [5] is a method
m
wheree the
mod
del is iterativelly changed to
o match that model
m
into a model
m
in th
he image [6]. T
This method is using a flexxible model which
w
acqu
uired from a number of train
ning data sam
mple.
D. Pose
P
from Orttography and Scaling
S
with Iteration
I
P
Pose from Orthography
O
a
and
Scaling with Iteratioon or
POS
SIT [7] is an algorithm
a
whiich can be useed for estimatte the
posiition of particuular object in 3D field. Thiss algorithm orrigins
from
m Dementhon on year 1995 [8]. For meassuring a pose of an
objeect, at least foour non-coplannar points in the image’s object
o
are required.
r
Theerefore, a 3D projection fieeld from 2D points
p
whicch have been found will bee used for this algorithm’s input.
i
It will
w estimatinng perspectiv
ve projection based on object
o
ortography projecction from scaalated object. The result of this
projection is a rootation matrixx and translaation vector of
o the
objeect.
Given
G
a guessed position inn a picture, AS
SM iterativelyy will
be matched
m
withh the image. By choosingg a set of shape
s
paraameter b for P
Point Distribu
ution Model, the shape of the
mod
del can be defi
fined in a coorrdinate frame which centered in
the object.
o
Instancce X in the im
mage’s model can
c be construucted
by defining
d
position, orientationn, and scale.
X M(s, )[x] X c
(1)
wherre: X c ( X c ,Yc , ..., X c , Yc ) ; M(s,θ)[.] is
i a rotation with
w θ
degrree and scale with value off s; (Xc, Yc) iss a center possition
of an
n image’s fram
me model.
T
T
Therefore, thee purpose off this algorithhm is to calcculate
rotattion matrix and
a translationn vector of an
a object. Rottation
matrrix R from an object is a matrix which consisst of
coorrdinates from three vectorss i, j, and k. These
T
three veectors
com
mes from an obbject coordinaate system whiich is a result from
39
a caamera coordinnate system. The purpose of the rotation
matrrix is to calcuulate a cameraa coordinate syystem from object
o
coorrdinate system
m such as M0Mi the muultiplication from
M0Mi, i between first
f
row of thhe matrix and M0Mi vector,, will
creatte a vector prrojection to i vector
v
which located in camera
coorrdinate. For eexample, Xi – X0 coordinaate from M0Mi, as
long
g M0Mi coorddinate and i vector
v
row loccated in the same
s
coorrdinate system
m then it will gives
g
result such as these:
iu i v i w
R ju jv jw
(2)
ku k v kw
(5) IIf i ( n ) i ( n 1) is greateer than the thrreshold then n = n
+ 1 and repeatt step 3.
(6) IIf it is less thhan the thresh
hold then dispplay the Pose with
llast iterative’ss value.
O
OM0 translattion vector iss OM0 = Om
m0/s and rottation
m
marix is a mattrix with i, j, and
a k vector rows.
IIt should be considered that
t
the objecct reference point
shouuld located in the beginning
g of coordinatte axis. On figgure 3
an object
o
projection is shown in the shape off cube with PO
OSIT
algoorithm.
wherre iu, iv, iw arre i coordinattes in object coordinate syystem
(M0u,
u M0v, M0w).
The
T rotation m
matrix can be constructed from
f
a calculation
of i and j in the object coorddinate system.. k vector wiill be
creatted from a cross-productt from i x j. While forr the
transslation vector, T is OM0 veector betweenn projection points
p
O and
a
object reference poin
nts M0. Thee coordinatess for
transslation vector is shown: X0, Y0, and Z0.
Image
I
2 shows a persspective projjection (mi) and
ortho
ogonal projecction which already
a
scaledd (Pi) from object
o
poin
nt Mi which haave a reference point M0.
Figuure 3. The result from
f
POSIT algoorithm.
The
T steps for P
POSIT algoritthm follow:
(1) Calculating
C
ann object featuure points whhich positioneed in
image.
i ( 0 ) 0 , ( i 1...N 1), n 1
I
(2)
(2) Initialize:
E. Head
H
Pose Esttimation
T
The estimatio
on of user’s head
h
pose [9] when usingg this
appllication will be
b using thesee algorithms: Haar-like
H
feattures,
Actiive Shape Moodel, and POSIT. Haar-likke features wiill be
usedd for face deteection, Activee Shape Modeel will be useed for
gainning the user’’s face point, and POSIT algorithm wiill be
usedd for acquiring
g a user’s facee. Poses will be
b calculated based
b
on face feature points whichh generated by Active Shape
S
Moddel.
(3) Calculate iteraatively i, j, annd Z0:
- Calculate X’ image vecto
or which N-11 coordinate from
formula :
x , (i 1...N 1), n 1
X' xi 1
(3)
0
i(n 1)
And calculatte Y’ image vector
v
with N--1 coordinate from
formula:
Y' yi (1 i (n1) ) y0 , (i 1...N 1), n 1
(4)
- multiply 3x((N-1) B matrix
x with N-1 cooordinate vecttor to
calculate I and J vecctors which consist of three
t
coordinates.
(5)
I = B.X’ dan J = B.Y’
- calculate s sscale from prrojection as an
a average froom I
and J vectorr normalizationn.
s1 (I.I)1/ 2 , s2 (J.J)1/ 2 , s (s1 s2 ) / 2
(6)
- calculate i annd j vector.
i I / s1 , j J / s 2
(7)
R
thhe value of i((n) (i 1...N 1), n 1 .
(4) Recalculate
- Calculate k vvector from thhe cross produuct of i and j.
- Calculate z coordinate from
fr
Z0 whicch is a translation
vector with Z0 = f / s wherre f is a camerra’s focal lenggth.
- Calculate:
1
i (n)
M 0M i .k
(8)
Z0
Figuure 4. Points of Faace Feature Used for Projection inn POSIT Algorithm
m [9].
III. RESULT AND TESTING
In an augmeented reality application,
a
thee user will waatch a
com
mbination betw
ween a real wo
orld conditionn and virtual world.
w
The real world condition whhere the userr located will be
w
conditionn will
capttured using weebcam. Whilee the virtual world
be generated ussing computeer graphics. The process that
40
Through testing, it is found out that the best brightness for
this face detection is an average one. A high brightness or low
brightness will decrease the accuracy of the face detection.
happen in combining both of this world will involving several
algorithms mentioned before.
In this application the user will have to positioned in front
of the webcam and activate it. After that, the display of user’s
face will be automatically displayed with virtual eyeglasses
frame. With the help of webcam, this simulator software will
be able to operate real time just like a realistic mirror for user.
There are several things that acquired after the testing is
conducted.
A. Face Detection and ASM Accuracy
In the next testing, it will be seen that face tracking using
Active Shape Model which used in the software will have
significant speed change based on input image resolution.
Figure 5. Face Detection in Several Types of Brightness.
By several testing with different input image resolution
and same proporsional resolution, it is concluded that low
resolution image will increase the speed of the software and
the number of frame per second for the output. Table 1 shows
the fps with different resolution.
TABLE I.
The next testing will be involved with object in image.
Often, there are several objects in the image that can block the
face. With Active Shape Model algorithm then the face
detection will have increased accuracy even if parts of the face
is blocked by object. It should be remembered that the
maximum face point feature that blocked by objects is ¼. The
result of the face detection can be seen in figure 6.
FRAME PER SECOND COMPARISON
Resolution
Frame Per Second (FPS)
320 x 240
15 – 18
480 x 360
12 – 15
640 x 480
9 – 11
800 x 600
6–8
The number of iteration for Active Shape Model process to
get the best point location not depends on the image size itself.
From several testing, it is known that the size from the input
image does not have a significant impact with the number of
iteration. It is because of the effect of subsampling that used in
the face tracking process.
For the example, the number of iteration from image with
640x480 resolution is not too differ than image with 480x360
resolution. Table 2 shows the effect of resolution to the
numbers of iteration.
TABLE II. NUMBER OF ITERATION COMPARISON.
Resolution
Number of iteration
320 x 240
10-16
480 x 360
12-15
640 x 480
11-17
800 x 600
14-20
Figure 6. Face Detection in Partly Blocked Face.
B. Head Pose Estimation Accuracy
The testing for the user’s head pose is using POSIT
algorithm which resulted in rotation matrix and translation
vector as the output. The result from POSIT algorithm is
acceptable even though it is not precise since the feature face
points detection with Active Shape Model is not constant.
Therefore, the output for the user’s head pose is not precise.
The accuracy of Active Shape Model depends on several
factors, such as brightness, image sharpness, and noise. For
brightness, it is known that the image brightness intensity will
affect the accuracy of detection. The comparison between
brightness intensity and active shape model result will be
shown in figure 5.
Mistakes in user’s head pose estimation is causing a shift
in the location of eyeglasses frame in the program output.
However, the shift is still acceptable and doesn’t disturbing
the primary functionality of the software. The shift in the head
pose estimation can be seen in figure 7.
41
By testing, the simulator software developed is acceptable
in the terms of speed and accuracy. Eyeglasses frame provided
in the simulator is constructed using OpenGL and located in
the correct place according the face detection result. The pose
shown before is created by masking methods. The result of the
application can be shown in figure 8.
ACKNOWLEDGMENT
This work was supported and fully funded by Directorate
General of Higher Education, Ministry of Education and
Culture of Indonesia.
REFERENCES
[1]
[2]
[3]
[4]
[5]
[6]
[7]
[8]
Figure 7. User’s Head Pose Estimation
[9]
[10]
[11]
[12]
[13]
Figure 8. The result of eyeglasses frame simulator software.
[14]
[15]
IV. CONCLUSION
By analyze the implementation of AR, several things can
be concluded:
1. Augmented Reality application should be supported by
correct algorithm since it is commonly requires real-time
processing.
2. Active shape model algorithm have a particular error level
because shape model factor used and algorithm’s
efficiency.
3. The choice of correct input image in the implementation of
active shape model will have a significant impact on the
output. Filtering and the environment where the image is
taken are important factors for achieving best results in the
active shape model implementation.
4. The quality of POSIT algorithm’s output will have a high
dependency on the quality of its input image.
5. The user movement speed have great impact on frame
positioning accuracy because of the ASM and POSIT
processing delay.
6. For high resolution video ASM and POSIT will need more
time to process and cause some delay.
[16]
[17]
42
Ronald Azuma, Recent Advances in Augmented Reality, IEEE
Transaction on Computer Graphics and Application, 1997.
Stephen Milborrow and Fred Nicolls, Locating Facial Features with an
Extended Active Shape Model, Computer Vision ECCV, Springer
Berlin Heidelberg, 2008.
Paul Viola, Michael J. Jones, Rapid Object Detection using Boosted
Cascade of Simple Features, IEEE CVPR, 2001.
T. F. Cootes and C.J. Taylor. Statistical Models of Appearance for
Computer Vision. University of Manchester, 2004.
T. F. Cootes, C.J. Taylor, D.H. Cooper, J. Graham. Active Shape
Models –Their Training and Application, 1995.
T. F. Cootes, Active Shape Models – Smart Snakes, British Machine
Vision Conference, 1992.
Marco Treiber, An Introduction to Object Recognition Selected
Algorithms for a Wide Variety of Applications, Springer London
Dordrecht Heidelberg New York, ISSN 1617-7916 ISBN 978-1-84996234-6 e-ISBN 978-1-84996-235-3 DOI 10.1007/978-1-84996-235-3,
2010.
D. Dementhon and Larry S. D., Model-Based Object Pose in 25 Lines
of Code, International Journal of Computer Vision, 15, pp. 123-141,
June 1995.
P. Martins and J. Batista, Monocular Head Pose Estimation, Paper
presented at International Conference on Image Analysis and
Recognition, 2008.
K. Baker, Singular Value Decomposition Tutorial, The Ohio State
University, 2005.
R. Gonzales and R. Woods, Digital Image Processing, second edition.
Prentice Hall, 2002.
Haralick, Analysis and Solutions of the Three Point Perspective Pose
Estimation Problem, Proc. IEEE Conf. Computer Vision and Pattern
Recognition, Maui, Hawaii, 1991.
Paul Viola, Michael J. Jones, Robust Real-Time Face Detection, Proc.
International Journal of Computer Vision 57 (2), Kluwer Academic
Publishers. Manufactured in The Netherlands, 2004, p.137–154.
Ed Angel, OpenGL Transformations: Interactive Computer Graphics
4E, Addison-Wesley, 2005.
Lars Kruger, Model Based Object Classification and Localisation in
Multiocular Images, Dissertation, November 2007.
F. Abdat, C. Maaoui, and A. Pruski, Real Facial Feature Points
Tracking With Pyramidal Lucas-Kanade Algorithm, IEEE ROMAN08, The 17th International Symposium on Robot and Human
Interactive Communication, Germany, 2008.
Paul A. Viola and Michael J. Jones, Robust real-time face detection.
Proc. in ICCV, 2001.
Through testing, it is found out that the best brightness for
this face detection is an average one. A high brightness or low
brightness will decrease the accuracy of the face detection.
happen in combining both of this world will involving several
algorithms mentioned before.
In this application the user will have to positioned in front
of the webcam and activate it. After that, the display of user’s
face will be automatically displayed with virtual eyeglasses
frame. With the help of webcam, this simulator software will
be able to operate real time just like a realistic mirror for user.
There are several things that acquired after the testing is
conducted.
A. Face Detection and ASM Accuracy
In the next testing, it will be seen that face tracking using
Active Shape Model which used in the software will have
significant speed change based on input image resolution.
Figure 5. Face Detection in Several Types of Brightness.
By several testing with different input image resolution
and same proporsional resolution, it is concluded that low
resolution image will increase the speed of the software and
the number of frame per second for the output. Table 1 shows
the fps with different resolution.
TABLE I.
The next testing will be involved with object in image.
Often, there are several objects in the image that can block the
face. With Active Shape Model algorithm then the face
detection will have increased accuracy even if parts of the face
is blocked by object. It should be remembered that the
maximum face point feature that blocked by objects is ¼. The
result of the face detection can be seen in figure 6.
FRAME PER SECOND COMPARISON
Resolution
Frame Per Second (FPS)
320 x 240
15 – 18
480 x 360
12 – 15
640 x 480
9 – 11
800 x 600
6–8
The number of iteration for Active Shape Model process to
get the best point location not depends on the image size itself.
From several testing, it is known that the size from the input
image does not have a significant impact with the number of
iteration. It is because of the effect of subsampling that used in
the face tracking process.
For the example, the number of iteration from image with
640x480 resolution is not too differ than image with 480x360
resolution. Table 2 shows the effect of resolution to the
numbers of iteration.
TABLE II. NUMBER OF ITERATION COMPARISON.
Resolution
Number of iteration
320 x 240
10-16
480 x 360
12-15
640 x 480
11-17
800 x 600
14-20
Figure 6. Face Detection in Partly Blocked Face.
B. Head Pose Estimation Accuracy
The testing for the user’s head pose is using POSIT
algorithm which resulted in rotation matrix and translation
vector as the output. The result from POSIT algorithm is
acceptable even though it is not precise since the feature face
points detection with Active Shape Model is not constant.
Therefore, the output for the user’s head pose is not precise.
The accuracy of Active Shape Model depends on several
factors, such as brightness, image sharpness, and noise. For
brightness, it is known that the image brightness intensity will
affect the accuracy of detection. The comparison between
brightness intensity and active shape model result will be
shown in figure 5.
Mistakes in user’s head pose estimation is causing a shift
in the location of eyeglasses frame in the program output.
However, the shift is still acceptable and doesn’t disturbing
the primary functionality of the software. The shift in the head
pose estimation can be seen in figure 7.
41
Augmented Reality 3D Eyeglasses
Frame Simulator Using Active Shape Model
and Real Time Face Tracking
Endang Setyati, Yosi Kristian,
Yuliana Melita Pranoto
David Alexandre
Department of Industrial Management
National Taiwan University of Science and Technology
Taipei City, Taiwan
[email protected]
Department of Information Technology
Sekolah Tinggi Teknik Surabaya (STTS)
Surabaya, Indonesia.
[email protected], [email protected], [email protected]
Abstract—Combination of real-world object and computergenerated object know as Augmented Reality (AR) is considered
media of the future. In AR the computer-generated object is a
result of a three-dimensional graphics rendering. An AR
software designed to provide real-time interactivity with the user
in forms of video. In this paper we implement AR technology to
simulate eyeglasses frame model simulation so user can try many
eyeglasses frame models and see how they look without actually
have to try the eyeglasses. After activating this software, users in
the webcam’s range will be processed automatically, a 3D
eyeglasses frame model will be positioned in the user’s face. To
develop this software we must first implement a fast and effective
face tracking algorithm so user can feel the AR as a real object.
The speed of face detection or face tracking used in this software
will have a significant effect on user experience. On the other
hand the higher the video resolution the slower the face tracking
algorithm will be. In this paper we use a medium video resolution
(640 x 480) so we can increase the speed of face tracking
algorithm to make the frame per second user experience
acceptable. This simulation is an example of an effective AR
technology usage.
II. FACE TRACKING ON VIDEO
A. Augmented Reality
In concept, AR is a real world condition which modified by
computer into a new condition. Input used in AR varies from
sound, video, image, to GPS. In the larger scale, AR need
various sensors and special environment. The way of AR
working is involving an input which is a position or
coordinates in real environment and output which is a
generated object from AR calculation. The AR generated
object is a result from a 3D object.
Generally AR can be divided into two groups, marker-less
tracking and marker-based tracking. Marker-based detection
on augmented reality will involves marker as a basis for
defining input. Therefore, one of the algorithms used in this
simulation will be a marker tracking algorithm for finding a
marker type and position for generating AR object with
accurate position. The result is a picture from 3D object which
its position adjusted based on the marker position. Of course,
the user is required to have the marker mentioned above for
activating AR object.
Keywords—Technology Innovation, Augmented Reality, Face
Tracking, Eyeglasses Frame Simulator.
In marker-less tracking, AR defines its input by using
algorithms such as object tracking, hand tracking, or face
tracking. The result for these algorithms will be used to define
the position of AR generated object. In its development,
various methods and algorithms have been used for generating
more accurate and better AR. In addition, there are also a
significant enhancement in the supporting devices, such as
glasses for watching the AR result directly, etc.
I. INTRODUCTION
Face tracking (real time face detection) algorithm is one of
the fast-growing subjects in the computer vision. Numerous
algorithm and methods have been applied to accomplish a fast
and robust face tracking system. A good face tracking system
is able to track human faces on video stream with acceptable
frame rates and to detect only human face, not a human alike
face.
B. Haar-Like Feature
An AR system supplements the real world with virtual
(computer-generated) objects that appear to coexist in the
same space as the real world [1]. In AR implementation
generally, the media used is a video camera and a marker. By
determining the position detected form the marker, AR system
can position the computer generated object to look like a real
object in real life.
Haar-like features are digital image features used in object
recognition. This method works by changing a part of image
(Region of Interest) into a value. This method is one of the
common methods for face detection with high speed and
accuracy. They owe their name to their intuitive similarity
with Haar wavelets and commonly used in the real-time face
detector.
One of a significant breakthrough is in two dimensional
face processing. The result of this research is a possibility to
process a two dimensional (2D) face data into a three
dimensional (3D) space. In turn, these development resulted in
a larger face detection implementation area.
A Haar-like feature considers adjacent rectangular regions
at a specific location in a detection window, sums up the pixel
intensities in each region and calculates the difference
between these sums. This difference is then used to categorize
subsections of an image.
978-1-4673-2317-8/12/$31.00©2013 IEEE
38
Basically,
B
haaar-like featurres working with changinng a
regio
on near an im
mage into pix
xel area basedd on its classsifier.
The next step will be to calcuulate the intennsity differencce of
h pixel area. The
T resulting difference is used
u
to categorize
each
each
h area for an object
o
detection. Example of Haar-like feature
impllementation foor face detectiion can be seeen on Fig.1.
B
Basically, Active Shape Model
M
works w
with steps succh as
these:
Locating a bettter position for
f points in arround the poinnts in
(1) L
tthe image;
(2) U
Updating parameters (Xb, Yb, s, θ, b) shape and poses
p
aaccording the new position found in step 1;
(3) D
Deciding consstraints on paarameter b to ensure a matcching
sshape (ex: |bi| < 3√λi);
(4) R
Repeat the stteps until con
nvergent conddition is achiieved
((convergent means
m
there are no signnificant differrence
bbetween an iteeration and thee iteration beffore).
IIn practice, thhe iteration willl look in the image
i
aroundd each
poinnt for a better position and update the model
m
parameteers to
get the
t best matcch to these neew found posiition. The sim
mplest
wayy to get betterr position is to
t acquire thee location of edge
whicch have the highest
h
intenssity (with orieentation if kn
nown)
alonng the profile. The location of this edge iss a new location of
moddel’s point. Evven though, thhe best locatioon should be on
o the
stronng edges which
w
combin
ned by the references from
statiistical model point.
p
F
Figure
1. Haar Likke Implementatioon.
C. Active
A
Shape M
Model
A landmark reepresents a disstinguishable point
p
exist in most
of th
he images undder observatioon, for exampple, the locatioon of
the right eye puupil [2]. We can locate facial
f
featurees by
locatting landmarkks.
T
To get better result, the Acctive Shape Model
M
is combbined
withh Gaussian Pyyramid. With subsampling, the image will
w be
resizzed and it is stored
s
as temp
porary data. The
T next step is to
calculate the ASM
M result from
m one image to another im
mage
withh different sizze and then im
mage with thee best locationn will
be thhe best result.
A set of landm
marks forms a shape. Shappes are represeented
as vectors
v
of poiints. We alignn one shape to another with
w a
similarity transfo
form (allowin
ng translatioon, scaling, and
rotattion) that m
minimizes thee average euuclidean disttance
betw
ween shape points. The mean
m
shape is the mean of the
align
ned training shapes (whicch in our caase are manuually
land
dmarked faces)).
IIt should be considered
c
thaat a point in the model dooesn’t
alwaays located in highest inttensity edge in local struccture.
Those points can be representin
ng lower intennsity edge or other
imagge structure. The
T best apprroach is by annalyzing whatt it is
to bee found in figuure 2.
The
T ASM starrts the searchh for landmarkks from the mean
m
shap
pe aligned to tthe position and
a size of thhe face determ
mined
by a global face detector. It then
t
repeats the
t followingg two
stepss until convergence
(i)
(
Suggest a tentative sh
hape by adjussting the locaations
of sh
hape points bby template matching
m
of the image tex
xture
arou
und each pointt
(ii)) Conform tthe tentative shape
s
to a gloobal shape model.
m
The individdual templatee matches are unreliable and the
shape moddel pools thee results of thhe weak tem
mplate
matchers too form a stron
nger overall cllassifier.
Figure 2. ASM Implementation.
Active
A
Shape Model (ASM
M) [5] is a method
m
wheree the
mod
del is iterativelly changed to
o match that model
m
into a model
m
in th
he image [6]. T
This method is using a flexxible model which
w
acqu
uired from a number of train
ning data sam
mple.
D. Pose
P
from Orttography and Scaling
S
with Iteration
I
P
Pose from Orthography
O
a
and
Scaling with Iteratioon or
POS
SIT [7] is an algorithm
a
whiich can be useed for estimatte the
posiition of particuular object in 3D field. Thiss algorithm orrigins
from
m Dementhon on year 1995 [8]. For meassuring a pose of an
objeect, at least foour non-coplannar points in the image’s object
o
are required.
r
Theerefore, a 3D projection fieeld from 2D points
p
whicch have been found will bee used for this algorithm’s input.
i
It will
w estimatinng perspectiv
ve projection based on object
o
ortography projecction from scaalated object. The result of this
projection is a rootation matrixx and translaation vector of
o the
objeect.
Given
G
a guessed position inn a picture, AS
SM iterativelyy will
be matched
m
withh the image. By choosingg a set of shape
s
paraameter b for P
Point Distribu
ution Model, the shape of the
mod
del can be defi
fined in a coorrdinate frame which centered in
the object.
o
Instancce X in the im
mage’s model can
c be construucted
by defining
d
position, orientationn, and scale.
X M(s, )[x] X c
(1)
wherre: X c ( X c ,Yc , ..., X c , Yc ) ; M(s,θ)[.] is
i a rotation with
w θ
degrree and scale with value off s; (Xc, Yc) iss a center possition
of an
n image’s fram
me model.
T
T
Therefore, thee purpose off this algorithhm is to calcculate
rotattion matrix and
a translationn vector of an
a object. Rottation
matrrix R from an object is a matrix which consisst of
coorrdinates from three vectorss i, j, and k. These
T
three veectors
com
mes from an obbject coordinaate system whiich is a result from
39
a caamera coordinnate system. The purpose of the rotation
matrrix is to calcuulate a cameraa coordinate syystem from object
o
coorrdinate system
m such as M0Mi the muultiplication from
M0Mi, i between first
f
row of thhe matrix and M0Mi vector,, will
creatte a vector prrojection to i vector
v
which located in camera
coorrdinate. For eexample, Xi – X0 coordinaate from M0Mi, as
long
g M0Mi coorddinate and i vector
v
row loccated in the same
s
coorrdinate system
m then it will gives
g
result such as these:
iu i v i w
R ju jv jw
(2)
ku k v kw
(5) IIf i ( n ) i ( n 1) is greateer than the thrreshold then n = n
+ 1 and repeatt step 3.
(6) IIf it is less thhan the thresh
hold then dispplay the Pose with
llast iterative’ss value.
O
OM0 translattion vector iss OM0 = Om
m0/s and rottation
m
marix is a mattrix with i, j, and
a k vector rows.
IIt should be considered that
t
the objecct reference point
shouuld located in the beginning
g of coordinatte axis. On figgure 3
an object
o
projection is shown in the shape off cube with PO
OSIT
algoorithm.
wherre iu, iv, iw arre i coordinattes in object coordinate syystem
(M0u,
u M0v, M0w).
The
T rotation m
matrix can be constructed from
f
a calculation
of i and j in the object coorddinate system.. k vector wiill be
creatted from a cross-productt from i x j. While forr the
transslation vector, T is OM0 veector betweenn projection points
p
O and
a
object reference poin
nts M0. Thee coordinatess for
transslation vector is shown: X0, Y0, and Z0.
Image
I
2 shows a persspective projjection (mi) and
ortho
ogonal projecction which already
a
scaledd (Pi) from object
o
poin
nt Mi which haave a reference point M0.
Figuure 3. The result from
f
POSIT algoorithm.
The
T steps for P
POSIT algoritthm follow:
(1) Calculating
C
ann object featuure points whhich positioneed in
image.
i ( 0 ) 0 , ( i 1...N 1), n 1
I
(2)
(2) Initialize:
E. Head
H
Pose Esttimation
T
The estimatio
on of user’s head
h
pose [9] when usingg this
appllication will be
b using thesee algorithms: Haar-like
H
feattures,
Actiive Shape Moodel, and POSIT. Haar-likke features wiill be
usedd for face deteection, Activee Shape Modeel will be useed for
gainning the user’’s face point, and POSIT algorithm wiill be
usedd for acquiring
g a user’s facee. Poses will be
b calculated based
b
on face feature points whichh generated by Active Shape
S
Moddel.
(3) Calculate iteraatively i, j, annd Z0:
- Calculate X’ image vecto
or which N-11 coordinate from
formula :
x , (i 1...N 1), n 1
X' xi 1
(3)
0
i(n 1)
And calculatte Y’ image vector
v
with N--1 coordinate from
formula:
Y' yi (1 i (n1) ) y0 , (i 1...N 1), n 1
(4)
- multiply 3x((N-1) B matrix
x with N-1 cooordinate vecttor to
calculate I and J vecctors which consist of three
t
coordinates.
(5)
I = B.X’ dan J = B.Y’
- calculate s sscale from prrojection as an
a average froom I
and J vectorr normalizationn.
s1 (I.I)1/ 2 , s2 (J.J)1/ 2 , s (s1 s2 ) / 2
(6)
- calculate i annd j vector.
i I / s1 , j J / s 2
(7)
R
thhe value of i((n) (i 1...N 1), n 1 .
(4) Recalculate
- Calculate k vvector from thhe cross produuct of i and j.
- Calculate z coordinate from
fr
Z0 whicch is a translation
vector with Z0 = f / s wherre f is a camerra’s focal lenggth.
- Calculate:
1
i (n)
M 0M i .k
(8)
Z0
Figuure 4. Points of Faace Feature Used for Projection inn POSIT Algorithm
m [9].
III. RESULT AND TESTING
In an augmeented reality application,
a
thee user will waatch a
com
mbination betw
ween a real wo
orld conditionn and virtual world.
w
The real world condition whhere the userr located will be
w
conditionn will
capttured using weebcam. Whilee the virtual world
be generated ussing computeer graphics. The process that
40
Through testing, it is found out that the best brightness for
this face detection is an average one. A high brightness or low
brightness will decrease the accuracy of the face detection.
happen in combining both of this world will involving several
algorithms mentioned before.
In this application the user will have to positioned in front
of the webcam and activate it. After that, the display of user’s
face will be automatically displayed with virtual eyeglasses
frame. With the help of webcam, this simulator software will
be able to operate real time just like a realistic mirror for user.
There are several things that acquired after the testing is
conducted.
A. Face Detection and ASM Accuracy
In the next testing, it will be seen that face tracking using
Active Shape Model which used in the software will have
significant speed change based on input image resolution.
Figure 5. Face Detection in Several Types of Brightness.
By several testing with different input image resolution
and same proporsional resolution, it is concluded that low
resolution image will increase the speed of the software and
the number of frame per second for the output. Table 1 shows
the fps with different resolution.
TABLE I.
The next testing will be involved with object in image.
Often, there are several objects in the image that can block the
face. With Active Shape Model algorithm then the face
detection will have increased accuracy even if parts of the face
is blocked by object. It should be remembered that the
maximum face point feature that blocked by objects is ¼. The
result of the face detection can be seen in figure 6.
FRAME PER SECOND COMPARISON
Resolution
Frame Per Second (FPS)
320 x 240
15 – 18
480 x 360
12 – 15
640 x 480
9 – 11
800 x 600
6–8
The number of iteration for Active Shape Model process to
get the best point location not depends on the image size itself.
From several testing, it is known that the size from the input
image does not have a significant impact with the number of
iteration. It is because of the effect of subsampling that used in
the face tracking process.
For the example, the number of iteration from image with
640x480 resolution is not too differ than image with 480x360
resolution. Table 2 shows the effect of resolution to the
numbers of iteration.
TABLE II. NUMBER OF ITERATION COMPARISON.
Resolution
Number of iteration
320 x 240
10-16
480 x 360
12-15
640 x 480
11-17
800 x 600
14-20
Figure 6. Face Detection in Partly Blocked Face.
B. Head Pose Estimation Accuracy
The testing for the user’s head pose is using POSIT
algorithm which resulted in rotation matrix and translation
vector as the output. The result from POSIT algorithm is
acceptable even though it is not precise since the feature face
points detection with Active Shape Model is not constant.
Therefore, the output for the user’s head pose is not precise.
The accuracy of Active Shape Model depends on several
factors, such as brightness, image sharpness, and noise. For
brightness, it is known that the image brightness intensity will
affect the accuracy of detection. The comparison between
brightness intensity and active shape model result will be
shown in figure 5.
Mistakes in user’s head pose estimation is causing a shift
in the location of eyeglasses frame in the program output.
However, the shift is still acceptable and doesn’t disturbing
the primary functionality of the software. The shift in the head
pose estimation can be seen in figure 7.
41
By testing, the simulator software developed is acceptable
in the terms of speed and accuracy. Eyeglasses frame provided
in the simulator is constructed using OpenGL and located in
the correct place according the face detection result. The pose
shown before is created by masking methods. The result of the
application can be shown in figure 8.
ACKNOWLEDGMENT
This work was supported and fully funded by Directorate
General of Higher Education, Ministry of Education and
Culture of Indonesia.
REFERENCES
[1]
[2]
[3]
[4]
[5]
[6]
[7]
[8]
Figure 7. User’s Head Pose Estimation
[9]
[10]
[11]
[12]
[13]
Figure 8. The result of eyeglasses frame simulator software.
[14]
[15]
IV. CONCLUSION
By analyze the implementation of AR, several things can
be concluded:
1. Augmented Reality application should be supported by
correct algorithm since it is commonly requires real-time
processing.
2. Active shape model algorithm have a particular error level
because shape model factor used and algorithm’s
efficiency.
3. The choice of correct input image in the implementation of
active shape model will have a significant impact on the
output. Filtering and the environment where the image is
taken are important factors for achieving best results in the
active shape model implementation.
4. The quality of POSIT algorithm’s output will have a high
dependency on the quality of its input image.
5. The user movement speed have great impact on frame
positioning accuracy because of the ASM and POSIT
processing delay.
6. For high resolution video ASM and POSIT will need more
time to process and cause some delay.
[16]
[17]
42
Ronald Azuma, Recent Advances in Augmented Reality, IEEE
Transaction on Computer Graphics and Application, 1997.
Stephen Milborrow and Fred Nicolls, Locating Facial Features with an
Extended Active Shape Model, Computer Vision ECCV, Springer
Berlin Heidelberg, 2008.
Paul Viola, Michael J. Jones, Rapid Object Detection using Boosted
Cascade of Simple Features, IEEE CVPR, 2001.
T. F. Cootes and C.J. Taylor. Statistical Models of Appearance for
Computer Vision. University of Manchester, 2004.
T. F. Cootes, C.J. Taylor, D.H. Cooper, J. Graham. Active Shape
Models –Their Training and Application, 1995.
T. F. Cootes, Active Shape Models – Smart Snakes, British Machine
Vision Conference, 1992.
Marco Treiber, An Introduction to Object Recognition Selected
Algorithms for a Wide Variety of Applications, Springer London
Dordrecht Heidelberg New York, ISSN 1617-7916 ISBN 978-1-84996234-6 e-ISBN 978-1-84996-235-3 DOI 10.1007/978-1-84996-235-3,
2010.
D. Dementhon and Larry S. D., Model-Based Object Pose in 25 Lines
of Code, International Journal of Computer Vision, 15, pp. 123-141,
June 1995.
P. Martins and J. Batista, Monocular Head Pose Estimation, Paper
presented at International Conference on Image Analysis and
Recognition, 2008.
K. Baker, Singular Value Decomposition Tutorial, The Ohio State
University, 2005.
R. Gonzales and R. Woods, Digital Image Processing, second edition.
Prentice Hall, 2002.
Haralick, Analysis and Solutions of the Three Point Perspective Pose
Estimation Problem, Proc. IEEE Conf. Computer Vision and Pattern
Recognition, Maui, Hawaii, 1991.
Paul Viola, Michael J. Jones, Robust Real-Time Face Detection, Proc.
International Journal of Computer Vision 57 (2), Kluwer Academic
Publishers. Manufactured in The Netherlands, 2004, p.137–154.
Ed Angel, OpenGL Transformations: Interactive Computer Graphics
4E, Addison-Wesley, 2005.
Lars Kruger, Model Based Object Classification and Localisation in
Multiocular Images, Dissertation, November 2007.
F. Abdat, C. Maaoui, and A. Pruski, Real Facial Feature Points
Tracking With Pyramidal Lucas-Kanade Algorithm, IEEE ROMAN08, The 17th International Symposium on Robot and Human
Interactive Communication, Germany, 2008.
Paul A. Viola and Michael J. Jones, Robust real-time face detection.
Proc. in ICCV, 2001.
Through testing, it is found out that the best brightness for
this face detection is an average one. A high brightness or low
brightness will decrease the accuracy of the face detection.
happen in combining both of this world will involving several
algorithms mentioned before.
In this application the user will have to positioned in front
of the webcam and activate it. After that, the display of user’s
face will be automatically displayed with virtual eyeglasses
frame. With the help of webcam, this simulator software will
be able to operate real time just like a realistic mirror for user.
There are several things that acquired after the testing is
conducted.
A. Face Detection and ASM Accuracy
In the next testing, it will be seen that face tracking using
Active Shape Model which used in the software will have
significant speed change based on input image resolution.
Figure 5. Face Detection in Several Types of Brightness.
By several testing with different input image resolution
and same proporsional resolution, it is concluded that low
resolution image will increase the speed of the software and
the number of frame per second for the output. Table 1 shows
the fps with different resolution.
TABLE I.
The next testing will be involved with object in image.
Often, there are several objects in the image that can block the
face. With Active Shape Model algorithm then the face
detection will have increased accuracy even if parts of the face
is blocked by object. It should be remembered that the
maximum face point feature that blocked by objects is ¼. The
result of the face detection can be seen in figure 6.
FRAME PER SECOND COMPARISON
Resolution
Frame Per Second (FPS)
320 x 240
15 – 18
480 x 360
12 – 15
640 x 480
9 – 11
800 x 600
6–8
The number of iteration for Active Shape Model process to
get the best point location not depends on the image size itself.
From several testing, it is known that the size from the input
image does not have a significant impact with the number of
iteration. It is because of the effect of subsampling that used in
the face tracking process.
For the example, the number of iteration from image with
640x480 resolution is not too differ than image with 480x360
resolution. Table 2 shows the effect of resolution to the
numbers of iteration.
TABLE II. NUMBER OF ITERATION COMPARISON.
Resolution
Number of iteration
320 x 240
10-16
480 x 360
12-15
640 x 480
11-17
800 x 600
14-20
Figure 6. Face Detection in Partly Blocked Face.
B. Head Pose Estimation Accuracy
The testing for the user’s head pose is using POSIT
algorithm which resulted in rotation matrix and translation
vector as the output. The result from POSIT algorithm is
acceptable even though it is not precise since the feature face
points detection with Active Shape Model is not constant.
Therefore, the output for the user’s head pose is not precise.
The accuracy of Active Shape Model depends on several
factors, such as brightness, image sharpness, and noise. For
brightness, it is known that the image brightness intensity will
affect the accuracy of detection. The comparison between
brightness intensity and active shape model result will be
shown in figure 5.
Mistakes in user’s head pose estimation is causing a shift
in the location of eyeglasses frame in the program output.
However, the shift is still acceptable and doesn’t disturbing
the primary functionality of the software. The shift in the head
pose estimation can be seen in figure 7.
41