An Image Retrieval Method Based on Manifold Learning with Scale-Invariant Feature Control

TELKOMNIKA, Vol.14, No.3A, September 2016, pp. 252~258
ISSN: 1693-6930, accredited A by DIKTI, Decree No: 58/DIKTI/Kep/2013
DOI: 10.12928/TELKOMNIKA.v14i3A.4409



252

An Image Retrieval Method Based on Manifold Learning
with Scale-Invariant Feature Control
1,2,3,5

Haifeng Guo1, Shoubao Su*2, Jing Liu3, Zhoubao Sun4, Yonghua Xu5

School of Computer Engineering, Jinling Institute of Technology, Nanjing, Jiangsu, 211169, P. R.
China;
1
School of Computer and Information Engineering, Hohai University, Nanjing, Jiangsu, 210098, P. R.
China
4
Nanjing Audit University, Nanjing, Jiangsu, 211815, P. R. China

*Corresponding author, e-mail: showbo@jit.edu.cn

Abstract
Aiming at the problem of the traditional dimensionality reduction methods cannot recover the
inherent structure, and scale invariant feature transform (SIFT) achieving low precision when reinstating
images, an Image Retrieval Method Based on Manifold Learning with Scale-Invariant Feature is proposed.
It aims to find low-dimensional compact representations of high-dimensional observation data and explores
the inherent low and intrinsic dimension of data. The feature extraction method-SIFT and the adaptive
ISOMAP method are combined and conducted experiments on the ORL face image dataset. This paper
analyzes and discusses the problem of effects of the neighborhood parameter and the intrinsic dimension
size on the face image recognition.
Keywords: image retrieval; manifold learning; dimensionality reduction; intrinsic dimension
Copyright © 2016 Universitas Ahmad Dahlan. All rights reserved.

1. Introduction
With the rapid development of Internet technology, the number of image in the internet
grows rapidly, most of which is without any label category information. The main problems of
image retrieval is the "semantic gap", which is a gap between low-level image features and
high-level semantic, and it therefore is difficult to make the users satisfaction because this huge
differences in action of making image retrieval effect and currently, image retrieval methods are

mainly based on the content of images.
In the process of image retrieval, a size of 64 * 64 face image can be expressed as a
4096 dimensional vector, it is clear that the vector is too large to calculate the similarities
between images. In order to avoid the dimension disaster, it needs to reduce the dimensionality
of the images. Traditional dimension reduction methods [1-2] include principal component
analysis (PCA), Independent component analysis (ICA) and multi-dimensional scaling (MDS),
etc. From the Angle of geometry, the linear dimension reduction methods assume the image
data as global linear structure, while the collection of face image is high dimensional data space
which is related to position and pose, expression and gesture and which also brings a lot of
nonlinear information. Also, its corresponding characteristic changes can be seen as lowdimensional nonlinear manifolds embedded in high dimensional face space. The traditional
dimensionality reduction methods cannot recover the inherent structure. Manifold Learning [3-4]
is a kind of geometry space, with a known global approximation of the unknown local geometric
structure. The purpose is to dig regularity of inner structure and nonlinear high-dimensional data
under manifold space structure. Manifold learning technology [3] can light the inherent low
dimensional structure of face image and it can be effectively used for face image recognition.
The more widely used manifold learning methods include ISOMAP (Isometric Mapping) [5-7],
LLE (Locally Linear Embedding) [8-11], LE (Laplacian Eigenmap) and so on. Compared to the
traditional dimensionality reduction methods, manifold learning methods have many
advantages: the first is less parameters, only neighbor parameter k and intrinsic dimension
estimation parameter d, the second is the computing has certain adaptability to the data of

nonlinear manifold structure. Face image recognition mainly includes two parts: feature
extraction and face recognition. Feature extraction mainly extracts the pixels of maximum
Received March 23, 2016; Revised July 23, 2016; Accepted August 4, 2016

253



ISSN: 1693-6930

classification characteristics and reduces the complexity of the process of face recognition;
Face recognition mainly compares the extracted features with facial information and uses the
classification algorithm to achieve the goal of recognition.
In this paper, we use SIFT and the improvement of manifold learning method ISOMAP
[12] to learn the problem of face recognition and conduct experiments on the ORL face standard
data sets. First, SIFT algorithm is utilized to extract 128--dimensional feature vector of the local
descriptor of face data. Second, the improved ISOMAP algorithm is used to reduce the
dimensionality and the nearest neighbor classifier is used to group the data. In the process of
using ISOMAP algorithm, we mainly analyze the effects of the neighborhood parameter and the
size of the intrinsic dimension within the problem of face image recognition.

2. Proposed Methodology
2.1. The SIFT Algorithm
SIFT is an algorithm to sample the lower partial feature of the proposed descriptor
which was proposed by David at Columbia University's in 1999, who ,later on, researched and
extended the SIFT algorithm. SIFT algorithm has several characteristics: strong resolution, large
amount of information, good affine invariants and so on. The basic idea is filtering through
Gaussian kernel to extract the stable point of scale space. This feature points are rotation,
scaling, translation and part of the affine invariance, further the length of feature vector
normalization. SIFT feature extraction algorithm for face recognition is feasible: SIFT feature is
similar between closed similar faces, and it is different between different faces. It is the partial
feature of image, and describes image histogram distribution of the gradient of the key area.
The essence of the SIFT algorithm is to extract key points from the face image, and the process
includes the following four main steps: the construction of scale space and extreme detection,
feature point precise positioning, determine the feature points the main direction, generated
SIFT descriptor. Specific process can be referred to references [13-14]. Figure 1 shows the
description and key points and the matching condition of one person in ORL face image data
set.

10


10

10

20

20

20

30

30

30

40

40


40

50

50

50

60

60

60

70

70

70


80

80

80

90

90

90

100

100

100

110


110
10

20

30

40

50

60

70

80

90

110

10

20

30

40

50

60

70

80

90

20


40

60

80

100

120

140

160

180

Figure 1. Keypoints and match
2.2. The Proposed Method
The main idea of the proposed method is to use local neighbor distance to estimate
global manifold geodesic distance, and through the establishment of original data of geodesic

distance and dimension reduction space distance of equivalence relations, to realize data
dimension reduction. ISOMAP algorithm can mine the essence of high-dimensional data
corresponding to the low dimensional embedding structure, which is mainly due to the fact that
geodesic distance can internally reflect the nature of data manifold geometric features, the main
steps are as following [15]:
(1) Using the neighborhood or k neighbor method of original data to construct the
neighbor graph G;
(2) Any node of the shortest path in the neighbor graph G is used to approximate the
corresponding geodesic distance;
(3) Geodesic distance matrix in (2) as input, MDS algorithm is used to calculate the data
of low dimensional space, and the data is mapped to a low dimensional visual space;

TELKOMNIKA Vol. 14, No. 3A, September 2016 : 252 – 258

TELKOMNIKA

ISSN: 1693-6930



254

For given face image data set, high-dimensional face feature vector is extracted by
using SIFT, and the success of ISOMAP depends on the choice of the neighborhood size k, and
the appropriate inner dimension d of the face image. Because only the appropriate k can
guarantee calculation and approximation of the geodesic distance, if k is too big, the original
manifold may disrupt connectivity seriously; if k is too small, manifold structure may be divided
into many disconnected areas contain "holes". In order to find adaptive optimal neighborhood
parameter k, on premise of keeping topological structure unchanged, for a given scope of the
neighborhood factor , , ⋯ , ⋯ , we calculate each k_i mapping of the loss function

‖ , take the smallest L
of k corresponds to the initial candidate set Z; and for
each k ∈ Z, using the improved ISOMAP algorithm to calculate the corresponding error, the
error function is :
,

∑ ∑

,

,

(1)

,
is neat, d ,
and d , are distance matrix of all the
Where M ∑ ∑
points in the input and output space. The error is smaller, the better distance between points
keeps during the dimension reduction so that topological relations have maintained better.
Finally the value of corresponding k is much suitable when the error function is minimum.
In process of estimating the intrinsic dimension d, the point in the high-dimensional
space map to low dimensional space will overlap, resulting the data is not properly identified, if d
is too small, it is likely that it might be redundant information which, together with noise data for
subsequent, overestimate the impact of the recognition.
This paper adopts vector refactoring residual (1
, which ρ is the correlation
and Euclidean distance matrix of in low
coefficient between geodesic distance matrix
dimensional space D Y and maximum likelihood estimate methods to estimate the size of the
inner dimension. Through the establishment of likelihood function of the distance between
neighbors, we get the intrinsic dimension of maximum likelihood function, so as to get the
maximum likelihood estimate. Random sampling sample of the face space, Constitute the
Poisson distribution


T t, x



(2)

For a given neighborhood k, xi of inner dimension of maximum likelihood estimation is


(3)

The maximum likelihood estimation method is used to traverse the type, we can get n
estimations if local intrinsic dimension, and then take the average value as inner embedding
dimension,


(4)

Maximum likelihood estimate starting from the nature of the local structure of face data,
through the certain way to estimate the intrinsic dimension, and use the probability statistics
method, it can estimate the intrinsic dimension better and computing speed is faster.

An Image Retrieval Method Based on Manifold Learning with SIF Control (Shoubao Su)

255



ISSN: 1693-6930

Figure 2. Part of the ORL face image
3. Experiment and Results Analysis
We conduct the experiment on the ORL face data set. The database contains a total of
400 pieces of face image of 40 people; the image size is 92 * 112 pixels. The face image is
centralized and standardized, because the data set has nothing to do with light, so the change
of the face image is mainly gestures and facial expressions. Figure 2 shows one person face
image. Because different training samples have huge influence on the results of recognition, in
the experiment, we use five images randomly selected as training samples, the remaining 5
images as a test set. We use the nearest neighbor classifier to discriminant feature extraction
algorithm of classification effect. The experiments are conducted on matlab 7.1, with 2.7 Hz
CPU, 2G memory.
3.1. Feature Extraction
In ORL face image data set, 128-dimension characteristic vector is extracted from face
images by using the SIFT algorithm in this paper. Then we used the isometric mapping
algorithm ISOMAP to reduce the dimension, and test data set is on the 200 faces and the
algorithm is compared with the results obtained by using ISOMAP algorithm directly, in the
process of dimension reduction, embedding dimension and neighbor parameter default set to
10, the test results are shown in table 1.

Table 1. The comparison between two algorithms in ORL face data set
algorithm
ISOMAP
Proposed method

neighbor parameter
10
10

embedding imension
10
10

Recognition accuracy
83.5%
91.3%

As shown in table 1, embedding dimension and near neighbor parameter are set to the
same situation, based on the SIFT algorithm to extract the features in the use of ISOMAP
algorithm has higher recognition accuracy, because the SIFT algorithm plays an important role
in extracting the face image features in the affine invariants, it can effectively extract and
describe the key points of face image.
With the same neighbor parameter and embedding dimension, the number of training
samples has a certain influence on recognition rate. For two kinds of methods, in the experiment
we chose three, four, five, six, seven, eight images as the training sets, and left the
corresponding remaining images as the test sets. And ten-cross validation method is used to
evaluate the result and the close neighbor parameters and the embedding dimension is still set
to 10, the experiment result is shown in Figure 3.

TELKOMNIKA Vol. 14, No. 3A, September 2016 : 252 – 258

TELKOMNIKA

ISSN: 1693-6930



256

Figure 3. The relationship between Accuracy and the Training number
3.2. The Choice of Neighbor Parameter
ISOMAP algorithm is based on the hypothesis that the image data is local linear. But
when calculating the low dimensional manifold structure, the key to its success depends largely
on the number of neighbors. The factors determining the linear neighbor of one point usually
include two aspects: measuring the distance and choosing size of the neighbor parameter. In
the experiment, we choose our neighbor parameters of 10. Figure 4 is the results of different
parameter values affect the recognition rate of face image data.

Figure 4. The relationship between Accuracy and Neighborhood size
3.3. The Choice of Embedding Dimension
Intrinsic dimension of sample data is the low dimensional manifold which is embedded
in the high dimension, the size of the intrinsic dimension affects the embedded results of low
dimensional space. Therefore, the problem of manifold dimensionality reduction of highdimensional data needs to be solved in the process of intrinsic dimensionality and embedded
into the low dimensional space dimension estimation. On the ORL face image, this experiment
applies to the vector reconstruction residual error and maximum likelihood estimation to
estimate the two methods of dimension, the experiment result is shown in Figure 5 and Figure
6.

An Image Retrieval Method Based on Manifold Learning with SIF Control (Shoubao Su)

257



ISSN: 1693-6930

Figure 5. Accuracy on Intrinsic dimension

Figure 6. Residual variance on Intrinsic dimension

Neighborhood parameter is set to 10 in the experiment. The figure 5 shows that in the
case of the neighbor number unchanged, with the increase of embedding dimension, the
recognition rate will gradually rise, but near tend to a limit value in the end, while ORL face data
set during embedding dimension for 10 can achieve a higher recognition rate, figure 6 reflects
the relationship between Accuracy (Residual variance) and Intrinsic dimension. So ORL face
database is complex than Swiss Roll or Yale face data sets, because it contains a different
gestures and expressions face of 40 people, consistent with our understanding of face image.
4. Conclusion
This paper combines the SIFT feature extraction algorithm and the manifold learning
algorithm ISOMAP to conduct image retrieval experiments on the ORL face image data set, the
experiments show that the proposed method based on the SIFT algorithm to extract the
features and the ISOMAP dimension reduction algorithm has better recognition effect. In the
process of dimension reduction, ISOMAP algorithm relies on the neighbor parameters k and
inner dimension d, but they are difficult to choose, this paper uses the minimum residual error
and maximum likelihood estimation methods to estimate and select neighbor parameter and the
size of the inner dimension. Experiment shows that the appropriate neighbor parameter and
internal dimension has a huge impact on recognition effect.
Acknowledgments
The work is partially supported by the National Natural Science Foundation of China
(No.61375121), the Re-search Funds of Natural Scientific and Teaching Reform and Top-notch
Academic Programs for Jiangsu Higher Education Institutions (Nos.14KJD520003,
2015JSJG163, PPZY2015B140), the Scientific Research Foundation of Jinling Institute of
TELKOMNIKA Vol. 14, No. 3A, September 2016 : 252 – 258

TELKOMNIKA

ISSN: 1693-6930



258

Technology (No.jit-rcyj-201505), the and sponsored by the Funds for Nanjing Creative Team of
Swarm Computing & Smart Software led by Prof. S.B. Su (Corresponding author).
References
[1]

[2]
[3]

[4]
[5]
[6]
[7]

[8]
[9]
[10]

[11]

[12]
[13]
[14]
[15]

Christopher KW, Scott HH, Mevin BH. Guest Editor’s Introduction to the Special Issue on Modern
Dimension Reduction Methods for Big Data Problems in Ecology. Journal of Agricultural, Biological,
and Environmental Statistics. 2016; 18(3): 271-273.
Hayato I, Atsushi I, Tomoya S. Dimension Reduction and Construction of Feature Space for Image
Pattern Recognition. Journal of Mathematical Imaging and Vision. 2016; 56(1): 1-31.
Kian H, Seyyed AS, Reza A. Video-based face recognition and image synthesis from rotating head
frames using nonlinear manifold learning by neural networks. Neural Computing and Applications.
2016; 27(6): 1761-1769.
Chen J, Liu Y. Locally linear embedding: a survey. Artificial Intelligence Review. 2011; 36(1): 29-48.
Zhang Y, Li BB, Wang ZB, Wang W, Wang L. Fault diagnosis of rotating machine by isometric
feature mapping. Journal of Mechanical Science and Technology. 2013; 27(11): 3215-3221.
Yin HJ. Advances in adaptive nonlinear manifolds and dimensionality reduction. Frontiers of Electrical
and Electronic Engineering. 2011; 6(1): 72-85.
Paganelli C, Peroni M, Riboldi M, et al. Scale Invariant Feature Transform in Adaptive Radiation
Therapy: A Tool for Deformable Image Registration Assessment and Re-planning Indication. Physics
in Medicine and Biology. 2013; 58(2): 287-299.
Liu XF, Zheng XD, Xu GC, Wang L, Yang H. Locally linear embedding-based seismic attribute
extraction and applications. Applied Geophysics. 2010; 7(4): 365-375.
Chang H, Yeung D. Robust locally linear embedding. Pattern Recognit. 2006; 39(6): 1053-1065.
Abdar M, Kalhori SRN, Sutikno T, Subroto IMI, Arji G. Comparing Performance of Data Mining
Algorithms in Prediction Heart Diseases. International Journal of Electrical and Computer
Engineering (IJECE). 2015; 5(6): 1569-1576.
Sekhar A, Raghavendrarajan V, Kumar RH, et al. An Interconnected Wind Driven SEIG System
Using SVPWM Controlled TL Z-Source Inverter Strategy for Off-Shore WECS. Indonesian Journal of
Electrical Engineering and Informatics (IJEEI). 2013; 1(3): 89-98.
Goldberg Y, Ritov Y. LDR-LLE: LLE with low-dimensional neighborhood representation. ISVC. 2008:
43–54.
Zhang S. Enhanced supervised locally linear embedding. Pattern Recognit. 2009; 30(13): 1208–
1218.
Farmohammadi L, Menhaj MB. Facial Expression Recognition Based on Facial Motion Patterns.
Indonesian Journal of Electrical Engineering and Informatics. 2015; 3(4): 177-184.
Kalpana J, Krishnamoorthi R. Color image retrieval technique with local features based on orthogonal
polynomials model and SIFT. Multimedia Tools and Applications. 2016; 75(1): 49-69.

An Image Retrieval Method Based on Manifold Learning with SIF Control (Shoubao Su)