sourcing data quality. In this quality model, the completeness, thematic accuracy, and positional accuracy are included as the
three components of the quality elements. The quality of the road data is assessed by the calculation and the analysis of these
elements. Learning from the quality model mentioned above, meanwhile
considering the check-in data features, a quality model for location check-in data is proposed in this paper. In this model,
the classification accuracy, degree of matching, and positional accuracy are selected as the quality elements. To be specific, the
classification accuracy shows the accuracy of the classification attributes, the degree of matching indicates the overlapping
between check-in data and standard data in content coverage, and the positional accuracy is presented by the offsets between
check-in data and standard data in spatial locations. 2.2
Quality Analysis Approach
The matching between location check-in data and standard data is the basis of quality analysis. The data participated in the
analysis are the successfully matched records in the matching operation. The others are insignificant in quality analysis, but
valuable in POI update. After data matching, the classification accuracy can be
calculated by the comparison of classification attributes between both datasets. Simultaneously, the degree of matching
can be collected from the statistical counts of successfully matched data records. And similarly, the positional accuracy
can be assessed by a statistical analysis of the offsets between check-in data records and corresponding standard data records
in spatial locations. The technical workflow for the proposed quality analysis
approach is as shown in Figure 1.
Figure 1. Flow chart of the quality analysis
3. QUALITY CONTROL
After quality analysis, a data processing progress would be achieved in order to perform quality control for the check-in
data. A pre-processing operation and a spatial registration operation are included in the quality control procedure
proposed in this paper. The pre-processing operation is for the quality control of the attribute information. While the spatial
registration, being the emphasis of the data processing, is for the purpose of improving positional accuracy of check-in data. The
RANSAC algorithm is adopted in this paper for model establishment. The result model is to be used in the registration
operation to achieve positional quality control. 3.1
The RANSAC Algorithm
The Random Sample Consensus RANSAC algorithm is designed to estimate a group of mathematical model parameters
from observed datasets with abnormal value data. It is put forward originally by Fischler and Bolles in 1981 Fischler,
1981. The basic assumption of RANSAC is that in the samples, there are inliers which can be expressed in a certain model as
well as outliers which are distinctly abnormal and cannot fit in any model. In short, there is noise in the datasets. In addition,
the algorithm also suggests that given a group of inlier data, a set of model parameters suitable for the expression of these data
is existent and can be obtained by calculation. The RANSAC algorithm is highly robust with the ability to
estimate high-accuracy parameters from a dataset with a considerable amount of outliers. It is suitable for the
establishment of a certain optimal model from a dataset with a relatively large deviation.
3.2
Pre-processing
In data pre-processing, the completion of missing attributes and the amalgamation of repeated records are included. The
completion of missing attributes needs to rely on certain standard format in order to preserve data value in use. The
repeated data records need to be merged to reduce redundancy Du, 2011. The amalgamation of different aliases, nicknames,
and standard names of the same geographic object can be achieved through the comparison between POI data dictionary
and check-in data records Wu, 2012. 3.3
Spatial Registration
The spatial registration process is required to reduce offset error and improve positional accuracy. The RANSAC algorithm is
applied in this paper to estimate the affine transformation relational model between location check-in data and
corresponding standard data. The basic idea of the algorithm is that, when estimating parameters, by obtaining basic data subset
through repeated sampling, model estimation is achieved Shan, 2006. To acquire optimal model by data fitting, the size of
randomly selected sample needs to be limited, meaning the minimum data set size to determine the model needs to be
specific. In this paper, the affine transformation formula is used as model, which means at least four point pairs are demanded
for the solution of the six parameters. The spatial registration process is described as follow.
Initialization: Initialize model by selecting four point pair samples randomly from set.
Parameter Estimation: Identify an inner point set that is suitable for current model using threshold. If the size of this set is larger
than a pre-defined threshold, re-evaluate model parameters using this set.
Optimal Model Solution: Define suitable iteration count. Then during these iterations, use the maximum inner point set to
re-evaluate model parameters and obtain optimal model Qu, 2010.
4. UNCERTAINTY MODELING