QUALITY CONTROL isprsarchives XL 2 W1 165 2013

sourcing data quality. In this quality model, the completeness, thematic accuracy, and positional accuracy are included as the three components of the quality elements. The quality of the road data is assessed by the calculation and the analysis of these elements. Learning from the quality model mentioned above, meanwhile considering the check-in data features, a quality model for location check-in data is proposed in this paper. In this model, the classification accuracy, degree of matching, and positional accuracy are selected as the quality elements. To be specific, the classification accuracy shows the accuracy of the classification attributes, the degree of matching indicates the overlapping between check-in data and standard data in content coverage, and the positional accuracy is presented by the offsets between check-in data and standard data in spatial locations. 2.2 Quality Analysis Approach The matching between location check-in data and standard data is the basis of quality analysis. The data participated in the analysis are the successfully matched records in the matching operation. The others are insignificant in quality analysis, but valuable in POI update. After data matching, the classification accuracy can be calculated by the comparison of classification attributes between both datasets. Simultaneously, the degree of matching can be collected from the statistical counts of successfully matched data records. And similarly, the positional accuracy can be assessed by a statistical analysis of the offsets between check-in data records and corresponding standard data records in spatial locations. The technical workflow for the proposed quality analysis approach is as shown in Figure 1. Figure 1. Flow chart of the quality analysis

3. QUALITY CONTROL

After quality analysis, a data processing progress would be achieved in order to perform quality control for the check-in data. A pre-processing operation and a spatial registration operation are included in the quality control procedure proposed in this paper. The pre-processing operation is for the quality control of the attribute information. While the spatial registration, being the emphasis of the data processing, is for the purpose of improving positional accuracy of check-in data. The RANSAC algorithm is adopted in this paper for model establishment. The result model is to be used in the registration operation to achieve positional quality control. 3.1 The RANSAC Algorithm The Random Sample Consensus RANSAC algorithm is designed to estimate a group of mathematical model parameters from observed datasets with abnormal value data. It is put forward originally by Fischler and Bolles in 1981 Fischler, 1981. The basic assumption of RANSAC is that in the samples, there are inliers which can be expressed in a certain model as well as outliers which are distinctly abnormal and cannot fit in any model. In short, there is noise in the datasets. In addition, the algorithm also suggests that given a group of inlier data, a set of model parameters suitable for the expression of these data is existent and can be obtained by calculation. The RANSAC algorithm is highly robust with the ability to estimate high-accuracy parameters from a dataset with a considerable amount of outliers. It is suitable for the establishment of a certain optimal model from a dataset with a relatively large deviation. 3.2 Pre-processing In data pre-processing, the completion of missing attributes and the amalgamation of repeated records are included. The completion of missing attributes needs to rely on certain standard format in order to preserve data value in use. The repeated data records need to be merged to reduce redundancy Du, 2011. The amalgamation of different aliases, nicknames, and standard names of the same geographic object can be achieved through the comparison between POI data dictionary and check-in data records Wu, 2012. 3.3 Spatial Registration The spatial registration process is required to reduce offset error and improve positional accuracy. The RANSAC algorithm is applied in this paper to estimate the affine transformation relational model between location check-in data and corresponding standard data. The basic idea of the algorithm is that, when estimating parameters, by obtaining basic data subset through repeated sampling, model estimation is achieved Shan, 2006. To acquire optimal model by data fitting, the size of randomly selected sample needs to be limited, meaning the minimum data set size to determine the model needs to be specific. In this paper, the affine transformation formula is used as model, which means at least four point pairs are demanded for the solution of the six parameters. The spatial registration process is described as follow. Initialization: Initialize model by selecting four point pair samples randomly from set. Parameter Estimation: Identify an inner point set that is suitable for current model using threshold. If the size of this set is larger than a pre-defined threshold, re-evaluate model parameters using this set. Optimal Model Solution: Define suitable iteration count. Then during these iterations, use the maximum inner point set to re-evaluate model parameters and obtain optimal model Qu, 2010.

4. UNCERTAINTY MODELING