INTRODUCTION isprs archives XLI B1 639 2016

AUTOMATIC ADJUSTMENT OF WIDE-BASE GOOGLE STREET VIEW PANORAMAS KEY WORDS: panoramas, wide baseline, bundle adjustment, image matching, cartographic projection, projective transformation ABSTRACT: This paper focuses on the issue of sparse matching in cases of extremely wide-base panoramic images such as those acquired by Google Street View in narrow urban streets. In order to effectively use affine point operators for bundle adjustment, panoramas must be suitably rectified to simulate affinity. To this end, a custom piecewise planar projection triangular prism projection is applied. On the assumption that the image baselines run parallel to the street façades, the estimated locations of the vanishing lines of the fa- çade plane allow effectively removing projectivity and applying the ASIFT point operator on panorama pairs. Results from compa- risons with multi-panorama adjustment, based on manually measured image points, and ground truth indicate that such an approach, if further elaborated, may well provide a realistic answer to the matching problem in the case of demanding panorama configurations. Corresponding author

1. INTRODUCTION

Thanks to their obvious advantages, spherical panoramic images represent today an increasingly common type of imagery. They provide an omnidirectional field of view, thus potentially reduc- ing the number of required images and also providing far more comprehensive views. They may be generated in various ways, yet it is today rather easy to produce panoramas with low cost- equipment and use of freely available software for automatically stitching together homocentric images onto a sphere, and subse- quently mapping them in suitable cartographic projections Sze- liski Shum, 1997, Szeliski, 2006. Spherical panoramas are thus being exploited in several contexts, including indoor navi- gation, virtual reality applications and, notably, cultural heritage documentation, where the use of panoramas is now regarded as a ‘natural extension of the standard perspective images’ Pagani et al., 2011. Of course, most important is the availability of street-level pa- noramas, such as those provided by Google. Its popular service Google Street View GSV is a vast dataset with regularly upda- ted, geo-tagged panoramic views of most main streets and roads in several parts of the world, typically acquired at a frequency of ~12 m by camera clusters mounted on moving vehicles. Ap- plication areas of such pictorial information range, for instance, from space intersection Tsai Chang, 2013 to image-based modeling Torii et al., 2009; Ventura Höllerer, 2013, vision- based assistance systems Salmen et al., 2012 and localization or trajectory estimation of a moving camera Taneja et al., 2014; Agarwal et al., 2015. A central question regarding the metric exploitation of panora- mas is their registration bundle adjustment. Due to its omnidi- rectional nature, a spherical panorama has the properties of a sphere, i.e. it defines a bundle of 3D rays. In this sense, the issue of “interior orientation” camera geometry appears in this case to be irrelevant. However, the particular cartographic projection of the panorama on which image measurements will take place must of course be known; this projection in fact represents the interior orientation of a panorama Tsironis, 2015. Panoramas in a known projection each have, therefore, 6 degrees of free- dom. If no ground control is available, the 7 parameters of a 3D similarity transformation need to be fixed. Thus, for instance, Aly Bouguet 2012 adjust unordered sets of spherical panoramas to estimate their relative pose up to a global scale. Of course, several simplifications are possible if camera movement is assumed to be somehow constrained e.g. in Fangi, 2015, small angles are assumed. A crucial related issue is, of course, automatic point extraction, description and matching. Although spherical operators have in- deed been suggested see Hansen et al., 2010; Cruz-Mota et al., 2012, practically all researchers rely on standard planar point operators such as SIFT , SURF and ASIFT . Several alternatives have been reported. Agarwal et al. 2015 thus use conventional frames provided by Google when requested for input from a virtual camera and match them via SIFT to the image sequence. Mičušík Košecká 2009 and Zamir Shah 2011, on the other hand, employ rectilinear cubic projections and SURF or SIFT operators for street panoramas. Majdik et al. 2013 gene- rate artificial affine views of the scene in order to overcome the large viewpoint differences between GSV and low altitude ima- ges. Others Torii et al., 2009; Ventura Höllerer, 2013 match directly on the spherical GSV panoramas but using much denser images than those freely available by Google. Finally, Sato et al. 2011 have suggested the introduction of further constraints in- to the RANSAC outlier detection process to support automatic establishment of correspondences between wide-base GSV pa- noramas. E. Boussias-Alexakis a , V. Tsironis a , E. Petsa b , G. Karras a a Laboratory of Photogrammetry, Department of Surveying, National Technical University of Athens, GR-15780 Athens, Greece bousias.alexakisgmail.com, tsironisbime.com, gkarrascentral.ntua.gr b Laboratory of Photogrammetry, Department of Civil Engineering and Surveying Geoinformatics Engineering, Technological Educational Institute of Athens, GR-12210 Athens, Greece petsateiath.gr Commission I, Working Group ICWG IVa This contribution has been peer-reviewed. doi:10.5194isprsarchives-XLI-B1-639-2016 639 In order to match directly on the spherical panoramas with pla- nar operators, the image base needs to be relatively short, as it is the case in most of the publications cited above. To our know- ledge, only Sato et al. 2011 have worked on directly matching between standard wide-base GSV panoramas. Such a solution assumes that tentative matches have already been established e.g. by SIFT , SURF , ASIFT . The concept “wide-base”, however, does not refer to the absolute size of the image base itself, but rather on the base-to-distance ratio which in fact determines the intersection angle on homologue rays. Our contribution focuses on matching standard GSV panoramas of rather narrow streets in densely built urban areas. In this context, a street of ~8 m width recorded from the street center-line at a step of ~12 m produces very unfavourable base-to-distance ratios of about 3:1 with re- spect to the street façade in this sense one might speak of ‘ultra wide bases’. Such configurations produce large scale variations and strong incompatibilities between the distortions of projected panoramas plus more occluded areas. It was thus experienced that even the ASIFT operator could just produce only a few valid matches along the baseline, namely close to the two vanishing points of this direction when the street ended at streets perpen- dicular to it. Mičušík Košecká 2009 point out that panorama representa- tion via piecewise perspective, i.e. projection onto a quadrangu- lar prism rather than on a cylinder, permits point matching algo- rithms to perform better since their assumption of locally affine distortions is expected to be more realistic for perspective ima- ges than for cylindrical panoramas. Corresponding tentatively matched 3D rays may then be validated via robust epipolar geo- metry estimation to produce the essential matrix E. However, it would be clearly preferable to create virtual views of panoramas as close as possible to affinity as did Majdik et al., 2013, in or- der to register frames to panoramas and subsequently apply the affine operator ASIFT developed by Morel Yu 2009. Thus, the main purpose of this contribution is to describe, implement and evaluate such an alternative for “ultra wide-base” panora- mas. Results will be given and assessed for performed 3D mea- surements and achieved accuracies. 2. RETRIEVAL AND ADJUSTMENT OF PANORAMAS 2.1 Retrieval of Google Street View panoramic images