AUTOMATIC ADJUSTMENT OF WIDE-BASE GOOGLE STREET VIEW PANORAMAS
KEY WORDS: panoramas, wide baseline, bundle adjustment, image matching, cartographic projection, projective transformation
ABSTRACT:
This paper focuses on the issue of sparse matching in cases of extremely wide-base panoramic images such as those acquired by Google Street View in narrow urban streets. In order to effectively use affine point operators for bundle adjustment, panoramas must
be suitably rectified to simulate affinity. To this end, a custom piecewise planar projection triangular prism projection is applied. On the assumption that the image baselines run parallel to the street façades, the estimated locations of the vanishing lines of the fa-
çade plane allow effectively removing projectivity and applying the ASIFT point operator on panorama pairs. Results from compa- risons with multi-panorama adjustment, based on manually measured image points, and ground truth indicate that such an approach,
if further elaborated, may well provide a realistic answer to the matching problem in the case of demanding panorama configurations.
Corresponding author
1. INTRODUCTION
Thanks to their obvious advantages, spherical panoramic images represent today an increasingly common type of imagery. They
provide an omnidirectional field of view, thus potentially reduc- ing the number of required images and also providing far more
comprehensive views. They may be generated in various ways, yet it is today rather easy to produce panoramas with low cost-
equipment and use of freely available software for automatically stitching together homocentric images onto a sphere, and subse-
quently mapping them in suitable cartographic projections Sze- liski Shum, 1997, Szeliski, 2006. Spherical panoramas are
thus being exploited in several contexts, including indoor navi- gation, virtual reality applications and, notably, cultural heritage
documentation, where the use of panoramas is now regarded as a ‘natural extension of the standard perspective images’ Pagani
et al., 2011.
Of course, most important is the availability of street-level pa- noramas, such as those provided by Google. Its popular service
Google Street View
GSV
is a vast dataset with regularly upda- ted, geo-tagged panoramic views of most main streets and roads
in several parts of the world, typically acquired at a frequency of ~12 m by camera clusters mounted on moving vehicles. Ap-
plication areas of such pictorial information range, for instance, from space intersection Tsai Chang, 2013 to image-based
modeling Torii et al., 2009; Ventura Höllerer, 2013, vision- based assistance systems Salmen et al., 2012 and localization
or trajectory estimation of a moving camera Taneja et al., 2014; Agarwal et al., 2015.
A central question regarding the metric exploitation of panora- mas is their registration bundle adjustment. Due to its omnidi-
rectional nature, a spherical panorama has the properties of a sphere, i.e. it defines a bundle of
3D
rays. In this sense, the issue of “interior orientation” camera geometry appears in this case
to be irrelevant. However, the particular cartographic projection of the panorama on which image measurements will take place
must of course be known; this projection in fact represents the interior orientation of a panorama Tsironis, 2015. Panoramas
in a known projection each have, therefore, 6 degrees of free- dom. If no ground control is available, the 7 parameters of a
3D
similarity transformation need to be fixed. Thus, for instance, Aly Bouguet 2012 adjust unordered sets
of spherical panoramas to estimate their relative pose up to a global scale. Of course, several simplifications are possible if
camera movement is assumed to be somehow constrained e.g. in Fangi, 2015, small angles are assumed.
A crucial related issue is, of course, automatic point extraction, description and matching. Although spherical operators have in-
deed been suggested see Hansen et al., 2010; Cruz-Mota et al., 2012, practically all researchers rely on standard planar point
operators such as
SIFT
,
SURF
and
ASIFT
. Several alternatives have been reported. Agarwal et al. 2015 thus use conventional
frames provided by Google when requested for input from a virtual camera and match them via
SIFT
to the image sequence. Mičušík Košecká 2009 and Zamir Shah 2011, on the
other hand, employ rectilinear cubic projections and
SURF
or
SIFT
operators for street panoramas. Majdik et al. 2013 gene- rate artificial affine views of the scene in order to overcome the
large viewpoint differences between
GSV
and low altitude ima- ges. Others Torii et al., 2009; Ventura Höllerer, 2013 match
directly on the spherical
GSV
panoramas but using much denser images than those freely available by Google. Finally, Sato et al.
2011 have suggested the introduction of further constraints in- to the
RANSAC
outlier detection process to support automatic establishment of correspondences between wide-base
GSV
pa- noramas.
E. Boussias-Alexakis
a
, V. Tsironis
a
, E. Petsa
b
, G. Karras
a a
Laboratory of Photogrammetry, Department of Surveying, National Technical University of Athens, GR-15780 Athens, Greece bousias.alexakisgmail.com, tsironisbime.com, gkarrascentral.ntua.gr
b
Laboratory of Photogrammetry, Department of Civil Engineering and Surveying Geoinformatics Engineering, Technological Educational Institute of Athens, GR-12210 Athens, Greece petsateiath.gr
Commission I, Working Group ICWG IVa
This contribution has been peer-reviewed. doi:10.5194isprsarchives-XLI-B1-639-2016
639
In order to match directly on the spherical panoramas with pla- nar operators, the image base needs to be relatively short, as it is
the case in most of the publications cited above. To our know- ledge, only Sato et al. 2011 have worked on directly matching
between standard wide-base
GSV
panoramas. Such a solution assumes that tentative matches have already been established
e.g. by
SIFT
,
SURF
,
ASIFT
. The concept “wide-base”, however, does not refer to the absolute size of the image base itself, but
rather on the base-to-distance ratio which in fact determines the intersection angle on homologue rays. Our contribution focuses
on matching standard
GSV
panoramas of rather narrow streets in densely built urban areas. In this context, a street of ~8 m width
recorded from the street center-line at a step of ~12 m produces very unfavourable base-to-distance ratios of about 3:1 with re-
spect to the street façade in this sense one might speak of ‘ultra wide bases’. Such configurations produce large scale variations
and strong incompatibilities between the distortions of projected panoramas plus more occluded areas. It was thus experienced
that even the
ASIFT
operator could just produce only a few valid matches along the baseline, namely close to the two vanishing
points of this direction when the street ended at streets perpen- dicular to it.
Mičušík Košecká 2009 point out that panorama representa- tion via piecewise perspective, i.e. projection onto a quadrangu-
lar prism rather than on a cylinder, permits point matching algo- rithms to perform better since their assumption of locally affine
distortions is expected to be more realistic for perspective ima- ges than for cylindrical panoramas. Corresponding tentatively
matched
3D
rays may then be validated via robust epipolar geo-
metry estimation to produce the essential matrix E. However, it would be clearly preferable to create virtual views of panoramas
as close as possible to affinity as did Majdik et al., 2013, in or- der to register frames to panoramas and subsequently apply the
affine operator
ASIFT
developed by Morel Yu 2009. Thus, the main purpose of this contribution is to describe, implement
and evaluate such an alternative for “ultra wide-base” panora- mas. Results will be given and assessed for performed
3D
mea- surements and achieved accuracies.
2. RETRIEVAL AND ADJUSTMENT OF PANORAMAS 2.1 Retrieval of Google Street View panoramic images