Crowdsourcing Indoor Geodata CROWD

www.viamichelin.com , RouteNet plan www.routenet.com , OpenRouteService http:openrouteservice.org and Naver – in South Korea http:maps.naver.com . Brussels in Belgium was used for most of the tests, but Seoul in Korea was chosen for one of the underground experiments. When calculating the shortest path in Brussels both Figure 34 Bing and Google do not use a gallery as a short cut, whereas the others do so; Bing does not show the route, but Google Maps does provide a name, but not a route. The conclusion must be that some networks do include interior pathways accessed by above ground entrances in their shortest route calculations; others did not have indoor networks at the time of the experiment. The underground Myondong shopping centre in Seoul lies beneath a wide and very busy road, with entrances at either side of the road. This made a good test of routing software: would the calculated path suggest going down, through the shopping centre, and then up on the other side of the road? Only Naver was able to provide pedestrian routing in Seoul, and it did successfully find the route; definitely a case of “Why did the chicken survive crossing the road? Because it’s route planner did know the underground path .” Figure 35: Brussels Central Station to Ravensteingallerij using route planner a Bing ,b Google Maps, c Mappy, d Via Michelin, e RouteNet and f OpenRouteService. From Vanclooster et al 2012. A further test was conducted in Brussels to see if underground entrances and exits were part of the route planner’s database. The test route went from the main railway station via an underground connection to the Ravensteingallerij see Figure 35. Only OpenRouteService used all available entrances and underground passageways to progress directly to the gallery. The other networks were less successful. In many cases, for all route finders, the address translating process to convert to geo- location on the map were rather simplified. For instance all the route planners used one entranceaddress point for route planning, no matter what the destination of the query, while railway stations are well known for having several entrances, owing to their considerable size, These tests show that the data for the route networks in these Brussels and Seoul examples were incomplete when the tests took place, not that the route finder algorithms per se were inadequate. All data providers add as much extra spatial data and information into their networks as they can, but update and discovery take time, and this leads to inadequacy in some locations. Until the last few years most commercial data providers did not use any VGI or CGI spatial data to update their networks. Now, more are doing so as it becomes very clear that organisations such as OSM can provide accurate useful information. For example, Google Maps – definitely in the CGI camp for data acquisition – has started to make increased efforts to use this huge human resource to improve its mapping.

4.2 Crowdsourcing Indoor Geodata

The inside of buildings provides a fairly difficult VGI survey environment because of the non-operation of most easily understood GPS handheld devices. Other means are becoming possible using 3-axis accelerometers and the 3-axis magnetometers available in many smart phones and even a piezometer implanted in a Nike running shoe Xuan et al, 2011. But, the kit and skills required are not yet commonplace. Added to the sudden change of scale, and therefore desired locational accuracy, when moving across the interface from exterior survey to interior mapping there is the question, as discussed by Vanclooster et al 2012, of address matching with possible multiple entrances to the same building. Lee 2009 has considered the address interface problem and comments on the orderliness of some exterior addresses and the disorganised nature of many interiors. One of the most prolific authors dealing with 3D city models published in the last few year has been Goetz Goetz and Zipf, 2011 and 2012; Goetz, 2012 who discusses the CityGML standard for storing and exchanging 3D city models, and the progress of modelling, mostly with German examples. As he points out, different regions in the world have different desires and therefore different standards that need to be met in their 3D models. According to Haklay 2010 and others OSM is often able to exceed official or commercial data sources in terms of quality and quantity. Figure 36: Levels of Detail LoD0 – LoD4 defined in CityGML specification CityGML defines a number of Levels of Detail LoD for modelling purposes: LoD0 is the plan view of a 2.5D terrain model, LoD1 is a simple extruded block rendition, LoD2 is textured with roof structures, LoD3 is a detailed architectural model, and LoD4 is a full “walkable” 3D model. As we have seen, most routing systems focus on a 2D network specification and cannot cope easily with a 3D model. Goetz proposes: “the development of a web-based 3D routing system based on a new HTML extension. The visualization of rooms as well as the computed routes is realized with XML3D. Since this emerging technology is based on WebGL and will likely be integrated into the HTML5 standard, the developed system is already compatible with most common browsers such as Google Chrome or Firefox. Another key difference of the approach presented is that all utilized data is actually crowdsourced geodata from OSM ” Goetz, 2012. Volume XL-1W1, ISPRS Hannover Workshop 2013, 21 – 24 May 2013, Hannover, Germany 412 This would appear to be a very useful contribution to both mapping and routing services, and the use of OSM to provide much of the ground truth data would prove invaluable in terms of both time and money. Figure 37: Increase in use of the “building” blue and the “indoor” key for OSM additions from 2007 up to 2011. From Goetz and Zipf 2011. There are now about 45M tagged buildings in OSM with approaching 0.5M added every week Goetz, 2012; not all are fully rendered, but many are well above LoD1. This rate of building increase is currently higher than that for roads, which, although standing at about the same total, increases by perhaps 0.2M per week. See Figure 37, which shows the increase in usage of building and interior keys between 2007 and 2011. Describing and modelling buildings requires a common understanding of terms and importance. Ontologies have been developed to allow not only questioning, searching and joining, but also to allow routing and navigation. Yuan and Zizhang 2008 proposed an alternative navigation ontology, but one that did not concern itself about colour or other non-essentials to navigation. Goetz’s ontology is kept simple, but allows realistic visualization of the building as well. It would appear from the foregoing that much of the present work on buildings, both exterior and interior is being performed by OSM crowdsource enthusiasts; many of them very skilled. Who should be the crowdsourced agents for making the building models? Rosser et al 2012 say that perhaps the buildings occupants are those best placed to make the most satisfactory models; to them at least, and who has a better claim? As time passes and the number of building edits shown in Figure 34 increases exponentially, it is likely that much of the activity will turn to editing and the improvement of the quality of OSM submitted buildings, where in some cases exuberance may have dominated accuracy. The open source tools to make and use the models have been generated, the skills are present in some quantity, and the ontologies are defined to allow the buildings to be more than a pretty object; they can become full network models. 4.3 Indoor Google Google in its various mapping guises has for many years been the mainstay for providing base maps for the crowdsource community. The main commercial spatial data providers were for many years Navteq, TeleAtlas and Google. Historically, Netherlands based TeleAtlas and North American Navteq were used in many navigation applications. Since Nokia http:en.wikipedia.orgwikiNavteq bought Navteq in 2008 and TomTom acquired TeleAtlas, also in 2008 http:en.wikipedia.orgwikiTele_Atlas , there has been a clear separation where each navigation specialist was responsible for its own data sets. But in 2011 Garmin bought Navigon AG to capture the iOS and Android navigation apps market where Navigon was dominant. This was an important step for Garmin because all portable navigation devices had been losing ground to smartphone navigation apps, a market that Garmin had not entered before. Interestingly in June 2011 Navigon had introduced a PoI package derived from the crowdsourced OSM project. Figure 38: Interiors of Guildford Cathedral, showing the indoor online possibilities of Google’s Street View. A full walk about is very possible. Google Guildford Cathedral . Google changed from TeleAtlas data in 2009 to individually conducted “Street View” data gathering for their US dataset. Reasons for this move were said to be the lack of accuracy and coverage in the United States from the TeleAtlas data according to http:blumenthals.comblog20091012google-replaces- tele-atlas-data-in-us-with-google-data . By doing this Google confirmed its intention to remain a major contender for providing spatial information. Until 2011 neither Nokia, TomTom, nor Google had showed much interest in acquiring a ubiquitous indoor data capability. Google announced in October 2011 that Street View would now move inside buildings. Throughout 2011 Google was trialling its data collection and photography systems of interiors with the help of stores and offices. Meanwhile Street View was being expanded to many new countries; by May 2012 Israel was on the photographic map and by October Street View coverage had broadened to 11 countries, including the US, UK, Sweden, Italy and Singapore, and special collections have been launched for South Africa, Japan, Spain, France, Brazil, Mexico, Israel and other countries. At the same time Street View had been enhanced to provide indoor tours of a number of places, for instance: Russias Volume XL-1W1, ISPRS Hannover Workshop 2013, 21 – 24 May 2013, Hannover, Germany 413 Catherine Palace and Ferapontov monastery, Taiwans Chiang Kai-shek Memorial Hall, Vancouvers Stanley Park, the interior of Kronborg Castle in Denmark, and Guildford Cathedral in Figure 38. This interest on Google’s part in interiors and encouraging both Street View but also crowdsourced participation may herald the start of complete routing systems that do understand 3D buildings, their interiors, the underground passages, and the car parks that go to make up the modern city. It will then be very interesting to apply Vanclooster’s routing tests again and see whether the 3D information gathered by VGI, CGI, or PGI means, allows accurate shortestfastest guidance through the urban maze. 5 WHITHER SPATIAL ONTOLOGIES? Shanahan 1995 is quoted in Bhatt 2008 as defining spatial ontology as: “If we are to develop a formal theory of common sense, we need a precisely defined language for talking about shape, spatial location and change. The theory will include axioms, expressed in that language, that capture domain- independent truths about shape, location and change, and will also incorporate a formal account of any non- deductive forms of common sense inference that arise in reasoning about the spatial properties of objects and how they vary over time .” The idea that common sense is vital and will be used while reasoning with and about spatial objects is to be applauded. The understanding and semantics of geographical concepts vary both between user communities – VGI or PGI based – and need ontological formal languages and structures to represent the concepts used by any information community. Stock 2008 argues that, as human semantics can be extremely informal in nature, “Perhaps NSDIs have been too formal and do not account for this human flexibility?” Berners-Lee 2005 tried to estimate the effort involved in ontology creation. The table in figure 39 shows his results. Figure 39: Berners-Lee 2005 Total cost of ontologies. This table assumes ontologies are evenly spread across orders of magnitude, committee size is reasonably represented by logcommunity, time is community2, and costs are shared evenly over the community. The scale column represents the community size from which the committee size and costs are calculated. The general conclusion is that the cost of large efforts is huge, but it is borne by and benefits an even larger group and so the cost per individual is very, very small indeed. A large effort is required to build an ontology for a particular domain. This is a disincentive for groups who might create ontologies to suit their own circumstances. Also, why should there be a single ontology, and why in English, which might act as a constraint to some non-English semantic ontologies http:en.wikipedia.orgwikiLinguistic_relativity ? Perhaps variety and semantic translation and interoperability would be a better approach? The figures in the table argue strongly in favour of large group participation and might be thought to be a good prospect for professional crowdsourcing, as with standards and software open source projects? GIS has used standard forms with attached labels since it appeared as a discipline Evans, 2008. Researchers consider a good representation of the world assumes: there is only one type of object in a given space; several in one space would be identical, and that everyone would use the same descriptive terms and labels. In practice this is not so; we are all our own experts with different mental formulations of the world around us brought about by different experiences and upbringing in society. Despite this problem with diverse groups OSM has built an open ontology from scratch Dodge, 2011. Many of the people most actively involved in the ontological development of OSM, whilst skilful and self-motivated, do not have a cartographic background. This often leads to lively debates about the ontology for OSM and some new thoughts about how maps should look and work. The difficulties are usually ironed out by social negotiation to determine the best understanding of the objects in question. As time passes so OSM’s ontology is becoming more complex and useful, but partly as the result of considerable mental anguish amongst the contributors and editors. 5.1 Vagueness Much time and effort is involved in defining ontologies of spatial objects; possibly even more in recognising, defining and categorising the objects in a formal manner so that they can enter an ontological description. A major problem of the real world is that although an object is definitely present it can rarely be described by a single term, or multiple occurrences by the same exact definition. Our desire to classify is confounded by vagueness of description. A traditional example of this problem is: when does a pond become a river, become a lake, become a sea Third, 2008? Scale is one factor, form is another, salinity might be a third, and so on. Our descriptors are unclear and suit our thinking, perhaps. This problem of vagueness could be overcome by choosing to ignore insignificant parts, but insignificant is also vague and undefined in most cases. Humans tend to skip over insignificant deviations. A serious problem in developing an ontology is to define the terms to be used within it. Bennett 2008 discussed the standpoint theory of vagueness and showed how apparently impossibly difficult semantic problems – is this a river or an estuary? Is this a heap or merely a pile? – could be expressed using supervaluation semantics Bennett, 2008 to enable vague human language to be logically interpreted by a set of possible precise interpretations called precisifications, providing a very general framework within which vagueness could be analysed within a formal representation and handled by computer algorithm. Implementation of this process is not at all trivial, despite the human ability to make instant judgements. For instance processing geometric shapes to provide qualitative shape Volume XL-1W1, ISPRS Hannover Workshop 2013, 21 – 24 May 2013, Hannover, Germany 414 definitions is geographers f for instance picking a relev possibilities w 5.2 Moving f SDIs? Bennett et al whether peopl the expansion questions to b now being dev ontology mat available, or u external resou Janowicz et al Enablement L the Semantic W et al 2010 s Resource Desc OSM data set service for sea Stock et al 2 register contai rather than an contain a voc could use, pro assist interpret dealing with f enable flexibl form an impo expression wi ISO’s FTC w and by adding compliant regi richness of in language or U Stock’s appro benefits: easy SDI where it providers per management WSML and s able to conclu flexibility in s including chan language inter 5.3 Standard International s by crowd sou awareness am problems invo been addresse complexity o groups: OGC www.isotc21 df . Percivall 201 “Fusion is data or in quite complex from previous Cole and Kin vant finite set o was difficult from Ontology 2007 require le recognised an n of the seman be asked. Du e veloped to perfo tching based using lexical and urces and pri l 2010 though Layer for an SD Web was an im suggested using cription Frame t, within the Se arching and que 2010 has prop ining a Feature n ontology to cabulary of term ovide what Sto tation of the re feature types. T le response to ortant link betw ithin an SDI. S within the OGC g stored querie istry. The FTC nformation an UML rather than ach, stressing t navigation to c has been buil rhaps as in the of the gap be static content o ude that non-on solving comple nge through tim rpretations. ds Equals Qua standards exist, urced data col mongst many olving geograp d and mostly m f the standard www.opengeo 1.orgOutreach 0 while writin the act or proc nformation reg . Along with m academic quan g, 1975, Benn of regions from y Feature Type ed: more forma nd approved th ntic structures et al 2012 rep orm automated on shared up d structural info ior matches ht a shared and t DI, integrating mportant missing g the LinkedG ework RDF im emantic Web, erying geodata. osed the idea Type Catalogu support SDIs ms that both c ock calls a ligh egistry contents This formulation operations on ween implemen She suggested t C’s Web Catalo es allow interro s could store v d be expresse n in formal onto the use of the F content, simple lt or is being u case of OSM etween web se ontology. By 20 ntology approa ex geographical me, vagueness, ality? , and are used, llectors, partly crowdsource e phical descript met, and partly ds generated b ospatial.orgstan hISO_TC_211_ g about standar cess of combini garding one o many generatio ntitative revolu nett discovered m an infinite ran e Catalogue Ba al semantics, te e formal results to allow locat port many tool and semi-autom pper ontologie ormation, user i Noy et al, 2 transparent Sem query services g element. Hah GeoData project mplementation o to provide a c of using a sem ue FTC, ISO 19 as the FTC w reators and da htweight ontolo s, and a structur n would, Stock a feature type ntation and sem the incorporati ogue Service C ogation of the variable amount ed either in na ology language. FTC, provides er management updated by mu , and an end t ervice eg OW 011 Stock et al aches provided l research prob and varying n , but not neces y owing to lac enthusiasts tha tion standards due to the nece by the internat ndardsgml and _Standards_Gui rds said: ing or associat or more entit ons of utions d that nge of ased esting s, and ational ls are mated es, if input, 2005. mantic from hmann t as a of the entral mantic 9110 would atasets ogy to ure for k said, e, and mantic ion of CSW OGC ts and natural many of an ultiple to the WL-S, were more blems, natural ssarily ck of at the have essary ational d ISO ide.p ting ties co to for en Wha natio scree relia appr Beck Fig Typi of F upda junct rathe utilit abstr onsidered in an improve one’s r detection, id ntity.” at is a map but a onal or region en, and assume able same prob opriate for the p k 2008. gure 40: Top, u same juncti ical mapping of Figure 40, gain ate, showing a m tion. The map er smaller scal ty lines. It pr raction, not real explicit or imp s capability or dentification, o a fusion process nal mapping ag they: are corre blem, and tha purpose of the tility survey da on on the finali f a real world s ned from groun multitude of uti on the right is e with general rovides a clea lity. It also mee plicit knowledg r provide a new or characteriza s? Most people agencies, either ect whatever th at the level of map. Look at F ata at a junction ised map Beck situation is reve nd survey and ility lines criss- a plan of the a lisation of both ar interpretatio ets standards, m ge framework w capability ation of that see maps from r on paper or hat means, are f abstraction is Figure 40, from n, Bottom, the k, 2008 ealed in the left manual paper crossing a road area, drawn at a h junction and on, but is an may be accurate, m r e s m t r d a d n , Volume XL-1W1, ISPRS Hannover Workshop 2013, 21 – 24 May 2013, Hannover, Germany 415 but is not precise, nor is it complete. There are many ways of storing data, here on paper or in a GIS, and several ways of structuring the data using a variety of syntactic and schematic models. What is needed, as proposed by Stock and others listed earlier in this text, is the ability to integrate models based on global schema and perhaps to use Bennett’s standpoint theory to help remove vagueness from the mix. Mary McRae, until 2010 working with OASIS Organization for the Advancement of Structured Information Standards , is well known for a presentation containing the slide “Standards are like parachutes: they work best when theyre open”, possibly derived from LE Modesitt Jr, or was it Frank Zappa, or Elvis? The point is well made, however. Open standards are essential if people are to be willing to use them. They must be, according to OSGeo, freely and publicly available, non- discriminatory, with no license fees, agreed through formal consensus, vendor neutral, and data neutral. Note that open standards does not mean open source. Standards are usually documents; sources tend to be software. See the paper from OSGeo on this subject – Open Source and Open Standards at http:wiki.osgeo.orgwikiOpen_Source_and_Open_Standards . The OGC recommends many geospatial standards to users, including GML Geography Markup Language as its base. See http:en.wikipedia.orgwikiGeography_Markup_Language . GML is the XML grammar for expressing geospatial features. It offers a system for data modelling and as such is used as a basis for scientific applications and for international interoperability. It is at the heart of INSPIRE Infrastructure for Spatial Information in the European Community, the European initiative, and for GEOSS Global Earth Observation System of Systems . See http:inspire.jrc.ec.europa.euindex.cfm and http:www.earthobservations.orggeoss.shtml . New standards from OGC are continuing, with community-specific application schemas introduced to extend GML. GML 3.0 is a generic XML defining points, lines, polygons and coverages. It extends GML to model data related to a city using CityGML for representation, storage and exchange of virtual 3-D models. There was a workshop in January 2013: CityGML in National Mapping . See http:www.geonovum.nlcontentprogramme- workshop-national-mapping . Other standards include: the Web Feature Service WFS, for requesting geographical features; the Web Map Service WMS, for requesting maps using layers, to be drawn by the server and exported to the client as images; and the Styled Layer Descriptor SLD which provides symbolisation and colouring for feature and coverage data. The OGC is an international organisation, but so is ISO, and both generate standards for the geodata community to use. The OGC was started with mainly industrial, commercial and some research membership, whereas ISO, in the form of Technical Committee 211 ISOTC211, was instituted as a completely independent body with members delegated by participating nations, usually but not always from appropriate national standards bodies. Both operated in the same field but, fortunately, for many years OGC and ISO have cooperated closely in standards specification to the benefit of both. Recently ISOTC211 has published the text for standard 19157, Geographic information — Data quality , to specify standards for ensuring that quality of geographic information can be implemented, measured and maintained. The data quality elements consist of: completeness of features; logical consistency , adherence to data structure rules; positional accuracy within a spatial reference system; thematic accuracy of quantitative and qualitative attributes; temporal quality of a time measurement; usability element based on user requirements; and lineage: provenance of the data. Interestingly, all but logical consistency need to be verified by some ground truth action. The standard proposes that different methods of sampling see Figure 41 will be necessary for different data types. Figure 41: Choosing a sampling strategy for quality checking by logical feature or by areal selection ISOTC19157. There is flexibility as to which strategy is chosen based on either a complete population survey, probabilistic or judgemental sampling procedure. The choice depends on the data being tested. 6 ACCURACY, QUALITY, AND TRUST Figure 42: MODIS top and GLC-2000 bottom satellite images covering a 20km x 20km area around the UK town of Milton Keynes, showing automatic classification by different but standard approaches, to indicate similar land cover classes. In both images deep pink is cropland or natural vegetation, red is urban and built up areas. Volume XL-1W1, ISPRS Hannover Workshop 2013, 21 – 24 May 2013, Hannover, Germany 416 Quality check taken from Fri satellite image There is consi classification classification, automated alg 2000 image at time luminos combination w The differenc estimates urba the percentage products agree meet ISO1915 this and the g field. Fritz 2 survey was EuroGEOSS, G The Geo-Wik now looking f “Volunteers cover disag actually see the land cov recorded in used in the f global land The volunteer photos, tag the Wiki. Figure project. Prese contributors re checking exerc Figure 43: ground tru This technique studies, and en contributions. up, the averag with a median since 2009, wi earlier tend to do the vast ma It is possible t are, but som Usually this is ground survey This is probab additional aeri with imagery interpretation; king can, in pra itz et al 2011 es of the area iderable differe that has be shown at th gorithm trained t the bottom of sity set at a with expert kn ce is remarka an area at abou e of pixels on es is less than 3 57 standards general problem 2011 reported to be carried GEOBENE, CC ki Project at htt for s . . . to revie greement and e in Google Ea ver maps are co a database, al future for the cover map .” r can downloa em with approp 43 shows part ence in the tab eceive, other th cise. left High Scor uth checking, a regist e is used in a nu ntirely depende Thus far April e contribution i n of around 30. ith later peaks a have complete ajority of the wo to assess how a me true version s a national map yors and consid bly as reliable a ial photography y is not one there are actice, be very illustrates the p a near Milton ence in the resu een used. Th he top, was g d using calibrat f the figure was a relatively h nowledge, to c ably large. Th ut 50 greater n which each o 30. This pres The only succ m is through gr that a crowdso d out from 2 C-TAME and C tp:www.geo-w ew hotspot map determine, bas rth and their lo orrect or incorr long with uploa creation of a n ad an App fo priate text and u t of the High ble is the only han taking part re table for sate and right peak tration date. umber of enviro nt on crowdsou l 2013 346 vol is nearly 400 ce Volunteers hav and troughs. Th d more work, a ork. accurate the cro n must be us p, created over derable editing as any base can y or similar im of determinin no easily m difficult. Figur problem well fo Keynes in the ults of the auto he MODIS i generated usin ion data. The G s created using high threshold lassify urban a he MODIS pr than the GLC-2 of these land sent result woul cessful resoluti round checks i ourced ground 011, supporte CC_AFS. wiki.orgindex.p ps of global la sed on what th ocal knowledge rect. Their inpu aded photos, to new and improv r their phone, upload them to Score table fo y incentive the t in a useful qu ellite classificati s and troughs o onmental resour urced VGI lunteers have si ells field checke ve been signing hose registering and as always a owdsourced pro ed for compar many decades over those dec be, other than magery. The pro ng accuracy bu measured auto re 42, or two e UK. omatic image ng an GLC- night d, in areas. roduct 2000; cover ld not on to in the truth ed by php is and hey e, if ut is o be ved take Geo- or this VGI quality ion of rce igned ed, g up few oducts arison. using cades. using oblem ut of omatic poss dip i wher defin param comi attrib cons 6.1 The node down to ha samp tryin was 921, 1,662 The accu beca truth Moo disco were subs F ver the It is all fo majo edite cons 95.3 3,549 and A may the e hypo main Moo been chan Olym inade Moo datas VGI ibilities of line into a churning re the monster nitional problem meters for accu ing sections th bute accuracy istency, as reco Accuracy – O OSM global d es OSM-Stats, nload but is cu andle, so subse pling. Anderka ng to assess the run on 20th N 816; uploaded 2,862,568; way global stats ar uracy of OSM ause of OSM siz h sets only cove oney et al 2012 over how much e using the UK et. Figure 44: From rsions of edited number of data axis, for interesting to n our datasets hav ority of ways h ed a very few idered finished 3, 232,707 fr 9,831, Germa Austria 89.0 not indicate th editor – and a othesis could b nly in rural ar oney does not co n ferociously ed nging landscap mpic stadium si equate survey in oney 2011 se sets. A lack of contributors co e-point-area com g ocean of atte rs of the deep ms rather than uracy have been here will be d y, completene ommended in th SM compared database contai 2012. The hist rrently close to ets tend to be a et al 2011 quality of Wik ovember 2012 d GPS points ys - 157,983,53 re huge, so eve has been forc ze, and partly b red some nation looked at the h dynamic chan K, Ireland, Au m table 1 in Moo ways on the ho asets falling into r Germany, Aus note that a very ve five or fewer have been gath times at very d. The percent rom 244,192, any 93.1, 10 , 1,085,003 fro hey are accurate any others – m be that low v reas that do n onsider this pos dited by many p e, such as wa ite in Figure 33 n the first place es a number o cartographical, ould be a majo omparison. Wha empts to determ p are often the n those of mea n defined in ISO discussions of ess, currency, he standard. d with OS ins several bill story dump file o 500 GB in siz used rather th had similar p kipedia. An OS to find exact f - 3,182,076, 32, and relation erybody who h ced to take a because their na nal areas, not th annotation pro ange the editing ustria and Ger oney 2012, the orizontal axis co o that category stria, and UK-E y large percent r versions. The hered in the f few sessions tages are as fo UK 95.4, 3 0,445,536 from om 1,219,045. te or complete; merely lost int version number not change mu ssibility. Some people. This cou as the case fo 3, or it might be e of risks to the , surveying, an or issue, differe at follows is a mine accuracy, e semantic and asurement. The O 19157, and in positional and and logical lion points and is available for ze and difficult han a complete problems when SM stats report figures: users - ,380; nodes - ns - 1,667,558. has studied the subset, partly ational mapping he entire world. cess in OSM to g caused. They rmany as their e number of ompared with on the vertical Eire. tage of ways in e overwhelming field, uploaded, and then been ollows: Ireland 3,384,643 from m 11,226,308, Of course this it may be that terest. Another r counts occur uch over time. few areas have uld be a rapidly or the London e that there was crowdsourced nd GIS skills of ent tags may be a , d e n d l d r t e n t - - . e y g . o y r n g , n d m , s t r r . e y n s d f e Volume XL-1W1, ISPRS Hannover Workshop 2013, 21 – 24 May 2013, Hannover, Germany 417 used to descri features, and t vandalism by structures wer In the UK Hak much VGI la checked four 2 by OSM volun OS ITN Integ are accurate to Figure 45: compared w similar meth disp OSM has capt quarter of th attributes. Ple minor roads, method outlin 3.8m buffer fo an 8m buffer f to approximat within this bu the roads abo were always m each square k than in a prev Figure 46, in 2 VGI contribut Figure 46: A blocks, calcu Me ibe the same w there is the pos some contribu re not in place to klay et al 2010 abour was req 25sqkm tiles in nteers for comp grated Transpo o 1m. : Accuracy mea with OS-ITN, f hod is proposed placement comp tured about 30 hese were spag enty of editing main roads an ned in Figure 4 or minor roads, for motorways, te the true road uffer 80 to 86 ove 85. The V more than 5 an kilometre of hi vious general su 2008 where nea ting crowdsourc Average positio ulated for node eridian 2 data. S web resources o ssibility of rogu utors. Mooney o find and elim 0 were concern quired to map n the urban Lon parison with the ort Network d asurement meth from Haklay et d in ISO 19157 pared with a ref of the linewor ghetti without remains to be nd motorways 45 to measure a , a 5.6m buffer placed around d width. He fou 6 of the time, VGI contributi nd up to 20 co is urban Londo urvey of VGI in arly 90 was m cers. onal accuracy le points, compar See Haklay et a or to annotate s ue edits or delib states that, in inate rogue edit ned to discover an area well ndon area comp e generally exc data set where n hod of OSM wa al 2010. A ve for measuring l ference set. rk for England, a complete s done. Haklay for testing, an accuracy. He u for main roads d the ITN centre und OSM road , and more than ion was large; ontributors map on tiles, many n England, show mapped by 3 or evels for 1km ro ring OSM with al 2010. spatial berate 2011, tors. r how , and pleted cellent nodes ay ery line , but a set of y used nd the used a s, and e line, ds fell n half there apping more wn in fewer oad OS Meri spec 20m are c node most areas posit than lies a sugg chan Hakl linke mate Figu incom 10m± latter stand F Bhat large with topo and conc varie and foun comp comp Kouk done In th mult OSM split were were cons of IT OSM were OSM urba perce order of ro town idian 2 data w ification indica buffer accurac correct. A comp es, and to asse t accurate with, s. Haklay say tional error sm 15 metres. The at about 7m. T gest that measu nce. lay 2010 als ed to VGI contr erial in poorer a ure 47 shows er mplete ones. P ±6.5m 1SD f r, indicating po dard of OSM m Figure 47: Distri comparison for Engl ttacharya 2012 ely confirmed t OSM but graphic map T accurate betw cluded; automa ed from 70 to building data c nd the general pleteness was pleteness was th koletsos et al 2 e by Haklay, us his later article ti-stage automa M roads. Simila into small 1km e localised. Roa e matched usin traints. Results TN roads could M roads with I e much poorer M matched ITN an and less th entage being d r of matcher an oads in ITN an n [20,734km v was used as th ates it has been y along roads, plex algorithm ss errors. The , as might be e s over 70 o maller than 12 m e mode of the e he shape of the urements better o showed that ribution in that areas, and the le ror levels for c Positional accur for the former a oorer socio-eco mapping. ibution of error r complete and and, from Hakl 2, in a master the previous fi his reference TOP10NL, whi ween scales of atic matching o 102 presum completeness v quality of O high, and in a he most signific 2012 performed ing OSM and I e, a rather mo ated process w r to the cells in m block areas, s ad entities, with g a combinatio were promisin d be matched w TN. As before : 60 of ITN N. Matching err han 4 rural different, and nd matchee, is nd OSM were 18,368km], mo the comparison n generalised but that junctio was used to ch major urban expected, larger of the OSM n metres and ano error distributio e distribution o r than 7m wer t socio-econom t OSM voluntee evel of complet complete areas uracy was foun and nearly 12m onomic areas do rs for the OSM incomplete OS lay et al 2010 r’s thesis in th indings. Bhatta set was th ich is consider f 1:5,000 and was possible; mably more de varied from 89 OSM was quit agreement with cant quality fac ed similar exper ITN for rural an ore developed was used to m n Figure 46 the so that compari h feature tags of on of geometri ng in that for ur with OSM road e, in the rural matched OSM rors were smal . The reason dependent on because the ab not the same ore ITN in coun n to OSM. Its to lie within a onnode centres hoose matching areas were the r errors in rural nodes show a other 80 less on in Figure 47 of errors would re obtained by mic factors are ers provide less teness is lower. compared with nd to be under m±7.7m for the o have a lower to Meridian M areas in , he Netherlands, acharya worked e Netherlands red both useful 1:25,000. He completeness etail in OSM?; to 97. He e good where h Haklay, that ctor. riments to those nd urban areas. feature based atch ITN with e data sets were ison and results f name and type ic and attribute rban areas 93 ds, and 81 of areas matches M, but 88 of ll, between 2 for the match the positional bsolute quantity more OSM in ntry [2,936km v s a s g e l a s 7 d y e s . h r e r , d s l e s ; e e t e . d h e s e e f s f h l y n v Volume XL-1W1, ISPRS Hannover Workshop 2013, 21 – 24 May 2013, Hannover, Germany 418 1,953km], an as they simpl shows how thi Figure 48: Go OSM data. O What Koukole has surveyed publicly availa might be exam reference set. data not foun either. Anand et al with feature r They conclude to be sufficien feature type in matching and standards has likely to be pre Figure 49: Pi of the similar p Al-Bakri et al on XML sche view of both that Path in O level. Similarl at different lev quite awkward be gained from OS ITN and O perform better nd some roads i ly did not exis is can happen. ogle backgroun OSM has extra from Kouko etsos calls OSM something that able mapping. N mples of this k Similarly ther nd in OSM, an 2010 used sim ich informal O ed that simple nt and that, agre nformation was enriching the meant that me esent in most V ieces of the OS arities and differ position. From 2012 suggest ema, as in Figu schemas for O OS is Footway ly Rail, though vels. This make d. Nevertheless m trying to ma OSM roads to se r than a purely g in both data set st in the other nd, not quite reg data in this part oletsos et al 20 M commission the OS does n New estates and ind of extra ma re are cases o nd sometimes milar methods OSM, and to ke geometric matc eeing with Kou going to be ver data. Introduct etadata are now VGI data sets. and OSM sche rences, both in Al-Bakri et al t ways of simila ure 49, which s OS MasterMap in OSM and o the same term es calculation o s, there should atch schemas D ee if an ontolog geometric one t ts were unmatc data set. Figu gistered with O ticular area. Ta 12. data is when not yet have on d areas of rebui ap data over th f OS commiss just plain erro to try to matc eep the best of ching was not g ukoletsos 2012 ry important, bo tion of internat w specified and emas showing s terminology an 2012. arity matching b shows a very p and for OSM. occurs at a diff m in both schem of similarity ma be some benef Du et al 2011 gical approach w to road matchin chable ure 48 S and aken OSM n their ilding he OS sioned ors in ch OS both. going 2, the oth in ational quite some nd in based partial Note fferent mas, is atches fits to used would ng. Figu in attr Onto simil matc of nu Geom matc spec level foun – fou Failu spec 90 met an on Du geos onto geos ensu OW form OGC Java data to tr http: the 105 prod with meth need and proc ure 50: OS top ncluding schem ribute type and n ologies by them larity and over ching approach ulls in OSM att metric error, ching were used ific pairs of OS l could then be nd. The output f und 111 match ures were due ific road conne . Du concluded tadata to allow ntological appro et al 2012 c patial ontologi logies, and then patial ontology ure consistency WL 2, W3C [2 mats to facilitate C W3C standar implementatio from disparate ranslate, merge sourceforge.n open source c test road match duced 92 new ge around 90 i hod has insuffi ds further work in solving prob ess. The princip p and OSM bo ma information, name can be dif road is sele mselves were n rlap. Du consi Figure 50 to t tribute fields an network conn d to assign a pr S and OSM road e used to determ from the open s hes in OSM o to very differe ections, but th d that greater su w better ontolog oach, no formal continued work ies by first co n merging these y that could b y. The OWL 2009], develop e information s rd web ontolog on for Du’s exa e sources. The e, amend and netprojectsgeoo code. Rather dis hes drawn from eometries, abou n the previous icient worth to to refine the p blems of incon ple is good and ottom data for from Du et al ifferent. The red lected. not adequate to idered a proba try to circumve nd feature type nections, and robability to a d data. A thresh mination if a m source based so out of 116 road ent topological he success rate uccess required gy matching. A al ontologies we k on automate onverting inpu e structures into be checked an 2 Web Ontol ped as one o sharing and int gy language. It ample to integr authors develo regenerate geo ontomergingfi sappointingly t m OS and OSM ut 85 of the in paper. This is o be continued process of linki nsistency during the goal is pos the same area, 2011. Note d OSM airport o infer concept abilistic feature ent the problem dissimilarities. feature type match between hold probability match had been oftware – QGIS ds in OS ITN. l structures for was still over more complete Although using ere generated. ed merging of ut data sets to o a new flexible nd modified to logy Language f the standard tegration, is an was used in a ate road vector ped algorithms ometries. Visit les to fetch he output from M data sets only nput, compared not to say the d; rather that it ing information g the matching sible. t e m . e n y n S . r r e g f o e o e d n a r s t h m y d e t n g Volume XL-1W1, ISPRS Hannover Workshop 2013, 21 – 24 May 2013, Hannover, Germany 419 Are VGI contributors ever going to desire to exploit fully the geospatial standards available; or indeed to have any interest in them? How far can loose knit organisations like OSM impose structure and completeness on their members; and, should they? The evidence from this review would suggest that the majority of the work is done by competent committed people who do try to produce completed maps, with full attribute sets, and careful editing. They may be unpaid, but they appear to be anything but uncaring And, they are fast. There seems to be no question that timely update of features that interest them is a major strength of the VGI mapping process. The numbers of VGI participants who are involved in any particular project can be absolutely staggering, even if the distribution of effort means only relatively few do most; a few thousand or so perhaps? Raymond 2001 discusses what he calls Linus’ Law Linus Torvald, of Linux. He says in the Cathedral and the Bazaar pp14-18: “Given enough eyeballs, all bugs are shallow .” In open source developments, many programmers are involved in the development, scrutiny, and testing the code, so software becomes increasingly better without formal quality assurance procedures. In mapping, perhaps this can be translated into the number of enthusiastic contributors that work on a given theme in a given area Haklay et al, 2010. Most bazaar dwellers including most of us, dear friends are either not aware or at most partially aware of the standards we are trying to implement, but the web software tries its best to prod us into good behaviour, so no problem really? Will they map the bits others can’t or don’t want to reach? Almost certainly. Will they do things we, the supposed professionals have not thought of? Oh yes, but not always for the best, perhaps. Those rural areas are a real problem if one is thinking of backing up a national mapping organisation with VGI help. Volunteers are colourful and bumptious and the results a bit prickly, but if their enthusiasm adds value – by no means necessarily only or at all in monetary terms – and is reasonably accurate, then why not? Volunteers have a good editing record as seen in all the community projects using OSM. Is VGI a major opportunity for national mapping agencies? Yes 6.2 Quality in VGI Figure 51: from Grira et al 2009, volunteer freedom of interaction compared with authoritarian constraint. The relationship between the actors in mapping is illustrated on a theoretical basis in Figure 51, taken from Grira et al 2009. The graph in Figure 51 can be used to position the trends, tools, products, and processes in geospatial activity. Grira invoked two axes: one from authoritarian to volunteered, which might be called a power axis, where many people, volunteers individually have little power, but a few organisations, national mapping have considerable authority but also considerable constraint on their actions. The other axis is one of capability: where full capability is exhibited by volunteers in the form of mashups developed for a disaster mapping site perhaps, using open source software, considerable ingenuity and a large user base; and the other end of the same axis would, amongst other consist of internet surfers viewing contributed maps. The national mapping agency lies to the right in the diagram, including both the highly trained surveyor in the field influencing changes in a map, and in the constrained and limited corner print shop, where the map is printed correctly, or perhaps not. Grira considered there were three gaps threatening the success of volunteered efforts in relation to institutions: − A normative gap – the difference between quality expected by end users and quality standards developed by organisations, for instance ISO 19115 on metadata compliance. Very, very few volunteers will have looked at, let alone read and understood the specification. − A technological gap – GIS or map servers provide geographic information; it may well be an uphill struggle for users or volunteers to upload feedback and comments about data quality. In the age of VGI, institutions that supply mapping will need to reconsider their role in relation to users. − A cognitive gap – volunteers need to be able to use less static types of metadata, and introduce their own terminology. At the same time producers must realise that user perception of the meaning of quality is often confused with that of accuracy see section on trust later in this paper. Figure 52: From Cooper 2011. African based 2D VGI taxonomy, and the advent of the crowdsourced SDI. Cooper et al 2011 took a far more practical approach to the problem of a 2D taxonomy of VGI, with one axis called Determination of Specifications instead of Power, and the other axis Type of Data instead of Capability, and the entries are real web sites rather than theoretical activities, but the principles of construction are the same. At the top right in Figure 52 is an as yet probably non-existent, possibly national, perhaps international, crowdsourced SDI where users contribute data according to tight specifications from the SDI custodian, who would then subject the VGI to full quality assurance processes. This concept has become a distinct possibility as open source software platforms become both more Volume XL-1W1, ISPRS Hannover Workshop 2013, 21 – 24 May 2013, Hannover, Germany 420 integrated and users more sophisticated. In the same quadrant OSM can be found: a crowdsourced product but with less draconian specifications. In the lower right quadrant are regional or national point location databases for Points of Interest , such as Precint Web, a South African crime mapping site. The lower left quadrant contains sites which require minimal custodial activity and contain location data – such as the Google Panoramio site that holds photos for Google Earth, without much location editing or content checking, and hence fairly prone to error. The top left quadrant is effectively empty as fundamental data sets Base data are very unlikely to be held by and for an individual user. National mapping agencies provide data of generally higher quality than VGI output, as they have better trained staff, better kit, they understand and are concerned about standards, and survey is their job of work. But, national responsibility for national coverage at a particular range of scales might result in significant delays before they can update data in certain areas. Most surveys keep a record of known changes that have occurred on map sheets and hope to update the maps when these changes have reached a specified threshold level. This requires the agency both to notice change has taken place, presumably by aerial photo survey or ground detection, and is quite costly. If at the very least the public could be persuaded to form part of the notification process, then VGI might be the best available method to document changes when they happen and simultaneously result in revision requests being submitted to the relevant agency. A stage more helpful, but risky perhaps to the fundamental SDI, would be VGI data gathering and mapping initiatives, heading towards the fully crowdsourced SDI region in Figure 52. Poor quality data might result from VGI, but it might also be produced by commercial contractors. In all cases – VGI, internal or contracted – systems are needed to check the data quality, presumably to ISO 19157 and other standards. The provision of metadata to ISO 19115 standard?, peer pressure and review should keep VGI contributors on the straight and narrow. If it does not, then software tools are being developed to assess the quality aspects of imported data and to ensure their consistency with the national depository. These tools are not yet fully functional see previous sections in this paper but are showing promise in the research laboratory. ISO 19158 on Quality Assurance of Data Supply Jakobsson, 2011, now issued as a published standard 2012, is a major attempt to ensure validation of data input to measurable quality standards. The standard allows three different levels of accreditation to be certified, and these would pass with the data to all user generated output. It will be particularly useful for assessing data transfer potential between countries for the INSPIRE programme within Europe. Whether the ISO 19158 specification is framed sufficiently widely to be able to be used to certify VGI data – even data as robust as OSM data seems to be – is an interesting question. Yet this goal must be achieved if VGI data is to enter the marble halls of NSDIs. Perhaps NSDIs should recognise levels of trustworthiness for data, rather than set absolute standards? The question of how to assign trustworthiness indicators to datasets that do not meet all standards requirements and whether this might be a solution where data is needed but better is not yet available, is considered later in this paper. 6.3 Trust in VGI User trust is different in form from agency trust as the user has different expectations and is concerned about different aspects of VGI than the agency. The former wants thematic accuracy most of all with some, possibly mostly, topological regard for locational consistency, whereas the agency tends to reverse this order, or even insist on both. Goodchild 2009 considered “uncertainty regarding the quality of user-generated content is often cited as a major obstruction to its wider use ”, and considered that crowdsourcing ventures should always publish assessments of quality. With the advent of the appropriate trustworthy ISO standards perhaps this has now become possible? Stark 2010 tried an address matching method to asses accuracy, quality and trust. The Swiss canton of Solothurn had over 90,000 geocoded addresses acting as the reference set based on national cadastral survey data. If people using an address matching service are going to trust it then the address location displayed must point reasonably closely to the mailbox location on map or photo. Goodchild 2007 thought failure in achieving good locational matches to be a major vulnerability to acceptance of a VGI approach to generating location data. Figure 53: Part of canton of Solothurn, from Stark 2012. The cadastral map at the top shows correct house locations. At the bottom are the attempts by Google Maps, Bing and Yahoo Maps to locate addresses on an air photo background, with the reference locations displayed. Stark used Google Maps, Bing and Yahoo Maps to locate the addresses, and then compared these locations with the cadastral survey derived information see Figure 53. Over 95 of the addresses were found for all three open WMS geocoders but the locations given were quite suspect in some cases. Figure 53 demonstrates a particularly poor area, where Bing and Yahoo Volume XL-1W1, ISPRS Hannover Workshop 2013, 21 – 24 May 2013, Hannover, Germany 421 Maps linearly road. Google probably beca shows the dist Figure 54 Dis the whole can axis; percent The modal dis perhaps less geolocators, b pattern to tho Google Maps compared with The effect of terms of trust was two years but more exam of this paper’ navigation sys house, and on 1975 on his user developin consequences? Kessler et al Flikr photos. gazetteers use types for nam building block Figure 55: K Pink dots Photo location of “Manhattan in, or of, Manh y interpolated a tried harder an ause it used the tribution of loca stribution of dis nton. Error in m tage in each dist From S stance error for than 5m – an but the distribu ose visible in s error tail h the other two. mislocations s by the user po s ago and the s mples continue ’s authors disc stem did not kn nly one of six cul-de-sac was ng a lack of t ? 2009 investig An example e place names, med geographi k for spatial sea Kessler et al 200 s show the vary Ma n errors have be n” photos becau hattan, Some ph address location nd was mostly cadastral surve ation errors for stance errors in metres is shown tance category Stark 2010. the entire canto nd roughly the tions of error s the small exa is very signi . such as these i opulation. One situation has no to crop up from overed his new now the numbe x houses built s recognised; de trust Small fa gated similar lo is given in geographic foo ic places and rch engines. 09 Flickr geolo ying locations o anhattan . een made in pi use they may ha hotos for exam ns along the c correct for this ey source. Figu the three system address locatio along the horiz along vertical a on data set is sm same for all show a very si mple in Figur ficantly suppr is very damagi e might say tha ow improved; i m other sources w Nov 2012 er or location o between 1830 efinitely a case ailures can hav ocation problem Figure 55. D otprints, and fe are a fundam ocation failure t f photos tagged cking the corre ave been either ple have been losest s area ure 54 ms. on for zontal axis. mall – three imilar re 53. ressed ing in at this it has, s. One car’s of his 0 and e of a ve big ms for Digital eature mental test. d ect set taken taken on th in te solvi using quali cross reaso and u noise new Gaze reinf pape reaso Bror Sens inter

6.4 Do