Fraudulent PedigreeProvenance web 2.0 security and privacy

5.5.1.2 Example 2: Control of Botnets via Mashups

Botnets are networks of millions of small software programs installed by malware on users’ machines which are used for extortion usually financial – for more information see Botnets – the silent threat 37. Botnets are normally controlled by channels such as Internet relay chat IRC; however security researchers have recently identified attacks which use Web 2.0 features such as blog postings and email to RSS interfaces to control Botnets. The use of such channels for command and control of the botnets makes tracing the attacker much more difficult and, because of the open nature of the mashup architecture, it means that control interfaces are very difficult to shut down. Mashups are perfectly suited to massively distributed systems with untraceable control structures and are therefore likely to lead to a variety of related attacks see Use of Web 2.0 technologies to control botnets 38 and With Web 2.0, a new breed of malware evolves 39 for more information.

5.5.2 Vulnerabilities of Collaborative Knowledge Systems

The most unique vulnerability of collaborative knowledge-bases is the editing mechanism. By default, articles on a wiki grant ‘write’ privileges to both anonymous and registered users. The access control is generally very coarse, falling into the following categories: • Editable by anyone; • Editable only by registered users; and • Editable only by editors. There is usually little or no means of establishing trust, integrity or the provenance of statements made. For example, there is often no validation of user-identity, no signature available for content and often no use of reputation-based systems. If reputation-based systems are used then the vulnerabilities in these – eg, collusion, whitewashing, Sybil attacks, etc – are often not addressed. For more information, see Reputation-based systems: a security analysis 40.

5.5.2.1 Example: Medical Data

The unreliability of information can have dangerous consequences – for example only 25 of users surveyed in Online Health Search 2006 41, who search the Internet for health advice regularly, check the source and date of the information they find to assess its quality. Tools such as Wikiscanner 42, which trace source IPs of collaboratively produced articles, show that the deliberate introduction of bias and misinformation is frequent.

5.5.3 Metadata Attacks

The use of metadata formats such as ID3 for mp3, RSS and other RDF metadata is central to the operation of Web 2.0 applications. Metadata has several important security vulnerabilities: • Integrity and trust infrastructure for metadata is virtually non-existent – ontologies, tag clouds and schemas can often be altered or substituted or used to provide an attacker with extra information via unintended inferences 43.There are also serious problems with digital signatures for RDF data such as ontologies and RDF, which means that tools for digital signatures over RDF are non-existent 44. • There is no standardised way to add provenance data to most metadata formats – this lies at the root of many of the knowledge management vulnerabilities described above, eg, those related to syndication. • Metadata misuse is often the source of phishing attacks because it can be used to mislead the user about where a web resource comes from, eg, a wrong link attached to an image. Web 2.0 Security and Privacy 27 • Hidden metadata on Web 2.0 pages and sources can disclose private information – as the case of Wikiscanner 42 shows – as well as metadata hidden in documents; see Microsoft Word’s Hidden Tags Reveal Once-Anonymous Peer Reviewers 46 and Psst Your Metadata is Showing 47 for examples. Metadata is often the source of malformed syntax leading to buffer overflows – there are thousands of examples of such vulnerabilities – one example being CVE-2008-0065 45. More web-media-related examples are described in Exposing Vulnerabilities in Web software 48. • Codecs, a binary format used to provide interoperability for media display formats, although not strictly speaking metadata, are another important related class of add-on to Web 2.0 media which contains considerable vulnerabilities, some of which are also described in 48.

5.5.4 Privacy Violations

Theft or misuse of personal data was the top concern of end-users in a Web 2.0 context according to the ENISA survey 1. This stresses the importance of upholding the fundamental rights to privacy, anonymity and informational self-determination enshrined in, for example, the European convention on Human Rights article 8 49 and European data protection laws such as Directive 9546EC 50.

5.5.4.1 Image Tagging and Search

The posting and tagging of images or reports by private individuals is an increasing problem where users publish images without the consent of the subject. This is exacerbated by the introduction of face-recognition software into mainstream photo-sharing applications such as Google Picasa 9, leading to a huge increase in the number of tags in photos which therefore means that photos are easy to find and link together. This has serious implications for privacy, particularly since they are distributed with EULAs which stipulate that tagging is on an opt-out basis ie, if you don’t like the fact that someone has tagged an image, you must explicitly ask the publisher to remove it. In fact, ENISA’s survey 1 showed that a significant number of users are unhappy about the tagging of images in various scenarios: 60 are unhappy about a colleague tagging a photo with a social network profile and 44 unhappy about a friend doing the same. Such software could also be used for linking anonymous profiles to personal data by linking together similar images. The issue of image posting by private individuals is being taken seriously by governments – Poland, for example, has a draft bill before parliament as of August 2008 for a new penal code which contains an article introducing a penalty ranging from 3 months’ to 3 years’ imprisonment for distributing pornographic images of a person without their permission and a penalty from a fine to 2 years of imprisonment for distributing naked images without permission.

5.3.4.2 Behavioural Marketing

Behavioural marketing is a very common source of revenue in Web 2.0 applications where content and the exchange of personal data are often the main value proposition. Marketing companies generally operate on the premise that anything which does not include explicit personal data fields such as name, address, email or birthdate is not personal data – and therefore not subject to data protection laws. We note that such a premise is extremely questionable with respect to the definition of personal data outlined by the Article 29 Working Party 51 whereby an IP address may be considered personal data. The issue of what really is anonymised data is a much more complex issue. Data on hobbies, friendships and search terms, etc, may be extremely personalised and may serve to identify an individual just as much as their name, address and birth date. Furthermore their knowledge by a third party may result in negative consequences on their personal or economic life andor discriminatory exclusion from services. This leads to a situation where some providers are processing data which is in reality extremely personal data by virtue of being specific to an individual and thereby able to identify them but are technically not subject to data protection law. This is the subject of the debate around behavioural marketing. Web 2.0 Security and Privacy 28