Namespaces Structure of XML Documents

Chapter 6 • XML and Data Representation 223 1 ?xml version=1.0 encoding=UTF-8 standalone=yes? 2 book 3 bk:cover xmlns:bk=http:any.website.netbook 4 bk:titleA Book About Namespacesbk:title 5 bk:authorAnonymousbk:title 6 bk:isbn number=1881378241 7 bk:cover 8 bk2:chapter xmlns:bk2=http:any.website.netbook 9 ch_name=Introduction 10 bk2:paragraph 11 In this chapter we start from the beginning. 12 ... 13 bk2:paragraph 14 . . . 15 bk2:chapter As can be seen, the namespace identifier must be declared only in the outermost element. In our case, there are two top-level elements: bk:cover and bk:chapter, and their embedded elements just inherit the namespace attributes. All the elements of the namespace are prefixed with the appropriate prefix, in our case “bk.” The actual prefix’s name is not important, so in the above example I define “bk” and “bk2” as prefixes for the same namespace in different scopes. Notice also that an element can have an arbitrary number of namespace attributes, each defining a different prefix and referring to a different namespace. In the second form, the prefix is omitted, so the elements of this namespace are not qualified. The namespace attribute is bound to the default namespace. For the above example Listing 6-3, the second form can be declared as: Listing 6-4: Example of using a default namespace. 1 ?xml version=1.0 encoding=UTF-8 standalone=yes? 2 book 3 cover xmlns=http:any.website.netbook 4 titleA Book About Namespacestitle 5 authorAnonymoustitle 6 isbn number=1881378241 7 cover 8 . . . Notice that there can be at most one default namespace declared within a given scope. In Listing 6-4, we can define another default namespace in the same document, but its scope must not overlap with that of the first one.

6.1.4 XML Parsers

The parsers define standard APIs to access and manipulate the parsed XML data. The two most popular parser APIs are DOM Document Object Model based and SAX Simple API for XML. See Appendix E for a brief review of DOM. SAX and DOM offer complementary paradigms to access the data contained in XML documents. DOM allows random access to any part of a parsed XML document. To use DOM APIs, the parsed objects must be stored in the working memory. Conversely, SAX provides no storage and presents the data as a linear stream. With SAX, if you want to refer back to anything seen earlier you have to implement the underlying mechanism yourself. For example, with DOM an Scope of b k Scope of b k2 Scope of default n.s. Ivan Marsic • Rutgers University 224 application program can import an XML document, modify it in arbitrary order, and write back any time. With SAX, you cannot perform the editing arbitrarily since there is no stored document to edit. You would have to edit it by filtering the stream, as it flows, and write back immediately. Event-Oriented Paradigm: SAX SAX Simple API for XML is a simple, event-based API for XML parsers. The benefit of an event-based API is that it does not require the creation and maintenance of an internal representation of the parsed XML document. This makes possible parsing XML documents that are much larger than the available system memory would allow, which is particularly important for small terminals, such as PDAs and mobile phones. Because it does not require storage behind its API, SAX is complementary to DOM. SAX provides events for the following structural information for XML documents: • The start and end of the document • Document type declaration DTD • The start and end of elements • Attributes of each element • Character data • Unparsed entity declarations • Notation declarations • Processing instructions Initiating the parser . . . Parser parser = ParserFactory. makeParser com.sun.xml.parser.Parser; parser.setDocumentHandlernew DocumentHandlerImpl; parser.parse input; . . . DocumentHandler Interface public void startDocumentthrows SAXException{} public void endDocumentthrows SAXException{} public void startElementString name, AttributeList attrs throws SAXException{} public void endElementString namethrows SAXException{} public void characterschar buf [], int offset, int lenthrows SAXException{} ?xml ... element attr1=“val1” This is a test. element element attr1=“val2” end of the document startDocument startElement characters endElement endDocument Document Handler Event triggering in SAX parser: Figure 6-3: SAX parser Java example. Chapter 6 • XML and Data Representation 225 Object-Model Oriented Paradigm: DOM DOM Document Object Model Practical Issues Additional features relevant for both event-oriented and object-model oriented parsers include: • Validation against a DTD • Validation against an XML Schema • Namespace awareness, i.e., the ability to determine the namespace URI of an element or attribute These features affect the performance and memory footprint of a parser, so some parsers do not support all the features. You should check the documentation for the particular parser as to the list of supported features.

6.2 XML Schemas

Although there is no universal definition of schema, generally scholars agree that schemas are abstractions or generalizations of our perceptions of the world around us, which is molded by our experience. Functionally, schemas are knowledge structures that serve as heuristics which help us evaluate new information. An integral part of schema is our expectations of people, place, and things. Schemas provide a mechanism for describing the logical structure of information, in the sense of what elements can or should be present and how they can be arranged. Deviant news results in violation of these expectations, resulting in schema incongruence. In XML, schemas are used to make a class of documents adhere to a particular interface and thus allow the XML documents to be created in a uniform way. Stated another way, schemas allow a document to communicate meta-information to the parser about its content, or its grammar. Meta- information includes the allowed sequence and arrangementnesting of tags, attribute values and their types and defaults, the names of external files that may be referenced and whether or not they contain XML, the formats of some external non-XML data that may be referenced, and the entities that may be encountered. Therefore, schema defines the document production rules. XML documents conforming to a particular schema are said to be valid documents. Notice that having a schema associated with a given XML document is optional. If there is a schema for a given document, it must appear before the first element in the document. Here is a simple example to motivate the need for schemas. In Section 6.1.1 above I introduced an XML representation of a correspondence letter and used the tags letter, sender, name , address, street, city, etc., to mark up the elements of a letter. What if somebody used the same vocabulary in a somewhat different manner, such as the following? Listing 6-5: Variation on the XML example document from Listing 6-1.