Chapter 6 • XML and Data Representation
225
Object-Model Oriented Paradigm: DOM
DOM Document Object Model
Practical Issues
Additional features relevant for both event-oriented and object-model oriented parsers include: • Validation against a DTD
• Validation against an XML Schema • Namespace awareness, i.e., the ability to determine the namespace URI of an element or
attribute These features affect the performance and memory footprint of a parser, so some parsers do not
support all the features. You should check the documentation for the particular parser as to the list of supported features.
6.2 XML Schemas
Although there is no universal definition of schema, generally scholars agree that schemas are abstractions or generalizations of our perceptions of the world around us, which is molded by our
experience. Functionally, schemas are knowledge structures that serve as heuristics which help us evaluate new information. An integral part of schema is our expectations of people, place, and
things. Schemas provide a mechanism for describing the logical structure of information, in the sense of what elements can or should be present and how they can be arranged. Deviant news
results in violation of these expectations, resulting in schema incongruence.
In XML, schemas are used to make a class of documents adhere to a particular interface and thus allow the XML documents to be created in a uniform way. Stated another way, schemas allow a
document to communicate meta-information to the parser about its content, or its grammar. Meta- information includes the allowed sequence and arrangementnesting of tags, attribute values and
their types and defaults, the names of external files that may be referenced and whether or not they contain XML, the formats of some external non-XML data that may be referenced, and the
entities that may be encountered. Therefore, schema defines the document production rules. XML documents conforming to a particular schema are said to be valid documents. Notice that
having a schema associated with a given XML document is optional. If there is a schema for a given document, it must appear before the first element in the document.
Here is a simple example to motivate the need for schemas. In Section 6.1.1 above I introduced an XML representation of a correspondence letter and used the tags letter, sender,
name , address, street, city, etc., to mark up the elements of a letter. What if
somebody used the same vocabulary in a somewhat different manner, such as the following?
Listing 6-5: Variation on the XML example document from Listing 6-1.
Ivan Marsic • Rutgers University
226
1 ?xml version=1.0 encoding=UTF-8? 2 letter
3 senderMr. Charles Morsesender 4 street13 Takeoff Lanestreet
5 cityTalkeetna, AK 99676city 6 date29.02.1997date
7 recipientMrs. Robinsonrecipient 8 street1 Entertainment Waystreet
9 cityLos Angeles, CA 91011city 10 body
11 Dear Mrs. Robinson, 12
13 Heres part of an update ... 14
15 Sincerely, 16 body
17 signatureCharliesignature 18 letter
We can quickly figure that this document is a letter, although it appears to follow different rules of production than the example in Listing 6-1 above. If asked whether Listing 6-5 represents a
valid letter, you would likely respond: “It probably does.” However, to support automatic validation of a document by a machine, we must precisely specify and enforce the rules and
constraints of composition. Machines are not good at handling ambiguity and this is what schemas are about. The purpose of a schema in markup languages is to:
• Allow machine validation of document structure • Establish a contract how an XML document will be structured between multiple parties
who are exchanging XML documents There are many other schemas that are used regularly in our daily activities. Another example
schema was encountered in Section 2.2.2—the schema for representing the use cases of a system under discussion, Figure 2-1.
Chapter 6 • XML and Data Representation
227
6.2.1 XML Schema Basics
XML Schema provides the vocabulary to state the rules of document production. It is an XML language for which the vocabulary is defined using itself. That is, the elements and datatypes that
are used to construct schemas, such as schema, element, sequence, string, etc., come from the http:www.w3.org2001XMLSchema namespace, see Figure 6-4.
The XML Schema namespace is also called the “schema of schemas,” for it defines the elements and attributes used for defining new schemas.
The first step involves defining a new language see Figure 6-4. The following is an example schema for correspondence letters, an example of which is given in Listing 6-1 above.
Listing 6-6: XML Schema for correspondence letters see an instance in Listing 6-1. 1
2 2a
2b 2c
3 4
5 6
6a 7
?xml version=1.0 encoding=UTF-8? xsd:schema
xmlns:xsd=http:www.w3.org2001XMLSchema targetNamespace=http:any.website.netletter
xmlns=http:any.website.netletter elementFormDefault=qualified
xsd:element name=letter xsd:complexType
xsd:sequence xsd:element name=sender
type=personAddressType
minOccurs=1 maxOccurs=1 xsd:element name=date
type=xsd:date minOccurs=0
http:www.w3.org2001XMLSchema
schema element
complexType sequence
string
boolean
http:any.website.netletter
letter sender
address street
name salutation
This is the vocabulary that XML Schema provides to define
your new vocabulary recipient
city
?xml version=1.0 encoding=UTF-8? lt:letter xmlns:lt =http:any.website.netletter
xmlns:xsi=http:www.w3.org2001XMLSchema-instance xsi:schemaLocation=http:any.website.netletter
http:any.website.netletterletter.xsd lt:language=English_US lt:template=personal
lt:sender ...
lt:letter
An instance document that conforms to the “letter” schema
Figure 6-4: Using XML Schema. Step 1: use the Schema vocabulary to define a new XML language Listing 6-6. Step 2: use both to produce valid XML documents Listing 6-7.