XML Schema Basics XML Schemas
Chapter 6 • XML and Data Representation
229
Line 2a: Declares the target namespace as http:any.website.netletter—the elements defined by this schema are to go in the target namespace.
Line 2b: The default namespace is set to http:any.website.netletter—same as the target namespace—so the elements of this namespace do not need the namespace
qualifierprefix within this schema document.
Line 2c: This directive instructs the instance documents which conform to this schema that any elements used by the instance document which were declared in this schema must be
namespace qualified. The default value of elementFormDefault if not specified is unqualified
. The corresponding directive about qualifying the attributes is attributeFormDefault
, which can take the same values. Lines 3–17: Define the root element letter as a compound datatype
xsd:complexType comprising several other elements. Some of these elements, such as
lt:address anonymous type
anonymous type
lt:letter
lt:signature lt:closing
lt:body lt:salutation
lt:recipient lt:sender
lt:postal-code lt:state
lt:city lt:personAddressType
lt:sender lt:name
lt:address
lt:template lt:language
lt:street
+
? ?
lt:date
?
lt:address anonymous type
anonymous type
lt:letter lt:letter
lt:signature lt:closing
lt:body lt:salutation
lt:recipient lt:sender
lt:postal-code lt:state
lt:city lt:personAddressType
lt:sender lt:sender
lt:name lt:name
lt:address lt:address
lt:template lt:language
lt:street
+
? ?
lt:date
?
Kleene operators:
no indicator Required
One and only one
?
Optional None or one minOccurs = 0, maxOccurs = 1
∗
Optional, repeatable None, one, or more minOccurs = 0, maxOccurs =
∞
+
Required, repeatable One or more minOccurs = 1, maxOccurs =
∞
Unique element values must be unique
choice sequence
all element reference
element immediately within schema, i.e. global element
not immediately within schema, i.e. local
element has sub-elements not shown element has sub-elements shown
attribute of an element
XML Schema symbols
group of elements
attributeGroup
Kleene operators:
no indicator Required
One and only one
?
Optional None or one minOccurs = 0, maxOccurs = 1
∗
Optional, repeatable None, one, or more minOccurs = 0, maxOccurs =
∞
+
Required, repeatable One or more minOccurs = 1, maxOccurs =
∞
Unique element values must be unique
choice sequence
all element reference
element reference element immediately within schema, i.e. global
element immediately within schema, i.e. global element
not immediately within schema, i.e. local element
not immediately within schema, i.e. local
element has sub-elements not shown element has sub-elements not shown
element has sub-elements shown element has sub-elements shown
attribute of an element attribute of an element
XML Schema symbols
group of elements group of elements
attributeGroup attributeGroup
Figure 6-5: Document structure defined by correspondence letters schema see Listing 6-6. NOTE: The symbolic notation is inspired by the one used in [McGovern et al., 2003].
Ivan Marsic • Rutgers University
230
Schema document
Instance documents
conforms-to
salutation and body, contain simple, predefined datatype xsd:string. Others,
such as sender and recipient, contain compound type personAddressType which is defined below in this schema document lines 18–23. This complex type is also a
sequence , which means that all the named elements must appear in the sequence listed.
The letter element is defined as an anonymous type since it is defined directly within the element definition, without specifying the attribute “name” of the xsd:complexType
start tag line 4. This is called inlined element declaration. Conversely, the compound type personAddressType
, defined as an independent entity in line 18 is a named type, so it can be reused by other elements see lines 6 and 8.
Line 6a: The multiplicity attributes minOccurs and maxOccurs constrain the number of occurrences of the element. The default value of these attributes equals to 1, so line 6a is
redundant and it is omitted for the remaining elements but, see lines 7 and 27a. In general, an element is required to appear in an instance document defined below when the value of
minOccurs
is 1 or more. Line 7: Element date is of the predefined type xsd:date. Notice that the value of
minOccurs is set to 0, which indicates that this element is optional.
Lines 14–15: Define two attributes of the element letter, that is, language and template
. The language attribute is of the built-in type xsd:language Section 6.2.3 below.
Lines 18–23: Define our own personAddressType type as a compound type comprising person’s name and postal address as opposed to a business-address-type. Notice that the
postal address element is referred to in line 21 attribute ref and it is defined elsewhere in the same document. The personAddressType type is extended as
sender
and recipient in lines 6 and 8, respectively. Lines 24–33: Define the postal address element, referred to in line 21. Of course, this
could have been defined directly within the personAddressType datatype, as an anonymous sub-element, in which case it would not be reusable. Although the element is not
reused in this schema, I anticipate that an external schema may wish to reuse it, see Section 6.2.4 below.
Line 27a: The multiplicity attribute maxOccurs is set to “unbounded,” to indicate that the street address is allowed to extend over several lines.
Notice that Lines 2a and 2b above accomplish two different tasks. One is to declare the namespace URI that the letter schema will be associated with Line 2a. The other task is to
define the prefix for the target namespace that will be used in this document Line 2b. The reader may wonder whether this could have been done in one line. But, in the spirit of the modularity
principle, it is always to assign different responsibilities tasks to different entities in this case different lines.
The second step is to use the newly defined schema for production of valid instance documents see Figure 6-4. An instance document is an XML
document that conforms to a particular schema. To reference the above schema in letter documents, we do as follows:
Chapter 6 • XML and Data Representation
231
Listing 6-7: Referencing a schema in an XML instance document compare to Listing 6-1 1 ?xml version=1.0 encoding=UTF-8?
2 -- Comment: A personal letter marked up in XML. -- 3 lt:letter
xmlns:lt =http:any.website.netletter
3a
xmlns:xsi=http:www.w3.org2001XMLSchema-instance
3b xsi:schemaLocation=http:any.website.netletter
3c http:any.website.netletterletter.xsd
3d lt:language=en-US lt:template=personal 4 lt:sender
... -- similar to Listing 6-1 -- 10 lt:sender
... -- similar to Listing 6-1 -- 25 lt:letter
The above listing is said to be valid unlike Listing 6-1 for which we generally only know that it is well-formed
. The two documents Listings 6-1 and 6-7 are the same, except for referencing the letter schema as follows:
Step 1 line 3: Tell a schema-aware XML processor that all of the elements used in this instance document come from the http:any.website.netletter namespace. All
the element and attribute names will be prefaced with the lt: prefix. Notice that we could also use a default namespace declaration and avoid the prefix.
Step 2 line 3a: Declare another namespace, the XMLSchema-instance namespace, which contains a number of attributes such as schemaLocation, to be used next that are part of
a schema specification. These attributes can be applied to elements in instance documents to provide additional information to a schema-aware XML processor. Again, a usual convention
is to use the namespace prefix xsi: for XMLSchema-instance.
Step 3 lines 3b–3c: With the xsi:schemaLocation attribute, tell the schema-aware XML processor to establish the binding between the current XML document and its schema.
The attribute contains a pair of values. The first value is the namespace identifier whose schema’s location is identified by the second value. In our case the namespace identifier is
http:any.website.netletter
and the location of the schema document is http:any.website.netletterletter.xsd
. In this case, it would suffice to only have letter.xsd as the second value, since the schema document’s URL overlaps
with the namespace identifier. Typically, the second value will be a URL, but specialized applications can use other types of values, such as an identifier in a schema repository or a
well-known schema name. If the document used more than one namespace, the xsi:schemaLocation
attribute would contain multiple pairs of values all within a single pair of quotations.
Notice that the schemaLocation attribute is merely a hint. If the parser already knows about the schema types in that namespace, or has some other means of finding them, it does not have to
go to the location you gave it.
XML Schema defines two aspects of an XML document structure: 1.
Content model validity, which tests whether the arrangement and embedding of tags is correct. For example, postal address tag must have nested the street, city, and postal-code
tags. A country tag is optional.
Ivan Marsic • Rutgers University
232
2. Datatype validity, which is the ability to test whether specific units of information are of
the correct type and fall within the specified legal values. For example, a postal code is a five-digit number. Data types are the classes of data values, such as string, integer, or
date. Values are instances of types.
There are two types of data: 1.
Simple types are elements that contain data but not attributes or sub-elements. Examples of simple data values are integer or string, which do not have parts. New simple
types are defined by deriving them from existing simple types built-in’s and derived. 2.
Compound types are elements that allow sub-elements andor attributes. An example is personAddressType
type defined in Listing 6-6. Complex types are defined by listing the elements andor attributes nested within them.