Ivan Marsic • Rutgers University
234
abstract=true|false block=all|list of extension|restriction
final=all|list of extension|restriction any attributes
annotation?,simpleType | complexType?,unique | key | keyref∗
element Kleene operators ?, +, and ∗ are defined in Figure 6-5.
The group element is used to define a collection of elements to be used to model compound elements. Its parent element can be one of the following: schema, choice,
sequence , complexType, restriction both simpleContent and
complexContent ,
extension both simpleContent and
complexContent .
Syntax of the group element Description all attributes are optional
group id=ID
name=NCName ……………………………… ref=QName ……………………………………
maxOccurs=nonNegativeInteger | unbounded minOccurs=nonNegativeInteger
any attributes annotation?, all | choice | sequence
group Specifies a name for the group. This attribute is used only when the
schema element is the parent of this group element. Name and ref attributes cannot both be present.
Refers to the name of another group. Name and ref attributes cannot both be present.
The attributeGroup element is used to group a set of attribute declarations so that they can be incorporated as a group into complex type definitions.
Syntax of attributeGroup Description all attributes are optional
attributeGroup id=ID
name=NCName …………………………… ref=QName …………………………………
any attributes annotation?, attribute | attributeGroup∗,
anyAttribute? attributeGroup
Specifies the name of the attribute group. Name and ref attributes cannot both be present.
Refers to a named attribute group. Name and ref attributes cannot both be present.
Chapter 6 • XML and Data Representation
235
The annotation element specifies schema comments that are used to document the schema. This element can contain two elements: the documentation element, meant for
human consumption, and the appinfo element, for machine consumption.
Simple Elements
A simple element is an XML element that can contain only text. It cannot contain any other elements or attributes. However, the “only text” restriction is ambiguous since the text can be of
many different types. It can be one of the built-in types that are included in the XML Schema definition, such as boolean, string, date, or it can be a custom type that you can define
yourself as will be seen Section 6.2.3 below. You can also add restrictions facets to a data type in order to limit its content, and you can require the data to match a defined pattern.
Examples of simple elements are salutation and body elements in Listing 6-6 above.
Groups of Elements
XML Schema enables collections of elements to be defined and named, so that the elements can be used to build up the content models of complex types. Un-named groups of elements can also
be defined, and along with elements in named groups, they can be constrained to appear in the same order sequence as they are declared. Alternatively, they can be constrained so that only
one of the elements may appear in an instance.
A model group is a constraint in the form of a grammar fragment that applies to lists of element information items, such as plain text or other markup elements. There are three varieties of model
group:
• Sequence element sequence all the named elements must appear in the order listed;
• Conjunction element all all the named elements must appear, although they can occur in any order;
• Disjunction element choice one, and only one, of the elements listed must appear.
6.2.3 Datatypes
In XML Schema specification, a datatype is defined by:
a Value space, which is a set of distinct values that a given datatype can assume. For
example, the value space for the integer type are integer numbers in the range [
−4294967296, 4294967295], i.e., signed 32-bit numbers. b
Lexical space, which is a set of allowed lexical representations or literals for the datatype. For example, a float-type number 0.00125 has alternative representation as 1.25E
−3. Valid literals for the float type also include abbreviations for positive and negative
infinity ±INF and Not a Number NaN.
Ivan Marsic • Rutgers University
236
c Facets that characterize properties of the value space, individual values, or lexical items.
For example, a datatype is said to have a “numeric” facet if its values are conceptually quantities in some mathematical number system. Numeric datatypes further can have a
“bounded” facet, meaning that an upper andor lower value is specified. For example, postal codes in the U.S. are bounded to the range [10000, 99999].
XML Schema has a set of built-in or primitive datatypes that are not defined in terms of other datatypes. We have already seen some of these, such as xsd:string which was used in Listing
6-6. More will be exposed below. Unlike these, derived datatypes are those that are defined in terms of other datatypes either primitive types or derived ones.
Simple Types: simpleType
These types are atomic in that they can only contain character data and cannot have attributes or element content. Both built-in simple types and their derivations can be used in all element and
attribute declarations. Simple-type definitions are used when a new data type needs to be defined, where this new type is a modification of some other existing simpleType-type.
Table 6-1 shows a partial list of the Schema-defined types. There are over 40 built-in simple types and the reader should consult the XML Schema specification see
http:www.w3.orgTRxmlschema-0 , Section 2.3 for the complete list.
Table 6-1: A partial list of primitive datatypes that are built into the XML Schema. Name Examples
Comments
string My favorite text example
byte −128, −1, 0, 1, …, 127
A signed byte value unsignedByte 0, …, 255
Derived from unsignedShort boolean
0, 1, true, false May contain either true or false, 0 or 1
short −5, 328
Signed 16-bit integer int
−7, 471 Signed 32-bit integer
integer −2, 435
Same as int long
−4, 123456 Signed 64-bit integer
float 0,
−0, −INF, INF, −1E4, 1.401298464324817e
−45, 3.402823466385288e
+38, NaN
Conforming to the IEEE 754 standard for 32- bit single precision floating point number.
Note the use of abbreviations for positive and negative infinity
±INF, and Not a Number NaN
double 0,
−0, −INF, INF, −1E4, 4.9e
−324, 1.797e308, NaN Conforming to the IEEE 754 standard for 64-
bit double precision floating point numbers duration
P1Y2M3DT10H30M12.3S 1 year, 2 months, 3 days, 10 hours, 30
minutes, and 12.3 seconds dateTime
1997-03-31T13:20:00.000- 05:00
March 31st 1997 at 1.20pm Eastern Standard Time which is 5 hours behind Coordinated
Universal Time date
1997-03-31 time
13:20:00.000, 13:20:00.000-05:00
Chapter 6 • XML and Data Representation
237
gYear 1997
The “g” prefix signals time periods in the Gregorian calendar.
gDay ---31
the 31st day QName
lt:sender XML Namespace QName qualified name
language en-GB, en-US, fr
valid values for xml:lang as defined in XML 1.0
ID this-element
An attribute that identifies the element; can be any string that confirms to the rules for
assigning the element names. IDREF
this-element IDREF attribute type; refers to an element
which has the ID attribute with the same value A straightforward use of built-in types is the direct declaration of elements and attributes that
conform to them. For example, in Listing 6-6 above I declared the signature element and template
attribute of the letter element, both using xsd:string built-in type: xsd:element name=signature type=xsd:string
xsd:attribute name=template type=xsd:string New simple types are defined by deriving them from existing simple types built-in’s and
derived. In particular, we can derive a new simple type by restricting an existing simple type, in other words, the legal range of values for the new type are a subset of the existing type’s range of
values. We use the simpleType element to define and name the new simple type. We use the restriction element to indicate the existing base type, and to identify the facets that constrain
the range of values. A complete list of facets is provided below.
Facets and Regular Expressions
We use the “facets” of datatypes to constrain the range of values. Suppose we wish to create a new type of integer called zipCodeType whose range of values is
between 10000 and 99999 inclusive. We base our definition on the built-in simple type integer
, whose range of values also includes integers less than 10000 and greater than 99999. To define zipCodeType, we restrict the range of the integer base type by employing two
facets called minInclusive and maxInclusive to be introduced below:
Listing 6-8: Example of new type definition by facets of the base type. xsd:simpleType name=zipCodeType
xsd:restriction base=xsd:integer xsd:minInclusive value=10000
xsd:maxInclusive value=99999
xsd:restriction xsd:simpleType
Table 6-2 and Table 6-3 list the facets that are applicable for built-in types. The facets identify various characteristics of the types, such as:
• length, minLength, maxLength—the exact, minimum and maximum character length of the value
• pattern—a regular expression pattern for the value see more below • enumeration—a list of all possible values an example given in Listing 6-9 below