Overview of XML Schema

8.2 Overview of XML Schema

XML is a very small conforming subset of Standard Generalized Markup Language (SGML). It is very powerful and easy to use. XML has no limits on namespace or structural complexity. XML, being a meta-language, supports the definition of languages for market verticals or specific industries. XML supports a large set of data types and integrity constraints. We will discuss each of them.

8.2.1 Simple Types

The W3C XML Schema specifies a number of simple types as shown below in Figure 8.4.

Database Fundamentals 190

Figure 8.4 - Types in XML Schema

There are many more simple types that can be used to specify data types for elements in an XML document. Apart from defining elements using these simple types, one can also create user defined types from these simple types.

For example, given below is a derivation of a user defined type ‘myInteger’ from simple type xs:integer, which allows only values in the range of -2 to 5.

<xs:simpleType name= "myInteger" > <xs:restriction base= "xs:integer" > <xs:minInclusive value = "-2" /> <xs:maxExclusive value = "5" /> </xs:restriction> </xs:simpleType>

This type of derivation is known as derivation by restriction. Given below is an example of another derivation that makes use of the enumeration

schema element: <xs:simpleType name= "passGrades" >

<xs:restriction base= "xs:string" > <xs:enumeration value = "A" /> <xs:enumeration value = "B" /> <xs:enumeration value = "C" /> </xs:restriction> </xs:simpleType>

Chapter 8 – Query languages for XML 191 Any element defined of type passGrades can only have one of the three possible values

(A, B or C). Any other value for this element will raise an XML schema error. One can also specify a pattern of values that can be held in an element, as shown in the

example below: <xs:simpleType name= "CapitalNames" >

<xs:restriction base= "xs:string" > <xs:pattern value = "([A-Z]( [a-z]*)?)+" /> </xs:restriction> </xs:simpleType>

The other two types of derivations are derivation by list and derivation by union. Given below is an example of derivation by list.

<xs:simpleType name= "myintegerList" > <xs:list itemType= "xs:integer" /> </xs:simpleType>

This data type can be used to define attributes or elements that accept a whitespace separated list of integers, like "1 234 333 -32321".

Given below is an example of derivation by union. <xs:simpleType name= "intordate" >

<xs:union memberTypes= "xs:integer xs:date" /> </xs:simpleType>

This data type can be used to define attributes or elements that accept a whitespace separated list of integers like “1 223 2001-10-26". Note that in this case we have data with different types (integer and date) as list members.

8.2.2 Complex Types

Complex types are a description of the markup structure and they make use of simple types for construction of individual elements or attributes that make up the complex type. Simply put, elements of a complex type contain other elements/attributes. A complex element can be empty. It can also contain other elements, text, or both along with attributes. Given below is an example of a complex type.

<xs:complexType name=”employeeType> <xs:sequence> <xs:element name="firstname" type="xs:string"/> <xs:element name="lastname" type="xs:string"/>

</xs:sequence> </xs:complexType>

An element of the above complex type, can be created as shown below. <xs:element name="employee" type=“employeeType”>

Database Fundamentals 192 Another way of creating an element with the same complex type is as below.

<xs:element name="employee"> <xs:complexType> <xs:sequence> <xs:element name="firstname" type="xs:string"/> <xs:element name="lastname" type="xs:string"/>

</xs:sequence> </xs:complexType> </xs:element>

The difference in both of the approaches is that the first complex type definition is independent of the element and can be reused while defining different elements.

8.2.3 Integrity constraints

XML schemas allow ways to identify and reference the pieces of information that it contains. One can directly emulate the ID and IDREFs attribute types from XML DTDs using XML Schema type xs:ID and xs:IDREFs; however, in XML Schemas, xs:ID and xs:IDREFs, can also be used with elements and not just with attributes, as in the case of DTDs. One can also enforce uniqueness among elements using the xs:unique element type, as shown in the example below.

<xs:element name =”book” > <xs:complexType> …. </xs:complexType> <xs:unique name=”book”> <xs:selector xpath=”book”/> <xs:field xpath=”isbn”/> </xs:unique> </xs:element>

This will make sure that there is a unique ISBN for each book in the XML document. One can also use xs:key and xs:keyref to enforce integrity constraints. A key is a unique

constraint with an additional restriction that all nodes corresponding to all fields are required. The definition is similar to a unique element definition:

Chapter 8 – Query languages for XML 193 <xs:element name =”book” >

<xs:complexType> …. </xs:complexType> <xs:key name=”book”> <xs:selector xpath=”book”/> <xs:field xpath=”isbn”/> </xs:key> </xs:element>

The xs:keyref can be used to refer to this key from its current scope. Note that the referring attribute of the xs:keyref element should refer to an xs:key or xs:unique element defined under the same element or under one of their ancestors.

8.2.4 XML Schema evolution

One of the reasons for the growing use of XML is its flexibility as a data model. It can easily accommodate changes in schema. Today every industry segment has an industry standard specification in the form of an XML schema document. These industry standards are evolving and there is a growing need to be compliant with them. There is a need to adapt quickly to these changes without any application downtime. DB2 provides a lot of flexibility in handling these schema changes. In DB2, one can store XML documents with or without

a schema association, within the same column. Similarly, one can also store XML documents compliant with different XML schemas within the same XML column. DB2 also provides a function to check the compatibility of two XML schema versions. If found compatible it also provides the facility to update the current schema with the newer version.

If you are validating your stored documents against a schema that is about to change, then there are two ways to proceed in DB2 pureXML:

 If the two schemas are sufficiently alike (compatible), you can register the new schema in the XML Schema Repository (XSR), by replacing the original schema and continue validating. Both of the schema names (the SQL name and the schema location URI) remain the same across the two compatible schemas.

 In cases where the two XML schemas are not alike (not compatible), you register the new schema with a new SQL name and new schema location URI.

After evolving the new compatible schema when using XMLVALIDATE, you can continue to refer to the new XML schema using the existing SQL name, or you can rely on the schema location URI in the XML instance documents provided that the URI remains unchanged across the existing and new XML instance documents. Typically, compatible schema evolution is used when the changes in the schema are minor.

For example, let's take a look at a case where there are some minor schema changes. The steps to follow would be to replace the existing schema with the new modified schema on successful evolution of an XML schema in the XSR:

Database Fundamentals 194

1. Call the XSR_REGISTER stored procedure or run the REGISTER XMLSCHEMA command to register the new XML schema in the XSR. Note that no documents should be validated against the new registered XML schema, if the plan is to replace the existing schema with the new schema as described in the next step.

2. Call the XSR_UPDATE stored procedure or run the UPDATE XMLSCHEMA command to update the new XML schema in the XSR by replacing the existing schema.

Successful schema evolution replaces the original XML schema. Once evolved, only the updated XML schema is available.

If the dropnewschema option is used on the XSR_UPDATE stored procedure or on the update XMLSCHEMA command, then the new schema is available under the existing schema name only, and is not available under the name used to register it.