Overview of XML

8.1 Overview of XML

XML stands for eXtensible Markup Language. XML is a hierarchical data model consisting of nodes of several types linked together through an ordered parent/child relationship. An XML data model can also be represented in text or binary format.

8.1.1 XML Elements and Database Objects

An XML element is the most fundamental part of an XML document. Every XML document must have at least one XML element, also known as the root or document element. This element can further have any number of attributes and child elements.

Given below is an example of a simple XML element <name>I am an XML element</name>

Database Fundamentals 184 Every XML element has a start and end tag. In the above example <name> is the start tag

and </name> is the end tag. The element also has a text value which is “I am an XML element”.

Given below is an XML document with multiple XML elements and an attribute <employees>

<employee id=”121”> <firstname>Jay</firstname> <lastname>Kumar</lastname> <job>Asst. manager</job> <doj>2002-12-12</doj>

</employee> </employees>

In the above example <employees> is the document element of the XML document which has a child element <employee>. The <employee> element has several child elements along with an attribute named id with the value 121. In DB2, the entire XML document gets stored as single column value. For example for the table structure below

department(id integer, deptdoc xml) The id column stores the integer id of each department whereas the deptdoc column,

which is of type XML, will store one XML document per department. Therefore, the entire XML document is treated as a single object/value. Below is the graphical representation of the way XML data is stored in DB2.

Chapter 8 – Query languages for XML 185

Figure 8.1 - Native XML Storage

In DB2 the XML document is parsed during insertion and the parsed hierarchical format is stored inside the database. Due to this DB2 does not need to parse the XML document during querying and therefore yields better query performance.

8.1.2 XML Attributes

Attributes are always part of an element and provide additional information about the element of which they are a part of. Below is an example of an element with two attributes, id and uid with values 100-233-03 and 45, respectively.

<product id=”100-233-03” uid=”45”/> Note that in a given element, attribute names must be unique, that is, the same element

cannot have two attributes with same name. For example, the following element declaration will result in a parsing error. <product id=”100-233-03” id=”10023303”/> Note that attribute values are always in quotes.

Database Fundamentals 186 In an XML Schema definition, attributes are defined after all the elements in the complex

element have been defined. Given below is an example of a complex element with one attribute and two child elements.

<xs:element name="employee"> <xs:complexType> <xs:sequence> <xs:element name="firstname" type="xs:string"/> <xs:element name="lastname" type="xs:string"/>

</xs:sequence> <xs:attribute name=”empid” type=”xs:integer”/> </xs:complexType> </xs:element>

8.1.3 Namespaces

XML namespaces provide a mechanism to qualify an attribute, and an element name to avoid the naming conflict in XML documents. For example, if a health insurance company receives insurer information from a different company as an XML document, it is quite possible that two or more companies have the same element name defined, but representing different things in different formats. Qualifying the elements with a namespace resolves the name-conflict issue. In XML, a name can be qualified by a namespace. A qualified name has two parts:

 A namespace Uniform Resource Identifier (URI)  Local name

For example, http://www.acme.com/names is a namespace URI and customer is a local name. Often, the qualified name uses a namespace prefix instead of the URI. For example the customer element belonging to the http://www.acme.com/names URI may also be written as acme:customer,

where acme is the namespace prefix for http://www.acme.com/names. Namespace prefixes can be declared in the XQuery prolog as shown below.

Declare namespace acme "http://www.acme.com/names"

Also, the namespace prefix can be declared in the element constructors as shown below. <book xmlns:acme="http://www.acme.com/names"> Note that the namespace declaration has the same scope as that of the element declaring

it, that is, all child elements can refer to this namespace declaration. The following namespace prefixes are predefined and should not be used as user-defined

namespace prefixes: Xml, xs, xsi, fn, xdt.

One can also declare a default namespace, a namespace without a prefix in the following ways:

Chapter 8 – Query languages for XML 187 declare default element namespace ‘http://www.acme.org/names’

(in XQuery Prolog) <book xmlns="http://www.acme.com/names"> (In element constructor)

8.1.4 Document Type Definitions

A Document Type Definition (DTD) defines the legal building blocks of an XML document. It defines the document structure with a list of legal elements and attributes. A DTD can be declared inline inside an XML document, or as an external reference.

Given below is an example of a DTD. <!DOCTYPE TVSCHEDULE [

<!ELEMENT PRODUCTS (PRODUCT+)> <!ELEMENT PRODUCT (NAME,PRICE,DESCRIPTION)> <!ELEMENT NAME (#PCDATA)> <!ELEMENT PRICE (#PCDATA)> <!ELEMENT DESCRIPTION (#PCDATA)>

<!ATTLIST PRODUCTS ID CDATA #REQUIRED> ]>

Both of the following XML documents are valid as per the DTD above. <PRODUCTS>

<PRODUCT ID=”100-200-43”> <NAME>Laptop</NAME> <PRICE>699.99</PRICE> <DESCRIPTION>This is a Laptop with 15 inch wide screen, 4 GB RAM, 120

GB HDD </DESCRIPTION> </PRODUCT> </PRODUCTS>

<PRODUCTS> <PRODUCT ID=”100-200-56”> <NAME>Printer</NAME> <PRICE>69.99</PRICE> <DESCRIPTION>This is a line printer </DESCRIPTION>

</PRODUCT> <PRODUCT ID=”100-200-89”>

<NAME>Laptop</NAME> <PRICE>699.99</PRICE> <DESCRIPTION>This is a Laptop with 13 inch wide screen, 4 GB RAM, 360

GB HDD </DESCRIPTION> </PRODUCT> </PRODUCTS>

Database Fundamentals 188

8.1.5 XML Schema

An xml schema defines the structure, content and data types for the XML document. It can consist of one or more schema documents. A schema document can define a namespace.

Figure 8.2 - XML Schema: An example

An XML Schema is an XML-based alternative to DTDs. An XML Schema describes the structure of an XML document. The XML Schema language is also referred to as XML Schema Definition (XSD). XML Schemas are more powerful than DTDs, because XML schemas provide better control over the XML instance document. Using XML schemas one can not only make use of basic data types like integer, date, decimal, and datetime, but also create their own user-defined types, complex element types, etc. One can specify the length, maximum and minimum values, patterns of allowed string values and enumerations as well. One can also specify the sequence of the occurrence of elements in the XML document. Another advantage of using XML Schema over DTDs is its ability to support XML Namespaces. Additionally, XML schema provides support for type inheritance. XML Schema became a W3C Recommendation on May 2001.

Here is one example of one XML schema consisting of three schema documents and two namespaces.

Chapter 8 – Query languages for XML 189

Figure 8.3 - Multiple namespaces in an XML Schema

In DB2, use of XML Schemas is optional and is on a per-document basis. This means you can use the same XML columns to store both kind of XML documents, one with an XML schema and one without an XML schema association. Therefore, there is no need for a fixed schema per XML column. XML document validation is per document (that is, per row). One can use zero, one, or many schemas per XML column. One can also choose to mix validated & non-validated documents in one column. DB2 also allows the user to detect or prevent insert of non-validated documents using the ‘IS VALIDATED’ clause in the SELECT statement.