Materializing XML Documents with XSLT

The XML document shown in Figure 12-1 shows both the document’s structure and content. Nothing in the document, however, indicates how it is to be materialized. The designers of XML created a clean separation among structure, content, and format. The most popular way to materialize XML documents is to use Extensible Style Language: Transformations (XSLT). XSLT is a powerful and robust transformation language. It can

be used to materialize XML documents into HTML, and it can be used for many other purposes as well.

One common application of XSLT is to transform an XML document in one format into a second XML document in another format. A company can, for example, use XSLT to transform an XML order document in its own format into an equivalent XML order document in its customer’s format. We will be unable to discuss many of the features and functions of XSLT here. See www.w3.org for more information.

XSLT is a declarative transformation language. It is declarative because you create a set of rules that govern how the document is to be materialized instead of specifying a procedure for materializing document elements. It is transformational because it transforms the input document into another document.

Figure 12-3(a) shows DBP-e12-CustomerList.dtd, which is a DTD for a document that has

a list of customers, and Figure 12-3(b) shows DBP-e12-CustomerListDocument.xml, which is an XML document that is type-valid on that DTD. The DOCTYPE statement in Figure 12-3(b) points to a file that contains the DTD shown in Figure 12-3(a). The next statement in the XML document indicates the location of another document, called a stylesheet. Shown in the

Chapter 12 Database Processing with XML

Figure 12-3

An External DTD and an Example XML Document

(a) The DBP-e12-CustomerList.dtd DTD

(b) The DBP-e12-CustomerListDocument.xml XML Document with Two Customers

DBP-e12-CustomerListStyleSheet.xsl in Figure 12-4, a stylesheet is used by XSLT to indicate how to transform the elements of the XML document into another format. Here, those elements will transform it into an HTML document that will be acceptable to a browser.

The XSLT processor copies the elements of the stylesheet until it finds a command in the format {item, action}. When it finds such a command, it searches for an instance of the indicated item; when it finds one, it takes the indicated action. For example, when the XSLT processor encounters

<xsl:for-each select = "CustomerList/Customer">

it starts a search in the document for an element named CustomerList. When it finds such an element, it looks further within the customerlist element for an element named Customer. If a match is found, it takes the actions indicated in the loop that ends with </xsl:for-each> (twelfth from the bottom of the stylesheet). Within the loop, styles are set for each element in the DBP-e12-CustomerListDocument.xml XML document.

Part 5 Database Access Standards

Figure 12-4

The DBP-e12-

The examples we have created are based on the View Ridge Gallery database that we

CustomerListStyleSheet.xsl XSL Stylesheet

have used in previous chapters. Here, we are creating an XML document that can be viewed using a Web browser to display the list of customers at the View Ridge Gallery. As shown in Figure 12-5(a), we have revised the View Ridge Gallery home page we created in Chapter 11 to include a link to the XML document. Clicking the link displays the Web page shown in Figure 12-5(b), which is the result of applying the stylesheet in Figure 12-4 to the document in Figure 12-3(b).

Chapter 12 Database Processing with XML

Use this link to display the XML document in

a Web browser

(a) The View Ridge Gallery Home Page

Figure 12-5

HTML Result from Application of Stylesheet

(b) The CustomerList Web Page as Displayed in Web Browser

XSLT processors are context oriented; each statement is evaluated in the context of matches that have already been made. Thus, the following statement:

<xsl:value-of select = "CustomerName/LastName"/>

operates in the context of the CustomerList/Customer match that was made above. There is no need to code

Part 5 Database Access Standards

<xsl:select = "CustomerList/Customer/CustomerName/LastName"/>

because the context has already been set to CustomerList/Customer. In fact, if the select were coded in this second way, nothing would be found. Similarly,

<xsl:select "LastName"/>

results in no match, because LastName occurs only in the context CustomerList/Customer/ CustomerName. and not in the context CustomerList/Customer.

The nature of XSLT processing is: “When you find one of these, do this.” Thus, the document in Figure 12-4 says, for each Customer that you

find under the tag CustomerList, do the following: output an HTML <div> . . . </div> section and then some HTML with the value that you find in the document for CustomerName/LastName . Then, output more HTML and the value that you find for CustomerName/FirstName . Then, for each Address/Street you find, output some HTML along with the value of the Address/Street you just found, and so forth.

XSL can output anything. Instead of outputting HTML, it could be writing Russian or Chinese or algebraic equations. XSL is simply a transformation facility for structured documents such as XML documents.

This context orientation explains the need for the following statement (in the center of the stylesheet):

<xsl:value-of select = "node()"/>

The context at the location of this statement has been set to CustomerList/ Customer/Address/Street . Hence, the current node is a Street element, and this expression indicates that the value of that node is to be produced.

Observe, too, that a small transformation has been made by the stylesheet. The original document has FirstName followed by LastName, but the output stream has LastName followed by FirstName.

The XML document in Figure 12-3(b) is transformed into HTML using the XSL stylesheet in Figure 12-4. Figure 12-5(a) shows the VRG home page, which now has a link to display the XML document. When this transformed document is input to a browser, the browser will materialize it, as shown in Figure 12-5(b).

Browsers have built-in XSLT processors. You need only supply the document to the browser; it will locate the stylesheet and apply it to the document for you. The results will be like those shown in Figure 12-5(b).

XML Schema

DTDs were the XML community’s first attempt at developing a document structure specification language. DTDs work, but they have some limitations, and, embarrassingly, DTD documents are not XML documents. To correct these problems, the W3C Committee defined another specification language called XML Schema. Today, XML Schema is the preferred method for defining document structure.

XML Schemas are XML documents. This means that you use the same language to define an XML Schema as you would use to define any other XML document. It also means that you can validate an XML Schema document against its schema, just as you would any other XML document.

Chapter 12 Database Processing with XML

If you are following this discussion, then you realize that there is a chicken-and-the-egg problem here. If XML Schema documents are themselves XML documents, what document is used to validate them? What is the schema of all of the schemas? There is such a document; the mother of all schemas is located at www.w3.org. All XML Schema documents are validated against this document.

XML Schema is a broad and complex topic. Dozens of sizable books have been written just on XML Schema alone. Clearly, we will not be able to discuss even the major topics of XML Schema in this chapter. Instead, we will focus on a few basic terms and concepts and show how those terms and concepts are used with database processing. Given this introduction, you will then be able to learn more on your own.

XML Schema validation requires thinking at two meta levels. To understand why, recall that metadata is data about data. The statement CUSTOMER

contains column CustomerLastName Char(25) is metadata. Extending this idea, the statement SQL has a data type Char(n) for defining character data of length n is data about metadata, or meta-metadata.

XML has the same meta levels. An XML document has a structure that is defined by an XML Schema document. The XML Schema document contains metadata, because it is data about the structure of other XML documents. But an XML Schema document has its own structure that is defined by another XML Schema. That XML Schema document is data about metadata, or meta-metadata.

The XML case is elegant. You can write a program to validate an XML document (but don’t—use one of the hundreds that already exist). Once you have such a program, you can validate any XML document against its XML Schema document. The process is exactly the same, regardless of whether you are validating an XML document, an XML Schema document, or a document at any other level.

Materializing XML Documents with XSLT