Querying and Transformation
23.4 Querying and Transformation
Given the increasing number of applications that use XML to exchange, mediate, and store data, tools for effective management of XML data are becoming increas- ingly important. In particular, tools for querying and transformation of XML data are essential to extract information from large bodies of XML data, and to convert data between different representations (schemas) in XML . Just as the output of a relational query is a relation, the output of an XML query can be an XML document. As a result, querying and transformation can be combined into a single tool.
In this section, we describe the XP ath and XQ uery languages: • XP ath is a language for path expressions and is actually a building block for
XQ uery. • XQ uery is the standard language for querying XML data. It is modeled after
SQL but is significantly different, since it has to deal with nested XML data. XQ uery also incorporates XP ath expressions.
The XSLT language is another language designed for transforming XML . How- ever, it is used primarily in document-formatting applications, rather in data- management applications, so we do not discuss it in this book.
The tools section at the end of this chapter provides references to software that can be used to execute queries written in XP ath and XQ uery.
23.4.1 Tree Model of XML
A tree model of XML data is used in all these languages. An XML document is modeled as a tree , with nodes corresponding to elements and attributes. Element nodes can have child nodes, which can be subelements or attributes of the element. Correspondingly, each node (whether attribute or element), other than the root element, has a parent node, which is an element. The order of elements and attributes in the XML document is modeled by the ordering of children of nodes of the tree. The terms parent, child, ancestor, descendant, and siblings are used in the tree model of XML data.
The text content of an element can be modeled as a text-node child of the element. Elements containing text broken up by intervening subelements can have multiple text-node children. For instance, an element containing “this is a < bold> wonderful </bold> book” would have a subelement child corresponding to the element bold and two text node children corresponding to “this is a” and “book.” Since such structures are not commonly used in data representation, we shall assume that elements do not contain both text and subelements.
23.4 Querying and Transformation 999
23.4.2 XPath
XP ath addresses parts of an XML document by means of path expressions. The language can be viewed as an extension of the simple path expressions in object- oriented and object-relational databases (see Section 22.6). The current version of the XP ath standard is XP ath 2.0, and our description is based on this version.
A path expression in XP ath is a sequence of location steps separated by “/” (instead of the “.” operator that separates location steps in SQL ). The result of a path expression is a set of nodes. For instance, on the document in Figure 23.11, the XP ath expression:
/university-3/instructor/name
returns these elements:
< name>Srinivasan</name> < name>Brandt</name>
The expression:
/university-3/instructor/name/text()
returns the same names, but without the enclosing tags. Path expressions are evaluated from left to right. Like a directory hierarchy, the initial ’/’ indicates the root of the document. Note that this is an abstract root “above” <university-3> that is the document tag.
As a path expression is evaluated, the result of the path at any point consists of an ordered set of nodes from the document. Initially, the “current” set of elements contains only one node, the abstract root. When the next step in a path expression
is an element name, such as instructor, the result of the step consists of the nodes corresponding to elements of the specified name that are children of elements in the current element set. These nodes then become the current element set for the next step of the path expression evaluation. Thus, the expression:
/university-3
returns a single node corresponding to the:
< university-3>
tag, while:
/university-3/instructor
returns the two nodes corresponding to the:
1000 Chapter 23 XML
instructor
elements that are children of the:
university-3
node. The result of a path expression is then the set of nodes after the last step of path expression evaluation. The nodes returned by each step appear in the same order as their appearance in the document.
Since multiple children can have the same name, the number of nodes in the node set can increase or decrease with each step. Attribute values may also
be accessed, using the “@” symbol. For instance, /university-3/course/@course id returns a set of all values of course id attributes of course elements. By default, IDREF links are not followed; we shall see how to deal with IDREF s later.
XP ath supports a number of other features: • Selection predicates may follow any step in a path, and are contained in
square brackets. For example, /university-3/course[credits >= 4] returns course elements with a credits value greater than or equal to 4, while: /university-3/course[credits >= 4]/@course id returns the course identifiers of those courses.
We can test the existence of a subelement by listing it without any compar- ison operation; for instance, if we removed just “>= 4” from the above, the
expression would return course identifiers of all courses that have a credits subelement, regardless of its value.
Parts
» Indian Institute of Technology, Bombay
» Data Mining and Information Retrieval
» Structure of Relational Databases
» Database Schema When we talk about a database, we must differentiate between the database
» Basic Structure of SQL Queries
» Modification of the Database
» • Embedded SQL : Like dynamic SQL , embedded SQL provides a means by
» Advanced Aggregation Features**
» The Cartesian-Product Operation
» The Tuple Relational Calculus
» The Entity-Relationship Model
» • For an n-ary relationship set with an arrow on one of its edges, the primary
» Entity-Relationship Design Issues
» Representation of Generalization
» Alternative Notations for Modeling Data
» Other Aspects of Database Design
» Features of Good Relational Designs
» Atomic Domains and First Normal Form
» Decomposition Using Functional Dependencies
» BCNF Decomposition Algorithm
» Decomposition Using Multivalued Dependencies
» Application Programs and User Interfaces
» Overview of Physical Storage Media
» Magnetic Disk and Flash Storage
» Organization of Records in Files
» Comparison of Ordered Indexing and Hashing
» Implementation of Pipelining
» Evaluation Algorithms for Pipelining
» Transformation of Relational Expressions
» (A, r ), the number of distinct values that appear in the relation r for attribute
» Advanced Topics in Query Optimization**
» Transaction Atomicity and Durability
» Transaction Isolation and Atomicity
» Implementation of Isolation Levels
» Transactions as SQL Statements
» Weak Levels of Consistency in Practice
» Concurrency in Index Structures**
» Failure with Loss of Nonvolatile Storage
» Early Lock Release and Logical Undo Operations
» Centralized and Client – Server Architectures
» Parallelism on Multicore Processors
» Recovery and Concurrency Control
» Distributed Query Processing
» Heterogeneous Distributed Databases
» Partitioning and Retrieving Data
» Transactions and Replication
» Decision-Tree Construction Algorithm
» Relevance Ranking Using Terms
» Synonyms, Homonyms, and Ontologies
» Crawling and Indexing the Web
» Information Retrieval: Beyond Ranking of Pages
» Structured Types and Inheritance in SQL
» Array and Multiset Types in SQL
» Application Program Interfaces to XML
» Native Storage within a Relational Database
» Other Issues in Application Development
» Representation of Geographic Data
» Transaction-Processing Monitors
» Real-Time Transaction Systems
» PostgreSQL Implementation of MVCC
» Database Design and Querying Tools
» Database Administration Tools
» Business Intelligence Features
Show more