M02107

I n t r odu ct ion - M ot iva t ion
v Since XM L document accept ed as a st andard in
Informat ion Syst em, online processing of
semist ruct ured dat a becomes more import ant .
v Online processing applies online algorit hm w hich
process dat a piece by piece.
v The performance of semist ruct ured dat a processing is
affect ed by some aspect s:

AN XM L ALGEBRA FOR
ON LI N E PROCESSI N G OF
XM L D OCUM EN TS

§ Semistructured data has more complex structure than columns
and rows.
§ Data model exists requires data to be completed before can be
processed, while online processing needs to process data pieceby-piece.
§ In some of data model exists, semistructured data is treated as
relational-type data model:

By : H a n dok o a n d Ja n u sz R. Ge t t a


• Evaluations of XML Stream are in tuple-based approach.
• Unnest and nest operations have to be employed.
• Requires more resources of CPU and memory

3

Ou t lin e

An Ex a m ple of On lin e Pr oce ssin g - On lin e I n t e gr a t ion

v Int roduct ion
v Relat ed Work

D AT A
SOURCES

RECENT DATA

PREVIOUS RESULT


§ XML Data Model & Algebra exists

v XM L St ruct ures
v XM L Algebra
v Conclusion & Fut ure w orks

XML Data Model

INCLUDES:

APPLI CATI ONS

2

1.
2.
3.
4.


Fragmentation Handling
XML Algebra
Execution Plan Algorithms
Scheduling Algorithms

4

Re la t e d W or k -

Lit e r a t u r e Re vie w -

XM L D a t a M ode l & Alge br a

v XM L Algebra (XAL) [4]

XM L D a t a M ode l & Alge br a

v TAX (Tree Algebra for XM L)[5]

§ XML as rooted connected directed graph cyclic or

acyclic
§ Vertices represent elements, edges represent
simple values
§ Has three groups of operators:

§ Represents the document in an ordered labeled tree. Every
XML element will be represented as a node which has:
• tag attribute: single-valued attribute which indicates the type of
element;
• content attribute: representing atomic value which can be any of
atomic types;
• pedigree attributes: carry the information of element's
predecessor which will be very useful for manipulation and
comparison.

• Extraction operators
– Projection, Selection, distinct, join, sort, product

• Meta Operators


§ TAX proposed an idea of tree pattern (TP) but represents a
very different concept from classical relational algebra.
§ TAX provides Selection, Projection, Product, Grouping,
Aggregation, Renaming, Reordering, Copy and Paste, Value
Updates, Node Deletion and Node Insertion operation,and
some set operators.

– Map, Kleene Star

• Construction operator
– Create vertex, create edge, copy

5

Lit e r a t u r e Re vie w -

7

XM L D a t a M ode l – Ex t e n de d Tr e e Gr a m m a r


XM L D a t a M ode l & Alge br a

v A st ruct ure of an XM L document is a pair w here ETG( Ext ended Tree Grammar )
represent s st ruct ure, w hile EIG( Ext ended Inst ance
Grammar ) provides rule for creat ing t he inst ance
XM L document s.

§ Algebra in relational-like data structure
§ Uses data structure called Envelope
§ Header he is unordered set of attribute (A), body be is a set of
pair (A,v) where v is value, and re represents result
§ XAnswer provides Unary operators (Function Execution,
selection, projection, sort, index, nest, unnest, and duplicate)
and Binary operators such as union, cross product, and left
outer join operators.
§ Union operator in XAnswer does not remove duplicates.
§ XAnswer also provides left-outer-join operation instead of
expressing it using selection, cross product and union
operators.


6

8

Ex a m ple of ETG a n d it s se n t e n ce

Applying EI G t o ETG sent ence
v When w e get any t erminal symbol library(...) t hen
w e apply it t o product ion rule w hich mat ch t o t he
t erminal symbol. library(x)→x

v x in st ep 1 consist s of nodes books and aut hors at
t he same level, so w e need t o apply t he
corresponding product ion rules
books(x)→x and
authors(x)→x

Our inst ance XM L document w ill become:

xx


9

XM L D a t a M ode l –

11

Ex t e n de d I n st a n ce Gr a m m a r

v As reaching a t erminal symbol w hich has no inner
st ruct ure (t erminal w hich is not follow ed by
opening curly bracket ), w e t ranslat e it int o relat ed
product ion rule. For example
title→#PCDATA w ill be t ranslat ed
int o: Basic XML

v Terminal symbol w it h no inner st ruct ure and
follow ed by square bracket ([) w ill be t ranslat ed
using product ion rules defined. For example
book_author[authref] w ill be t ranslat ed int o:



10

12

XM L D a t a M ode l – I n de x e d ETG

XM L Alge br a - Su b Gr a m m a r

v Then w e ext end ETG t o accommodat e recursive

v Let Gi and Gj be ETGs. Let Gj includes t he follow ing
produc on rules Y→t y{r y(X)}, X→t x{r x(Z)}, Z→t z{r z}

st ruct ures as:

w here r y(X) is a regular expression t hat includes a
non-t erminal symbol X and r x(Z) is a regular
expression t hat includes a non-t erminal symbol Z.


v We say t hat Gi is a sub-grammar of Gj w hen Gi can
be obt ained from Gj by t he applicat ion of t he
follow ing t ransformat ion rules:

13

15

XM L Alge br a

XM L Alge br a – Su b gr a m m a r

v XM L algebra in t his syst em consist s of:

v Transformat ion of document st ruct ure:

§ Basic operations

a)

b)
c)

• Restructuring Operation (π)
• Filtering Operation (σ)
• Cross Product operation (×)

Removal of a production rule
Removal of a sub-tree
Extraction of a sub-tree

§ Set Operations (∪,∩, and -)
§ Derived Operations (join, semijoin, and antijoin)

14

16

XM L Alge br a - Re st r u ct u r in g

XM L Alge br a - Filt e r in g

v Rest ruct uring operat ion can be defined as:

v Filt ering operat ion can be defined as:
Definit ion 8 . Filt ering is an unary operat or denot ed
as σϕ(D) = {R1,R2, …,RD : Ri ∈X, w here D is a set of
docum ent s, ϕ is a t riple , P is a valid
pat h, min is t he minimum occurrence of P (default 1), and max is t he maximum occurrence of P
(default -1).

v To different iat e bet w een preserve and remove
semant ics of t he operat or, w e int roduce t he
symbols σ+ and σ17

Algor it h m for Re st r u ct u r in g

19

Algor it h m for Filt e r in g

18

20

XM L Alge br a – Cr oss Pr odu ct

Tr e e Pa t t e r n r e pr e se n t a t ion

v Cross product is a binary operat or w hich creat es all
possible pairs of t he document s from t w o set s.
Definit ion 9 . Cross product is defined as

RxρS = {ρ{r s}:r ∈R ∧ s∈S, w here R and S are set s of
document ; ρ is a valid name of element w hich w ill
be t he parent node of every combinat ion of
document s from R and S.

N={BOOK,YEAR,AUTHOR}
T={book,year,aut hor}
A={}
S={BOOK}
P={BOOK→book{YEAR AUTHOR+},
YEAR→year,
AUTHOR→aut hor}
21

Algor it h m for Cr oss Pr odu ct

23

Qu e r y a ddit ion
v Rat her difficult t o do query:
§ retrieve all books which have at most 1 author
§ get all books which has minimum 2 authors and maximum 3
authors

22

24

XM L Alge br a – On lin e Pr oce ssin g

Re fe r e n ce s

v The argument s of t he XM L algebra are set s of document s,

[1] C. Beeri and Y. Tzaban. SAL: An algebra for semist ruct ured dat a and XM L. In
Informal Proc. Of Workshop on The Web and Dat abases, ACM SIGM OD, pages
37{42. ACM Press, 1999.

and w e assume t hat every increment / decrement of an
argument is an XM L document

[2] S. Bose, L. Fegaras, D. Levine, and V. Chaluvadi. A query algebra for fragment ed
XM L st ream dat a. In Proceeding of 9t h Int ernat ional Conference on Dat a Base
Programming Languages (DBPL), pages 275{277, Pot sdam, Germany, Sept ember
6-8 2003.

v For online int egrat ion, consider an int egrat ion as a UNION
operat ion, w e should be able t o comput e
increment / decrement dat a (δAi ) and int egrat e t he result
w it h t he previous one:

[3] G. Burat t i. A M odel and an Algebra for Semi-St ruct ured and Full-Text Queries.
PhD t hesis, Informat ica, Universit a di Bologna, Padova, 2007.
[4] F. Frasincar, G.-J. Houben, and C. Pau. XAL: an algebra for XM L query
opt imizat ion. Aust . Comput . Sci. Commun., 24(2):49{56, January 2002.

e(A1 ,…,Ai⊕δAi ,…,AD) = e(A1,…,Ai ,…,AD) ⊕ f (A1,…,δAi ,…,AD)

[5] H. V. Jagadish, L. V. S. Lakshmanan, D. Srivast ava, and K. Thompson. TAX: A t ree
algebra for XM L. In In Proc. DBPL Conf, pages 149{164, 2001.

v Example, applying U over ⊕:
e(R1 ⊕δ1 )UR2 = (R1UR2) ⊕ δ1
v f is a funct ion t hat need t o be defined so t hat all operat ors

[6] M . Lukichev, B. Novikov, and P. M ehra. An XM L-algebra for eficient set -at -at ime execut ion. ComSIS, 9(1):64{80, January 2012.
[7] M . M urat a, D. Lee, M . M ani, and K. Kaw aguchi. Taxonomy of XM L schema
languages using formal language t heory. ACM Trans. Int ernet Technol.,
5(4):660{704, Nov. 2005.

over increment / decrement dat a follow s t he form.

v We found t hat f is a funct ion of eit her ⊕ or -.
25

Con clu sion
v XM L Algebra proposed is consist ent w it h relat ional
algebra.

v It meet s t he need of online processing:
§ It works in tree structure to avoid nest and unnest operations.
§ It possible to find a function to process increment data

26

27

Re se a r ch Pr ogr e ss

XM L D a t a M ode l – Re gu la r Tr e e Gr a m m a r
v A st ruct ure of an XM L document can be formally

v A st ruct ure of an XM L document can be formally

defined by an RTG ( Regular Tree Grammar ).

defined by a Regular Tree Grammar

29

31

Re se a r ch Pr ogr e ss

XM L D a t a M ode l – I n st a n ce Gr a m m a r

v We int roduce a grammar for creat ing inst ance XM L

v An Inst ance Grammar (IG) as a cont ext sensit ive
grammar, w hich t ransforms t he sent ences of RTG
int o t he inst ances of XM L document s.

document w hich is called Inst ance Grammar (IG).
IG is a cont ext sensit ive grammar w hich is
t ransformat ion of regular t ree grammar sent ences
int o inst ances of XM L document .

30

32

Dokumen yang terkait