Processing XML: XPath, XQuery Ramakrishnan & Gehrke, Chapter 24 / 27 320302 Databases & WebServices (P. Baumann)
Why are we DB’ers interested? It‟s data, stupid. That‟s us. Database issues: • How are we going to model XML? • Trees, graphs • How are we going to query XML? • XQuery • How are we going to store XML? • in a relational database? object-oriented? native? • How are we going to process XML efficiently? • many interesting research questions! 320302 Databases & WebApplications (P. Baumann) 2
XML Revisited From a data modelling viewpoint, what does XML offer? Entities (ER!) Attributes • Single-valued, atomic Relationships? Yes, but: • Single-root trees only • Unordered, no role names • General graphs through id/idrefs, syntax only 320302 Databases & WebApplications (P. Baumann) 3
Roadmap XPath XQuery 320302 Databases & WebApplications (P. Baumann) 4
Path Expressions: XPath Basic concept: path = sequence of location steps • Axis: tree relationship between nodes selected by location step + current node • parent, child, self, descendant-or- self, attribute, … • a node test: node type + expanded-name of nodes selected by location step • 0..* predicates: further refinement General location step syntax: axisname::nodetest[predicate] 320302 Databases & WebApplications (P. Baumann) 5
Pattern Expressions <?xml version="1.0" encoding="ISO-8859-1"?> identify nodes in document <catalog> <cd country="USA"> <title>Empire Burlesque</title> path through the XML document <artist>Bob Dylan</artist> • .../node1/node2/... <price>10.90</price> </cd> pattern "selects" elements that match <cd country="UK"> <title>Hide your heart</title> path, result is a (sub)tree <artist>Bonnie Tyler</artist> <price>9.90</price> • „all price elements of all cd elements </cd> of the catalog element“: <cd country="USA"> /catalog/cd/price <title>Greatest Hits</title> <price>10.90</price> <artist>Dolly Parton</artist> <price>9.90</price> <price>9.90</price> <price>9.90</price> </cd> </catalog> 320302 Databases & WebApplications (P. Baumann) 6
Paths <?xml version="1.0" encoding="ISO-8859-1"?> Absolute vs. relative vs. fitting: <catalog> <cd country="USA"> • path starts with slash ( / ): <title>Empire Burlesque</title> absolute path <artist>Bob Dylan</artist> <price>10.90</price> • path starts with oduble slash ( // ): </cd> all fitting elements, <cd country="UK"> even if at different levels in tree <title>Hide your heart</title> <artist>Bonnie Tyler</artist> • Otherwise: path relative to current position <price>9.90</price> </cd> Relative addressing via axis: <cd country="USA"> • node set relative to current node <title>Greatest Hits</title> <artist>Dolly Parton</artist> • all children of parent, child, self, ancestor, <price>9.90</price> descendant, attribute , … </cd> </catalog> 320302 Databases & WebApplications (P. Baumann) 7
Examples 320302 Databases & WebApplications (P. Baumann) 9
More Examples self({2}) = {2} ancestor-or-self({4}) = {1,2,4} child({1}) = {2,5} following({3}) = {4,5} <1> parent({3}) ={2} preceding({4}) = {3} <2> <3/> descendant({1}) = {2,3,4,5} following-sibling({4}) = {} <4/> </2> descendant-or-self({1}) = {1,2,3,4,5} preceding-sibling({5}) = {2} <5/> <1/> ancestor({4}) = {1,2} 320302 Databases & WebApplications (P. Baumann) 10
Wildcards <?xml version="1.0" encoding="ISO-8859-1"?> * selects unknown elements <catalog> <cd country="USA"> <title>Empire Burlesque</title> „ all child elements of all cd of catalog “: <artist>Bob Dylan</artist> /catalog/cd/* <price>10.90</price> </cd> „ all price elements that are <cd country="UK"> <title>Hide your heart</title> grandchilds of catalog “: <artist>Bonnie Tyler</artist> /catalog/*/price <price>9.90</price> </cd> „ all price elements which have 2 <cd country="USA"> <title>Greatest Hits</title> ancestors “: /*/*/price <artist>Dolly Parton</artist> <price>9.90</price> „ all elements “: //* </cd> </catalog> 320302 Databases & WebApplications (P. Baumann) 11
Abbreviations a/b/c • ./child::a/child::b/child::c a//@id • ./child::a/descendant-or-self::node()/attribute::id //a • root(.)/descendant-or-self::node()/child::a a/text() • ./child::a/child::text() 320302 Databases & WebApplications (P. Baumann) 12
Branch Selection <?xml version="1.0" encoding="ISO-8859-1"?> Selecting branches from subtree: "[...]" <catalog> <cd country="USA"> „first cd child of catalog“: /catalog/cd[1] <title>Empire Burlesque</title> <artist>Bob Dylan</artist> • /catalog/cd[ position() = 1 ] <price>10.90</price> </cd> „last cd child of catalog“: <cd country="UK"> /catalog/cd[ last() ] <title>Hide your heart</title> <artist>Bonnie Tyler</artist> Note: There is no function named first() • <price>9.90</price> „ all cd elements of catalog that have a </cd> <cd country="USA"> price element “: /catalog/cd[ price ] <title>Greatest Hits</title> <artist>Dolly Parton</artist> „ all cd elements of catalog that have a <price>9.90</price> price with value of 10.90 “: </cd> </catalog> /catalog/cd[ price=10.90 ] 320302 Databases & WebApplications (P. Baumann) 13
Multiple Paths <?xml version="1.0" encoding="ISO-8859-1"?> Selecting Several Paths: | operator <catalog> <cd country="USA"> <title>Empire Burlesque</title> „all title, artist elements“: <artist>Bob Dylan</artist> /catalog/cd/title | /catalog/cd/artist <price>10.90</price> </cd> „all the title and artist elements in the <cd country="UK"> <title>Hide your heart</title> document“: //title | //artist <artist>Bonnie Tyler</artist> <price>9.90</price> „all title, artist, price elements“: </cd> //title | //artist | //price <cd country="USA"> <title>Greatest Hits</title> “all title elements of cd of catalog, and <artist>Dolly Parton</artist> <price>9.90</price> all artist elements“: </cd> /catalog/cd/title | //artist </catalog> 320302 Databases & WebApplications (P. Baumann) 14
Attributes <?xml version="1.0" encoding="ISO-8859-1"?> Selecting Attributes: <catalog> <cd country="USA"> prefix attributes with @ <title>Empire Burlesque</title> <artist>Bob Dylan</artist> „all attributes named „ country „ “: <price>10.90</price> //@country </cd> <cd country="UK"> <title>Hide your heart</title> „all cd elements which have an <artist>Bonnie Tyler</artist> attribute named country“: <price>9.90</price> //cd[@country] </cd> <cd country="USA"> <title>Greatest Hits</title> „all cd elements with attribute named <artist>Dolly Parton</artist> country with value 'UK' ": <price>9.90</price> //cd[@country='UK'] </cd> </catalog> 320302 Databases & WebApplications (P. Baumann) 15
Predicates <?xml version="1.0" encoding="ISO-8859-1"?> Predicates, operators, functions <catalog> <cd country="USA"> as usual <title>Empire Burlesque</title> <artist>Bob Dylan</artist> „ all CDs with price below 10.0 “: <price>10.90</price> /catalog/cd[ price<10.0 ] </cd> <cd country="UK"> <title>Hide your heart</title> „ all CDs with country "UK" <artist>Bonnie Tyler</artist> and price below 10.0 “: <price>9.90</price> / catalog </cd> <cd country="USA"> / cd[ @country="UK" ] <title>Greatest Hits</title> / [ price<10.0 ] <artist>Dolly Parton</artist> <price>9.90</price> </cd> </catalog> 320302 Databases & WebApplications (P. Baumann) 16
Roadmap XPath XQuery 320302 Databases & WebApplications (P. Baumann) 19
Recommend
More recommend