processing xml
play

Processing XML: XPath, XQuery Ramakrishnan & Gehrke, Chapter 24 - PowerPoint PPT Presentation

Processing XML: XPath, XQuery Ramakrishnan & Gehrke, Chapter 24 / 27 320302 Databases & WebServices (P. Baumann) Why are we DBers interested? Its data, stupid. Thats us. Database issues: How are we going to model


  1. Processing XML: XPath, XQuery Ramakrishnan & Gehrke, Chapter 24 / 27 320302 Databases & WebServices (P. Baumann)

  2. Why are we DB’ers interested?  It‟s data, stupid. That‟s us.  Database issues: • How are we going to model XML? • Trees, graphs • How are we going to query XML? • XQuery • How are we going to store XML? • in a relational database? object-oriented? native? • How are we going to process XML efficiently? • many interesting research questions! 320302 Databases & WebApplications (P. Baumann) 2

  3. XML Revisited  From a data modelling viewpoint, what does XML offer?  Entities (ER!)  Attributes • Single-valued, atomic  Relationships? Yes, but: • Single-root trees only • Unordered, no role names • General graphs through id/idrefs, syntax only 320302 Databases & WebApplications (P. Baumann) 3

  4. Roadmap  XPath  XQuery 320302 Databases & WebApplications (P. Baumann) 4

  5. Path Expressions: XPath  Basic concept: path = sequence of location steps • Axis: tree relationship between nodes selected by location step + current node • parent, child, self, descendant-or- self, attribute, … • a node test: node type + expanded-name of nodes selected by location step • 0..* predicates: further refinement  General location step syntax: axisname::nodetest[predicate] 320302 Databases & WebApplications (P. Baumann) 5

  6. Pattern Expressions <?xml version="1.0" encoding="ISO-8859-1"?>  identify nodes in document <catalog> <cd country="USA"> <title>Empire Burlesque</title>  path through the XML document <artist>Bob Dylan</artist> • .../node1/node2/... <price>10.90</price> </cd>  pattern "selects" elements that match <cd country="UK"> <title>Hide your heart</title> path, result is a (sub)tree <artist>Bonnie Tyler</artist> <price>9.90</price> • „all price elements of all cd elements </cd> of the catalog element“: <cd country="USA"> /catalog/cd/price <title>Greatest Hits</title> <price>10.90</price> <artist>Dolly Parton</artist> <price>9.90</price> <price>9.90</price> <price>9.90</price> </cd> </catalog> 320302 Databases & WebApplications (P. Baumann) 6

  7. Paths <?xml version="1.0" encoding="ISO-8859-1"?>  Absolute vs. relative vs. fitting: <catalog> <cd country="USA"> • path starts with slash ( / ): <title>Empire Burlesque</title> absolute path <artist>Bob Dylan</artist> <price>10.90</price> • path starts with oduble slash ( // ): </cd> all fitting elements, <cd country="UK"> even if at different levels in tree <title>Hide your heart</title> <artist>Bonnie Tyler</artist> • Otherwise: path relative to current position <price>9.90</price> </cd>  Relative addressing via axis: <cd country="USA"> • node set relative to current node <title>Greatest Hits</title> <artist>Dolly Parton</artist> • all children of parent, child, self, ancestor, <price>9.90</price> descendant, attribute , … </cd> </catalog> 320302 Databases & WebApplications (P. Baumann) 7

  8. Examples 320302 Databases & WebApplications (P. Baumann) 9

  9. More Examples self({2}) = {2} ancestor-or-self({4}) = {1,2,4}   child({1}) = {2,5} following({3}) = {4,5}   <1> parent({3}) ={2} preceding({4}) = {3} <2>   <3/> descendant({1}) = {2,3,4,5} following-sibling({4}) = {}   <4/> </2> descendant-or-self({1}) = {1,2,3,4,5} preceding-sibling({5}) = {2}   <5/> <1/> ancestor({4}) = {1,2}  320302 Databases & WebApplications (P. Baumann) 10

  10. Wildcards <?xml version="1.0" encoding="ISO-8859-1"?>  * selects unknown elements <catalog> <cd country="USA"> <title>Empire Burlesque</title>  „ all child elements of all cd of catalog “: <artist>Bob Dylan</artist> /catalog/cd/* <price>10.90</price> </cd>  „ all price elements that are <cd country="UK"> <title>Hide your heart</title> grandchilds of catalog “: <artist>Bonnie Tyler</artist> /catalog/*/price <price>9.90</price> </cd>  „ all price elements which have 2 <cd country="USA"> <title>Greatest Hits</title> ancestors “: /*/*/price <artist>Dolly Parton</artist> <price>9.90</price>  „ all elements “: //* </cd> </catalog> 320302 Databases & WebApplications (P. Baumann) 11

  11. Abbreviations  a/b/c • ./child::a/child::b/child::c  a//@id • ./child::a/descendant-or-self::node()/attribute::id  //a • root(.)/descendant-or-self::node()/child::a  a/text() • ./child::a/child::text() 320302 Databases & WebApplications (P. Baumann) 12

  12. Branch Selection <?xml version="1.0" encoding="ISO-8859-1"?>  Selecting branches from subtree: "[...]" <catalog> <cd country="USA">  „first cd child of catalog“: /catalog/cd[1] <title>Empire Burlesque</title> <artist>Bob Dylan</artist> • /catalog/cd[ position() = 1 ] <price>10.90</price> </cd>  „last cd child of catalog“: <cd country="UK"> /catalog/cd[ last() ] <title>Hide your heart</title> <artist>Bonnie Tyler</artist> Note: There is no function named first() • <price>9.90</price>  „ all cd elements of catalog that have a </cd> <cd country="USA"> price element “: /catalog/cd[ price ] <title>Greatest Hits</title> <artist>Dolly Parton</artist>  „ all cd elements of catalog that have a <price>9.90</price> price with value of 10.90 “: </cd> </catalog> /catalog/cd[ price=10.90 ] 320302 Databases & WebApplications (P. Baumann) 13

  13. Multiple Paths <?xml version="1.0" encoding="ISO-8859-1"?>  Selecting Several Paths: | operator <catalog> <cd country="USA"> <title>Empire Burlesque</title>  „all title, artist elements“: <artist>Bob Dylan</artist> /catalog/cd/title | /catalog/cd/artist <price>10.90</price> </cd>  „all the title and artist elements in the <cd country="UK"> <title>Hide your heart</title> document“: //title | //artist <artist>Bonnie Tyler</artist> <price>9.90</price>  „all title, artist, price elements“: </cd> //title | //artist | //price <cd country="USA"> <title>Greatest Hits</title>  “all title elements of cd of catalog, and <artist>Dolly Parton</artist> <price>9.90</price> all artist elements“: </cd> /catalog/cd/title | //artist </catalog> 320302 Databases & WebApplications (P. Baumann) 14

  14. Attributes <?xml version="1.0" encoding="ISO-8859-1"?>  Selecting Attributes: <catalog> <cd country="USA"> prefix attributes with @ <title>Empire Burlesque</title> <artist>Bob Dylan</artist>  „all attributes named „ country „ “: <price>10.90</price> //@country </cd> <cd country="UK"> <title>Hide your heart</title>  „all cd elements which have an <artist>Bonnie Tyler</artist> attribute named country“: <price>9.90</price> //cd[@country] </cd> <cd country="USA"> <title>Greatest Hits</title>  „all cd elements with attribute named <artist>Dolly Parton</artist> country with value 'UK' ": <price>9.90</price> //cd[@country='UK'] </cd> </catalog> 320302 Databases & WebApplications (P. Baumann) 15

  15. Predicates <?xml version="1.0" encoding="ISO-8859-1"?>  Predicates, operators, functions <catalog> <cd country="USA"> as usual <title>Empire Burlesque</title> <artist>Bob Dylan</artist>  „ all CDs with price below 10.0 “: <price>10.90</price> /catalog/cd[ price<10.0 ] </cd> <cd country="UK"> <title>Hide your heart</title>  „ all CDs with country "UK" <artist>Bonnie Tyler</artist> and price below 10.0 “: <price>9.90</price> / catalog </cd> <cd country="USA"> / cd[ @country="UK" ] <title>Greatest Hits</title> / [ price<10.0 ] <artist>Dolly Parton</artist> <price>9.90</price> </cd> </catalog> 320302 Databases & WebApplications (P. Baumann) 16

  16. Roadmap  XPath  XQuery 320302 Databases & WebApplications (P. Baumann) 19

Recommend


More recommend