symmetrically exploiting xml
play

Symmetrically Exploiting XML Shuohao Zhang and Curtis Dyreson - PowerPoint PPT Presentation

Symmetrically Exploiting XML Shuohao Zhang and Curtis Dyreson School of E.E. and Computer Science Washington State University Pullman, Washington, USA The 15 th International World Wide Web Conference May 2006 Edinburgh, Scotland 1970s


  1. Symmetrically Exploiting XML Shuohao Zhang and Curtis Dyreson School of E.E. and Computer Science Washington State University Pullman, Washington, USA The 15 th International World Wide Web Conference May 2006 Edinburgh, Scotland

  2. 1970’s Database Controversy • Hierarchical model vs. relational model • Codd: symmetric exploitation of data Part Project Commit Project Part Project Part � part/project works on some, but not all • Path expressions are asymmetric • Currently, all XML query languages use path expressions Symmetrically Exploiting XML: Zhang, Dyreson

  3. Querying Data with Path Expressions author name book book E. F. Codd title publisher price title publisher price 9.99 46.95 Automata DB Addison Wesley Academic Press • Task � Find books by E. F. Codd • XQuery � return doc("author.xml")//author[name= 'E. F. Codd']/book Symmetrically Exploiting XML: Zhang, Dyreson

  4. Same Data, Different Structure author book book name book book title author price publisher title author price publisher E. F. Codd DB 46.95 Automata 9.99 title publisher price title publisher price name name Addison Wesley Academic Press 9.99 46.95 Automata DB Codd E. F. Codd Addison Wesley Academic Press • Same task � Find books by E. F. Codd • Need different XQuery � return doc("book.xml")//book[author/name='E. F. Codd'] Symmetrically Exploiting XML: Zhang, Dyreson

  5. Goal • Make same query work on different structures • Useful when there is � lack of schema knowledge � heterogeneous data � irregular data � schema evolution • Factor off problem of different label sets, others are working on it Symmetrically Exploiting XML: Zhang, Dyreson

  6. Existing Axes are Directional ancestor self preceding following descendent Symmetrically Exploiting XML: Zhang, Dyreson

  7. Proposal: A Non-directional Axis ancestor self preceding following descendent Symmetrically Exploiting XML: Zhang, Dyreson

  8. Proposal: A Non-directional Axis ancestor self preceding following descendent Symmetrically Exploiting XML: Zhang, Dyreson

  9. Proposal: A Non-directional Axis ancestor self preceding following descendent Symmetrically Exploiting XML: Zhang, Dyreson

  10. The Closest Axis • Syntax � closest:: ->name is abbreviation for closest::name � • Semantics � a function that takes a context node and returns a sequence of closest nodes Symmetrically Exploiting XML: Zhang, Dyreson

  11. Closest Axis of the First Title author name book book title publisher price title publisher price • closest::* � Returns a list of five nodes • closest::price � Returns the first price node Symmetrically Exploiting XML: Zhang, Dyreson

  12. When the First Book Lacks a Price author name book book title publisher title publisher price • Node selection restricted by minimal type distance � The minimal distance between a title and a price is 2 • closest::price � Returns an empty list Symmetrically Exploiting XML: Zhang, Dyreson

  13. Type Distance is Crucial • closest::name for each book? author name book book title publisher title publisher price name • Root-to-node path type � author/name � author/book/publisher/name Symmetrically Exploiting XML: Zhang, Dyreson

  14. Querying with the Closest Axes Same query -- return doc("any.xml")->author[->name='E. F. Codd']->book Query Result#1 Closest axis-enabled Query Result#2 XQuery evaluation engine Result#3 Query Symmetrically Exploiting XML: Zhang, Dyreson

  15. Querying with Directional Axes Query#1 -- return doc("author.xml")//author[name= 'E. F. Codd']/book Result#1 XQuery Query#2 -- …… Result#2 evaluation engine Result#3 Query#3 -- return doc("book.xml")//book[author/name='E. F. Codd'] Symmetrically Exploiting XML: Zhang, Dyreson

  16. In-memory Implementation • Naïve approach � Compute Closest for every node � Time complexity is O( sn 2 ) � s : number of labels in the signature � n : number of nodes • Converting to a path expression Find the closest price for title author Non-directional expression closest::price book name Directional (path) expression parent::*/child::price title publisher price Symmetrically Exploiting XML: Zhang, Dyreson

  17. Experiment • Compare directional vs. nondirectional for $b in doc("bib.xml")//title/closest::publisher return $b for $b in doc("bib.xml")//title/..//publisher return $b 1600 1400 1200 Time (milliseconds) • Implemented closest in 1000 descendant 800 eXist (an XML DBMS) closest 600 400 200 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 5 0 5 0 5 2 5 7 0 2 5 1 1 1 Number of Nodes Symmetrically Exploiting XML: Zhang, Dyreson

  18. Persistent Implementation • Take advantage of type indexes • LCA-join � Every Closest pair related via an LCA � Idea is to merge lists of types current lca … … current child current parent … … … … direction of merge � O( sn ) Symmetrically Exploiting XML: Zhang, Dyreson

  19. Related Work • Data integration TSIMMIS � Garcia-Molina et al. ( Journal of Intelligent Information Systems 1997) � YAT � Christophides, Cluet, Simèon ( SIGMOD Record June 2000) � Silkroute � Fernandez, Tan, Suciu ( WWW 2000) � • LCA-related techniques Schmidt, Kersten, Windhouwer ( ICDE 2001) � Cohen, Mamou, Kanza, Sagiv ( VLDB 2003) � Li, Yu, Jagadish ( VLDB 2004) � Symmetrically Exploiting XML: Zhang, Dyreson

  20. Related Research Projects • XML Restructuring � Zhang, Dyreson ( IIWeb 2006) • XML Compaction � Zhang, Dyreson, Dang ( DASFAA 2006) • Common theme – symmetric exploitation! Symmetrically Exploiting XML: Zhang, Dyreson

  21. Conclusion • Current XQuery depends on path expressions • A path expression is directional (asymmetric) � May break down if structure changes • The closest axis is non-directional (symmetric) � Simple in syntax � Can be easily integrated in XQuery � Can be implemented efficiently � In-memory � Persistent Symmetrically Exploiting XML: Zhang, Dyreson

  22. Thank You!

Recommend


More recommend