efficient filtering of xml documents with xpath expression

Efficient Filtering of XML Documents with XPath Expression - PowerPoint PPT Presentation

Efficient Filtering of XML Documents with XPath Expression Authors: Chee-Yong Chan, Pascal Felber, Minos Garofalakis, Rajeev Rastogi Bell Laboratories, Lucent Technologies {cychan,pascal,minos,rastogi}@research.bell-labs.com Speaker: Lam-Son


  1. Efficient Filtering of XML Documents with XPath Expression Authors: Chee-Yong Chan, Pascal Felber, Minos Garofalakis, Rajeev Rastogi Bell Laboratories, Lucent Technologies {cychan,pascal,minos,rastogi}@research.bell-labs.com Speaker: Lam-Son LE LamSon.Le@epfl.ch , EPFL, I&C Doctoral School, WS 2002/2003 Distributed Information Processing Page 1

  2. Outline • Introduction – publish/subscribe systems, “bags of words” vs. XPath language • Background – XPE-tree, unordered/ordered matching • XPE Decompositions and Matchings – substring/minimal/simple decomposition, substring-tree • The XTrie Indexing Scheme – substring table, Trie – matching algorithm • Evaluation – comparison with XFilter Page 2

  3. Introduction • Selective data dissemination – publishers selectively deliver data to subscribers • Simple matching schema: “bags of words” • XML data emergence – XPath as filter-specification language, XPE – XPath Expression – Retrieval problem: Given a collection P of XPEs and an input XML document D , find the subset of XPEs in P that match D . • XTrie based on XPath expressions • XTrie efficiently filters XML documents – Indexing on a set of substrings rather than individual element – support both ordered and unordered matching Page 3

  4. Background (1/3) • XML documents as trees – root element, sub elements can be nested to any depth – level ( root ) = 1, level ( d ) = level ( d’ ) + 1 if d’ is the parent of d • XPath expressions (XPEs) – “/”: parent/child operator – “//”: ancestor/descendant operator – “*”: wildcard operator – “[”, “]”: delimiting a predicate – example: p = //a//b[*/c]/d – 2 patterns: path pattern and tree pattern Page 4

  5. Background (2/3) • XPE-tree – predicate expressions give rise to branches of the tree – XPE-tree can have order if the elements in XPE are supposed to be ordered – relative level of a node in XPE-tree • relLevel(t i ) = [k, ∝ ] if t i is prefixed with “//” followed by (k-1) “*” a range • relLevel(t i ) = [k, k] if t i is prefixed with “/” followed by (k-1) “*” a precise value Page 5

  6. Background (3/3) • Unordered matching b 1 – set of nodes with names //a [1, ∝ ] matched a 2 – level differences of match //b [1, ∝ ] nodes are according to b 3 b 10 relative level • Ordered matching is b 4 f 8 stronger: the order of /*/c [2,2] /d [1,1] elements in the XPE-tree is c 9 e 5 d 7 taken into account • Matching example c 6 – p = //a//b[*/c]/d – { a 2 , b 4 , c 6 , d 7 } is an ordered XML tree D XPE-tree T matching of D to p Page 6

  7. XPE Decompositions (1/3) • Substring of an EXP – a possible concatenation of node separated by “/” – example: p = /a/b[c/d//e][g//e/f]//*/*/e/f . Possible substrings: abg , bcd , ef, b • Substring decomposition: set of substring that cover all nodes in XPE tree • Minimal decomposition: one substring couldn’t be a prefix of another – advantage: substring as longest pas possible, resulting in lower probability of being found and matched Page 7

  8. XPE Decompositions (2/3) • Simple decomposition: add a substring for each branching node to the minimal decomposition • Substring-tree: nodes are substrings from simple decomposition – parent if a prefix of the child or – the last element of parent substring is the parent node of the first element of the child substring • Relative level is extended to substrings – computed based on the relative level of the different elements between the given substring and its parent Page 8

  9. XPE Decompositions (3/3) ab /a /a abg ef /b abcd /b /*/*/e /*/*/e /c /c /g /g e ef //e //e /d /d /f /f /f /f //e //e Simple Minimal Substring-tree decomposition decomposition • Example for p = /a/b[c/d//e][g//e/f]//*/*/e/f Page 9

  10. Matching with Substrings (1/2) • A substring matches a node in XML document if its last element match that node • Typically, XML documents are parsed in pre-order (SAX parser). Substrings should also be ordered by pre-order traversal of the substring-tree • Partial matching: matching for all consecutive substrings from the first to the given substring • Complete matching: partial matching for the final substring • Subtree-matching: partial matching found at all descendants of the given substring • Redundant matching: subtree-matching found at some earlier node in the XML document Page 10

  11. Matching with Substrings (2/2) b 1 • Again, p = //a//b[*/c]/d [1, ∝ ] s1 = a – s1 = a, s2 = b, s3 = c, (s 1 ) a 2 [1, ∝ ] s2 = b s4 = db b 3 (s 2 ) b 10 – matching at c 9 and b 10 s4 = bd [1,1] (s 2 ) are redundant b 4 f 8 s3 = c [2,2] (s 3 ) e 5 d 7 c 9 substring-tree (s 4 ) (s 3 ) c 6 XML tree D Page 11

  12. XTrie Indexing Schema (1/2) • XTrie indexing schema built for a set of XPEs – derive the simple decomposition for all XPEs – associated them with relative levels • Consists of 2 data structures – Trie T: a tree where edges are labeled with element name in the XML document – Substring-Table ST: each row represents a substring Page 12

  13. XTrie Indexing Schema (2/2) 0 1 1 a b c d substring Index Parent Relative Rank Number of Next 0 1 8 1 row Level children row 0 1 0 1 3 4 5 2 [4, ∝ ] aabc 1 0 1 1 0 a b c b d ab 2 1 [3, 3] 1 0 3 11 5 0 2 2 3 9 4 10 3 6 7 8 9 10 ab 3 0 [2, 2] 1 2 6 b c d abce 4 3 [2, 2] 1 0 0 0 7 7 8 11 12 5 10 13 bcd 5 3 [4, 4] 2 0 0 c e ab 6 0 [2, 2] 1 2 0 1 12 14 15 abc 7 6 [1, 1] 1 1 0 4 1 d 8 7 [2, 2] 1 0 12 [2, ∝ ] bc 9 6 2 0 0 [2, ∝ ] cb 10 0 1 1 0 Example 2 [2, ∝ ] cd 11 10 1 1 0 d 12 11 [3, 3] 1 0 0 p1 = //a/a/b/c/*/a/b p2 = /a/b[c/e]/*/b/c/d p3 = /a/b[c/*/d]//b/c p4=//c/b//c/d/*/*/d Page 13

  14. XTrie Matching Algorithm (1/2) • Based on SAX to get notified when an element name is parsed • Requires another 2-dimension array sized <number of rows in ST> × <maximum level of XML document> • B[s, l ] is – is initialized to 0 at the beginning – incremented by 1 if non-redundant matching of s at level l is found – reset to 0 when end-tag at level l is parsed • An XPE p match the XML document if B[rs, l ] = m + 1 for some level l , where – rs is the root substring in the substring-tree for p – m is the number of child substring of rs Page 14

  15. XTrie Matching Algorithm (2/2) 0 1 1 b 1 a b c 2 1 3 3 1 4 1 1 2 a 2 d 4 1 5 b 3 b 10 substring Index Parent Relative Rank Number of Next row Level children row b 4 f 8 [1, ∝ ] a 1 0 1 1 0 [1, ∝ ] b 2 1 1 2 0 c 3 2 [2,2] 1 0 0 e 5 d 7 c 9 bd 4 2 [1,1] 2 0 0 c 6 Again, p = //a//b[*/c]/d Page 15

  16. Evaluation 4000 Filtering Time(ms) Filtering Time(ms) 1500 3000 1000 2000 500 1000 0 0 0 100 200 300 400 500 20 100 1000 Varying P (L=20, p w =0.1, p d =0.1, p b =0) Varying doc. length (P=100k, L=20, p w =0.1, p d =0.1, p b =0) • In comparison with XFilter (using hashtable on single element names) Page 16

  17. Thank you! Questions? Page 17

Recommend


More recommend


Explore More Topics

Stay informed with curated content and fresh updates.