Efficient Filtering of XML Documents with XPath Expression - PowerPoint PPT Presentation

Efficient Filtering of XML Documents with XPath Expression Authors: Chee-Yong Chan, Pascal Felber, Minos Garofalakis, Rajeev Rastogi Bell Laboratories, Lucent Technologies {cychan,pascal,minos,rastogi}@research.bell-labs.com Speaker: Lam-Son LE LamSon.Le@epfl.ch , EPFL, I&C Doctoral School, WS 2002/2003 Distributed Information Processing Page 1

Outline • Introduction – publish/subscribe systems, “bags of words” vs. XPath language • Background – XPE-tree, unordered/ordered matching • XPE Decompositions and Matchings – substring/minimal/simple decomposition, substring-tree • The XTrie Indexing Scheme – substring table, Trie – matching algorithm • Evaluation – comparison with XFilter Page 2

Introduction • Selective data dissemination – publishers selectively deliver data to subscribers • Simple matching schema: “bags of words” • XML data emergence – XPath as filter-specification language, XPE – XPath Expression – Retrieval problem: Given a collection P of XPEs and an input XML document D , find the subset of XPEs in P that match D . • XTrie based on XPath expressions • XTrie efficiently filters XML documents – Indexing on a set of substrings rather than individual element – support both ordered and unordered matching Page 3

Background (1/3) • XML documents as trees – root element, sub elements can be nested to any depth – level ( root ) = 1, level ( d ) = level ( d’ ) + 1 if d’ is the parent of d • XPath expressions (XPEs) – “/”: parent/child operator – “//”: ancestor/descendant operator – “*”: wildcard operator – “[”, “]”: delimiting a predicate – example: p = //a//b[*/c]/d – 2 patterns: path pattern and tree pattern Page 4

Background (2/3) • XPE-tree – predicate expressions give rise to branches of the tree – XPE-tree can have order if the elements in XPE are supposed to be ordered – relative level of a node in XPE-tree • relLevel(t i ) = [k, ∝ ] if t i is prefixed with “//” followed by (k-1) “*” a range • relLevel(t i ) = [k, k] if t i is prefixed with “/” followed by (k-1) “*” a precise value Page 5

Background (3/3) • Unordered matching b 1 – set of nodes with names //a [1, ∝ ] matched a 2 – level differences of match //b [1, ∝ ] nodes are according to b 3 b 10 relative level • Ordered matching is b 4 f 8 stronger: the order of /*/c [2,2] /d [1,1] elements in the XPE-tree is c 9 e 5 d 7 taken into account • Matching example c 6 – p = //a//b[*/c]/d – { a 2 , b 4 , c 6 , d 7 } is an ordered XML tree D XPE-tree T matching of D to p Page 6

XPE Decompositions (1/3) • Substring of an EXP – a possible concatenation of node separated by “/” – example: p = /a/b[c/d//e][g//e/f]//*/*/e/f . Possible substrings: abg , bcd , ef, b • Substring decomposition: set of substring that cover all nodes in XPE tree • Minimal decomposition: one substring couldn’t be a prefix of another – advantage: substring as longest pas possible, resulting in lower probability of being found and matched Page 7

XPE Decompositions (2/3) • Simple decomposition: add a substring for each branching node to the minimal decomposition • Substring-tree: nodes are substrings from simple decomposition – parent if a prefix of the child or – the last element of parent substring is the parent node of the first element of the child substring • Relative level is extended to substrings – computed based on the relative level of the different elements between the given substring and its parent Page 8

XPE Decompositions (3/3) ab /a /a abg ef /b abcd /b /*/*/e /*/*/e /c /c /g /g e ef //e //e /d /d /f /f /f /f //e //e Simple Minimal Substring-tree decomposition decomposition • Example for p = /a/b[c/d//e][g//e/f]//*/*/e/f Page 9

Matching with Substrings (1/2) • A substring matches a node in XML document if its last element match that node • Typically, XML documents are parsed in pre-order (SAX parser). Substrings should also be ordered by pre-order traversal of the substring-tree • Partial matching: matching for all consecutive substrings from the first to the given substring • Complete matching: partial matching for the final substring • Subtree-matching: partial matching found at all descendants of the given substring • Redundant matching: subtree-matching found at some earlier node in the XML document Page 10

Matching with Substrings (2/2) b 1 • Again, p = //a//b[*/c]/d [1, ∝ ] s1 = a – s1 = a, s2 = b, s3 = c, (s 1 ) a 2 [1, ∝ ] s2 = b s4 = db b 3 (s 2 ) b 10 – matching at c 9 and b 10 s4 = bd [1,1] (s 2 ) are redundant b 4 f 8 s3 = c [2,2] (s 3 ) e 5 d 7 c 9 substring-tree (s 4 ) (s 3 ) c 6 XML tree D Page 11

XTrie Indexing Schema (1/2) • XTrie indexing schema built for a set of XPEs – derive the simple decomposition for all XPEs – associated them with relative levels • Consists of 2 data structures – Trie T: a tree where edges are labeled with element name in the XML document – Substring-Table ST: each row represents a substring Page 12

XTrie Indexing Schema (2/2) 0 1 1 a b c d substring Index Parent Relative Rank Number of Next 0 1 8 1 row Level children row 0 1 0 1 3 4 5 2 [4, ∝ ] aabc 1 0 1 1 0 a b c b d ab 2 1 [3, 3] 1 0 3 11 5 0 2 2 3 9 4 10 3 6 7 8 9 10 ab 3 0 [2, 2] 1 2 6 b c d abce 4 3 [2, 2] 1 0 0 0 7 7 8 11 12 5 10 13 bcd 5 3 [4, 4] 2 0 0 c e ab 6 0 [2, 2] 1 2 0 1 12 14 15 abc 7 6 [1, 1] 1 1 0 4 1 d 8 7 [2, 2] 1 0 12 [2, ∝ ] bc 9 6 2 0 0 [2, ∝ ] cb 10 0 1 1 0 Example 2 [2, ∝ ] cd 11 10 1 1 0 d 12 11 [3, 3] 1 0 0 p1 = //a/a/b/c/*/a/b p2 = /a/b[c/e]/*/b/c/d p3 = /a/b[c/*/d]//b/c p4=//c/b//c/d/*/*/d Page 13

XTrie Matching Algorithm (1/2) • Based on SAX to get notified when an element name is parsed • Requires another 2-dimension array sized <number of rows in ST> × <maximum level of XML document> • B[s, l ] is – is initialized to 0 at the beginning – incremented by 1 if non-redundant matching of s at level l is found – reset to 0 when end-tag at level l is parsed • An XPE p match the XML document if B[rs, l ] = m + 1 for some level l , where – rs is the root substring in the substring-tree for p – m is the number of child substring of rs Page 14

XTrie Matching Algorithm (2/2) 0 1 1 b 1 a b c 2 1 3 3 1 4 1 1 2 a 2 d 4 1 5 b 3 b 10 substring Index Parent Relative Rank Number of Next row Level children row b 4 f 8 [1, ∝ ] a 1 0 1 1 0 [1, ∝ ] b 2 1 1 2 0 c 3 2 [2,2] 1 0 0 e 5 d 7 c 9 bd 4 2 [1,1] 2 0 0 c 6 Again, p = //a//b[*/c]/d Page 15

Evaluation 4000 Filtering Time(ms) Filtering Time(ms) 1500 3000 1000 2000 500 1000 0 0 0 100 200 300 400 500 20 100 1000 Varying P (L=20, p w =0.1, p d =0.1, p b =0) Varying doc. length (P=100k, L=20, p w =0.1, p d =0.1, p b =0) • In comparison with XFilter (using hashtable on single element names) Page 16

Thank you! Questions? Page 17

Efficient Filtering of XML Documents with XPath Expression - PowerPoint PPT Presentation

Efficient Filtering of XML Documents with XPath Expression Authors: Chee-Yong Chan, Pascal Felber, Minos Garofalakis, Rajeev Rastogi Bell Laboratories, Lucent Technologies {cychan,pascal,minos,rastogi}@research.bell-labs.com Speaker: Lam-Son

XPath: Arithmetical Operations XPath : Arithmetical Operations 3.1 Additional Features 3.1

Session 16 XPath 1 Objectives Understand XPath well enough to provide a background to jQuery

XPATH and XQUERY Two query language to search for features in XML documents XML Query

XPath Reference XPath leashed, Michael Benedikt and Christoph Koch, TR, 2006 1

Information Systems XPath Nikolaj Popov Research Institute for Symbolic Computation Johannes

9. Path expressions: XPath XPath is a language for selecting parts of XML documents it is

Reference XPath leashed, Michael Benedikt and Christoph Koch, TR, 2006 XPath Formal setting

XPath Asst. Prof. Dr. Kanda Runapongsa Saikaew (krunapon@kku.ac.th) Dept. of Computer

Generative XPath One XPath to rule them all Oleg Parashchenko Saint-Petersburg State University,

XPath and XSLT Based on slides by Dan Suciu University of Washington CS330 Lecture November 12,

XPath XPath is a language for describing paths in XML documents. XML query languages

Gene Expression Data Introduction to gene expression data Expression data storage concept An

Filtering Cubemaps Filtering Cubemaps Angular Extent Filtering and Edge Seam Fixup Methods

Traffic Control Mechanisms Filtering Source address filtering Other forms of filtering

Lesson 7 Rate Conversion Filtering and Downsampling interchange Filtering and Upsampling

The The XPath XPath Language Language Abbreviations General expressions Anders Mller

The OeKB Fund Data Portal Presentation for the Bulgarian Association of Asset Management

Data Standards Survey February 2013 Introduction Kevin Sosa, Associate Director of IT, Ohio

XML, Corpora and Machine Translations Hanne Moa Department of Language and Communication Studies

Sap Business Objects Dashboard And Presentation Design User Guide (SAP BusinessObjects 4.0),

S100WG3 S-1xx Exchange Catalogue naming FREEDOM TO CHOOSE Operated by the Norwegian Mapping

FASAMS Monthly Meeting March 14, 2018 Presenters Adam Wasserman DCF Project Sponsor

Evolutionary Student Research Projects in Domain Specific Modelling for an ERP-System with ADOxx

IPCC Inventory Software UN Climate Change Conference 5 December 2019 Madrid, Spain Yurii