XPath Satisfiability with Parent Axes or Qualifiers Is Tractable - PowerPoint PPT Presentation
XPath Satisfiability with Parent Axes or Qualifiers Is Tractable under Many of Real-World DTDs Yasunori Ishihara (Osaka University) Nobutaka Suzuki (University of Tsukuba) Kenji Hashimoto (NAIST) Shogo Shimizu (Gakushuin Women's College) Toru
XPath Satisfiability with Parent Axes or Qualifiers Is Tractable under Many of Real-World DTDs Yasunori Ishihara (Osaka University) Nobutaka Suzuki (University of Tsukuba) Kenji Hashimoto (NAIST) Shogo Shimizu (Gakushuin Women's College) Toru Fujiwara (Osaka University) 1
XPath satisfiability β’ Input: XPath expression π DTD πΈ β’ Output: Is there an XML document π such that β π conforms to πΈ and β π returns a nonempty set for π ? β’ Research on XPath satisfiability is motivated by query optimization β Unsatisfiable (parts of) XPath expressions can be replaced with the empty set 2
XPath expression β’ Atomic expression: " axis :: label " β β (child axis) Ordinary notation: /a[b]//c β β β (descendant-or-self axis) root β β (parent axis) <a> β β β (ancestor-or-self axis) <b> β β + (following-sibling axis) <c> β β + (preceding-sibling axis) Our notation: ( β:: a[ β:: b])/ β β :: c β’ Path constructors: β β (path concatenation) No negation operators β βͺ (path union) β [ ] (qualifier, possibly with β§ and β¨ ) 3
Document type definition (DTD) β’ A DTD β specifies a set of XML documents β is naturally modeled by a tree grammar β’ Each production rule specifies, for a label, a set of sequences of its children by a regular expression content PC -> Name Manager ( Manager | Guest )* model <PC> <PC> β― <Name> <Name> <Manager> <Manager> <Guest> 4
Difficulty in XPath satisfiability β’ XPath satisfiability under arbitrary DTDs is in P for a very small subclass of XPath [BFG05,BFG08,GF05] β ( β , β β , βͺ ) β’ Analyzing non-cooccurrence of sibling labels is difficult β non-cooccurrence is specified by disjunctions <r> XPath exp.: β β :: r [ β :: a ] [ β :: b ] [ β :: c ] [ β :: d ] [ β :: e ] DTD: r -> ( ad | be )( b | ace )( ae | cd ) <a> <b> <c> <d> <e> x x x x x x 3 3 1 1 2 2 Ο = β¨ β¨ β§ β¨ β§ β¨ β§ β¨ β§ β¨ β¨ ( ) ( ) ( ) ( ) ( ) x x x x x x x x x x x x 1 2 3 1 2 2 3 1 3 1 2 3 a b c d e 5
Related work & our purpose β’ Two approaches: β Tackling the intractability of XPath satisfiability itself [GL06,GL07,GLS07] β’ XPath expressions and DTDs are translated into formulas in monadic second-order (MSO) logic or in a variant of π -calculus β’ Satisfiability is verified by fast decision procedures for MSO or π -calculus formulas β Finding subclasses of DTDs such that satisfiability of a larger XPath class becomes tractable 6
DTD classes restricting disjunctions (1) β’ Disjunction-free DTD [BFG05,BFG08,GF05] β No content model contains disjunction operators of regular expressions β’ non-cooccurrence of lables cannot be specified β Tractable XPath classes [IMSHF09]: β’ ( β , β β , β + , β + , βͺ , [ ] ) β’ ( β , β β , β , β β , β + , β + , βͺ ) β Disjunction-freeness is too restrictive from the practical point of view 7
DTD classes restricting disjunctions (2) β’ Disjunction-capsuled DTD (DC-DTD) [IMSHF09], DC ?+# -DTD [IHSF12] π # π = π π ππ β Regular expression operators: β , | , β , ? , + , # β Every disjunction is in the scope of β or + β’ non-cooccurrence cannot be specified PC -> Name ? Manager ( Manager | Guest ) * PC -> ( Name | IP ) ? Manager ( Manager | Guest ) * β disjunction-free β DC β DC ?+# β All tractability results of disjunction-free DTDs are inherited by DC ?+# -DTDs β’ as long as the XPath class is within our formulation 8
DTD class restricting non-coocurrence β’ Duplicate-free DTD (DF-DTD) [MWM07] β Regular expression operators: β , | , β , ? , + β Each label appears at most once in a content model β’ Non-cooccurrence of sibling labels exists but can be easily analyzed PC -> (Name | IP)(Manager | Guest)* PC -> (Name | IP) Manager (Manager | Guest)* β Tractable XPath classes: β’ ( β , β§ ) [MWM07] β§ : qualifier with only β§ β’ ( β , β , β + , β + ) [SF09] 9
Hybridizing DF-DTDs and DC ?+# -DTDs β’ RW-DTDs [IHSF12] PC -> ( Name | IP ) Manager (Manager | Guest)* DC ?+# DF β 26 out of 27 real-world DTDs are RW-DTDs β 1406 out of 1407 real-world DTD rules are covered β Expected that RW-DTDs has the same tractability as DF-DTDs β’ only DF parts can specify non-cooccurrence 10
Hybridizing the two DTD classes β’ RW-DTDs [IHSF12] PC -> ( Name | IP ) Manager (Manager | Guest)* DC ?+# DF β 26 out of 27 real-world DTDs are RW-DTDs β 1406 out of 1407 real-world DTD rules are covered β + β + βͺ β β β β β β β§ [ ] any RW DF DC ?+# P P P P + + + NPC P P P + + + + NPC NPC P P + + + + NPC NPC P P gap β§ : qualifier with only β§ NPC: NP-complete 11
Contribution of this work β’ MRW-DTDs : β 24 out of 27 real-world DTDs are MRW-DTDs β 1403 out of 1407 real-world DTD rules are covered RW MRW DC ?+# DF Music ML Γ Γ Γ Ecoknowmics XHTML1-strict Γ 12
Contribution of this work β’ MRW-DTDs : β 24 out of 27 real-world DTDs are MRW-DTDs β 1403 out of 1407 real-world DTD rules are covered β β β β β β β β β + β + βͺ β + β + βͺ β β β β β§ β§ [ ] [ ] DC ?+# RW RW MRW DF + + + + + + + + P P P P P + + + + + + + NPC NPC P P P + + + + + + + + + NPC NPC P P P P NPC NPC NPC P + + + + NPC NPC NPC P + + + + + NPC NPC NPC P β§ : qualifier with only β§ NPC: NP-complete 13
Outline β’ Results on RW-DTDs [IHSF12] β’ MRW-DTDs and their tractability results β’ Conclusion 14
RW-DTDs [IHSF12] β’ Hybridization of DF-DTDs and DC ?+# -DTDs PC -> ( Name | IP ) Manager (Manager | Guest)* DC ?+# DF β 26 out of 27 real-world DTDs are RW-DTDs β 1406 out of 1407 real-world DTD rules are covered β + β + βͺ β β β β β β β§ [ ] any RW DF DC ?+# P P P P + + + NPC P P P + + + + NPC NPC P P + + + + NPC NPC P P gap β§ : qualifier with only β§ NPC: NP-complete 15
Satisfiability checking algorithm for ( β , β β , β + , β + ) under RW-DTDs 1. DTD transformation PC -> (Name | IP) Manager (Manager | Guest)* RW PC -> Name β IP β Manager (Manager | Guest)* DC ?+# 2. Approximate satisfiability checking β Run the known, efficient algorithm for DC ?+# -DTDs β The algorithm may answer βsatisfiableβ mistakenly 3. Consistency checking β Check whether π is consistent with the non- cooccurrence of labels specified by the original RW-DTD β π is unsatisfiable if π says βName and IP are siblingsβ 16
Difficulty for ( β , β ) and ( β , β§ ) under RW-DTDs β’ Label occurrence of some bounded, plural number of times PC -> (Name | IP) Manager β Manager β Guest* β ( β , β β , β + , β + ): goes down only β ( β , β ), ( β , β§ ): goes down and up many times π : checks Managerβs children many times PC At consistency checking step, we have to decide nondeterministically: Manager Guest ``Which Manager should we go to?ββ 17
Outline β’ Results on RW-DTDs [IHSF12] β’ MRW-DTDs and their tractability results β’ Conclusion 18
MRW-DTDs β’ RW-DTDs with the following restriction: β label π is outside the scope of any * and + β label π appears only once in the content model PC -> (Name | IP) Manager β Guest* PC -> (Name | IP) Manager β Manager β Guest* PC -> (Name | IP) Manager + (Manager | Guest)* PC -> (Name | IP) Manager (Manager | Guest)* β’ Each label appears βat most onceβ or βunboundedly many timesβ in a content model 19
Satisfiability checking algorithm under MRW-DTDs 1. DTD transformation (MRW -> DC ?+# ) 2. Approximate satisfiability checking 3. Consistency checking β’ Check if π is consistent with the non-cooccurrence of sibling labels specified by the original MRW-DTD β Maintain sibling information of all the nodes that may be revisited during the traverse by π always avoidable β MRW-DTDs: always revisited to be revisited β’ Each label appears βat most onceβ or βunboundedly many timesβ in a content model 20
Satisfiability check for ( β , β , β + , β + ) β’ Always-revisited nodes: β nodes with labels outside the scope of any * and + β ancestor nodes of the current node β’ due to β β’ XPath expressions are non-branching β’ due to absence of β§ sibling information π XPath (( ββ· π / β + β· π )/( ββ· π / ββ· π ))/ β + β· π π β π β π β π π π β DTD π β π 21
Satisfiability check for ( β , β , β + , β + ) β’ Always-revisited nodes: β nodes with labels outside the scope of any * and + β ancestor nodes of the current node β’ due to β β’ XPath expressions are non-branching β’ due to absence of β§ sibling information π XPath (( ββ· π / β + β· π )/( ββ· π / ββ· π ))/ β + β· π π π β π β π β π π π β DTD π β π 22
Recommend
More recommend
Explore More Topics
Stay informed with curated content and fresh updates.