XPath Satisfiability with Parent Axes or Qualifiers Is Tractable under Many of Real-World DTDs Yasunori Ishihara (Osaka University) Nobutaka Suzuki (University of Tsukuba) Kenji Hashimoto (NAIST) Shogo Shimizu (Gakushuin Women's College) Toru Fujiwara (Osaka University) 1
XPath satisfiability β’ Input: XPath expression π DTD πΈ β’ Output: Is there an XML document π such that β π conforms to πΈ and β π returns a nonempty set for π ? β’ Research on XPath satisfiability is motivated by query optimization β Unsatisfiable (parts of) XPath expressions can be replaced with the empty set 2
XPath expression β’ Atomic expression: " axis :: label " β β (child axis) Ordinary notation: /a[b]//c β β β (descendant-or-self axis) root β β (parent axis) <a> β β β (ancestor-or-self axis) <b> β β + (following-sibling axis) <c> β β + (preceding-sibling axis) Our notation: ( β:: a[ β:: b])/ β β :: c β’ Path constructors: β β (path concatenation) No negation operators β βͺ (path union) β [ ] (qualifier, possibly with β§ and β¨ ) 3
Document type definition (DTD) β’ A DTD β specifies a set of XML documents β is naturally modeled by a tree grammar β’ Each production rule specifies, for a label, a set of sequences of its children by a regular expression content PC -> Name Manager ( Manager | Guest )* model <PC> <PC> β― <Name> <Name> <Manager> <Manager> <Guest> 4
Difficulty in XPath satisfiability β’ XPath satisfiability under arbitrary DTDs is in P for a very small subclass of XPath [BFG05,BFG08,GF05] β ( β , β β , βͺ ) β’ Analyzing non-cooccurrence of sibling labels is difficult β non-cooccurrence is specified by disjunctions <r> XPath exp.: β β :: r [ β :: a ] [ β :: b ] [ β :: c ] [ β :: d ] [ β :: e ] DTD: r -> ( ad | be )( b | ace )( ae | cd ) <a> <b> <c> <d> <e> x x x x x x 3 3 1 1 2 2 Ο = β¨ β¨ β§ β¨ β§ β¨ β§ β¨ β§ β¨ β¨ ( ) ( ) ( ) ( ) ( ) x x x x x x x x x x x x 1 2 3 1 2 2 3 1 3 1 2 3 a b c d e 5
Related work & our purpose β’ Two approaches: β Tackling the intractability of XPath satisfiability itself [GL06,GL07,GLS07] β’ XPath expressions and DTDs are translated into formulas in monadic second-order (MSO) logic or in a variant of π -calculus β’ Satisfiability is verified by fast decision procedures for MSO or π -calculus formulas β Finding subclasses of DTDs such that satisfiability of a larger XPath class becomes tractable 6
DTD classes restricting disjunctions (1) β’ Disjunction-free DTD [BFG05,BFG08,GF05] β No content model contains disjunction operators of regular expressions β’ non-cooccurrence of lables cannot be specified β Tractable XPath classes [IMSHF09]: β’ ( β , β β , β + , β + , βͺ , [ ] ) β’ ( β , β β , β , β β , β + , β + , βͺ ) β Disjunction-freeness is too restrictive from the practical point of view 7
DTD classes restricting disjunctions (2) β’ Disjunction-capsuled DTD (DC-DTD) [IMSHF09], DC ?+# -DTD [IHSF12] π # π = π π ππ β Regular expression operators: β , | , β , ? , + , # β Every disjunction is in the scope of β or + β’ non-cooccurrence cannot be specified PC -> Name ? Manager ( Manager | Guest ) * PC -> ( Name | IP ) ? Manager ( Manager | Guest ) * β disjunction-free β DC β DC ?+# β All tractability results of disjunction-free DTDs are inherited by DC ?+# -DTDs β’ as long as the XPath class is within our formulation 8
DTD class restricting non-coocurrence β’ Duplicate-free DTD (DF-DTD) [MWM07] β Regular expression operators: β , | , β , ? , + β Each label appears at most once in a content model β’ Non-cooccurrence of sibling labels exists but can be easily analyzed PC -> (Name | IP)(Manager | Guest)* PC -> (Name | IP) Manager (Manager | Guest)* β Tractable XPath classes: β’ ( β , β§ ) [MWM07] β§ : qualifier with only β§ β’ ( β , β , β + , β + ) [SF09] 9
Hybridizing DF-DTDs and DC ?+# -DTDs β’ RW-DTDs [IHSF12] PC -> ( Name | IP ) Manager (Manager | Guest)* DC ?+# DF β 26 out of 27 real-world DTDs are RW-DTDs β 1406 out of 1407 real-world DTD rules are covered β Expected that RW-DTDs has the same tractability as DF-DTDs β’ only DF parts can specify non-cooccurrence 10
Hybridizing the two DTD classes β’ RW-DTDs [IHSF12] PC -> ( Name | IP ) Manager (Manager | Guest)* DC ?+# DF β 26 out of 27 real-world DTDs are RW-DTDs β 1406 out of 1407 real-world DTD rules are covered β + β + βͺ β β β β β β β§ [ ] any RW DF DC ?+# P P P P + + + NPC P P P + + + + NPC NPC P P + + + + NPC NPC P P gap β§ : qualifier with only β§ NPC: NP-complete 11
Contribution of this work β’ MRW-DTDs : β 24 out of 27 real-world DTDs are MRW-DTDs β 1403 out of 1407 real-world DTD rules are covered RW MRW DC ?+# DF Music ML Γ Γ Γ Ecoknowmics XHTML1-strict Γ 12
Contribution of this work β’ MRW-DTDs : β 24 out of 27 real-world DTDs are MRW-DTDs β 1403 out of 1407 real-world DTD rules are covered β β β β β β β β β + β + βͺ β + β + βͺ β β β β β§ β§ [ ] [ ] DC ?+# RW RW MRW DF + + + + + + + + P P P P P + + + + + + + NPC NPC P P P + + + + + + + + + NPC NPC P P P P NPC NPC NPC P + + + + NPC NPC NPC P + + + + + NPC NPC NPC P β§ : qualifier with only β§ NPC: NP-complete 13
Outline β’ Results on RW-DTDs [IHSF12] β’ MRW-DTDs and their tractability results β’ Conclusion 14
RW-DTDs [IHSF12] β’ Hybridization of DF-DTDs and DC ?+# -DTDs PC -> ( Name | IP ) Manager (Manager | Guest)* DC ?+# DF β 26 out of 27 real-world DTDs are RW-DTDs β 1406 out of 1407 real-world DTD rules are covered β + β + βͺ β β β β β β β§ [ ] any RW DF DC ?+# P P P P + + + NPC P P P + + + + NPC NPC P P + + + + NPC NPC P P gap β§ : qualifier with only β§ NPC: NP-complete 15
Satisfiability checking algorithm for ( β , β β , β + , β + ) under RW-DTDs 1. DTD transformation PC -> (Name | IP) Manager (Manager | Guest)* RW PC -> Name β IP β Manager (Manager | Guest)* DC ?+# 2. Approximate satisfiability checking β Run the known, efficient algorithm for DC ?+# -DTDs β The algorithm may answer βsatisfiableβ mistakenly 3. Consistency checking β Check whether π is consistent with the non- cooccurrence of labels specified by the original RW-DTD β π is unsatisfiable if π says βName and IP are siblingsβ 16
Difficulty for ( β , β ) and ( β , β§ ) under RW-DTDs β’ Label occurrence of some bounded, plural number of times PC -> (Name | IP) Manager β Manager β Guest* β ( β , β β , β + , β + ): goes down only β ( β , β ), ( β , β§ ): goes down and up many times π : checks Managerβs children many times PC At consistency checking step, we have to decide nondeterministically: Manager Guest ``Which Manager should we go to?ββ 17
Outline β’ Results on RW-DTDs [IHSF12] β’ MRW-DTDs and their tractability results β’ Conclusion 18
MRW-DTDs β’ RW-DTDs with the following restriction: β label π is outside the scope of any * and + β label π appears only once in the content model PC -> (Name | IP) Manager β Guest* PC -> (Name | IP) Manager β Manager β Guest* PC -> (Name | IP) Manager + (Manager | Guest)* PC -> (Name | IP) Manager (Manager | Guest)* β’ Each label appears βat most onceβ or βunboundedly many timesβ in a content model 19
Satisfiability checking algorithm under MRW-DTDs 1. DTD transformation (MRW -> DC ?+# ) 2. Approximate satisfiability checking 3. Consistency checking β’ Check if π is consistent with the non-cooccurrence of sibling labels specified by the original MRW-DTD β Maintain sibling information of all the nodes that may be revisited during the traverse by π always avoidable β MRW-DTDs: always revisited to be revisited β’ Each label appears βat most onceβ or βunboundedly many timesβ in a content model 20
Satisfiability check for ( β , β , β + , β + ) β’ Always-revisited nodes: β nodes with labels outside the scope of any * and + β ancestor nodes of the current node β’ due to β β’ XPath expressions are non-branching β’ due to absence of β§ sibling information π XPath (( ββ· π / β + β· π )/( ββ· π / ββ· π ))/ β + β· π π β π β π β π π π β DTD π β π 21
Satisfiability check for ( β , β , β + , β + ) β’ Always-revisited nodes: β nodes with labels outside the scope of any * and + β ancestor nodes of the current node β’ due to β β’ XPath expressions are non-branching β’ due to absence of β§ sibling information π XPath (( ββ· π / β + β· π )/( ββ· π / ββ· π ))/ β + β· π π π β π β π β π π π β DTD π β π 22
Recommend
More recommend