xpath satisfiability with parent axes or qualifiers is
play

XPath Satisfiability with Parent Axes or Qualifiers Is Tractable - PowerPoint PPT Presentation

XPath Satisfiability with Parent Axes or Qualifiers Is Tractable under Many of Real-World DTDs Yasunori Ishihara (Osaka University) Nobutaka Suzuki (University of Tsukuba) Kenji Hashimoto (NAIST) Shogo Shimizu (Gakushuin Women's College) Toru


  1. XPath Satisfiability with Parent Axes or Qualifiers Is Tractable under Many of Real-World DTDs Yasunori Ishihara (Osaka University) Nobutaka Suzuki (University of Tsukuba) Kenji Hashimoto (NAIST) Shogo Shimizu (Gakushuin Women's College) Toru Fujiwara (Osaka University) 1

  2. XPath satisfiability β€’ Input: XPath expression π‘ž DTD 𝐸 β€’ Output: Is there an XML document π‘ˆ such that – π‘ˆ conforms to 𝐸 and – π‘ž returns a nonempty set for π‘ˆ ? β€’ Research on XPath satisfiability is motivated by query optimization – Unsatisfiable (parts of) XPath expressions can be replaced with the empty set 2

  3. XPath expression β€’ Atomic expression: " axis :: label " – ↓ (child axis) Ordinary notation: /a[b]//c – ↓ βˆ— (descendant-or-self axis) root – ↑ (parent axis) <a> – ↑ βˆ— (ancestor-or-self axis) <b> – β†’ + (following-sibling axis) <c> – ← + (preceding-sibling axis) Our notation: ( ↓:: a[ ↓:: b])/ ↓ βˆ— :: c β€’ Path constructors: – βˆ• (path concatenation) No negation operators – βˆͺ (path union) – [ ] (qualifier, possibly with ∧ and ∨ ) 3

  4. Document type definition (DTD) β€’ A DTD – specifies a set of XML documents – is naturally modeled by a tree grammar β€’ Each production rule specifies, for a label, a set of sequences of its children by a regular expression content PC -> Name Manager ( Manager | Guest )* model <PC> <PC> β‹― <Name> <Name> <Manager> <Manager> <Guest> 4

  5. Difficulty in XPath satisfiability β€’ XPath satisfiability under arbitrary DTDs is in P for a very small subclass of XPath [BFG05,BFG08,GF05] – ( ↓ , ↓ βˆ— , βˆͺ ) β€’ Analyzing non-cooccurrence of sibling labels is difficult – non-cooccurrence is specified by disjunctions <r> XPath exp.: ↓ βˆ— :: r [ ↓ :: a ] [ ↓ :: b ] [ ↓ :: c ] [ ↓ :: d ] [ ↓ :: e ] DTD: r -> ( ad | be )( b | ace )( ae | cd ) <a> <b> <c> <d> <e> x x x x x x 3 3 1 1 2 2 Ο• = ∨ ∨ ∧ ∨ ∧ ∨ ∧ ∨ ∧ ∨ ∨ ( ) ( ) ( ) ( ) ( ) x x x x x x x x x x x x 1 2 3 1 2 2 3 1 3 1 2 3 a b c d e 5

  6. Related work & our purpose β€’ Two approaches: – Tackling the intractability of XPath satisfiability itself [GL06,GL07,GLS07] β€’ XPath expressions and DTDs are translated into formulas in monadic second-order (MSO) logic or in a variant of 𝜈 -calculus β€’ Satisfiability is verified by fast decision procedures for MSO or 𝜈 -calculus formulas – Finding subclasses of DTDs such that satisfiability of a larger XPath class becomes tractable 6

  7. DTD classes restricting disjunctions (1) β€’ Disjunction-free DTD [BFG05,BFG08,GF05] – No content model contains disjunction operators of regular expressions β€’ non-cooccurrence of lables cannot be specified – Tractable XPath classes [IMSHF09]: β€’ ( ↓ , ↓ βˆ— , β†’ + , ← + , βˆͺ , [ ] ) β€’ ( ↓ , ↓ βˆ— , ↑ , ↑ βˆ— , β†’ + , ← + , βˆͺ ) – Disjunction-freeness is too restrictive from the practical point of view 7

  8. DTD classes restricting disjunctions (2) β€’ Disjunction-capsuled DTD (DC-DTD) [IMSHF09], DC ?+# -DTD [IHSF12] 𝑏 # 𝑐 = 𝑏 𝑐 𝑏𝑐 – Regular expression operators: β‹… , | , βˆ— , ? , + , # – Every disjunction is in the scope of βˆ— or + β€’ non-cooccurrence cannot be specified PC -> Name ? Manager ( Manager | Guest ) * PC -> ( Name | IP ) ? Manager ( Manager | Guest ) * – disjunction-free βŠ‚ DC βŠ‚ DC ?+# – All tractability results of disjunction-free DTDs are inherited by DC ?+# -DTDs β€’ as long as the XPath class is within our formulation 8

  9. DTD class restricting non-coocurrence β€’ Duplicate-free DTD (DF-DTD) [MWM07] – Regular expression operators: β‹… , | , βˆ— , ? , + – Each label appears at most once in a content model β€’ Non-cooccurrence of sibling labels exists but can be easily analyzed PC -> (Name | IP)(Manager | Guest)* PC -> (Name | IP) Manager (Manager | Guest)* – Tractable XPath classes: β€’ ( ↓ , ∧ ) [MWM07] ∧ : qualifier with only ∧ β€’ ( ↓ , ↑ , β†’ + , ← + ) [SF09] 9

  10. Hybridizing DF-DTDs and DC ?+# -DTDs β€’ RW-DTDs [IHSF12] PC -> ( Name | IP ) Manager (Manager | Guest)* DC ?+# DF – 26 out of 27 real-world DTDs are RW-DTDs – 1406 out of 1407 real-world DTD rules are covered – Expected that RW-DTDs has the same tractability as DF-DTDs β€’ only DF parts can specify non-cooccurrence 10

  11. Hybridizing the two DTD classes β€’ RW-DTDs [IHSF12] PC -> ( Name | IP ) Manager (Manager | Guest)* DC ?+# DF – 26 out of 27 real-world DTDs are RW-DTDs – 1406 out of 1407 real-world DTD rules are covered β†’ + ← + βˆͺ ↓ βˆ— ↑ βˆ— ↓ ↑ ∧ [ ] any RW DF DC ?+# P P P P + + + NPC P P P + + + + NPC NPC P P + + + + NPC NPC P P gap ∧ : qualifier with only ∧ NPC: NP-complete 11

  12. Contribution of this work β€’ MRW-DTDs : – 24 out of 27 real-world DTDs are MRW-DTDs – 1403 out of 1407 real-world DTD rules are covered RW MRW DC ?+# DF Music ML Γ— Γ— Γ— Ecoknowmics XHTML1-strict Γ— 12

  13. Contribution of this work β€’ MRW-DTDs : – 24 out of 27 real-world DTDs are MRW-DTDs – 1403 out of 1407 real-world DTD rules are covered ↓ βˆ— ↓ βˆ— ↑ βˆ— ↑ βˆ— β†’ + ← + βˆͺ β†’ + ← + βˆͺ ↓ ↓ ↑ ↑ ∧ ∧ [ ] [ ] DC ?+# RW RW MRW DF + + + + + + + + P P P P P + + + + + + + NPC NPC P P P + + + + + + + + + NPC NPC P P P P NPC NPC NPC P + + + + NPC NPC NPC P + + + + + NPC NPC NPC P ∧ : qualifier with only ∧ NPC: NP-complete 13

  14. Outline β€’ Results on RW-DTDs [IHSF12] β€’ MRW-DTDs and their tractability results β€’ Conclusion 14

  15. RW-DTDs [IHSF12] β€’ Hybridization of DF-DTDs and DC ?+# -DTDs PC -> ( Name | IP ) Manager (Manager | Guest)* DC ?+# DF – 26 out of 27 real-world DTDs are RW-DTDs – 1406 out of 1407 real-world DTD rules are covered β†’ + ← + βˆͺ ↓ βˆ— ↑ βˆ— ↓ ↑ ∧ [ ] any RW DF DC ?+# P P P P + + + NPC P P P + + + + NPC NPC P P + + + + NPC NPC P P gap ∧ : qualifier with only ∧ NPC: NP-complete 15

  16. Satisfiability checking algorithm for ( ↓ , ↓ βˆ— , β†’ + , ← + ) under RW-DTDs 1. DTD transformation PC -> (Name | IP) Manager (Manager | Guest)* RW PC -> Name β‹… IP β‹… Manager (Manager | Guest)* DC ?+# 2. Approximate satisfiability checking – Run the known, efficient algorithm for DC ?+# -DTDs – The algorithm may answer β€œsatisfiable” mistakenly 3. Consistency checking – Check whether π‘ž is consistent with the non- cooccurrence of labels specified by the original RW-DTD – π‘ž is unsatisfiable if π‘ž says β€œName and IP are siblings” 16

  17. Difficulty for ( ↓ , ↑ ) and ( ↓ , ∧ ) under RW-DTDs β€’ Label occurrence of some bounded, plural number of times PC -> (Name | IP) Manager β‹… Manager β‹… Guest* – ( ↓ , ↓ βˆ— , β†’ + , ← + ): goes down only – ( ↓ , ↑ ), ( ↓ , ∧ ): goes down and up many times π‘ž : checks Manager’s children many times PC At consistency checking step, we have to decide nondeterministically: Manager Guest ``Which Manager should we go to?’’ 17

  18. Outline β€’ Results on RW-DTDs [IHSF12] β€’ MRW-DTDs and their tractability results β€’ Conclusion 18

  19. MRW-DTDs β€’ RW-DTDs with the following restriction: – label 𝑏 is outside the scope of any * and + β‡’ label 𝑏 appears only once in the content model PC -> (Name | IP) Manager β‹… Guest* PC -> (Name | IP) Manager β‹… Manager β‹… Guest* PC -> (Name | IP) Manager + (Manager | Guest)* PC -> (Name | IP) Manager (Manager | Guest)* β€’ Each label appears β€œat most once” or β€œunboundedly many times” in a content model 19

  20. Satisfiability checking algorithm under MRW-DTDs 1. DTD transformation (MRW -> DC ?+# ) 2. Approximate satisfiability checking 3. Consistency checking β€’ Check if π‘ž is consistent with the non-cooccurrence of sibling labels specified by the original MRW-DTD – Maintain sibling information of all the nodes that may be revisited during the traverse by π‘ž always avoidable – MRW-DTDs: always revisited to be revisited β€’ Each label appears β€œat most once” or β€œunboundedly many times” in a content model 20

  21. Satisfiability check for ( ↓ , ↑ , β†’ + , ← + ) β€’ Always-revisited nodes: – nodes with labels outside the scope of any * and + – ancestor nodes of the current node β€’ due to ↑ β€’ XPath expressions are non-branching β€’ due to absence of ∧ sibling information 𝑠 XPath (( β†“βˆ· 𝑠 / β†’ + ∷ 𝑐 )/( β†“βˆ· 𝑏 / β†‘βˆ· 𝑐 ))/ β†’ + ∷ 𝑑 𝑠 β†’ 𝑠 βˆ— 𝑏 βˆ— 𝑐 𝑑 𝑠 βˆ— DTD 𝑐 β†’ 𝑏 21

  22. Satisfiability check for ( ↓ , ↑ , β†’ + , ← + ) β€’ Always-revisited nodes: – nodes with labels outside the scope of any * and + – ancestor nodes of the current node β€’ due to ↑ β€’ XPath expressions are non-branching β€’ due to absence of ∧ sibling information 𝑠 XPath (( β†“βˆ· 𝑠 / β†’ + ∷ 𝑐 )/( β†“βˆ· 𝑏 / β†‘βˆ· 𝑐 ))/ β†’ + ∷ 𝑑 𝑠 𝑠 β†’ 𝑠 βˆ— 𝑏 βˆ— 𝑐 𝑑 𝑠 βˆ— DTD 𝑐 β†’ 𝑏 22

Recommend


More recommend