Probabilistic XML Probabilistic XML Benny Kimelfeld & Yehoshua - - PowerPoint PPT Presentation

probabilistic xml probabilistic xml
SMART_READER_LITE
LIVE PREVIEW

Probabilistic XML Probabilistic XML Benny Kimelfeld & Yehoshua - - PowerPoint PPT Presentation

VLDB 2007 Vienna, Austria Matching Twigs in Matching Twigs Probabilistic XML Probabilistic XML Benny Kimelfeld & Yehoshua Sagiv


slide-1
SLIDE 1

Matching Twigs Matching Twigs in

Probabilistic XML Probabilistic XML

VLDB 2007 Benny Kimelfeld & Yehoshua Sagiv

Vienna, Austria

slide-2
SLIDE 2

Matching Twigs in Probabilistic XML Matching Twigs in Probabilistic XML

VLDB 2007 VLDB 2007

Example: Scanning Aerial Photography Example: Scanning Aerial Photography

Find regions that include a factory building and a road

… with a high probability

slide-3
SLIDE 3

Matching Twigs in Probabilistic XML Matching Twigs in Probabilistic XML

VLDB 2007 VLDB 2007

Analyzing a Region Analyzing a Region

road (90%) road (60%) factory bldg. & wall (40%) / house & road (30%) house (50%) / factory bldg. (50%) factory bldg. (40%) /

  • apt. building (50%)

match

(45%)

match

(36%)

match

(24%)

match

(36%)

What is the probability that this region is an answer (i.e., includes a factory building and a road)? But specifying the probability of each match does not answer the question! The probability of each match can be significantly smaller than the probability that there is any match

slide-4
SLIDE 4

Matching Twigs in Probabilistic XML Matching Twigs in Probabilistic XML

VLDB 2007 VLDB 2007

A Database Point of View A Database Point of View

Probabilistic Data

A prob. process for generating random data

* region road factory building

Query

Each answer has an amount of certainty:

The probability of being obtained when querying a random database Querying probabilistic data:

slide-5
SLIDE 5

Matching Twigs in Probabilistic XML Matching Twigs in Probabilistic XML

VLDB 2007 VLDB 2007

What Query Should We Pose? What Query Should We Pose?

* region road factory building

A pattern

  • An answer is a match
  • What is the probability of each

specific match?

  • What is the probability of each

pair of road & factory building?

* region road factory building

A pattern w/ projection

project

  • n region
  • An answer is a projection of
  • ne or more matches
  • What is the prob. of each

answer after the projection?

  • For each region, what is the
  • prob. that it has some pair of

road & factory building?

This is what we need! This is what we need!

slide-6
SLIDE 6

Matching Twigs in Probabilistic XML Matching Twigs in Probabilistic XML

VLDB 2007 VLDB 2007

Another Example Another Example

Find the following objects in one region:

A factory building, a road, an antenna, a heliport, a track

slide-7
SLIDE 7

Matching Twigs in Probabilistic XML Matching Twigs in Probabilistic XML

VLDB 2007 VLDB 2007

Finding a Partial Match Finding a Partial Match

factory bldg. w/ antennas (50%) /

  • apt. building w/ water tanks (30%)

road (90%) heliport (80%)

No Track!

For many applications, that’s good enough … partial match

(36%)

Find the following objects in one region:

A factory building, a road, an antenna, a heliport, a track

slide-8
SLIDE 8

Matching Twigs in Probabilistic XML Matching Twigs in Probabilistic XML

VLDB 2007 VLDB 2007

What If What If … …

factory bldg w/ antennas (50%) /

  • apt. building w/ water tanks (30%)

road (90%)

match

(7.2%)

track (20%)

The probability may be too low to be of any interest!

Should we just filter out the whole match?

Does not make sense!

What about the previous partial match?

heliport (80%)

slide-9
SLIDE 9

Matching Twigs in Probabilistic XML Matching Twigs in Probabilistic XML

VLDB 2007 VLDB 2007

Finding Maximal Matches Finding Maximal Matches

Probabilistic Data A pattern The goal is to find the maximal among the partial matches with a sufficient probability

slide-10
SLIDE 10

Matching Twigs in Probabilistic XML Matching Twigs in Probabilistic XML

VLDB 2007 VLDB 2007

Querying Prob. Data: Earlier Work Querying Prob. Data: Earlier Work

  • Projection

Projection and incomplete semantics incomplete semantics were explored for relational models

– – Projection: Projection: Very simple queries can be highly intractable (data complexity) [Dalvi & Suciu, VLDB 04] – – Maximally joining relations: Maximally joining relations: Tractable under data complexity, generally intractable under query-and- data complexity [Kimelfeld & Sagiv, PODS 07]

  • Yet tractable for important classes of schemas
  • None of these paradigms studied in the context
  • f prob. XML (only complete matches w/o projection)

But they are more relevant to prob. XML

since, as the paper shows, they become tractable

slide-11
SLIDE 11

Matching Twigs in Probabilistic XML Matching Twigs in Probabilistic XML

VLDB 2007 VLDB 2007

The Content of the Paper The Content of the Paper

Evaluating twig queries with projection Evaluating Boolean twig queries Finding maximal matches of twigs

Efficient algorithms and complexity analysis for various paradigms of querying

Query evaluation over probabilistic XML

In the paper, we explain in detail why our results do not follow from previous results on XML/relational models In the paper, we also have some preliminary results on the combination of maximal matches and projection

slide-12
SLIDE 12

Matching Twigs in Probabilistic XML Matching Twigs in Probabilistic XML

VLDB 2007 VLDB 2007

  • Introduction

Introduction

  • Twig Queries over Probabilistic XML

Twig Queries over Probabilistic XML

− XML and Twig Queries − Probabilistic XML − Querying Probabilistic XML (Complete Semantics)

  • Query Evaluation

Query Evaluation (Complete Semantics)

  • Finding Maximal Matches

Finding Maximal Matches

  • Conclusion, Related and Future Work

Conclusion, Related and Future Work

Talk Talk Overview Overview

slide-13
SLIDE 13

Matching Twigs in Probabilistic XML Matching Twigs in Probabilistic XML

VLDB 2007 VLDB 2007

(Ordinary) XML Documents (Ordinary) XML Documents

Each node has a tag, a value or both

Rooted tree

slide-14
SLIDE 14

Matching Twigs in Probabilistic XML Matching Twigs in Probabilistic XML

VLDB 2007 VLDB 2007

Twig Queries Twig Queries

* region @area heliport ≥10km2 factory park.lot

Node predicate over the tag and value Child edge Descendant edge Output node (projection)

Possibly, more than one

Rooted tree

slide-15
SLIDE 15

Matching Twigs in Probabilistic XML Matching Twigs in Probabilistic XML

VLDB 2007 VLDB 2007

Matches and Answers Matches and Answers

T d A match match of a twig T in a document d is a mapping from the nodes of T to those of d

root(T) → root(d)

  • desc. edge → path

node predicates are satisfied

* region @area heliport ≥10km2 factory park.lot

child edge → edge

An answer answer is obtained from a match by listing the images of the output nodes

That is, applying projection to the match

slide-16
SLIDE 16

Matching Twigs in Probabilistic XML Matching Twigs in Probabilistic XML

VLDB 2007 VLDB 2007

Boolean Queries Boolean Queries

* region @area heliport ≥10km2 factory park.lot

A twig without output nodes is a Boolean Boolean twig

The answer is either true or false B d

B(d) = true

means that there is a match of B in d

slide-17
SLIDE 17

Matching Twigs in Probabilistic XML Matching Twigs in Probabilistic XML

VLDB 2007 VLDB 2007

  • Introduction

Introduction

  • Twig Queries over Probabilistic XML

Twig Queries over Probabilistic XML

− XML and Twig Queries − Probabilistic XML − Querying Probabilistic XML (Complete Semantics)

  • Query Evaluation

Query Evaluation (Complete Semantics)

  • Finding Maximal Matches

Finding Maximal Matches

  • Conclusion, Related and Future Work

Conclusion, Related and Future Work

Talk Talk Overview Overview

slide-18
SLIDE 18

Matching Twigs in Probabilistic XML Matching Twigs in Probabilistic XML

VLDB 2007 VLDB 2007

Probabilistic XML Probabilistic XML

Probabilistic XML document

A probabilistic process

  • f generating ordinary

XML documents

Random Instance

An ordinary XML document d, generated with probability Pr(d) d

∑ Pr(d) = 1

d

slide-19
SLIDE 19

Matching Twigs in Probabilistic XML Matching Twigs in Probabilistic XML

VLDB 2007 VLDB 2007

Implicit Representations Implicit Representations

In practice, the probability space may be huge

E.g., uncertainty is many small pieces of data

We usually explore implicit representations It is unrealistic to represent the probabilistic document by explicitly specifying the entire space

Such as the following

  • ne that we consider:
slide-20
SLIDE 20

Matching Twigs in Probabilistic XML Matching Twigs in Probabilistic XML

VLDB 2007 VLDB 2007

Rooted tree

Mutually exclusive

A A ProTDB ProTDB Document Document [

[Nierman Nierman & & Jagadish Jagadish 02] 02]

0.8 0.4

track private

0.5 . 5

type vehicle neighborhood house m size s size house aerial-photo

0.75 . 8

building factory

. 8

park.lot heliport

0.4 . 3

region

  • 2 types of nodes
  • 2 types of distributions

Ordinary Ordinary nodes Distributional Distributional nodes Independent

slide-21
SLIDE 21

Matching Twigs in Probabilistic XML Matching Twigs in Probabilistic XML

VLDB 2007 VLDB 2007

A A ProTDB ProTDB Document Document [

[Nierman Nierman & & Jagadish Jagadish 02] 02]

0.8

track private

0.5 . 5

type vehicle neighborhood house m size s size house aerial-photo

0.75 . 8

building factory

. 8

park.lot heliport

0.4 . 3

region

A probability for each outgoing edge of a distributional node

0.4

slide-22
SLIDE 22

Matching Twigs in Probabilistic XML Matching Twigs in Probabilistic XML

VLDB 2007 VLDB 2007

Instance Generation: Step 1 Instance Generation: Step 1

0.8 0.4

track private

0.5 . 5

type vehicle neighborhood house m size s size house aerial-photo

0.75 . 8

building factory

. 8

park.lot heliport

0.4 . 3

region

Distributional nodes choose a set of children Traverse the tree top-down

Choose children independently

Drop unchosen children

Choose children independently Choose at most one child Choose at most one child

slide-23
SLIDE 23

Matching Twigs in Probabilistic XML Matching Twigs in Probabilistic XML

VLDB 2007 VLDB 2007

Instance Generation: Step 2 Instance Generation: Step 2

0.4

track

0.5

type vehicle neighborhood s size house aerial-photo

0.75

factory

. 8

heliport

. 3

region

Drop the distributional nodes

slide-24
SLIDE 24

Matching Twigs in Probabilistic XML Matching Twigs in Probabilistic XML

VLDB 2007 VLDB 2007

Instance Generation: Step 2 Instance Generation: Step 2

track type vehicle s size house aerial-photo factory heliport region neighborhood

Connect each

  • rdinary node to its

closest ancestor Drop the distributional nodes

slide-25
SLIDE 25

Matching Twigs in Probabilistic XML Matching Twigs in Probabilistic XML

VLDB 2007 VLDB 2007

The Result: An Ordinary Document The Result: An Ordinary Document

track type vehicle s size house aerial-photo factory heliport region neighborhood

slide-26
SLIDE 26

Matching Twigs in Probabilistic XML Matching Twigs in Probabilistic XML

VLDB 2007 VLDB 2007

  • Introduction

Introduction

  • Twig Queries over Probabilistic XML

Twig Queries over Probabilistic XML

− XML and Twig Queries − Probabilistic XML − Querying Probabilistic XML (Complete Semantics)

  • Query Evaluation

Query Evaluation (Complete Semantics)

  • Finding Maximal Matches

Finding Maximal Matches

  • Conclusion, Related and Future Work

Conclusion, Related and Future Work

Talk Talk Overview Overview

slide-27
SLIDE 27

Matching Twigs in Probabilistic XML Matching Twigs in Probabilistic XML

VLDB 2007 VLDB 2007

Querying Probabilistic XML Querying Probabilistic XML

Probabilistic XML document Probabilistic XML document Query Query Users pose an ordinary query

That is, of the type that is applied to non-probabilistic documents

* region road factory building

Twig w/ projection

… but the document is probabilistic

slide-28
SLIDE 28

Matching Twigs in Probabilistic XML Matching Twigs in Probabilistic XML

VLDB 2007 VLDB 2007

The Probability of an Answer The Probability of an Answer

When querying probabilistic data,

Each answer has a probability (certainty) Pr(A) = Pr( )

A is obtained by applying Q to a random document of P P

Q

A

Pr

slide-29
SLIDE 29

Matching Twigs in Probabilistic XML Matching Twigs in Probabilistic XML

VLDB 2007 VLDB 2007

The Prob. of Satisfying a Boolean Query The Prob. of Satisfying a Boolean Query

Pr( )

There is a match of B in a random document of P If B is a Boolean pattern, we have interest in: P

Q

Pr = true

When querying probabilistic data,

Each answer has a probability (certainty)

slide-30
SLIDE 30

Matching Twigs in Probabilistic XML Matching Twigs in Probabilistic XML

VLDB 2007 VLDB 2007

  • Introduction

Introduction

  • Twig Queries over Probabilistic XML

Twig Queries over Probabilistic XML

− XML and Twig Queries − Probabilistic XML − Querying Probabilistic XML (Complete Semantics)

  • Query Evaluation

Query Evaluation (Complete Semantics)

  • Finding Maximal Matches

Finding Maximal Matches

  • Conclusion, Related and Future Work

Conclusion, Related and Future Work

Talk Talk Overview Overview

slide-31
SLIDE 31

Matching Twigs in Probabilistic XML Matching Twigs in Probabilistic XML

VLDB 2007 VLDB 2007

Computational Problems Computational Problems

Find all answers A, s.t. Pr(A∈Q(P))≥ p Goal: Goal: A prob. document P, a non-Boolean twig query Q, a threshold p≥0 Input: Input: Compute Pr(B(P)=true) Goal: Goal: A prob. document P, a Boolean twig query B Input: Input:

Non-Boolean Queries: Boolean Queries:

slide-32
SLIDE 32

Matching Twigs in Probabilistic XML Matching Twigs in Probabilistic XML

VLDB 2007 VLDB 2007

From Regular to Boolean Queries From Regular to Boolean Queries

We apply a standard reduction from regular queries (that generate mappings) to Boolean ones:

1.

  • 1. Compute the answers as if the document is
  • rdinary (i.e., ignore the distributional nodes)

2.

  • 2. Compute the probability of each answer

Step 2 2 is done by evaluating a Boolean query

That is, computing the probability of a match Next, we consider the evaluation of Boolean queries

slide-33
SLIDE 33

Matching Twigs in Probabilistic XML Matching Twigs in Probabilistic XML

VLDB 2007 VLDB 2007

An Example An Example

  • a
  • b
  • c

a b

  • d
  • c
  • d

e

  • a

r e d

Q Q P P

slide-34
SLIDE 34

Matching Twigs in Probabilistic XML Matching Twigs in Probabilistic XML

VLDB 2007 VLDB 2007

Q Q P P

Possible Matches Possible Matches

  • a
  • b
  • c

a b

  • d
  • c
  • d

e

  • a

r e d

  • a
  • b
  • c

a b

  • d
  • c
  • d

e

  • a

r e d

  • a
  • b
  • c

a b

  • d
  • c
  • d

e

  • a

r e d

  • Matches are not disjoint

not disjoint events

  • Matches are not independent

not independent events

! !

slide-35
SLIDE 35

Matching Twigs in Probabilistic XML Matching Twigs in Probabilistic XML

VLDB 2007 VLDB 2007

Our Approach: Dynamic Programming Our Approach: Dynamic Programming

Document nodes are traversed bottom-up

  • a
  • b
  • c

a b

  • d
  • c
  • d

e

  • a

r e d 0.0 0.6 0.0 0.4 0.0 1.0 When visiting a node, evaluate a collection of queries (inc. the original

  • ne) over its subtree
slide-36
SLIDE 36

Matching Twigs in Probabilistic XML Matching Twigs in Probabilistic XML

VLDB 2007 VLDB 2007

Our Approach: Dynamic Programming Our Approach: Dynamic Programming

Document nodes are traversed bottom-up

  • a
  • b
  • c

a b

  • d
  • c
  • d

e

  • a

r e d When visiting a node, evaluate a collection of queries (inc. the original)

  • ver its subtree

Special treatment if the visited node is distributional

slide-37
SLIDE 37

Matching Twigs in Probabilistic XML Matching Twigs in Probabilistic XML

VLDB 2007 VLDB 2007

Bottom Bottom-

  • Up Evaluation

Up Evaluation

  • a
  • b
  • c

a b

  • d
  • c
  • d

e

  • a

r e d

  • Problem: Each specific match can

involve several different children How can we compute the probability that there is a match, based on previous results for the descendants?

slide-38
SLIDE 38

Matching Twigs in Probabilistic XML Matching Twigs in Probabilistic XML

VLDB 2007 VLDB 2007

From Twig to Negated Branches From Twig to Negated Branches

≡ ≡ ≡ ≡ ≡ ≡ ≡ ≡ ⋀ ⋀ ⋀ ⋀ ⋀ ⋀ ⋀ ⋀ ⋀ ⋀ ⋀ ⋀ ⋀ ⋀ ⋀ ⋀

Pr Pr

= = = = = = = = 1

1-

  • Pr

Pr

⋁ ⋁ ⋁ ⋁ ⋁ ⋁ ⋁ ⋁ ⋁ ⋁ ⋁ ⋁ ⋁ ⋁ ⋁ ⋁ ⌝ ⌝ ⌝ ⌝ ⌝ ⌝ ⌝ ⌝ ⌝ ⌝ ⌝ ⌝ ⌝ ⌝ ⌝ ⌝ ⌝ ⌝ ⌝ ⌝ ⌝ ⌝ ⌝ ⌝

Next: How to compute this value

slide-39
SLIDE 39

Matching Twigs in Probabilistic XML Matching Twigs in Probabilistic XML

VLDB 2007 VLDB 2007

  • From a Disjunction to Conjunctions

From a Disjunction to Conjunctions

Pr Pr

⋁ ⋁ ⋁ ⋁ ⋁ ⋁ ⋁ ⋁ ⋁ ⋁ ⋁ ⋁ ⋁ ⋁ ⋁ ⋁ ⌝ ⌝ ⌝ ⌝ ⌝ ⌝ ⌝ ⌝ ⌝ ⌝ ⌝ ⌝ ⌝ ⌝ ⌝ ⌝ ⌝ ⌝ ⌝ ⌝ ⌝ ⌝ ⌝ ⌝ ⌝ ⌝ ⌝ ⌝ ⌝ ⌝ ⌝ ⌝

Pr Pr

⌝ ⌝ ⌝ ⌝ ⌝ ⌝ ⌝ ⌝

+ + + + + + + + Pr

Pr

b c * *

⌝ ⌝ ⌝ ⌝ ⌝ ⌝ ⌝ ⌝

Pr Pr

+ + + + + + + +

⌝ ⌝ ⌝ ⌝ ⌝ ⌝ ⌝ ⌝ ⌝ ⌝ ⌝ ⌝ ⌝ ⌝ ⌝

Pr Pr

⋀ ⋀ ⋀ ⋀ ⋀ ⋀ ⋀ ⋀

… … … … … … … …

The principle of

inclusion & exclusion

Next: How to compute this value

⌝ ⌝ ⌝ ⌝ ⌝ ⌝ ⌝ ⌝

Pr Pr

⋀ ⋀ ⋀ ⋀ ⋀ ⋀ ⋀ ⋀

b c * *

⌝ ⌝ ⌝ ⌝ ⌝ ⌝ ⌝ ⌝

slide-40
SLIDE 40

Matching Twigs in Probabilistic XML Matching Twigs in Probabilistic XML

VLDB 2007 VLDB 2007

From a Document to Branches From a Document to Branches

Pr Pr

⋀ ⋀ ⋀ ⋀ ⋀ ⋀ ⋀ ⋀ ⌝ ⌝ ⌝ ⌝ ⌝ ⌝ ⌝ ⌝ ⌝ ⌝ ⌝ ⌝ ⌝ ⌝ ⌝ ⌝

r A document satisfies a conjunction of negated twig branches iff each of the

  • doc. branch satisfies

the conjunction

Good news: Document branches are independent!

slide-41
SLIDE 41

Matching Twigs in Probabilistic XML Matching Twigs in Probabilistic XML

VLDB 2007 VLDB 2007

Using Previous Computations on Children Using Previous Computations on Children

⌝ ⌝ ⌝ ⌝ ⌝ ⌝ ⌝ ⌝

Pr Pr

⋀ ⋀ ⋀ ⋀ ⋀ ⋀ ⋀ ⋀

b c * *

⌝ ⌝ ⌝ ⌝ ⌝ ⌝ ⌝ ⌝ ⌝ ⌝ ⌝ ⌝ ⌝ ⌝ ⌝ ⌝

Pr Pr

⋀ ⋀ ⋀ ⋀ ⋀ ⋀ ⋀ ⋀

b c * *

⌝ ⌝ ⌝ ⌝ ⌝ ⌝ ⌝ ⌝ ⌝ ⌝ ⌝ ⌝ ⌝ ⌝ ⌝ ⌝

Pr Pr

⋀ ⋀ ⋀ ⋀ ⋀ ⋀ ⋀ ⋀

b c * *

⌝ ⌝ ⌝ ⌝ ⌝ ⌝ ⌝ ⌝

x x x x

b c *

⌝ ⌝ ⌝ ⌝ ⌝ ⌝ ⌝ ⌝

Pr Pr

⋀ ⋀ ⋀ ⋀ ⋀ ⋀ ⋀ ⋀⌝

⌝ ⌝ ⌝ ⌝ ⌝ ⌝ ⌝

b c *

⌝ ⌝ ⌝ ⌝ ⌝ ⌝ ⌝ ⌝

Pr Pr

⋀ ⋀ ⋀ ⋀ ⋀ ⋀ ⋀ ⋀⌝

⌝ ⌝ ⌝ ⌝ ⌝ ⌝ ⌝

b c *

⌝ ⌝ ⌝ ⌝ ⌝ ⌝ ⌝ ⌝

Pr Pr

⋀ ⋀ ⋀ ⋀ ⋀ ⋀ ⋀ ⋀⌝

⌝ ⌝ ⌝ ⌝ ⌝ ⌝ ⌝

Cut the roots from both twig and doc. branches: x x x x

slide-42
SLIDE 42

Matching Twigs in Probabilistic XML Matching Twigs in Probabilistic XML

VLDB 2007 VLDB 2007

Descendant Edges Descendant Edges

  • In the computation we described, we assumed that the

root has only child edges; it would not work otherwise!

  • What about descendant edges?

≡ ≡ ≡ ≡ ≡ ≡ ≡ ≡

⌝ ⌝ ⌝ ⌝ ⌝ ⌝ ⌝ ⌝ ⋀ ⋀ ⋀ ⋀ ⋀ ⋀ ⋀ ⋀

The corresponding twig branches are replaced:

⌝ ⌝ ⌝ ⌝ ⌝ ⌝ ⌝ ⌝ ⌝ ⌝ ⌝ ⌝ ⌝ ⌝ ⌝ ⌝

slide-43
SLIDE 43

Matching Twigs in Probabilistic XML Matching Twigs in Probabilistic XML

VLDB 2007 VLDB 2007

Missing Details Missing Details

  • Creating the list of twigs that are evaluated over

the subtree rooted at each visited node

  • Different evaluation methods, depending on the

type of the visited node

– Ordinary node (sketched in the previous slides) – Distributional node

  • Independent distribution
  • Mutually-exclusive distribution
  • Dealing with node predicates of the twig

All the details of the algorithm are in the paper

slide-44
SLIDE 44

Matching Twigs in Probabilistic XML Matching Twigs in Probabilistic XML

VLDB 2007 VLDB 2007

Efficiency Efficiency

The algorithm computes Pr(B(P)=true) in time

(c|B|·|P|)

Is there an efficient algorithm under query-and-data complexity (polynomial in the query also)?

No! Computing Pr(B(P)=true) is #P-complete

under query & data complexity! Even if:

No desc. edges Only independent distributions

. . .

slide-45
SLIDE 45

Matching Twigs in Probabilistic XML Matching Twigs in Probabilistic XML

VLDB 2007 VLDB 2007

  • Introduction

Introduction

  • Twig Queries over Probabilistic XML

Twig Queries over Probabilistic XML

− XML and Twig Queries − Probabilistic XML − Querying Probabilistic XML (Complete Semantics)

  • Query Evaluation

Query Evaluation (Complete Semantics)

  • Finding Maximal Matches

Finding Maximal Matches

  • Conclusion, Related and Future Work

Conclusion, Related and Future Work

Talk Talk Overview Overview

slide-46
SLIDE 46

Matching Twigs in Probabilistic XML Matching Twigs in Probabilistic XML

VLDB 2007 VLDB 2007

T0

Standard Terminology Standard Terminology

T A match m0 of T0 is a

partial match of T T0: a subtree of

twig T, includes

the root

m2 subsumes m1 if m2 includes the mappings of m1

m1

  • a
  • b
  • c

e b

  • d
  • c
  • d

e

  • a

r f d

That is, m1=m2

  • ver domain(m1)

m2

slide-47
SLIDE 47

Matching Twigs in Probabilistic XML Matching Twigs in Probabilistic XML

VLDB 2007 VLDB 2007

Maximal Answer: Definition Maximal Answer: Definition

Ordinary Data:

m is a maximal answer:

∄ ∄ ∄ ∄ m0, such that m0 ≠ m and m0 subsumes m

Probabilistic Data:

  • Pr(m) ≥ threshold

∀ ∀ ∀ m0, if m0 ≠ m and m0 subsumes m, then

Pr(m0) < threshold

In other words, m is maximal among the partial answers with a sufficient probability

slide-48
SLIDE 48

Matching Twigs in Probabilistic XML Matching Twigs in Probabilistic XML

VLDB 2007 VLDB 2007

The Computational Problem The Computational Problem

Find all maximal matches of T in P w.r.t. p Goal: Goal: A probabilistic document P, a twig pattern T, a threshold p≥0 Input: Input:

slide-49
SLIDE 49

Matching Twigs in Probabilistic XML Matching Twigs in Probabilistic XML

VLDB 2007 VLDB 2007

Complexity of Finding Maximal Matches Complexity of Finding Maximal Matches

  • It is trivial to show that maximal matches can be found

efficiently under data complexity

  • Unlike the case of complete matches (NP-complete),

Maximal matches can be computed efficiently under query-and-data complexity Evaluation Algorithm Evaluation Algorithm

  • The algorithm runs with incremental polynomial time
  • All the details are in the paper …
slide-50
SLIDE 50

Matching Twigs in Probabilistic XML Matching Twigs in Probabilistic XML

VLDB 2007 VLDB 2007

  • Introduction

Introduction

  • Twig Queries over Probabilistic XML

Twig Queries over Probabilistic XML

− XML and Twig Queries − Probabilistic XML − Querying Probabilistic XML (Complete Semantics)

  • Query Evaluation

Query Evaluation (Complete Semantics)

  • Finding Maximal Matches

Finding Maximal Matches

  • Conclusion, Related and Future Work

Conclusion, Related and Future Work

Talk Talk Overview Overview

slide-51
SLIDE 51

Matching Twigs in Probabilistic XML Matching Twigs in Probabilistic XML

VLDB 2007 VLDB 2007

Paper Summary Paper Summary

  • Query evaluation over probabilistic XML is

investigated

– Known data model – Twig patterns (node predicates, child & desc. edges) – Complete & maximal semantics, projection

  • Evaluation algorithm for Boolean queries

– Also used for evaluating queries with projection – Efficient under data complexity

  • An algorithm for finding the maximal matches

– Efficient under query-and-data complexity

  • Analysis of the complexity of querying prob. XML
slide-52
SLIDE 52

Matching Twigs in Probabilistic XML Matching Twigs in Probabilistic XML

VLDB 2007 VLDB 2007

Complexity Results Complexity Results

  • Inc. Poly.

Poly. w/o projection Poly. Poly. Poly. Poly.

Data Complexity

Open #P-complete #P-complete NP-complete

Query & Data Complexity

w/ projection w/ projection Boolean w/o projection Complete Complete semantics semantics Maximal Maximal semantics semantics

slide-53
SLIDE 53

Matching Twigs in Probabilistic XML Matching Twigs in Probabilistic XML

VLDB 2007 VLDB 2007

Fuzzy trees [Abiteboul & Senellart, 2006] Query Evaluation: #P-Complete ProTDB [Nierman and Jagadish, 2002] Query Evaluation: Tractable

Other Models of Probabilistic XML Other Models of Probabilistic XML

PXML [Hung, Getoor & Subrahmanianm, 2003] Query Evaluation: Tree docs.: Tractable, DAG docs.: #P-hard Simple prob. trees [Abiteboul & Senellart, 2006] Query Evaluation: Tractable

Our model

Query evaluation: Complete semantics w/ projection

The complexity results in the different prob. XML models are a part of our ongoing research

slide-54
SLIDE 54

Matching Twigs in Probabilistic XML Matching Twigs in Probabilistic XML

VLDB 2007 VLDB 2007

Ongoing and Future Work Ongoing and Future Work

Implementing a system for representing and querying probabilistic XML Optimization of the proposed algorithms

– We already obtained significant improvements, both experimentally and analytically

Extending the expressiveness of the model of probabilistic XML

– New types of distributional nodes

– Ongoing work: A combination of ProTDB [Nierman and Jagadish, 2002] and PXML [Hung, Getoor & Subrahmanianm, 2003]

Combining incompleteness and projection

slide-55
SLIDE 55

Matching Twigs in Probabilistic XML Matching Twigs in Probabilistic XML

VLDB 2007 VLDB 2007

Thank you! Thank you!

Questions?