HANDLING UNCERTAINTY IN INFORMATION EXTRACTION Maurice van Keulen - - PowerPoint PPT Presentation

▶

Nov 25, 2022 159 likes •256 views

HANDLING UNCERTAINTY IN INFORMATION EXTRACTION Maurice van Keulen and Mena Badieh Habib URSW 23 Oct 2011 INFORMATION EXTRACTION Information Unstructured Web of Data extraction Text Inherently imperfect process Word Paris source:

SLIDE 1

HANDLING UNCERTAINTY IN INFORMATION EXTRACTION

Maurice van Keulen and Mena Badieh Habib URSW 23 Oct 2011

SLIDE 2

23 Oct 2011 Uncertainty Reasoning for the Semantic Web, Bonn, Germany 2

INFORMATION EXTRACTION

Unstructured Text Web of Data Information extraction Inherently imperfect process Word “Paris”

First name? City? …
City => over 60 cities “Paris”

Toponyms: 46% >2 refs

source: GeoNames

Goal: Technology to support the development of domain specific information extractors “We humans happily deal with doubt and misinterpretation every day; Why shouldn’t computers?”

SLIDE 3

Annotations are uncertain

Maintain alternatives + probabilities throughout process (incl. result)

Unconventional starting point

Not “no annotations”, but “no knowledge, hence anything is possible”

Developer interactively defines information extractor until “good enough”

Iterations: Add knowledge, apply to sample texts, evaluate result

Scalability for storage, querying, manipulation of annotations

From my own field (databases): Probabilistic databases?

23 Oct 2011 Uncertainty Reasoning for the Semantic Web, Bonn, Germany 3

SHERLOCK HOLMES-STYLE INFORMATION EXTRACTION

“when you have eliminated the impossible, whatever remains, however improbable, must be the truth” Information extraction is about gathering enough evidence to decide upon a certain combination of annotations among many possible ones Evidence comes from ML + developer (generic) + end user (instances)

SLIDE 4

23 Oct 2011 Uncertainty Reasoning for the Semantic Web, Bonn, Germany 4

SHERLOCK HOLMES-STYLE INFORMATION EXTRACTION

EXAMPLE: NAMED ENTITY RECOGNITION (NER) “when you have eliminated the impossible, whatever remains, however improbable, must be the truth”

Paris Hilton stayed in the Paris Hilton

a1 a2 a3 a4 a5 a6 a7 a8 a9 a10 a11 a12 a12 a13 a14 a15 a16 a17 a18 a19 a20 a21 a22 a23 a24 a25 a26 a27 a28

Person City Toponym dnc dnc isa inter-actively defined

SLIDE 5

|A|=O(klt) linear?!?

k: length of string l: maximum length phrases considered t: number of entity types

Here: 28 * 3 = 84 possible annotations
URSW call for papers

about 1300 words say 20 types say max length 6 (I saw one with 5) = roughly 1300 * 20 * 6 = roughly 156,000 possible annotations

23 Oct 2011 Uncertainty Reasoning for the Semantic Web, Bonn, Germany 5

SHERLOCK HOLMES-STYLE INFORMATION EXTRACTION

EXAMPLE: NAMED ENTITY RECOGNITION (NER)

Paris Hilton stayed in the Paris Hilton

a1 a2 a3 a4 a5 a6 a7 a8 a9 a10 a11 a12 a12 a13 a14 a15 a16 a17 a18 a19 a20 a21 a22 a23 a24 a25 a26 a27 a28

Although conceptual/theoretical, it doesn’t seem to be a severe challenge for a probabilistic database The problem is not in the amount of alternative annotations!

SLIDE 6

23 Oct 2011 Uncertainty Reasoning for the Semantic Web, Bonn, Germany 6

ADDING KNOWLEDGE = CONDITIONING

Paris Hilton stayed in the Paris Hilton

Person --- dnc --- City x1

2 (“Paris” is a City)

[a] x8

1 (“Paris Hilton” is a Person)

[b] become mutually exclusive

∅ a∧b a∧¬b b∧¬a

0.48 0.12 0.32 0.08

∅ a∧¬b b∧¬a

0.23 0.62 0.15

a and b independent P(a)=0.6 P(b)=0.8 a and b mutually exclusive (a∧b is not possible)

SLIDE 7

23 Oct 2011 Uncertainty Reasoning for the Semantic Web, Bonn, Germany 7

ADDING KNOWLEDGE CREATES DEPENDENCIES

NUMBER OF DEPS MAGNITUDES IN SIZE SMALLER THAN POSSIBLE COMBINATIONS Paris Hilton stayed Person City Person City dnc dnc dnc neq 8 +8 +15

SLIDE 8

I’m looking for a scalable approach to reason and redistribute probability mass considering all these dependencies to find the remaining possible interpretations and their probabilities

Feasibility approach hinges on efficient representation and

conditioning of probabilistic dependencies

Solution directions (in my own field):
Koch etal VLDB 2008 (Conditioning in MayBMS)
Getoor etal VLDB 2008 (Shared correlations)
This is not about only learning a joint probability distribution.

Here I’d like to estimate a joint probability distribution based on initial independent observations and then batch-by-batch add constraints/dependencies and recalculate

Techniques out there that fit this problem?

23 Oct 2011 Uncertainty Reasoning for the Semantic Web, Bonn, Germany 8

HANDLING UNCERTAINTY IN INFORMATION EXTRACTION Maurice van Keulen - - PowerPoint PPT Presentation

HANDLING UNCERTAINTY IN INFORMATION EXTRACTION

Maurice van Keulen and Mena Badieh Habib URSW 23 Oct 2011

INFORMATION EXTRACTION

Unstructured Text Web of Data Information extraction Inherently imperfect process Word “Paris”

Toponyms: 46% >2 refs

source: GeoNames

Goal: Technology to support the development of domain specific information extractors “We humans happily deal with doubt and misinterpretation every day; Why shouldn’t computers?”

Maintain alternatives + probabilities throughout process (incl. result)

Not “no annotations”, but “no knowledge, hence anything is possible”

Iterations: Add knowledge, apply to sample texts, evaluate result

From my own field (databases): Probabilistic databases?

SHERLOCK HOLMES-STYLE INFORMATION EXTRACTION

SHERLOCK HOLMES-STYLE INFORMATION EXTRACTION

EXAMPLE: NAMED ENTITY RECOGNITION (NER) “when you have eliminated the impossible, whatever remains, however improbable, must be the truth”

Paris Hilton stayed in the Paris Hilton

Person City Toponym dnc dnc isa inter-actively defined

k: length of string l: maximum length phrases considered t: number of entity types

about 1300 words say 20 types say max length 6 (I saw one with 5) = roughly 1300 * 20 * 6 = roughly 156,000 possible annotations

SHERLOCK HOLMES-STYLE INFORMATION EXTRACTION

EXAMPLE: NAMED ENTITY RECOGNITION (NER)

Paris Hilton stayed in the Paris Hilton

Although conceptual/theoretical, it doesn’t seem to be a severe challenge for a probabilistic database The problem is not in the amount of alternative annotations!

ADDING KNOWLEDGE = CONDITIONING

Paris Hilton stayed in the Paris Hilton

Person --- dnc --- City x1

2 (“Paris” is a City)

[a] x8

1 (“Paris Hilton” is a Person)

[b] become mutually exclusive

∅ a∧b a∧¬b b∧¬a

∅ a∧¬b b∧¬a

a and b independent P(a)=0.6 P(b)=0.8 a and b mutually exclusive (a∧b is not possible)

ADDING KNOWLEDGE CREATES DEPENDENCIES

NUMBER OF DEPS MAGNITUDES IN SIZE SMALLER THAN POSSIBLE COMBINATIONS Paris Hilton stayed Person City Person City dnc dnc dnc neq 8 +8 +15

I’m looking for a scalable approach to reason and redistribute probability mass considering all these dependencies to find the remaining possible interpretations and their probabilities

conditioning of probabilistic dependencies

Here I’d like to estimate a joint probability distribution based on initial independent observations and then batch-by-batch add constraints/dependencies and recalculate

PROBLEM AND SOLUTION DIRECTIONS

Questions / Suggestions?