Feature Extraction with Description Logics Functional Subsumption Rodrigo de Salvo Braz Dan Roth University of Illinois at Urbana-Champaign
A conflict ? Most machine learning algorithms use feature vectors as inputs. ? Most data is best represented as structured data. ? Feature extraction is the conversion from one to the other (and may be most of the work).
Structured data – I Mohammed Atta met with an Iraqi intelligence agent in Prague in April 2001. meeting participant participant person person name(“Mohammed Atta”) gender(male) location time nationality affiliation location country organization country city date name(Iraq) name(“Czech Republic”) month(April) name(Prague) year(2001) begin end Attributes (node labels) Roles (edge labels) before before before ... before ... after after after after word(an) word(Iraqi) word(intelligence) tag(DT) tag(JJ) tag(NN)
Structured data – II
Feature Extraction male name(margot) name(john) spouse friend spouse child child child student child FE 0 1 1 0 feature vector name(peter) name(mary) name(jill) name(jenny) A male female female female age(40) tall Structured example male male spouse child child child name(jenny) female friend female male female Human-written feature types
Feature Extraction ? Typically done in ad hoc fashion: ? Prevents general analysis; ? Prevents Feature Extraction/Learning unified analysis (e.g. kernels). ? Using a language is tricky ? Type of inference. ? May be intractable if not careful.
A language for declaring which features to generate Feature type Example segment specifications by directed trees meeting person meeting participant participant time location time nationality nationality affiliation country year(2001) country organization city date name(Iraq) name(Iraq) month(April) name(Prague) year(2001)
Generating feature vectors male name(margot) name(john) spouse friend Example spouse child child child student child name(peter) name(mary) name(jill) A name(jenny) male female female female age(40) tall spouse Feature types child child child child child child male tall male female tall female ? ? ? ?
Generating feature vectors male name(margot) name(john) spouse friend 2 1 Example spouse child child child student child 3 name(peter) name(mary) name(jill) A name(jenny) male female female female age(40) tall spouse 1 2 Feature types child 3 1
Generating feature vectors male name(margot) name(john) spouse friend 1 2 Example spouse child child child student child 3 name(peter) name(mary) name(jill) A name(jenny) male female female female age(40) tall spouse 1 2 Feature types child 3 1
Generating feature vectors male name(margot) name(john) spouse friend 2 1 Example spouse child child child student child 3 name(peter) name(mary) name(jill) A name(jenny) male female female female age(40) tall spouse 1 2 Feature types child 3 1
Generating feature vectors male name(margot) name(john) spouse friend Example spouse child child child student child name(peter) name(mary) name(jill) A name(jenny) male female female female age(40) tall Feature types Nothing like this child in the example! male tall 0
Generating feature vectors male name(margot) name(john) spouse friend 1 Example spouse child child child student child 3 2 name(peter) name(mary) name(jill) A name(jenny) male female female female age(40) tall 1 Feature types child child 3 2 male female 1
Generating feature vectors male name(margot) name(john) spouse friend 1 Example spouse child child child student child 2, 3 name(peter) name(mary) name(jill) A name(jenny) male female female female age(40) tall 1 Feature types child child 3 2 tall female 1
Generating feature vectors male name(margot) name(john) spouse friend 1 Example spouse child child child student child 2, 3 name(peter) name(mary) name(jill) A name(jenny) male female female female age(40) tall 1 Feature types child child 3 2 tall female 1
Generating feature vectors male name(margot) name(john) spouse friend Example spouse child child child student child name(peter) name(mary) name(jill) A name(jenny) male female female female age(40) tall spouse Feature types child child child child child child male tall male female tall female 1 0 1 1
Feature Description Logics spouse (AND (SOME spouse ANY) child (SOME child (AND male tall))) male tall spouse (SOME spouse (SOME friend female)) friend female
Subsumption ? A description C subsumes ( ⊇) a description D if every individual in D must be in C , no matter the interpretation. ? Subsumption is tractable. C = (AND C (SOME spouse ANY) (SOME child male)) D D = (AND (SOME spouse (SOME student ANY)) (SOME child (AND tall male)) (SOME child female))
Feature extraction as subsumption child (SOME child female) ⊇ female Feature type Example (AND name(carol) SOME friend friend (AND name(carol) child child SOME child (AND name(kelly) female)) SOME child name(john)) name(kelly) name(john) Description of node female
Feature extraction as subsumption child (SOME child female) ⊇ female Feature type Example name(carol) friend name(john) child child Description of node name(kelly) name(john) female
Feature extraction as subsumption child (SOME child female) ⊇ female Feature type Example name(carol) friend (AND name(kelly) female) child child Description of node name(kelly) name(john) female
Feature extraction as subsumption child (SOME child female) ⊇ female active Feature type feature! Example name(carol) (AND friend name(carol) SOME child (AND name(kelly) female)) child child Description of node name(kelly) name(john) female
A problem in practice buy purchase ⊇ object subject subject object dentist car dentist car name(patricia) model(accord) Subsumption would be natural in this case but does not occur
A problem in practice kill kill ⊇ object subject subject object name(JFK) name(castro) name(kennedy)
A problem in practice name(schwarzenegger) name(schwarzneger) ⊇ job job job governor actor
Make comparison more flexible ? At core of subsumption algorithm is the comparison of attributes: ... if (attr1 == attr2) ... ? We simply make that a function call: ... if (f (attr1, attr2) == 1) ...
Is this just a hack? What about the nice DL semantics?
Is this just a hack? What about the nice DL semantics? In fact, equivalent to “shallow OR” ( tractable).
Is this just a hack? What about the nice DL semantics? In fact, equivalent to “shallow OR” ( tractable). Replace any attr by (OR a 1 a 2 ... a n ) where f(attr, a i ) = 1. (AND kill (SOME object JFK)) (AND (OR kill murder assassinate) (SOME object (OR JFK kennedy “John F. Kennedy” ...)))
Why not just use shallow OR then? ? Function is an implicit representation. ? We may incorporate procedural knowledge: ? Typos; ? Similar sounding words; ? Context-sensitive knowledge.
Take home message ? Feature Description Logics provides an expressive way to deal with structured examples. ? Syntax choices render it tractable. ? Allows for FE-learning integrated approaches like kernels (Cumby & Roth 2003). ? Can be made even more expressive with little extra cost by functional subsumption.
The End
Recommend
More recommend