convex relaxations for weakly supervised information
play

Convex relaxations for weakly supervised information extraction - PowerPoint PPT Presentation

Convex relaxations for weakly supervised information extraction Edouard Grave Columbia University edouard.grave@gmail.com Information Extraction Extract structured information from unstructured documents. Information Extraction Extract


  1. Convex relaxations for weakly supervised information extraction ´ Edouard Grave Columbia University edouard.grave@gmail.com

  2. Information Extraction Extract structured information from unstructured documents.

  3. Information Extraction Extract structured information from unstructured documents.

  4. Example: named entity recognition Detect and classify mentions of named entities in text. The seven-month re-examination of why U.S. forces were caught off-guard by the Japanese attack was done at the request of Sen. Strom Thurmond, R-S.C., chairman of the Senate Armed Services Committee, and members of the Kimmel family.

  5. Example: named entity recognition Detect and classify mentions of named entities in text. The seven-month re-examination of why [U.S.] LOC forces were caught off-guard by the Japanese attack was done at the request of Sen. [Strom Thurmond] PER , R-[S.C.] LOC , chairman of the [Senate Armed Services Committee] ORG , and members of the [Kimmel] PER family. Traditionally, detect mentions of • people ( PER ), • locations ( LOC ), • organizations ( ORG ).

  6. Example: named entity recognition Detect and classify mentions of named entities in text. The seven-month re-examination of why [U.S.] LOC forces were caught off-guard by the Japanese attack was done at the request of Sen. [Strom Thurmond] PER , R-[S.C.] LOC , chairman of the [Senate Armed Services Committee] ORG , and members of the [Kimmel] PER family. Named entities can also be: • genes, cells, proteins, etc. • books, movies, games, etc. • laptops, phones, camera, etc.

  7. Example: entity linking Link an entity mention (e.g. Michael Jordan ) to a knowledge base

  8. Example: relation extraction Extract binary relations between named entities from text During World War II, Turing worked for the Government Code and Cypher School (GC&CS) at Bletchley Park.

  9. Example: relation extraction Extract binary relations between named entities from text During World War II, Turing worked for the Government Code and Cypher School (CG&CS) at Bletchley Park.

  10. Example: relation extraction Extract binary relations between named entities from text During World War II, Turing worked for the Government Code and Cypher School (CG&CS) at Bletchley Park. Employee(Alan Turing, CG&CS) Contains(Bletchley Park, CG&CS)

  11. Challenges of information extraction Most state-of-the-art methods: supervised machine learning.

  12. Challenges of information extraction Most state-of-the-art methods: supervised machine learning. • Needs (a lot of) labeled data: • expensive to obtain (need expertise), • thousands different kinds of entities / relations, • ressources for English. But French? Spanish? Russian?

  13. Challenges of information extraction Most state-of-the-art methods: supervised machine learning. • Needs (a lot of) labeled data: • expensive to obtain (need expertise), • thousands different kinds of entities / relations, • ressources for English. But French? Spanish? Russian? • Not robust to domain shift: Our distribution agreement with [Henry Schein] PER renews annu- ally unless terminated by either party.

  14. I. Relation extraction

  15. Distant supervision for relation extraction Craven and Kumlien (1999); Mintz et al. (2009) Knowledge base r e 1 e 2 BornIn Lichtenstein New York City DiedIn Lichtenstein New York City Sentences Roy Lichtenstein was born in New York City, into an upper-middle-class family. In 1961, Leo Castelli started displaying Lichten- stein’s work at his gallery in New York. Roy Lichtenstein died of pneumonia in 1997 in New York City.

  16. Distant supervision for relation extraction Craven and Kumlien (1999); Mintz et al. (2009) Knowledge base r e 1 e 2 BornIn Lichtenstein New York City DiedIn Lichtenstein New York City Sentences Roy Lichtenstein was born in New York City, into an upper-middle-class family. In 1961, Leo Castelli started displaying Lichten- stein’s work at his gallery in New York. Roy Lichtenstein died of pneumonia in 1997 in New York City.

  17. Distant supervision for relation extraction Craven and Kumlien (1999); Mintz et al. (2009) Knowledge base r e 1 e 2 BornIn Lichtenstein New York City DiedIn Lichtenstein New York City Sentences Latent label Roy Lichtenstein was born in New York City, into BornIn an upper-middle-class family. In 1961, Leo Castelli started displaying Lichten- None stein’s work at his gallery in New York. Roy Lichtenstein died of pneumonia in 1997 in New DiedIn York City.

  18. Multiple instance, multiple label learning Bunescu and Mooney (2007); Riedel et al. (2010); Hoffmann et al. (2011); Surdeanu et al. (2012) Roy Lichtenstein was born in New BornIn York City. (Lichtenstein, New York City) Lichtenstein left New York to study DiedIn in Ohio.

  19. Multiple instance, multiple label learning Bunescu and Mooney (2007); Riedel et al. (2010); Hoffmann et al. (2011); Surdeanu et al. (2012) Roy Lichtenstein was born in New BornIn E in R ik York City. (Lichtenstein, New York City) Lichtenstein left New York to study DiedIn in Ohio. N pair mentions represented by vec- I pairs of entities p i K relations tors x n E in = 1 if pair mention n corresponds to entity pair i R ik = 1 if entity pair i verifies relation k

  20. Overview Two steps procedure: 1 infer labels for each pair mention; 2 train supervised instance level relation extractor. Goal: infer a binary matrix Y such that: • Y nk = 1 if pair mention n express relation k ; • Y nk = 0 otherwise. Approach based on discriminative clustering.

  21. (a) Discriminative clustering

  22. Discriminative clustering Xu et al. (2004); Bach and Harchaoui (2007)

  23. Discriminative clustering Xu et al. (2004); Bach and Harchaoui (2007)

  24. Discriminative clustering Xu et al. (2004); Bach and Harchaoui (2007)

  25. Discriminative clustering Xu et al. (2004); Bach and Harchaoui (2007) Given a loss function ℓ and a regularizer Ω: N � min min ℓ ( y n , f ( x n )) + Ω( f ) , Y f n =1 s.t. Y ∈ Y

  26. (b) Weak supervision by constraining Y

  27. Weak supervision by constraining Y Each pair mention express exactly one relation:

  28. Weak supervision by constraining Y Each pair mention express exactly one relation: K +1 � ∀ n ∈ { 1 , ..., N } , Y nk = 1 . k =1

  29. Weak supervision by constraining Y If entity pair i verifies relation k , then at least one pair mention n corresponding to the pair i express that relation:

  30. Weak supervision by constraining Y If entity pair i verifies relation k , then at least one pair mention n corresponding to the pair i express that relation: � ∀ ( i , k ) such that R ik = 1 , Y nk ≥ 1 . n : E in =1 E in = 1 if pair mention n corresponds to entity pair i

  31. Weak supervision by constraining Y If entity pair i verifies relation k , then at least one pair mention n corresponding to the pair i express that relation: N � ∀ ( i , k ) such that R ik = 1 , E in Y nk ≥ 1 . n =1 E in = 1 if pair mention n corresponds to entity pair i

  32. Weak supervision by constraining Y If entity pair i does not verify relation k , then no pair mention n corresponding to pair i express that relation:

  33. Weak supervision by constraining Y If entity pair i does not verify relation k , then no pair mention n corresponding to pair i express that relation: � ∀ ( i , k ) such that R ik = 0 , Y nk = 0 . n : E in =1 E in = 1 if pair mention n corresponds to entity pair i

  34. Weak supervision by constraining Y If entity pair i does not verify relation k , then no pair mention n corresponding to pair i express that relation: N � ∀ ( i , k ) such that R ik = 0 , E in Y nk = 0 . n =1 E in = 1 if pair mention n corresponds to entity pair i

  35. Weak supervision by constraining Y For a given entity pair i , at most c percent of pair mentions classified as none :

  36. Weak supervision by constraining Y For a given entity pair i , at most c percent of pair mentions classified as none : N N � � ∀ i ∈ { 1 , ..., I } , E in Y n ( K +1) ≤ c E in , n =1 n =1

  37. Weak supervision by constraining Y These constraints are equivalent to: Y1 = 1 , ( EY ) ◦ S ≥ R .

  38. (c) Problem formulation

  39. Problem formulation Using linear classifiers W ∈ R D × ( K +1) and the squared loss: 1 F + λ 2 � Y − XW � 2 2 � W � 2 min F , Y , W s.t. Y ∈ { 0 , 1 } N × ( K +1) Y1 = 1 , ( EY ) ◦ S ≥ R .

  40. Problem formulation Using linear classifiers W ∈ R D × ( K +1) and the squared loss: 1 F + λ 2 � Y − XW � 2 2 � W � 2 min F , Y , W s.t. Y ∈ { 0 , 1 } N × ( K +1) Y1 = 1 , ( EY ) ◦ S ≥ R . Closed form solution for W : W = ( X ⊤ X + λ I D ) − 1 X ⊤ Y .

  41. Problem formulation Replacing W by its optimal value: 1 � Y ⊤ ( XX ⊤ + λ I N ) − 1 Y � min 2tr , Y s.t. Y ∈ { 0 , 1 } N × ( K +1) Y1 = 1 , ( EY ) ◦ S ≥ R .

  42. Problem formulation Replacing W by its optimal value: 1 � Y ⊤ ( XX ⊤ + λ I N ) − 1 Y � min 2tr , Y s.t. Y ∈ { 0 , 1 } N × ( K +1) Y1 = 1 , ( EY ) ◦ S ≥ R . This is a quadratic integer program. Hard to solve in general.

  43. Convex relaxation Relaxing the constraints Y ∈ { 0 , 1 } N × ( K +1) into Y ∈ [0 , 1] N × ( K +1) : 1 � Y ⊤ ( XX ⊤ + λ I N ) − 1 Y � min 2tr , Y s.t. Y ∈ [0 , 1] N × ( K +1) Y1 = 1 , ( EY ) ◦ S ≥ R . This is a convex quadratic program.

Recommend


More recommend