Convex relaxations for weakly supervised information extraction - PowerPoint PPT Presentation

Convex relaxations for weakly supervised information extraction ´ Edouard Grave Columbia University edouard.grave@gmail.com

Information Extraction Extract structured information from unstructured documents.

Example: named entity recognition Detect and classify mentions of named entities in text. The seven-month re-examination of why U.S. forces were caught off-guard by the Japanese attack was done at the request of Sen. Strom Thurmond, R-S.C., chairman of the Senate Armed Services Committee, and members of the Kimmel family.

Example: named entity recognition Detect and classify mentions of named entities in text. The seven-month re-examination of why [U.S.] LOC forces were caught off-guard by the Japanese attack was done at the request of Sen. [Strom Thurmond] PER , R-[S.C.] LOC , chairman of the [Senate Armed Services Committee] ORG , and members of the [Kimmel] PER family. Traditionally, detect mentions of • people ( PER ), • locations ( LOC ), • organizations ( ORG ).

Example: named entity recognition Detect and classify mentions of named entities in text. The seven-month re-examination of why [U.S.] LOC forces were caught off-guard by the Japanese attack was done at the request of Sen. [Strom Thurmond] PER , R-[S.C.] LOC , chairman of the [Senate Armed Services Committee] ORG , and members of the [Kimmel] PER family. Named entities can also be: • genes, cells, proteins, etc. • books, movies, games, etc. • laptops, phones, camera, etc.

Example: entity linking Link an entity mention (e.g. Michael Jordan ) to a knowledge base

Example: relation extraction Extract binary relations between named entities from text During World War II, Turing worked for the Government Code and Cypher School (GC&CS) at Bletchley Park.

Example: relation extraction Extract binary relations between named entities from text During World War II, Turing worked for the Government Code and Cypher School (CG&CS) at Bletchley Park.

Example: relation extraction Extract binary relations between named entities from text During World War II, Turing worked for the Government Code and Cypher School (CG&CS) at Bletchley Park. Employee(Alan Turing, CG&CS) Contains(Bletchley Park, CG&CS)

Challenges of information extraction Most state-of-the-art methods: supervised machine learning.

Challenges of information extraction Most state-of-the-art methods: supervised machine learning. • Needs (a lot of) labeled data: • expensive to obtain (need expertise), • thousands different kinds of entities / relations, • ressources for English. But French? Spanish? Russian?

Challenges of information extraction Most state-of-the-art methods: supervised machine learning. • Needs (a lot of) labeled data: • expensive to obtain (need expertise), • thousands different kinds of entities / relations, • ressources for English. But French? Spanish? Russian? • Not robust to domain shift: Our distribution agreement with [Henry Schein] PER renews annu- ally unless terminated by either party.

I. Relation extraction

Distant supervision for relation extraction Craven and Kumlien (1999); Mintz et al. (2009) Knowledge base r e 1 e 2 BornIn Lichtenstein New York City DiedIn Lichtenstein New York City Sentences Roy Lichtenstein was born in New York City, into an upper-middle-class family. In 1961, Leo Castelli started displaying Lichten- stein’s work at his gallery in New York. Roy Lichtenstein died of pneumonia in 1997 in New York City.

Distant supervision for relation extraction Craven and Kumlien (1999); Mintz et al. (2009) Knowledge base r e 1 e 2 BornIn Lichtenstein New York City DiedIn Lichtenstein New York City Sentences Latent label Roy Lichtenstein was born in New York City, into BornIn an upper-middle-class family. In 1961, Leo Castelli started displaying Lichten- None stein’s work at his gallery in New York. Roy Lichtenstein died of pneumonia in 1997 in New DiedIn York City.

Multiple instance, multiple label learning Bunescu and Mooney (2007); Riedel et al. (2010); Hoffmann et al. (2011); Surdeanu et al. (2012) Roy Lichtenstein was born in New BornIn York City. (Lichtenstein, New York City) Lichtenstein left New York to study DiedIn in Ohio.

Multiple instance, multiple label learning Bunescu and Mooney (2007); Riedel et al. (2010); Hoffmann et al. (2011); Surdeanu et al. (2012) Roy Lichtenstein was born in New BornIn E in R ik York City. (Lichtenstein, New York City) Lichtenstein left New York to study DiedIn in Ohio. N pair mentions represented by vec- I pairs of entities p i K relations tors x n E in = 1 if pair mention n corresponds to entity pair i R ik = 1 if entity pair i verifies relation k

Overview Two steps procedure: 1 infer labels for each pair mention; 2 train supervised instance level relation extractor. Goal: infer a binary matrix Y such that: • Y nk = 1 if pair mention n express relation k ; • Y nk = 0 otherwise. Approach based on discriminative clustering.

(a) Discriminative clustering

Discriminative clustering Xu et al. (2004); Bach and Harchaoui (2007)

Discriminative clustering Xu et al. (2004); Bach and Harchaoui (2007) Given a loss function ℓ and a regularizer Ω: N � min min ℓ ( y n , f ( x n )) + Ω( f ) , Y f n =1 s.t. Y ∈ Y

(b) Weak supervision by constraining Y

Weak supervision by constraining Y Each pair mention express exactly one relation:

Weak supervision by constraining Y Each pair mention express exactly one relation: K +1 � ∀ n ∈ { 1 , ..., N } , Y nk = 1 . k =1

Weak supervision by constraining Y If entity pair i verifies relation k , then at least one pair mention n corresponding to the pair i express that relation:

Weak supervision by constraining Y If entity pair i verifies relation k , then at least one pair mention n corresponding to the pair i express that relation: � ∀ ( i , k ) such that R ik = 1 , Y nk ≥ 1 . n : E in =1 E in = 1 if pair mention n corresponds to entity pair i

Weak supervision by constraining Y If entity pair i verifies relation k , then at least one pair mention n corresponding to the pair i express that relation: N � ∀ ( i , k ) such that R ik = 1 , E in Y nk ≥ 1 . n =1 E in = 1 if pair mention n corresponds to entity pair i

Weak supervision by constraining Y If entity pair i does not verify relation k , then no pair mention n corresponding to pair i express that relation:

Weak supervision by constraining Y If entity pair i does not verify relation k , then no pair mention n corresponding to pair i express that relation: � ∀ ( i , k ) such that R ik = 0 , Y nk = 0 . n : E in =1 E in = 1 if pair mention n corresponds to entity pair i

Weak supervision by constraining Y If entity pair i does not verify relation k , then no pair mention n corresponding to pair i express that relation: N � ∀ ( i , k ) such that R ik = 0 , E in Y nk = 0 . n =1 E in = 1 if pair mention n corresponds to entity pair i

Weak supervision by constraining Y For a given entity pair i , at most c percent of pair mentions classified as none :

Weak supervision by constraining Y For a given entity pair i , at most c percent of pair mentions classified as none : N N � � ∀ i ∈ { 1 , ..., I } , E in Y n ( K +1) ≤ c E in , n =1 n =1

Weak supervision by constraining Y These constraints are equivalent to: Y1 = 1 , ( EY ) ◦ S ≥ R .

(c) Problem formulation

Problem formulation Using linear classifiers W ∈ R D × ( K +1) and the squared loss: 1 F + λ 2 � Y − XW � 2 2 � W � 2 min F , Y , W s.t. Y ∈ { 0 , 1 } N × ( K +1) Y1 = 1 , ( EY ) ◦ S ≥ R .

Problem formulation Using linear classifiers W ∈ R D × ( K +1) and the squared loss: 1 F + λ 2 � Y − XW � 2 2 � W � 2 min F , Y , W s.t. Y ∈ { 0 , 1 } N × ( K +1) Y1 = 1 , ( EY ) ◦ S ≥ R . Closed form solution for W : W = ( X ⊤ X + λ I D ) − 1 X ⊤ Y .

Problem formulation Replacing W by its optimal value: 1 � Y ⊤ ( XX ⊤ + λ I N ) − 1 Y � min 2tr , Y s.t. Y ∈ { 0 , 1 } N × ( K +1) Y1 = 1 , ( EY ) ◦ S ≥ R .

Problem formulation Replacing W by its optimal value: 1 � Y ⊤ ( XX ⊤ + λ I N ) − 1 Y � min 2tr , Y s.t. Y ∈ { 0 , 1 } N × ( K +1) Y1 = 1 , ( EY ) ◦ S ≥ R . This is a quadratic integer program. Hard to solve in general.

Convex relaxation Relaxing the constraints Y ∈ { 0 , 1 } N × ( K +1) into Y ∈ [0 , 1] N × ( K +1) : 1 � Y ⊤ ( XX ⊤ + λ I N ) − 1 Y � min 2tr , Y s.t. Y ∈ [0 , 1] N × ( K +1) Y1 = 1 , ( EY ) ◦ S ≥ R . This is a convex quadratic program.

Convex relaxations for weakly supervised information extraction - PowerPoint PPT Presentation

Convex relaxations for weakly supervised information extraction Edouard Grave Columbia University edouard.grave@gmail.com Information Extraction Extract structured information from unstructured documents. Information Extraction Extract

Convex Hell 362 dnc CS 16: Convex Hull Whoops, I mean... Convex Hull Whats a Convex Hull?

free 18-May-17 Towards Weakly Supervised Image Understanding 1/50 Towards Weakly Supervised

Weakly Supervised Classification Weakly Supervised Classification and Robust Learning and Robust

Introduction to LP and SDP Hierarchies Madhur Tulsiani Princeton University Convex Relaxations

Convex hull 1 - 1 Convex hull 1 - 2 Convex hull 1 - 3 Convex hull Definition, extremal

CS133 Computational Geometry Convex Hull 1 Convex Hull Given a set of n points, find the

constrained convex optimization virgil pavlu 1 convex set a set X in a vector space is convex if

A convex relaxation for weakly supervised classifiers Armand Joulin and Francis Bach SIERRA

CS675: Convex and Combinatorial Optimization Spring 2018 Convex Sets Instructor: Shaddin Dughmi

Convex hull: basic facts Convex hull: basic facts CG Lecture 1 CG Lecture 1 Problem : give a set

Convex hulls of spheres and convex hulls of convex polytopes lying on parallel hyperplanes

CS675: Convex and Combinatorial Optimization Fall 2019 Convex Functions Instructor: Shaddin

Convex Analysis Jos e De Don a September 2004 Centre of Complex Dynamic Systems and

CS675: Convex and Combinatorial Optimization Fall 2019 Convex Sets Instructor: Shaddin Dughmi

CS133 Computational Geometry Convex Hull 4/12/2018 1 Convex Hull Given a set of n points,

CS675: Convex and Combinatorial Optimization Fall 2014 Convex Functions Instructor: Shaddin

Ap Applications of the Quantum st st-Co Connectivity y Algori rithm June 3, 2019 University

Back to School Night AE Wright Middle School Thank You Board Members! LVUSD Cabinet Guests

Computa(on through dynamics Using recurrent neural networks to unveil mechanism in neural

Learning to optimize multigrid PDE solvers DANIEL GREENFELD, WEIZMANN INSTITUTE OF SCIENCE JOINT

Cosmological Hydrodynamic Simulations Andrew Wetzel SUMMARY OF THIS TALK Cosmological

James Bullock @jbprime Universe of Galaxies (~ 10 -8 of observable part) Milky Way & Local

Wor kplac e We llbe ing Mid- Atlantic Chapte r Our te a m c o nve ne d a se rie s o f virtua

Hardness of exact distance queries in sparse graphs through hub labeling Adrian Kosowski,

Convex relaxations for weakly supervised information extraction - PowerPoint PPT Presentation

Convex relaxations for weakly supervised information extraction Edouard Grave Columbia University edouard.grave@gmail.com Information Extraction Extract structured information from unstructured documents. Information Extraction Extract

Convex Hell 362 dnc CS 16: Convex Hull Whoops, I mean... Convex Hull Whats a Convex Hull?

free 18-May-17 Towards Weakly Supervised Image Understanding 1/50 Towards Weakly Supervised

Weakly Supervised Classification Weakly Supervised Classification and Robust Learning and Robust

Introduction to LP and SDP Hierarchies Madhur Tulsiani Princeton University Convex Relaxations

Convex hull 1 - 1 Convex hull 1 - 2 Convex hull 1 - 3 Convex hull Definition, extremal

CS133 Computational Geometry Convex Hull 1 Convex Hull Given a set of n points, find the

constrained convex optimization virgil pavlu 1 convex set a set X in a vector space is convex if

A convex relaxation for weakly supervised classifiers Armand Joulin and Francis Bach SIERRA

CS675: Convex and Combinatorial Optimization Spring 2018 Convex Sets Instructor: Shaddin Dughmi

Convex hull: basic facts Convex hull: basic facts CG Lecture 1 CG Lecture 1 Problem : give a set

Convex hulls of spheres and convex hulls of convex polytopes lying on parallel hyperplanes

CS675: Convex and Combinatorial Optimization Fall 2019 Convex Functions Instructor: Shaddin

Convex Analysis Jos e De Don a September 2004 Centre of Complex Dynamic Systems and

CS675: Convex and Combinatorial Optimization Fall 2019 Convex Sets Instructor: Shaddin Dughmi

CS133 Computational Geometry Convex Hull 4/12/2018 1 Convex Hull Given a set of n points,

CS675: Convex and Combinatorial Optimization Fall 2014 Convex Functions Instructor: Shaddin

Ap Applications of the Quantum st st-Co Connectivity y Algori rithm June 3, 2019 University

Back to School Night AE Wright Middle School Thank You Board Members! LVUSD Cabinet Guests

Computa(on through dynamics Using recurrent neural networks to unveil mechanism in neural

Learning to optimize multigrid PDE solvers DANIEL GREENFELD, WEIZMANN INSTITUTE OF SCIENCE JOINT

Cosmological Hydrodynamic Simulations Andrew Wetzel SUMMARY OF THIS TALK Cosmological

James Bullock @jbprime Universe of Galaxies (~ 10 -8 of observable part) Milky Way &amp; Local

Wor kplac e We llbe ing Mid- Atlantic Chapte r Our te a m c o nve ne d a se rie s o f virtua

Hardness of exact distance queries in sparse graphs through hub labeling Adrian Kosowski,

James Bullock @jbprime Universe of Galaxies (~ 10 -8 of observable part) Milky Way & Local