DETERMINANTAL POINT PROCESSES FOR NATURAL LANGUAGE PROCESSING Jennifer Gillenwater Joint work with Alex Kulesza and Ben Taskar
OUTLINE
OUTLINE Motivation & background on DPPs
OUTLINE Motivation & background on DPPs Large-scale settings
OUTLINE Motivation & background on DPPs Large-scale settings Structured summarization
OUTLINE Motivation & background on DPPs Large-scale settings Structured summarization Other potential NLP applications
MOTIVATION & BACKGROUND
SUMMARIZATION
SUMMARIZATION ...
SUMMARIZATION ...
SUMMARIZATION ... Quality : relevance to the topic
SUMMARIZATION ... Quality : Diversity : relevance to coverage of the topic core ideas
SUBSET SELECTION
SUBSET SELECTION
SUBSET SELECTION
SUBSET SELECTION
AREA AS SET-GOODNESS
AREA AS SET-GOODNESS feature space
AREA AS SET-GOODNESS feature space B i B j
AREA AS SET-GOODNESS feature space p B > quality = i B i similarity = B > i B j B i B j
AREA AS SET-GOODNESS feature space p B > quality = i B i B i + B j similarity = B > i B j i B j ) 2 B i 2 � ( B > 2 k B j k 2 k 2 B i k q = a e B j r a
AREA AS SET-GOODNESS feature space p B > quality = i B i B i + B j similarity = B > i B j i B j ) 2 B i 2 � ( B > 2 k B j k 2 k 2 B i k q = a e B j r a
AREA AS SET-GOODNESS feature space p B > quality = i B i B i + B j similarity = B > i B j i B j ) 2 B i 2 � ( B > 2 k B j k 2 k 2 B i k q = a e B j r a
VOLUME AS SET-GOODNESS q 2 � ( B > k B i k 2 2 k B j k 2 area = i B j ) 2
VOLUME AS SET-GOODNESS q 2 � ( B > k B i k 2 2 k B j k 2 area = i B j ) 2
VOLUME AS SET-GOODNESS q 2 � ( B > k B i k 2 2 k B j k 2 area = i B j ) 2 length = k B i k 2
VOLUME AS SET-GOODNESS q 2 � ( B > k B i k 2 2 k B j k 2 area = i B j ) 2 volume = base × height length = k B i k 2
VOLUME AS SET-GOODNESS q 2 � ( B > k B i k 2 2 k B j k 2 area = i B j ) 2 volume = base × height length = k B i k 2 vol( B ) = height × base = || B 1 || 2 vol(proj ⊥ B 1 ( B 2: N ))
AREA AS A DET q 2 � ( B > k B i k 2 2 k B j k 2 area = i B j ) 2
AREA AS A DET q 2 � ( B > k B i k 2 2 k B j k 2 area = i B j ) 2 1 ( ) B > || B i || 2 2 i B j 2 = det B > || B j || 2 i B j 2
AREA AS A DET q 2 � ( B > k B i k 2 2 k B j k 2 area = i B j ) 2 1 ( ) B > || B i || 2 2 i B j 2 = det B > || B j || 2 i B j 2 1 = det ( ) 2 B i B j B i B j
VOLUME AS A DET 1 = det ( ) 2 B i vol( B { i,j } ) B j B i B j
VOLUME AS A DET 1 = det ( ) 2 B i vol( B { i,j } ) B j B i B j 1 ( ) 2 B 1 vol( B ) = det B 1 B N . . . . . . B N vol( B ) 2 = det( B > B ) = det( L )
COMPLEX STATISTICS
COMPLEX STATISTICS
COMPLEX STATISTICS
COMPLEX STATISTICS P
COMPLEX STATISTICS P
COMPLEX STATISTICS P
COMPLEX STATISTICS P
COMPLEX STATISTICS P
COMPLEX STATISTICS P
COMPLEX STATISTICS P
COMPLEX STATISTICS P
COMPLEX STATISTICS ⇒ 2 N sets N items =
EFFICIENT COMPUTATION
EFFICIENT COMPUTATION 1 det 2
EFFICIENT COMPUTATION 2 det
EFFICIENT COMPUTATION 2 det P
EFFICIENT COMPUTATION 2 det O ( N 3 )
POINT PROCESSES Y = { 1 , . . . , N }
POINT PROCESSES Y = { 1 , . . . , N }
POINT PROCESSES Y = { 1 , . . . , N } ( ) P
POINT PROCESSES Y = { 1 , . . . , N } ( ) = 0 . 2 P
DETERMINANTAL
DETERMINANTAL P ( { 2 , 3 , 5 } ) ∝
DETERMINANTAL P ( { 2 , 3 , 5 } ) ∝ L 11 L 12 L 13 L 14 L 15 L 21 L 22 L 23 L 24 L 25 L 31 L 32 L 33 L 34 L 35 L 41 L 42 L 43 L 44 L 45 L 51 L 52 L 53 L 54 L 55
DETERMINANTAL P ( { 2 , 3 , 5 } ) ∝ L 11 L 12 L 13 L 14 L 15 L 21 L 22 L 23 L 24 L 25 L 31 L 32 L 33 L 34 L 35 L 41 L 42 L 43 L 44 L 45 L 51 L 52 L 53 L 54 L 55
DETERMINANTAL L 22 L 23 L 25 P ( { 2 , 3 , 5 } ) ∝ L 32 L 33 L 35 L 52 L 53 L 55
DETERMINANTAL det ( L 22 L 23 L 25 ) P ( { 2 , 3 , 5 } ) ∝ L 32 L 33 L 35 L 52 L 53 L 55
DETERMINANTAL det ( L 22 L 23 L 25 ) P ( { 2 , 3 , 5 } ) = L 32 L 33 L 35 L 52 L 53 L 55
DETERMINANTAL det ( L 22 L 23 L 25 ) P ( { 2 , 3 , 5 } ) = L 32 L 33 L 35 L 52 L 53 L 55 det( L + I )
EFFICIENT INFERENCE
EFFICIENT INFERENCE P L ( Y = Y ) Normalizing:
EFFICIENT INFERENCE P L ( Y = Y ) Normalizing: P ( Y ⊆ Y ) Marginalizing:
EFFICIENT INFERENCE P L ( Y = Y ) Normalizing: P ( Y ⊆ Y ) Marginalizing: Conditioning: P L ( Y = B | A ⊆ Y ) P L ( Y = B | A ∩ Y = ∅ )
EFFICIENT INFERENCE P L ( Y = Y ) Normalizing: P ( Y ⊆ Y ) Marginalizing: Conditioning: P L ( Y = B | A ⊆ Y ) P L ( Y = B | A ∩ Y = ∅ ) Sampling: Y ∼ P L
EFFICIENT INFERENCE P L ( Y = Y ) Normalizing: P ( Y ⊆ Y ) Marginalizing: Conditioning: P L ( Y = B | A ⊆ Y ) P L ( Y = B | A ∩ Y = ∅ ) Sampling: Y ∼ P L O ( N 3 )
LARGE-SCALE SETTINGS
DUAL KERNEL KULESZA AND TASKAR (NIPS 2010)
DUAL KERNEL KULESZA AND TASKAR (NIPS 2010) L B 1 B N B 1 B 2 B 3 B 2 . . . B 3 . . . B N
DUAL KERNEL KULESZA AND TASKAR (NIPS 2010) L B 1 B N B 1 B 2 B 3 B 2 . . . = B 3 N × N . . . B N
DUAL KERNEL KULESZA AND TASKAR (NIPS 2010) C B 1 B N B 1 B 2 B 3 B 2 . . . = B 3 N × N . . . B N
DUAL KERNEL KULESZA AND TASKAR (NIPS 2010) C B 1 B N B 1 B 2 B 3 B 2 . . . = B 3 . . . B N
DUAL KERNEL KULESZA AND TASKAR (NIPS 2010) C B 1 B N B 1 B 2 B 3 D × D B 2 . . . = B 3 . . . B N
DUAL INFERENCE
DUAL INFERENCE L = V Λ V > C = ˆ V Λ ˆ V >
DUAL INFERENCE V = B > ˆ V Λ � 1 L = V Λ V > C = ˆ V Λ ˆ V > 2
DUAL INFERENCE V = B > ˆ V Λ � 1 L = V Λ V > C = ˆ V Λ ˆ V > 2 Normalizing O ( D 3 ) P Y det( L Y )
DUAL INFERENCE V = B > ˆ V Λ � 1 L = V Λ V > C = ˆ V Λ ˆ V > 2 Normalizing O ( D 3 ) P Y det( L Y ) O ( D 3 + D 2 k 2 ) Marginalizing & Conditioning
DUAL INFERENCE V = B > ˆ V Λ � 1 L = V Λ V > C = ˆ V Λ ˆ V > 2 Normalizing O ( D 3 ) P Y det( L Y ) O ( D 3 + D 2 k 2 ) Marginalizing & Conditioning Sampling O ( ND 2 k ) Y ∼ P L
EXPONENTIAL N
EXPONENTIAL N We want to select a diverse set of parses. N = O ( { sentence length } { sentence length } )
EXPONENTIAL N We want to select a diverse set of parses. N = O ( { sentence length } { sentence length } ) N = O ( { node degree } { path length } )
STRUCTURE FACTORIZATION KULESZA AND TASKAR (NIPS 2010) i =
STRUCTURE FACTORIZATION KULESZA AND TASKAR (NIPS 2010) i = B i = q ( i ) φ ( i ) quality similarity
STRUCTURE FACTORIZATION KULESZA AND TASKAR (NIPS 2010) i = { i α } α ∈ F α c = 1 i = B i = q ( i ) φ ( i ) quality similarity
STRUCTURE FACTORIZATION KULESZA AND TASKAR (NIPS 2010) i = { i α } α ∈ F α c = 2 i = B i = q ( i ) φ ( i ) quality similarity
STRUCTURE FACTORIZATION KULESZA AND TASKAR (NIPS 2010) i = { i α } α ∈ F α c = 2 i = Q � B i = q ( i α ) φ ( i ) α ∈ F
STRUCTURE FACTORIZATION KULESZA AND TASKAR (NIPS 2010) i = { i α } α ∈ F α c = 2 i = Q � P � B i = q ( i α ) φ ( i α ) α ∈ F α ∈ F
STRUCTURE FACTORIZATION KULESZA AND TASKAR (NIPS 2010) i = { i α } α ∈ F α c = 2 i = Q � P � B i = q ( i α ) φ ( i α ) α ∈ F α ∈ F O ( ND 2 k ) Y ∼ P L
STRUCTURE FACTORIZATION KULESZA AND TASKAR (NIPS 2010) α M = R = c = 2 Q � P � B i = q ( i α ) φ ( i α ) α ∈ F α ∈ F O ( D 2 k 3 + Dk 2 M c R ) Y ∼ P L
STRUCTURE FACTORIZATION KULESZA AND TASKAR (NIPS 2010) α M = R = c = 2 Q � P � B i = q ( i α ) φ ( i α ) α ∈ F α ∈ F O ( D 2 k 3 + Dk 2 M c R ) Y ∼ P L M c R = 4 2 ⇤ 12 = 192 ⌧ N = 4 12 = 16 , 777 , 216
LARGE FEATURE SETS?
LARGE FEATURE SETS? N = # of items Large Exponential D = # of features dual + Small dual structure
LARGE FEATURE SETS? N = # of items Large Exponential D = # of features dual + Small dual structure ? ? Large
RANDOM PROJECTIONS GILLENWATER, KULESZA, AND TASKAR (EMNLP 2012)
RANDOM PROJECTIONS GILLENWATER, KULESZA, AND TASKAR (EMNLP 2012) D N Φ
RANDOM PROJECTIONS GILLENWATER, KULESZA, AND TASKAR (EMNLP 2012) D Φ M c R
RANDOM PROJECTIONS GILLENWATER, KULESZA, AND TASKAR (EMNLP 2012) D d Φ D M c R ×
Recommend
More recommend