Proble lem: Low Rank Methods Operate at Fix ixed Granularity If rank is too small…… ≈ 11 Introduction Background Rank Power Ensembles Experiments
Proble lem: Low Rank Methods Operate at Fix ixed Granularity If rank is too small…… ≈ (break, spring) 11 Introduction Background Rank Power Ensembles Experiments
Proble lem: Low Rank Methods Operate at Fix ixed Granularity If rank is too small…… ≈ Probability gets diluted since (break, spring) “break” has many synonyms 11 Introduction Background Rank Power Ensembles Experiments
Proble lem: Low Rank Methods Operate at Fix ixed Granularity If rank is too large…. ≈ 12 Introduction Background Rank Power Ensembles Experiments
Proble lem: Low Rank Methods Operate at Fix ixed Granularity If rank is too large…. ≈ (domicile, dilapidated) 12 Introduction Background Rank Power Ensembles Experiments
Proble lem: Low Rank Methods Operate at Fix ixed Granularity If rank is too large…. ≈ Probabilities of rare words a (domicile, dilapidated) problem, since representation is too fine grained 12 Introduction Background Rank Power Ensembles Experiments
Our Approach 13 Introduction Background Rank Power Ensembles Experiments
Our Approach • Construct ensembles of low rank matrices/tensors to model language at multiple granularities 13 Introduction Background Rank Power Ensembles Experiments
Our Approach • Construct ensembles of low rank matrices/tensors to model language at multiple granularities • Includes existing n -gram techniques as special cases • Absolute discounting • Jelinek Mercer (deleted-interpolation) • Kneser Ney 13 Introduction Background Rank Power Ensembles Experiments
Our Approach • Construct ensembles of low rank matrices/tensors to model language at multiple granularities • Includes existing n -gram techniques as special cases • Absolute discounting • Jelinek Mercer (deleted-interpolation) • Kneser Ney • Preserves advantages of standard n -gram approaches • Effective for short context lengths • Fast evaluation at test time 13 Introduction Background Rank Power Ensembles Experiments
Outline • Introduction • Background on Kneser Ney smoothing • Our Approach • Rank • Power • Constructing the Ensemble • Experiments 14 Introduction Background Rank Power Ensembles Experiments
Kneser Ney - Intuition • Lower order distribution should be altered 56 Introduction Background Rank Power Ensembles Experiments
Kneser Ney - Intuition • Lower order distribution should be altered • Consider two words, York and door • York only follows very few words i.e. New York • Door can follow many words i.e. “the door”, “red door”, “my door” etc. 𝑄 𝑥 𝑗 = door backed − off on 𝑥 𝑗−1 ) > 𝑄(𝑥 𝑗 = York | backed − off on 𝑥 𝑗−1 ) 57 Introduction Background Rank Power Ensembles Experiments
Kneser Ney - Intuition • Lower order distribution should be altered • Consider two words, York and door • York only follows very few words i.e. New York • Door can follow many words i.e. “the door”, “red door”, “my door” etc. 𝑄 𝑥 𝑗 = door backed − off on 𝑥 𝑗−1 ) > 𝑄(𝑥 𝑗 = York | backed − off on 𝑥 𝑗−1 ) 58 Introduction Background Rank Power Ensembles Experiments
Kneser Ney Unigram Distribution 𝑂 − 𝑥 𝑗 = | 𝑥 ∶ 𝑑 𝑥 𝑗 , 𝑥 > 0 | ′ 𝒕 history Diversity of 𝒙 𝒋 16 Introduction Background Rank Power Ensembles Experiments
Kneser Ney Unigram Distribution 𝑂 − 𝑥 𝑗 = | 𝑥 ∶ 𝑑 𝑥 𝑗 , 𝑥 > 0 | ′ 𝒕 history Diversity of 𝒙 𝒋 𝑄 𝑙𝑜−𝑣𝑜𝑗 (𝑥 𝑗 ) = 𝑂 − 𝑥 𝑗 𝑥 𝑂 − 𝑥 16 Introduction Background Rank Power Ensembles Experiments
Discounting 17 Introduction Background Rank Power Ensembles Experiments
Discounting 𝑄 𝑒 𝑥 𝑗 𝑥 𝑗−1 ) = max(𝑑 𝑥 𝑗 , 𝑥 𝑗−1 − 𝑒, 0) 𝑥 𝑑 𝑥, 𝑥 𝑗−1 17 Introduction Background Rank Power Ensembles Experiments
Discounting 𝑄 𝑒 𝑥 𝑗 𝑥 𝑗−1 ) = max(𝑑 𝑥 𝑗 , 𝑥 𝑗−1 − 𝑒, 0) 𝑥 𝑑 𝑥, 𝑥 𝑗−1 𝑙𝑜𝑓𝑧 𝑥 𝑗 𝑥 𝑗−1 ) = 𝑒 𝑥 𝑗 𝑥 𝑗−1 ) + 𝛿 𝑥 𝑗−1 𝑄 𝑄 𝑄 𝑙𝑜−𝑣𝑜𝑗 (𝑥 𝑗 ) 17 Introduction Background Rank Power Ensembles Experiments
Discounting 𝑄 𝑒 𝑥 𝑗 𝑥 𝑗−1 ) = max(𝑑 𝑥 𝑗 , 𝑥 𝑗−1 − 𝑒, 0) 𝑥 𝑑 𝑥, 𝑥 𝑗−1 𝑙𝑜𝑓𝑧 𝑥 𝑗 𝑥 𝑗−1 ) = 𝑒 𝑥 𝑗 𝑥 𝑗−1 ) + 𝛿 𝑥 𝑗−1 𝑄 𝑄 𝑄 𝑙𝑜−𝑣𝑜𝑗 (𝑥 𝑗 ) Where 𝜹 𝒙 𝒋−𝟐 is the leftover probability 17 Introduction Background Rank Power Ensembles Experiments
Lower Order Marginal Aligns! 𝑄 𝑙𝑜𝑓𝑧 𝑥 𝑗 𝑥 𝑗−1 ) 𝑄 𝑥 𝑗 = 𝑄 𝑥 𝑗−1 𝑥 𝑗−1 18 Introduction Background Rank Power Ensembles Experiments
Generalizing KN to PLRE Kneser Ney Power Low Rank Ensembles 19 Introduction Background Rank Power Ensembles Experiments
Generalizing KN to PLRE Kneser Ney Power Low Rank Ensembles • Ensemble composed of unsmoothed n -grams 19 Introduction Background Rank Power Ensembles Experiments
Generalizing KN to PLRE Kneser Ney Power Low Rank Ensembles • Ensemble composed of unsmoothed n -grams • Alter lower order distributions by using count of unique histories 19 Introduction Background Rank Power Ensembles Experiments
Generalizing KN to PLRE Kneser Ney Power Low Rank Ensembles • Ensemble composed of unsmoothed n -grams • Alter lower order distributions by using count of unique histories • Use absolute discounting to interpolate different n -grams and preserve lower order marginal constraint 19 Introduction Background Rank Power Ensembles Experiments
Generalizing KN to PLRE Kneser Ney Power Low Rank Ensembles • Ensemble composed of ? unsmoothed n -grams • Alter lower order distributions by ? using count of unique histories • Use absolute discounting to ? interpolate different n -grams and preserve lower order marginal constraint 19 Introduction Background Rank Power Ensembles Experiments
Generalizing KN to PLRE Kneser Ney Power Low Rank Ensembles • Ensemble composed of ? unsmoothed n -grams • Alter lower order distributions by ? using count of unique histories • Use absolute discounting to ? interpolate different n -grams and preserve lower order marginal constraint 20 Introduction Background Rank Power Ensembles Experiments
In In General, Bigram is Full Rank 21 Introduction Background Rank Power Ensembles Experiments
In Independence = Rank 1 • If 𝑥 𝑗 and 𝑥 𝑗−1 are independent 𝑄(𝑥 𝑗 , 𝑥 𝑗−1 ) = 𝑄 𝑥 𝑗 𝑄 𝑥 𝑗−1 73 Introduction Background Rank Power Ensembles Experiments
In Independence = Rank 1 • If 𝑥 𝑗 and 𝑥 𝑗−1 are independent 𝑄(𝑥 𝑗 , 𝑥 𝑗−1 ) = 𝑄 𝑥 𝑗 𝑄 𝑥 𝑗−1 74 Introduction Background Rank Power Ensembles Experiments
In Independence = Rank 1 • If 𝑥 𝑗 and 𝑥 𝑗−1 are independent 𝑄(𝑥 𝑗 , 𝑥 𝑗−1 ) = 𝑄 𝑥 𝑗 𝑄 𝑥 𝑗−1 𝑄(ℎ𝑝𝑣𝑡𝑓, 𝑝𝑚𝑒) = 𝑄(𝑝𝑚𝑒) 𝑄(ℎ𝑝𝑣𝑡𝑓) 75 Introduction Background Rank Power Ensembles Experiments
In Independence = Rank 1 • If 𝑥 𝑗 and 𝑥 𝑗−1 are independent 𝑄(𝑥 𝑗 , 𝑥 𝑗−1 ) = 𝑄 𝑥 𝑗 𝑄 𝑥 𝑗−1 𝑄(ℎ𝑝𝑣𝑡𝑓, 𝑝𝑚𝑒) = 𝑄(𝑝𝑚𝑒) 𝑄(ℎ𝑝𝑣𝑡𝑓) • But what if 𝑥 𝑗 and 𝑥 𝑗−1 are not independent? What does the best rank 1 approximation give? 76 Introduction Background Rank Power Ensembles Experiments
Rank • Let 𝑪 be the matrix such that 𝑪 𝑥 𝑗 , 𝑥 𝑗−1 = 𝑑 𝑥 𝑗 , 𝑥 𝑗−1 • Let 𝑵 1 = 𝑛𝑗𝑜 𝑵:𝑵≥0,𝑠𝑏𝑜𝑙 𝑵 =1 𝑪 − 𝑵 𝐿𝑀 = Generalized KL [ Lee and Seung 2001 ] • Then 𝑵 1 𝑥 𝑗 , 𝑥 𝑗−1 ∝ 𝑄 𝑥 𝑗 𝑄 𝑥 𝑗−1 77 Introduction Background Rank Power Ensembles Experiments
Rank • MLE unigram is normalized rank 1 approx. of MLE bigram under KL: 𝑵 1 𝑥 𝑗 , 𝑥 𝑗−1 𝑄 𝑥 𝑗 = 𝑥 𝑗 𝑵 1 (𝑥 𝑗 , 𝑥 𝑗−1 ) 24 Introduction Background Rank Power Ensembles Experiments
Rank • MLE unigram is normalized rank 1 approx. of MLE bigram under KL: 𝑵 1 𝑥 𝑗 , 𝑥 𝑗−1 𝑄 𝑥 𝑗 = 𝑥 𝑗 𝑵 1 (𝑥 𝑗 , 𝑥 𝑗−1 ) • Vary rank to obtain quantities between bigram and unigram 24 Introduction Background Rank Power Ensembles Experiments
Rank • MLE unigram is normalized rank 1 approx. of MLE bigram under KL: 𝑵 1 𝑥 𝑗 , 𝑥 𝑗−1 𝑄 𝑥 𝑗 = 𝑥 𝑗 𝑵 1 (𝑥 𝑗 , 𝑥 𝑗−1 ) • Vary rank to obtain quantities between bigram and unigram full rank rank 1 24 Introduction Background Rank Power Ensembles Experiments
Rank • MLE unigram is normalized rank 1 approx. of MLE bigram under KL: 𝑵 1 𝑥 𝑗 , 𝑥 𝑗−1 𝑄 𝑥 𝑗 = 𝑥 𝑗 𝑵 1 (𝑥 𝑗 , 𝑥 𝑗−1 ) • Vary rank to obtain quantities between bigram and unigram full rank low rank rank 1 24 Introduction Background Rank Power Ensembles Experiments
Generalizing KN to PLRE Kneser Ney Power Low Rank Ensembles • Ensemble composed of • Ensemble composed of unsmoothed n -grams unsmoothed n -grams plus other low rank matrices/tensors • Alter lower order distributions by ? using count of unique histories • Use absolute discounting to ? interpolate different n -grams and preserve lower order marginal constraint 25 Introduction Background Rank Power Ensembles Experiments
Generalizing KN to PLRE Kneser Ney Power Low Rank Ensembles • Ensemble composed of • Ensemble composed of unsmoothed n -grams unsmoothed n -grams plus other low rank matrices/tensors • Alter lower order distributions by ? using count of unique histories • Use absolute discounting to ? interpolate different n -grams and preserve lower order marginal constraint 26 Introduction Background Rank Power Ensembles Experiments
Consider Elementwise Power 27 Introduction Background Rank Power Ensembles Experiments
Consider Elementwise Power 𝑪 𝟐 𝟑 𝟐 𝟏 𝟔 𝟏 𝟑 𝟏 𝟏 27 Introduction Background Rank Power Ensembles Experiments
Consider Elementwise Power 𝑪 𝟐 𝟑 𝟐 𝟏 𝟔 𝟏 𝟑 𝟏 𝟏 row sum 𝟓 𝟔 𝟑 27 Introduction Background Rank Power Ensembles Experiments
Consider Elementwise Power 𝑪 𝟐 𝟑 𝟐 𝟏 𝟔 𝟏 𝟑 𝟏 𝟏 row sum 𝟓 𝟔 𝟑 27 Introduction Background Rank Power Ensembles Experiments
Consider Elementwise Power 𝑪 𝟏.𝟔 𝑪 𝟐 𝟑 𝟐 𝟐 𝟐. 𝟓 𝟐 𝟏 𝟔 𝟏 𝟏 𝟑. 𝟑 𝟏 𝟑 𝟏 𝟏 𝟏 𝟏 𝟐. 𝟓 row sum 𝟓 𝟔 𝟑 27 Introduction Background Rank Power Ensembles Experiments
Consider Elementwise Power 𝑪 𝟏.𝟔 𝑪 𝟐 𝟑 𝟐 𝟐 𝟐. 𝟓 𝟐 𝟏 𝟔 𝟏 𝟏 𝟑. 𝟑 𝟏 𝟑 𝟏 𝟏 𝟏 𝟏 𝟐. 𝟓 row sum row sum 𝟓 𝟒. 𝟓 𝟔 𝟑. 𝟑 𝟑 𝟐. 𝟓 27 Introduction Background Rank Power Ensembles Experiments
Consider Elementwise Power 𝑪 𝟏.𝟔 𝑪 𝟐 𝟑 𝟐 𝟐 𝟐. 𝟓 𝟐 𝟏 𝟔 𝟏 𝟏 𝟑. 𝟑 𝟏 𝟑 𝟏 𝟏 𝟏 𝟏 𝟐. 𝟓 row sum row sum 𝟓 𝟒. 𝟓 𝟔 𝟑. 𝟑 𝟑 𝟐. 𝟓 27 Introduction Background Rank Power Ensembles Experiments
Consider Elementwise Power 𝑪 𝟏.𝟔 𝑪 𝟏 𝑪 𝟐 𝟑 𝟐 𝟐 𝟐 𝟐 𝟐 𝟐. 𝟓 𝟐 𝟏 𝟔 𝟏 𝟏 𝟐 𝟏 𝟏 𝟑. 𝟑 𝟏 𝟑 𝟏 𝟏 𝟏 𝟏 𝟐 𝟏 𝟏 𝟐. 𝟓 row sum row sum 𝟓 𝟒. 𝟓 𝟔 𝟑. 𝟑 𝟑 𝟐. 𝟓 27 Introduction Background Rank Power Ensembles Experiments
Consider Elementwise Power 𝑪 𝟏.𝟔 𝑪 𝟏 𝑪 𝟐 𝟑 𝟐 𝟐 𝟐 𝟐 𝟐 𝟐. 𝟓 𝟐 𝟏 𝟔 𝟏 𝟏 𝟐 𝟏 𝟏 𝟑. 𝟑 𝟏 𝟑 𝟏 𝟏 𝟏 𝟏 𝟐 𝟏 𝟏 𝟐. 𝟓 row sum row sum row sum 𝟓 𝟒. 𝟓 𝟒 𝟔 𝟑. 𝟑 𝟐 𝟑 𝟐. 𝟓 𝟐 27 Introduction Background Rank Power Ensembles Experiments
Consider Elementwise Power 𝑪 𝟏.𝟔 𝑪 𝟏 𝑪 𝟐 𝟑 𝟐 𝟐 𝟐 𝟐 𝟐 𝟐. 𝟓 𝟐 𝟏 𝟔 𝟏 𝟏 𝟐 𝟏 𝟏 𝟑. 𝟑 𝟏 𝟑 𝟏 𝟏 𝟏 𝟏 𝟐 𝟏 𝟏 𝟐. 𝟓 row sum row sum row sum 𝟓 𝟒. 𝟓 𝟒 𝟔 𝟑. 𝟑 𝟐 𝟑 𝟐. 𝟓 𝟐 emphasis on diversity 27 Introduction Background Rank Power Ensembles Experiments
Consider Elementwise Power 28 Introduction Background Rank Power Ensembles Experiments
Consider Elementwise Power 0 = 𝑛𝑗𝑜 𝑵:𝑵≥0,𝑠𝑏𝑜𝑙 𝑵 =1 𝑪 𝟏 − 𝑵 𝐿𝑀 𝑵 1 28 Introduction Background Rank Power Ensembles Experiments
Consider Elementwise Power 0 = 𝑛𝑗𝑜 𝑵:𝑵≥0,𝑠𝑏𝑜𝑙 𝑵 =1 𝑪 𝟏 − 𝑵 𝐿𝑀 𝑵 1 0 𝑥 𝑗 , 𝑥 𝑗−1 𝑄 𝑙𝑜−𝑣𝑜𝑗 (𝑥 𝑗 ) = 𝑵 1 0 𝑥, 𝑥 𝑗−1 𝑥 𝑵 1 28 Introduction Background Rank Power Ensembles Experiments
Consider Elementwise Power 0 = 𝑛𝑗𝑜 𝑵:𝑵≥0,𝑠𝑏𝑜𝑙 𝑵 =1 𝑪 𝟏 − 𝑵 𝐿𝑀 𝑵 1 0 𝑥 𝑗 , 𝑥 𝑗−1 𝑄 𝑙𝑜−𝑣𝑜𝑗 (𝑥 𝑗 ) = 𝑵 1 0 𝑥, 𝑥 𝑗−1 𝑥 𝑵 1 power = 1 full rank 28 Introduction Background Rank Power Ensembles Experiments
Consider Elementwise Power 0 = 𝑛𝑗𝑜 𝑵:𝑵≥0,𝑠𝑏𝑜𝑙 𝑵 =1 𝑪 𝟏 − 𝑵 𝐿𝑀 𝑵 1 0 𝑥 𝑗 , 𝑥 𝑗−1 𝑄 𝑙𝑜−𝑣𝑜𝑗 (𝑥 𝑗 ) = 𝑵 1 0 𝑥, 𝑥 𝑗−1 𝑥 𝑵 1 power = 1 power = 0 full rank full rank power 28 Introduction Background Rank Power Ensembles Experiments
Consider Elementwise Power 0 = 𝑛𝑗𝑜 𝑵:𝑵≥0,𝑠𝑏𝑜𝑙 𝑵 =1 𝑪 𝟏 − 𝑵 𝐿𝑀 𝑵 1 0 𝑥 𝑗 , 𝑥 𝑗−1 𝑄 𝑙𝑜−𝑣𝑜𝑗 (𝑥 𝑗 ) = 𝑵 1 0 𝑥, 𝑥 𝑗−1 𝑥 𝑵 1 power = 0 power = 1 power = 0 full rank rank = 1 full rank power low rank 28 Introduction Background Rank Power Ensembles Experiments
Vary rying Rank and Power • Construct matrices of varying rank and power power = 1 power = 0 full rank rank = 1 29 Introduction Background Rank Power Ensembles Experiments
Recommend
More recommend