Method-of-moments Daniel Hsu 1 Example: modeling the topics of a - PowerPoint PPT Presentation

Method-of-moments Daniel Hsu 1

Example: modeling the topics of a document corpus Goal : model the topics of document in a corpus. θ Sample of documents Learning algorithm Model parameters 2

Topic model ( e.g. , Hofmann, ’99; Blei-Ng-Jordan, ’03) k topics (distributions over vocab words). sports science Each document ↔ mixture of topics. Words in document ∼ iid mixture dist. politics business 3

Topic model ( e.g. , Hofmann, ’99; Blei-Ng-Jordan, ’03) k topics (distributions over vocab words). sports science Each document ↔ mixture of topics. Words in document ∼ iid mixture dist. politics business E.g. , ∼ iid 0 . 6 · + 0 . 3 · + 0 . 1 · + 0 · sports science politics business 0 aardvark 3 athlete Pr θ [ “play” | sports ] = 0 . 0002 Pr θ [ “game” | sports ] = 0 . 0003 . . . Pr θ [ “season” | sports ] = 0 . 0001 . . . 1 zygote 3

Learning topic models Topic model : k topics (dists. over d words) � µ 1 , . . . , � µ k ; sports science Each document ↔ mixture of topics. Words in document ∼ iid mixture dist. politics business 4

Learning topic models Simple topic model : (each document about single topic) k topics (dists. over d words) � µ 1 , . . . , � µ k ; sports science Topic t chosen with prob. w t , words in document ∼ iid � µ t . politics business 4

Learning topic models Simple topic model : (each document about single topic) k topics (dists. over d words) � µ 1 , . . . , � µ k ; sports science Topic t chosen with prob. w t , words in document ∼ iid � µ t . politics business ◮ Input : sample of documents, generated by simple topic model with unknown parameters θ ⋆ := { ( � µ t ⋆ , w t ⋆ ) } . 4

Learning topic models Simple topic model : (each document about single topic) k topics (dists. over d words) � µ 1 , . . . , � µ k ; sports science Topic t chosen with prob. w t , words in document ∼ iid � µ t . politics business ◮ Input : sample of documents, generated by simple topic model with unknown parameters θ ⋆ := { ( � µ t ⋆ , w t ⋆ ) } . ◮ Task : find parameters θ := { ( � µ t , w t ) } so that θ ≈ θ ⋆ . 4

Some approaches to estimation 5

Some approaches to estimation Maximum-likelihood ( e.g. , Fisher, 1912) . θ MLE := arg max θ Pr θ [ data ] . 5

Some approaches to estimation Maximum-likelihood ( e.g. , Fisher, 1912) . θ MLE := arg max θ Pr θ [ data ] . Current practice ( > 40 years) : local search for local maxima — can be quite far from θ MLE . 5

Some approaches to estimation Maximum-likelihood ( e.g. , Fisher, 1912) . θ MLE := arg max θ Pr θ [ data ] . Current practice ( > 40 years) : local search for local maxima — can be quite far from θ MLE . Method-of-moments (Pearson, 1894) . Find parameters θ that (approximately) satisfy system of equations based on the data. 5

Some approaches to estimation Maximum-likelihood ( e.g. , Fisher, 1912) . θ MLE := arg max θ Pr θ [ data ] . Current practice ( > 40 years) : local search for local maxima — can be quite far from θ MLE . Method-of-moments (Pearson, 1894) . Find parameters θ that (approximately) satisfy system of equations based on the data. Many ways to instantiate & implement. 5

Moments: normal distribution Normal distribution : x ∼ N ( µ, v ) First- and second-order moments : E ( µ, v ) [ x 2 ] = µ 2 + v . E ( µ, v ) [ x ] = µ, 6

Moments: normal distribution Normal distribution : x ∼ N ( µ, v ) First- and second-order moments : E ( µ, v ) [ x 2 ] = µ 2 + v . E ( µ, v ) [ x ] = µ, Method-of-moments estimators of µ ⋆ and v ⋆ : µ and ˆ find ˆ v s.t. µ 2 + ˆ � E S [ x 2 ] ≈ ˆ � E S [ x ] ≈ ˆ µ, v . 6

Moments: normal distribution Normal distribution : x ∼ N ( µ, v ) First- and second-order moments : E ( µ, v ) [ x 2 ] = µ 2 + v . E ( µ, v ) [ x ] = µ, Method-of-moments estimators of µ ⋆ and v ⋆ : µ and ˆ find ˆ v s.t. µ 2 + ˆ � E S [ x 2 ] ≈ ˆ � E S [ x ] ≈ ˆ µ, v . A reasonable solution : µ := � v := � E S [ x 2 ] − ˆ µ 2 E S [ x ] , ˆ ˆ since � E S [ x ] → E ( µ ⋆ , v ⋆ ) [ x ] and � E S [ x 2 ] → E ( µ ⋆ , v ⋆ ) [ x 2 ] by LLN. 6

Moments: simple topic model For any n -tuple ( i 1 , i 2 , . . . , i n ) ∈ Vocabulary n : (Population) moments under some parameter θ : � � Pr θ document contains words i 1 , i 2 , . . . , i n . e.g. , Pr θ [ “machine” & “learning” co-occur ] . 7

Moments: simple topic model For any n -tuple ( i 1 , i 2 , . . . , i n ) ∈ Vocabulary n : (Population) moments under some parameter θ : � � Pr θ document contains words i 1 , i 2 , . . . , i n . e.g. , Pr θ [ “machine” & “learning” co-occur ] . Empirical moments from sample S of documents: � � � Pr S document contains words i 1 , i 2 , . . . , i n i.e. , empirical frequency of co-occurrences in sample S . 7

Method-of-moments Method-of-moments strategy : Given data sample S , find θ to satisfy system of equations � moments θ = moments S . � moments S ≈ moments θ ⋆ by LLN.) (Recall: we expect Q1. Which moments should we use? Q2. How do we (approx.) solve these moment equations? 8

Q1. Which moments should we use? 9

Q1. Which moments should we use? moment order reliable estimates? unique solution? 1 st , 2 nd 1 st - and 2 nd -order moments ( e.g. , prob. of word pairs) . [Arora-Ge-Moitra, ’12] [Kleinberg-Sandler, ’04] [Vempala-Wang, ’02] [McSherry, ’01] 1 st 2 nd Ω( k ) th order of moments 9

Q1. Which moments should we use? moment order reliable estimates? unique solution? 1 st , 2 nd ✓ 1 st - and 2 nd -order moments ( e.g. , prob. of word pairs) . ◮ Fairly easy to get reliable estimates. � Pr S [ “machine”, “learning” ] ≈ Pr θ ⋆ [ “machine”, “learning” ] [Arora-Ge-Moitra, ’12] [Kleinberg-Sandler, ’04] [Vempala-Wang, ’02] [McSherry, ’01] 1 st 2 nd Ω( k ) th order of moments 9

Q1. Which moments should we use? moment order reliable estimates? unique solution? 1 st , 2 nd ✓ ✗ 1 st - and 2 nd -order moments ( e.g. , prob. of word pairs) . ◮ Fairly easy to get reliable estimates. � Pr S [ “machine”, “learning” ] ≈ Pr θ ⋆ [ “machine”, “learning” ] ◮ Can have multiple solutions to moment equations. � moments θ 1 = moments = moments θ 2 , θ 1 � = θ 2 [Arora-Ge-Moitra, ’12] [Kleinberg-Sandler, ’04] [Vempala-Wang, ’02] [McSherry, ’01] 1 st 2 nd Ω( k ) th order of moments 9

Q1. Which moments should we use? moment order reliable estimates? unique solution? 1 st , 2 nd ✓ ✗ Ω( k ) th Ω( k ) th -order moments (prob. of word k -tuples) [Arora-Ge-Moitra, ’12] [Gravin et al , ’12] [Kleinberg-Sandler, ’04] [Moitra-Valiant, ’10] [Vempala-Wang, ’02] [Lindsay, ’89] [McSherry, ’01] [Prony, 1795] 1 st 2 nd Ω( k ) th order of moments 9

Q1. Which moments should we use? moment order reliable estimates? unique solution? 1 st , 2 nd ✓ ✗ Ω( k ) th ✓ Ω( k ) th -order moments (prob. of word k -tuples) ◮ Uniquely pins down the solution. [Arora-Ge-Moitra, ’12] [Gravin et al , ’12] [Kleinberg-Sandler, ’04] [Moitra-Valiant, ’10] [Vempala-Wang, ’02] [Lindsay, ’89] [McSherry, ’01] [Prony, 1795] 1 st 2 nd Ω( k ) th order of moments 9

Q1. Which moments should we use? moment order reliable estimates? unique solution? 1 st , 2 nd ✓ ✗ Ω( k ) th ✗ ✓ Ω( k ) th -order moments (prob. of word k -tuples) ◮ Uniquely pins down the solution. ◮ Empirical estimates very unreliable. [Arora-Ge-Moitra, ’12] [Gravin et al , ’12] [Kleinberg-Sandler, ’04] [Moitra-Valiant, ’10] [Vempala-Wang, ’02] [Lindsay, ’89] [McSherry, ’01] [Prony, 1795] 1 st 2 nd Ω( k ) th order of moments 9

Q1. Which moments should we use? moment order reliable estimates? unique solution? 1 st , 2 nd ✓ ✗ Ω( k ) th ✗ ✓ Can we get best-of-both-worlds? [Arora-Ge-Moitra, ’12] [Gravin et al , ’12] [Kleinberg-Sandler, ’04] [Moitra-Valiant, ’10] [Vempala-Wang, ’02] [Lindsay, ’89] [McSherry, ’01] [Prony, 1795] 1 st 2 nd Ω( k ) th order of moments 9

Q1. Which moments should we use? moment order reliable estimates? unique solution? 1 st , 2 nd ✓ ✗ Ω( k ) th ✗ ✓ Can we get best-of-both-worlds? Yes! In high-dimensions, low-order multivariate moments suffice. (1 st -, 2 nd -, and 3 rd -order moments) [Arora-Ge-Moitra, ’12] [Gravin et al , ’12] this work [Kleinberg-Sandler, ’04] [Moitra-Valiant, ’10] [Vempala-Wang, ’02] [Lindsay, ’89] [McSherry, ’01] [Prony, 1795] 1 st 2 nd 3 rd Ω( k ) th order of moments 9

Low-order multivariate moments suffice Key observation : in high dimensions ( d ≫ k ) , low-order moments have simple (“low-rank”) algebraic structure. 10

Method-of-moments Daniel Hsu 1 Example: modeling the topics of a - PowerPoint PPT Presentation

Method-of-moments Daniel Hsu 1 Example: modeling the topics of a document corpus Goal : model the topics of document in a corpus. Sample of documents Learning algorithm Model parameters 2 Topic model ( e.g. , Hofmann, 99;

APPLYING THE METHOD APPLYING THE METHOD OF MOMENTS TO OF MOMENTS TO DEVELOP RELIABILITY

Estimating Frequency Moments Moments Estimating F 0 Algorithm Correctness Anil Maheshwari

Some applications of the method of moments in the analysis of algorithms Alois Panholzer

VMware Skyline Turn Moments of Panic into Moments to Shine with Proactive Support Arron King

History and Approaches to Social Change Todays session is about... Moments in Organizing

3. Lecture: Basics of Magnetism: Local Moments Hartmut Zabel Ruhr-University Bochum Germany

Outline Outline Conditional Distribution and Density Conditional Distribution and

Estimating Frequency Moments Estimating F 0 Algorithm Correctness Further Anil Maheshwari

three forces and moments Forces & Moments 1 Architectural Structures S2018abn Lecture 3

New Cluster Moments for Jet Cleaning Atlas Reco meeting Sven Menke, MPP M unchen 4. Oct 2011,

Moments of Traces for Circular -ensembles Tiefeng Jiang University of Minnesota This is a

three forces and moments Forces & Moments 1 Architectural Structures F2018abn Lecture 3

The Scientific Method The Scientific Method The Scientific Method involves 6 steps: Problem

Generalized method of moments estimation of linear dynamic panel data models Sebastian Kripfganz

Generalized Method of Moments (GMM) Estimation Heino Bohn Nielsen 1 of 29 Outline of the

A New Method of Moments for Latent Variable Models Matteo Ruffini, Marta Casanellas, Ricard

An Empirical Study on Configuration Errors in Commercial and Open Source Systems Zuoning Yin,

Growth, Public Investment and Corruption with Failing Institutions David de la Croix 1 Clara

Software Architecture Lab. Department of Information Systems University of Haifa Software

Next Generation (Semi-)Empirical galaxy formation models - Matching individual galaxies Benjamin

Automated Empirical Optimization of High Performance Floating Point Kernels R. Clint Whaley

Three Researchers, Five Conjectures: An Empirical Analysis of TOM-Skype Censorship and

Scale-Invariance Ideas Scale-Invariance: . . . Which Dependencies . . . Explain the Empirical

A Cache-conscious Profitability A Cache-conscious Profitability Model for Empirical Tuning of

Method-of-moments Daniel Hsu 1 Example: modeling the topics of a - PowerPoint PPT Presentation

Method-of-moments Daniel Hsu 1 Example: modeling the topics of a document corpus Goal : model the topics of document in a corpus. Sample of documents Learning algorithm Model parameters 2 Topic model ( e.g. , Hofmann, 99;

APPLYING THE METHOD APPLYING THE METHOD OF MOMENTS TO OF MOMENTS TO DEVELOP RELIABILITY

Estimating Frequency Moments Moments Estimating F 0 Algorithm Correctness Anil Maheshwari

Some applications of the method of moments in the analysis of algorithms Alois Panholzer

VMware Skyline Turn Moments of Panic into Moments to Shine with Proactive Support Arron King

History and Approaches to Social Change Todays session is about... Moments in Organizing

3. Lecture: Basics of Magnetism: Local Moments Hartmut Zabel Ruhr-University Bochum Germany

Outline Outline Conditional Distribution and Density Conditional Distribution and

Estimating Frequency Moments Estimating F 0 Algorithm Correctness Further Anil Maheshwari

three forces and moments Forces &amp; Moments 1 Architectural Structures S2018abn Lecture 3

New Cluster Moments for Jet Cleaning Atlas Reco meeting Sven Menke, MPP M unchen 4. Oct 2011,

Moments of Traces for Circular -ensembles Tiefeng Jiang University of Minnesota This is a

three forces and moments Forces &amp; Moments 1 Architectural Structures F2018abn Lecture 3

The Scientific Method The Scientific Method The Scientific Method involves 6 steps: Problem

Generalized method of moments estimation of linear dynamic panel data models Sebastian Kripfganz

Generalized Method of Moments (GMM) Estimation Heino Bohn Nielsen 1 of 29 Outline of the

A New Method of Moments for Latent Variable Models Matteo Ruffini, Marta Casanellas, Ricard

An Empirical Study on Configuration Errors in Commercial and Open Source Systems Zuoning Yin,

Growth, Public Investment and Corruption with Failing Institutions David de la Croix 1 Clara

Software Architecture Lab. Department of Information Systems University of Haifa Software

Next Generation (Semi-)Empirical galaxy formation models - Matching individual galaxies Benjamin

Automated Empirical Optimization of High Performance Floating Point Kernels R. Clint Whaley

Three Researchers, Five Conjectures: An Empirical Analysis of TOM-Skype Censorship and

Scale-Invariance Ideas Scale-Invariance: . . . Which Dependencies . . . Explain the Empirical

A Cache-conscious Profitability A Cache-conscious Profitability Model for Empirical Tuning of

three forces and moments Forces & Moments 1 Architectural Structures S2018abn Lecture 3

three forces and moments Forces & Moments 1 Architectural Structures F2018abn Lecture 3