method of moments
play

Method-of-moments Daniel Hsu 1 Example: modeling the topics of a - PowerPoint PPT Presentation

Method-of-moments Daniel Hsu 1 Example: modeling the topics of a document corpus Goal : model the topics of document in a corpus. Sample of documents Learning algorithm Model parameters 2 Topic model ( e.g. , Hofmann, 99;


  1. Method-of-moments Daniel Hsu 1

  2. Example: modeling the topics of a document corpus Goal : model the topics of document in a corpus. θ Sample of documents Learning algorithm Model parameters 2

  3. Topic model ( e.g. , Hofmann, ’99; Blei-Ng-Jordan, ’03) k topics (distributions over vocab words). sports science Each document ↔ mixture of topics. Words in document ∼ iid mixture dist. politics business 3

  4. Topic model ( e.g. , Hofmann, ’99; Blei-Ng-Jordan, ’03) k topics (distributions over vocab words). sports science Each document ↔ mixture of topics. Words in document ∼ iid mixture dist. politics business E.g. , ∼ iid 0 . 6 · + 0 . 3 · + 0 . 1 · + 0 · sports science politics business 0 aardvark 3 athlete Pr θ [ “play” | sports ] = 0 . 0002 Pr θ [ “game” | sports ] = 0 . 0003 . . . Pr θ [ “season” | sports ] = 0 . 0001 . . . 1 zygote 3

  5. Learning topic models Topic model : k topics (dists. over d words) � µ 1 , . . . , � µ k ; sports science Each document ↔ mixture of topics. Words in document ∼ iid mixture dist. politics business 4

  6. Learning topic models Simple topic model : (each document about single topic) k topics (dists. over d words) � µ 1 , . . . , � µ k ; sports science Topic t chosen with prob. w t , words in document ∼ iid � µ t . politics business 4

  7. Learning topic models Simple topic model : (each document about single topic) k topics (dists. over d words) � µ 1 , . . . , � µ k ; sports science Topic t chosen with prob. w t , words in document ∼ iid � µ t . politics business ◮ Input : sample of documents, generated by simple topic model with unknown parameters θ ⋆ := { ( � µ t ⋆ , w t ⋆ ) } . 4

  8. Learning topic models Simple topic model : (each document about single topic) k topics (dists. over d words) � µ 1 , . . . , � µ k ; sports science Topic t chosen with prob. w t , words in document ∼ iid � µ t . politics business ◮ Input : sample of documents, generated by simple topic model with unknown parameters θ ⋆ := { ( � µ t ⋆ , w t ⋆ ) } . ◮ Task : find parameters θ := { ( � µ t , w t ) } so that θ ≈ θ ⋆ . 4

  9. Some approaches to estimation 5

  10. Some approaches to estimation Maximum-likelihood ( e.g. , Fisher, 1912) . θ MLE := arg max θ Pr θ [ data ] . 5

  11. Some approaches to estimation Maximum-likelihood ( e.g. , Fisher, 1912) . θ MLE := arg max θ Pr θ [ data ] . Current practice ( > 40 years) : local search for local maxima — can be quite far from θ MLE . 5

  12. Some approaches to estimation Maximum-likelihood ( e.g. , Fisher, 1912) . θ MLE := arg max θ Pr θ [ data ] . Current practice ( > 40 years) : local search for local maxima — can be quite far from θ MLE . Method-of-moments (Pearson, 1894) . Find parameters θ that (approximately) satisfy system of equations based on the data. 5

  13. Some approaches to estimation Maximum-likelihood ( e.g. , Fisher, 1912) . θ MLE := arg max θ Pr θ [ data ] . Current practice ( > 40 years) : local search for local maxima — can be quite far from θ MLE . Method-of-moments (Pearson, 1894) . Find parameters θ that (approximately) satisfy system of equations based on the data. Many ways to instantiate & implement. 5

  14. Some approaches to estimation Maximum-likelihood ( e.g. , Fisher, 1912) . θ MLE := arg max θ Pr θ [ data ] . Current practice ( > 40 years) : local search for local maxima — can be quite far from θ MLE . Method-of-moments (Pearson, 1894) . Find parameters θ that (approximately) satisfy system of equations based on the data. Many ways to instantiate & implement. 5

  15. Moments: normal distribution Normal distribution : x ∼ N ( µ, v ) First- and second-order moments : E ( µ, v ) [ x 2 ] = µ 2 + v . E ( µ, v ) [ x ] = µ, 6

  16. Moments: normal distribution Normal distribution : x ∼ N ( µ, v ) First- and second-order moments : E ( µ, v ) [ x 2 ] = µ 2 + v . E ( µ, v ) [ x ] = µ, Method-of-moments estimators of µ ⋆ and v ⋆ : µ and ˆ find ˆ v s.t. µ 2 + ˆ � E S [ x 2 ] ≈ ˆ � E S [ x ] ≈ ˆ µ, v . 6

  17. Moments: normal distribution Normal distribution : x ∼ N ( µ, v ) First- and second-order moments : E ( µ, v ) [ x 2 ] = µ 2 + v . E ( µ, v ) [ x ] = µ, Method-of-moments estimators of µ ⋆ and v ⋆ : µ and ˆ find ˆ v s.t. µ 2 + ˆ � E S [ x 2 ] ≈ ˆ � E S [ x ] ≈ ˆ µ, v . A reasonable solution : µ := � v := � E S [ x 2 ] − ˆ µ 2 E S [ x ] , ˆ ˆ since � E S [ x ] → E ( µ ⋆ , v ⋆ ) [ x ] and � E S [ x 2 ] → E ( µ ⋆ , v ⋆ ) [ x 2 ] by LLN. 6

  18. Moments: simple topic model For any n -tuple ( i 1 , i 2 , . . . , i n ) ∈ Vocabulary n : (Population) moments under some parameter θ : � � Pr θ document contains words i 1 , i 2 , . . . , i n . e.g. , Pr θ [ “machine” & “learning” co-occur ] . 7

  19. Moments: simple topic model For any n -tuple ( i 1 , i 2 , . . . , i n ) ∈ Vocabulary n : (Population) moments under some parameter θ : � � Pr θ document contains words i 1 , i 2 , . . . , i n . e.g. , Pr θ [ “machine” & “learning” co-occur ] . Empirical moments from sample S of documents: � � � Pr S document contains words i 1 , i 2 , . . . , i n i.e. , empirical frequency of co-occurrences in sample S . 7

  20. Method-of-moments Method-of-moments strategy : Given data sample S , find θ to satisfy system of equations � moments θ = moments S . � moments S ≈ moments θ ⋆ by LLN.) (Recall: we expect Q1. Which moments should we use? Q2. How do we (approx.) solve these moment equations? 8

  21. Q1. Which moments should we use? 9

  22. Q1. Which moments should we use? moment order reliable estimates? unique solution? 1 st , 2 nd 1 st - and 2 nd -order moments ( e.g. , prob. of word pairs) . [Arora-Ge-Moitra, ’12] [Kleinberg-Sandler, ’04] [Vempala-Wang, ’02] [McSherry, ’01] 1 st 2 nd Ω( k ) th order of moments 9

  23. Q1. Which moments should we use? moment order reliable estimates? unique solution? 1 st , 2 nd ✓ 1 st - and 2 nd -order moments ( e.g. , prob. of word pairs) . ◮ Fairly easy to get reliable estimates. � Pr S [ “machine”, “learning” ] ≈ Pr θ ⋆ [ “machine”, “learning” ] [Arora-Ge-Moitra, ’12] [Kleinberg-Sandler, ’04] [Vempala-Wang, ’02] [McSherry, ’01] 1 st 2 nd Ω( k ) th order of moments 9

  24. Q1. Which moments should we use? moment order reliable estimates? unique solution? 1 st , 2 nd ✓ ✗ 1 st - and 2 nd -order moments ( e.g. , prob. of word pairs) . ◮ Fairly easy to get reliable estimates. � Pr S [ “machine”, “learning” ] ≈ Pr θ ⋆ [ “machine”, “learning” ] ◮ Can have multiple solutions to moment equations. � moments θ 1 = moments = moments θ 2 , θ 1 � = θ 2 [Arora-Ge-Moitra, ’12] [Kleinberg-Sandler, ’04] [Vempala-Wang, ’02] [McSherry, ’01] 1 st 2 nd Ω( k ) th order of moments 9

  25. Q1. Which moments should we use? moment order reliable estimates? unique solution? 1 st , 2 nd ✓ ✗ Ω( k ) th Ω( k ) th -order moments (prob. of word k -tuples) [Arora-Ge-Moitra, ’12] [Gravin et al , ’12] [Kleinberg-Sandler, ’04] [Moitra-Valiant, ’10] [Vempala-Wang, ’02] [Lindsay, ’89] [McSherry, ’01] [Prony, 1795] 1 st 2 nd Ω( k ) th order of moments 9

  26. Q1. Which moments should we use? moment order reliable estimates? unique solution? 1 st , 2 nd ✓ ✗ Ω( k ) th ✓ Ω( k ) th -order moments (prob. of word k -tuples) ◮ Uniquely pins down the solution. [Arora-Ge-Moitra, ’12] [Gravin et al , ’12] [Kleinberg-Sandler, ’04] [Moitra-Valiant, ’10] [Vempala-Wang, ’02] [Lindsay, ’89] [McSherry, ’01] [Prony, 1795] 1 st 2 nd Ω( k ) th order of moments 9

  27. Q1. Which moments should we use? moment order reliable estimates? unique solution? 1 st , 2 nd ✓ ✗ Ω( k ) th ✗ ✓ Ω( k ) th -order moments (prob. of word k -tuples) ◮ Uniquely pins down the solution. ◮ Empirical estimates very unreliable. [Arora-Ge-Moitra, ’12] [Gravin et al , ’12] [Kleinberg-Sandler, ’04] [Moitra-Valiant, ’10] [Vempala-Wang, ’02] [Lindsay, ’89] [McSherry, ’01] [Prony, 1795] 1 st 2 nd Ω( k ) th order of moments 9

  28. Q1. Which moments should we use? moment order reliable estimates? unique solution? 1 st , 2 nd ✓ ✗ Ω( k ) th ✗ ✓ Can we get best-of-both-worlds? [Arora-Ge-Moitra, ’12] [Gravin et al , ’12] [Kleinberg-Sandler, ’04] [Moitra-Valiant, ’10] [Vempala-Wang, ’02] [Lindsay, ’89] [McSherry, ’01] [Prony, 1795] 1 st 2 nd Ω( k ) th order of moments 9

  29. Q1. Which moments should we use? moment order reliable estimates? unique solution? 1 st , 2 nd ✓ ✗ Ω( k ) th ✗ ✓ Can we get best-of-both-worlds? Yes! In high-dimensions, low-order multivariate moments suffice. (1 st -, 2 nd -, and 3 rd -order moments) [Arora-Ge-Moitra, ’12] [Gravin et al , ’12] this work [Kleinberg-Sandler, ’04] [Moitra-Valiant, ’10] [Vempala-Wang, ’02] [Lindsay, ’89] [McSherry, ’01] [Prony, 1795] 1 st 2 nd 3 rd Ω( k ) th order of moments 9

  30. Low-order multivariate moments suffice Key observation : in high dimensions ( d ≫ k ) , low-order moments have simple (“low-rank”) algebraic structure. 10

Recommend


More recommend