robust spectral inference for joint stochastic matrix
play

Robust Spectral Inference for Joint Stochastic Matrix Factorization - PowerPoint PPT Presentation

Robust Spectral Inference for Joint Stochastic Matrix Factorization Kun Dong Cornell University October 20, 2016 K. Dong (Cornell University) Robust Spectral Inference for Joint Stochastic Matrix Factorization October 20, 2016 1 / 17


  1. Robust Spectral Inference for Joint Stochastic Matrix Factorization Kun Dong Cornell University October 20, 2016 K. Dong (Cornell University) Robust Spectral Inference for Joint Stochastic Matrix Factorization October 20, 2016 1 / 17

  2. Introduction Topic Modeling • Idea: Represent documents as combination of topics. • Advantages: • Low-dimensional representation of documents • Uncover hidden structure from large collections • Applications: • Summarizing documents with the topics • Clustering documents by similarity in topics K. Dong (Cornell University) Robust Spectral Inference for Joint Stochastic Matrix Factorization October 20, 2016 2 / 17

  3. Joint Stochastic Matrix Factorization Co-occurrence Matrix • The relationships between words can be more revealing than the words themselves. C ≈ BAB T • C ∈ R n × n - Word-Word Matrix. C ij = p ( X 1 = i , X 2 = j ) • A ∈ R k × k - Topic-Topic Matrix. A k ℓ = p ( Z 1 = k , Z 2 = ℓ ) • B ∈ R n × k - Word-Topic Matrix. B ik = p ( X = i | Z = k ) K. Dong (Cornell University) Robust Spectral Inference for Joint Stochastic Matrix Factorization October 20, 2016 3 / 17

  4. What We Observe K. Dong (Cornell University) Robust Spectral Inference for Joint Stochastic Matrix Factorization October 20, 2016 4 / 17

  5. Anchor Word • Separability: The word-topic matrix B is p -separable if for each topic k there is some word i such that A i , k ≥ p and A i ,ℓ = 0 for ℓ � = k • Every topic k has an anchor word i exclusive to it. • Documents containing anchor word i must contain topic k . K. Dong (Cornell University) Robust Spectral Inference for Joint Stochastic Matrix Factorization October 20, 2016 5 / 17

  6. Anchor Word Algorithm • Under this assumption, Arora et al.(2013) showed Anchor Word Algorithm computes this decomposition in polynomial time. • Use QR with row-pivoting after random projection on C . Choose the points that are farthest away from each other. • However, it fails to produce doubly nonnegative topic-topic matrix. • It tends to choose rare words as anchors and generate less meaningful topics. K. Dong (Cornell University) Robust Spectral Inference for Joint Stochastic Matrix Factorization October 20, 2016 6 / 17

  7. Probablistic Structure • For m -th document with n m words, we view it as n m ( n m − 1) pairs. • Generate a distribution A over pairs of topics with parameter α . • Sample two topics ( z 1 , z 2 ) ∼ A . • Sample actual word-pair ( x 1 , x 2 ) ∼ ( B z 1 , B z 2 ). K. Dong (Cornell University) Robust Spectral Inference for Joint Stochastic Matrix Factorization October 20, 2016 7 / 17

  8. Statistical Structure • Let f ( α ) be a distribution of topic-distributions. • Documents are M i.i.d. samples { W 1 , · · · , W m } ∼ f ( α ). • Let the posterior topic-topic matrix A ∗ M = 1 � M m =1 W m W T m and M the expectation A ∗ = E [ W m W T M → A ∗ as M → ∞ . m ]. A ∗ m B T and • Let posterior word-word matrix C ∗ m = BW m W T C ∗ = 1 � M m =1 C ∗ m . M • Let C be the noisy observation for all samples. C → E [ C ] = C ∗ = BA ∗ M B T → BA ∗ B M , A ∗ ∈ DNN K and C ∗ ∈ DNN N . • A ∗ K. Dong (Cornell University) Robust Spectral Inference for Joint Stochastic Matrix Factorization October 20, 2016 8 / 17

  9. Generating Co-occurrence C • Let H m be the vector of word counts for m -th document and W m be the latent topic distribution. • Let p m = BW m , and we assume H m ∼ Multi ( n m , p m ). • E [ H m ] = n m p m = n m BW m and Cov ( H m ) = n m ( diag ( p m ) − p m p T m ). • Let co-occurrence C m = H m H T m − diag ( H m ) . n m ( n m − 1) • E [ C m | W m ] = C ∗ m so E [ C | W ] = C ∗ . K. Dong (Cornell University) Robust Spectral Inference for Joint Stochastic Matrix Factorization October 20, 2016 9 / 17

  10. Rectifying Co-occurrence C • In reality C could still mismatch C ∗ because of model assumption violation and limited data. • We can rectify C into low-rank, doubly non-negative and joint-stochastic by Alternating Projection (Dykstra’s Algorithm). PSD NK ( C ) = U Λ + K U T � 1 − � i , j C ij 11 T � NOR N ( C ) = C + N 2 NN N ( C ) = max { C , 0 } � K. Dong (Cornell University) Robust Spectral Inference for Joint Stochastic Matrix Factorization October 20, 2016 10 / 17

  11. Finding Anchor Words • Use a column-pivoting QR algorithm to greedily find topics farthest away from each other. • Exploit sparsity and avoid using random projection. K. Dong (Cornell University) Robust Spectral Inference for Joint Stochastic Matrix Factorization October 20, 2016 11 / 17

  12. Recovering Word-Topic Matrix B • If we row-normalize C to get C , C ij = p ( w 2 = j | w 1 = i ). • Under separability assumption, � p ( z 1 = k ′ | w 1 = s k ) p ( w 2 = j | z 1 = k ′ ) = p ( w 2 = j | z 1 = k ) C s k , j = k ′ • The row-space of C lies in the convex hull of C s k rows. � � C ij = p ( z 1 = k | w 1 = i ) p ( w 2 = j | z 1 = k ) = Q ik C s k , j k k • Find Q ik through NNLS and infer B ik with Bayes’ rule. K. Dong (Cornell University) Robust Spectral Inference for Joint Stochastic Matrix Factorization October 20, 2016 12 / 17

  13. Example of Recovered Topics K. Dong (Cornell University) Robust Spectral Inference for Joint Stochastic Matrix Factorization October 20, 2016 13 / 17

  14. Recovering Topic-Topic Matrix A K. Dong (Cornell University) Robust Spectral Inference for Joint Stochastic Matrix Factorization October 20, 2016 14 / 17

  15. Conclusion • This algorithm can handle noisy co-occurrence by rectification. • It produces quality anchor words and topics, even when sample size is small. • Preserve the structure of the decomposition under our assumption. K. Dong (Cornell University) Robust Spectral Inference for Joint Stochastic Matrix Factorization October 20, 2016 15 / 17

  16. Citation Sanjeev Arora, Rong Ge, Yonatan Halpern, David M Mimno, Ankur Moitra, David Sontag, Yichen Wu, and Michael Zhu. A practical algorithm for topic modeling with provable guarantees. Moontae Lee, David Bindel, and David Mimno. Robust spectral inference for joint stochastic matrix factorization. In Advances in Neural Information Processing Systems , pages 2710–2718, 2015. K. Dong (Cornell University) Robust Spectral Inference for Joint Stochastic Matrix Factorization October 20, 2016 16 / 17

  17. Thank you! K. Dong (Cornell University) Robust Spectral Inference for Joint Stochastic Matrix Factorization October 20, 2016 17 / 17

Recommend


More recommend