discovery of latent factors in high dimensional data
play

Discovery of Latent Factors in High-dimensional Data Using Tensor - PowerPoint PPT Presentation

Discovery of Latent Factors in High-dimensional Data Using Tensor Methods Furong Huang University of California, Irvine Machine Learning Conference 2016 New York City 1 / 24 Machine Learning - Modern Challenges Big Data Challenging Tasks


  1. Discovery of Latent Factors in High-dimensional Data Using Tensor Methods Furong Huang University of California, Irvine Machine Learning Conference 2016 New York City 1 / 24

  2. Machine Learning - Modern Challenges Big Data Challenging Tasks Success of Supervised Learning Image classification Speech recognition Text processing Computation power growth Enormous labeled data 2 / 24

  3. Machine Learning - Modern Challenges Big Data Challenging Tasks Real AI requires Unsupervised Learning Filter bank learning Feature extraction Embeddings; Topics 2 / 24

  4. Machine Learning - Modern Challenges Big Data Challenging Tasks Real AI requires Unsupervised Learning Filter bank learning Feature extraction Embeddings; Topics Summarize key features in data: Machines vs Humans Foundation for successful supervised learning 2 / 24

  5. Unsupervised Learning with Big Data Information Extraction High dimension observation vs Low dimension representation T opics Cell T ypes Communities 3 / 24

  6. Unsupervised Learning with Big Data Information Extraction High dimension observation vs Low dimension representation T opics Cell T ypes Communities Finding Needle In the Haystack Is Challenging 3 / 24

  7. Unsupervised Learning with Big Data Information Extraction Solution for Unsupervised Learning A Unified Tensor Decomposition Framework 3 / 24

  8. Automated Categorization of Documents Mixed topics Topics Education Crime Sports 4 / 24

  9. Community Extraction From Connectivity Graph Mixed memberships 5 / 24

  10. Tensor Methods Compared with Variational Inference PubMed on Spark: 8 million docs 10 × 10 4 10 5 Tensor Variational 8 Running Time (s) Perplexity 6 10 4 4 2 10 3 0 6 / 24

  11. Tensor Methods Compared with Variational Inference PubMed on Spark: 8 million docs 10 × 10 4 10 5 Tensor Variational 8 Running Time (s) Perplexity 6 10 4 4 2 10 3 0 Facebook: n ∼ 20 k Yelp: n ∼ 40 k DBLP: n ∼ 1 million 10 6 10 1 Running Times (s) 10 5 Error /group 10 0 10 4 10 -1 10 3 10 -2 10 2 FB YP DBLPsub DBLP FB YP DBLPsub DBLP 6 / 24

  12. Tensor Methods Compared with Variational Inference PubMed on Spark: 8 million docs 10 × 10 4 10 5 Tensor Variational 8 Running Time (s) Perplexity 6 e t 10 4 a r u 4 c c A e 2 r o M & 10 3 0 r e t s a Facebook: n ∼ 20 k Yelp: n ∼ 40 k DBLP: n ∼ 1 million F e d 10 6 u t i n 10 1 g a M Running Times (s) 10 5 Error /group f o 10 0 s r e 10 4 d r O 10 -1 10 3 10 -2 10 2 FB YP DBLPsub DBLP FB YP DBLPsub DBLP “Online Tensor Methods for Learning Latent Variable Models”, F. Huang, U. Niranjan, M. Hakeem, A. Anandkumar, JMLR14. “Tensor Methods on Apache Spark”, by F. Huang, A. Anandkumar, Oct. 2015. 6 / 24

  13. Cataloging Neuronal Cell Types In the Brain 7 / 24

  14. Cataloging Neuronal Cell Types In the Brain Our method vs Average expression level [Grange 14’] Spatial point process (ours) k Average expression level ( previous ) 2.5 2.0 1.5 1.0 0.5 Recovered known cell types 1 astrocytes 2 interneurons 3 oligodendrocytes ” Discovering Neuronal Cell Types and Their Gene Expression Profiles Using a Spatial Point Process Mixture Model ” by F. Huang, A. Anandkumar, C. Borgs, J. Chayes, E. Fraenkel, M. Hawrylycz, E. Lein, A. Ingrosso, S. Turaga, NIPS 2015 BigNeuro workshop. 8 / 24

  15. Word Sequence Embedding Extraction The weather is good. tree Her life spanned years of soccer incredible change for women. Mary lived through an era of football liberating reform for women. Word Embedding Word Sequence Embedding 9 / 24

  16. Word Sequence Embedding Extraction The weather is good. tree Her life spanned years of soccer incredible change for women. Mary lived through an era of football liberating reform for women. Word Embedding Word Sequence Embedding Paraphrase Detection MSR paraphrase data: 5800 pairs of sentences Method Outside Information F score Vector Similarity (Baseline) word similarity 75.3% Convolutional Tensor (Proposed) none 80.7% Skip-thought (NIPS’15) train on large corpus 81.9% “Convolutional Dictionary Learning through Tensor Factorization”, by F. Huang, A. Anandkumar, conference and workshop proceeding of JMLR, vol.44, Dec 2015. 9 / 24

  17. Human Disease Hierarchy Discovery CMS: 1.6 million patients, 168 million diagnostic events, 11 k diseases. ” Scalable Latent TreeModel and its Application to Health Analytics ” by F. Huang, N. U.Niranjan, I. Perros, R. Chen, J. Sun, A. Anandkumar, NIPS 2015 MLHC workshop. 10 / 24

  18. Unsupervised Learning via Probabilistic Models h Choice Variable k1 k2 k3 k4 k5 Topics A A A A A Words data DNA RNA life gene Learning Algorithm Unlabeled data Probabilistic admixture model Inference 11 / 24

  19. Unsupervised Learning via Probabilistic Models h Choice Variable k1 k2 k3 k4 k5 Topics A A A A A Words data DNA RNA life gene MCMC Unlabeled data Probabilistic admixture model Inference MCMC: random sampling, slow ◮ Exponential mixing time 11 / 24

  20. Unsupervised Learning via Probabilistic Models h Choice Variable k3 k4 k5 Topics k1 k2 A A A A A Words life gene data DNA RNA Likelihood Methods Unlabeled data Probabilistic admixture model Inference MCMC: random sampling, slow ◮ Exponential mixing time Likelihood: non-convex, not scalable ◮ Exponential critical points 11 / 24

  21. Unsupervised Learning via Probabilistic Models h Choice Variable k3 k4 k5 Topics k1 k2 A A A A A Words life gene data DNA RNA Likelihood Methods Unlabeled data Probabilistic admixture model Inference MCMC: random sampling, slow ◮ Exponential mixing time Likelihood: non-convex, not scalable ◮ Exponential critical points Solution A unified tensor decomposition framework 11 / 24

  22. Unsupervised Learning via Probabilistic Models h Choice Variable k3 k4 k5 k1 k2 Topics ❂ ✰ ✰ A A A A A Words life gene data DNA RNA T Unlabeled data Probabilistic admixture model Inference ❡� ✁ ✂ ✄ ☎ ❡✆ ✂ ✝ ♣ ✂ ✁ ✞ ✟ ✞ ✂ � tensor decomposition → correct model 12 / 24

  23. Unsupervised Learning via Probabilistic Models h Choice Variable k3 k4 k5 k1 k2 Topics ✔ ✕ ✕ A A A A A Words life gene data DNA RNA T Unlabeled data Probabilistic admixture model Inference ✠✡ ☛ ☞ ✌ ✍ ✠✎ ☞ ✏ ✑ ☞ ☛ ✒ ✓ ✒ ☞ ✡ tensor decomposition → correct model Contributions Guaranteed online algorithm with global convergence guarantee Highly scalable, highly parallel, random projection Tensor library on CPU/GPU/Spark Interdisciplinary applications Extension to model with group invariance 12 / 24

  24. What is a tensor? Matrix: Second Order Moments i 2 M 2 : pair-wise relationship. = [ x ⊗ x ] i 1 ,i 2 = x i 1 x i 2 → [ M 2 ] i 1 ,i 2 i 1 Tensor: Third Order Moments i 3 M 3 : triple-wise relationship. i 2 = [ x ⊗ x ⊗ x ] i 1 ,i 2 ,i 3 = x i 1 x i 2 x i 3 → [ M 3 ] i 1 ,i 2 ,i 3 i 1 13 / 24

  25. Why are tensors powerful? Matrix Orthogonal Decomposition √ √ Not unique without eigenvalue gap 2 2 � 1 e 2 u 2 = [ 2 ] 2 , � 0 = e 1 e ⊤ 1 + e 2 e ⊤ 2 = u 1 u ⊤ 1 + u 2 u ⊤ e 1 2 0 1 √ √ 2 , − 2 2 u 1 = [ 2 ] 14 / 24

  26. Why are tensors powerful? Matrix Orthogonal Decomposition √ √ Not unique without eigenvalue gap 2 2 � 1 e 2 u 2 = [ 2 ] 2 , � 0 = e 1 e ⊤ 1 + e 2 e ⊤ 2 = u 1 u ⊤ 1 + u 2 u ⊤ e 1 2 0 1 √ √ 2 , − 2 2 u 1 = [ 2 ] Tensor Orthogonal Decomposition Unique: eigenvalue gap not needed = + ≠ 14 / 24

  27. Why are tensors powerful? Matrix Orthogonal Decomposition √ √ Not unique without eigenvalue gap 2 2 � 1 e 2 u 2 = [ 2 ] 2 , � 0 = e 1 e ⊤ 1 + e 2 e ⊤ 2 = u 1 u ⊤ 1 + u 2 u ⊤ e 1 2 0 1 √ √ 2 , − 2 2 u 1 = [ 2 ] Tensor Orthogonal Decomposition Unique: eigenvalue gap not needed Slice of tensor has eigenvalue gap = + ≠ 14 / 24

  28. Why are tensors powerful? Matrix Orthogonal Decomposition √ √ Not unique without eigenvalue gap 2 2 � 1 e 2 u 2 = [ 2 ] 2 , � 0 = e 1 e ⊤ 1 + e 2 e ⊤ 2 = u 1 u ⊤ 1 + u 2 u ⊤ e 1 2 0 1 √ √ 2 , − 2 2 u 1 = [ 2 ] Tensor Orthogonal Decomposition Unique: eigenvalue gap not needed Slice of tensor has eigenvalue gap = + ≠ 14 / 24

  29. Outline Introduction 1 LDA and Community Models 2 From Data Aggregates to Model Parameters Guaranteed Online Algorithm Conclusion 3 15 / 24

  30. Outline Introduction 1 LDA and Community Models 2 From Data Aggregates to Model Parameters Guaranteed Online Algorithm Conclusion 3 16 / 24

  31. Probabilistic Topic Models - LDA Bag of words campus police Topics witness campus police witness T opic Proportion campus police witness 17 / 24

  32. Probabilistic Topic Models - LDA Bag of words campus police Topics Educa witness S p crime o on ✖ r t s campus campus police police witness witness T opic Proportion campus police witness 17 / 24

  33. Probabilistic Topic Models - LDA Bag of words campus police Topics Educa � on witness S p crime o r t s campus campus police police witness witness T opic Proportion campus police witness Goal campus police Topic-word matrix P [ word = i | topic = j ] witness 17 / 24

  34. Mixture Form of Moments Goal: Linearly independent topic-word table campus police witness 18 / 24

Recommend


More recommend