Discovery of Latent Factors in High-dimensional Data Using Tensor - PowerPoint PPT Presentation

Discovery of Latent Factors in High-dimensional Data Using Tensor Methods Furong Huang University of California, Irvine Machine Learning Conference 2016 New York City 1 / 24

Machine Learning - Modern Challenges Big Data Challenging Tasks Success of Supervised Learning Image classification Speech recognition Text processing Computation power growth Enormous labeled data 2 / 24

Machine Learning - Modern Challenges Big Data Challenging Tasks Real AI requires Unsupervised Learning Filter bank learning Feature extraction Embeddings; Topics 2 / 24

Machine Learning - Modern Challenges Big Data Challenging Tasks Real AI requires Unsupervised Learning Filter bank learning Feature extraction Embeddings; Topics Summarize key features in data: Machines vs Humans Foundation for successful supervised learning 2 / 24

Unsupervised Learning with Big Data Information Extraction High dimension observation vs Low dimension representation T opics Cell T ypes Communities 3 / 24

Unsupervised Learning with Big Data Information Extraction High dimension observation vs Low dimension representation T opics Cell T ypes Communities Finding Needle In the Haystack Is Challenging 3 / 24

Unsupervised Learning with Big Data Information Extraction Solution for Unsupervised Learning A Unified Tensor Decomposition Framework 3 / 24

Automated Categorization of Documents Mixed topics Topics Education Crime Sports 4 / 24

Community Extraction From Connectivity Graph Mixed memberships 5 / 24

Tensor Methods Compared with Variational Inference PubMed on Spark: 8 million docs 10 × 10 4 10 5 Tensor Variational 8 Running Time (s) Perplexity 6 10 4 4 2 10 3 0 6 / 24

Tensor Methods Compared with Variational Inference PubMed on Spark: 8 million docs 10 × 10 4 10 5 Tensor Variational 8 Running Time (s) Perplexity 6 10 4 4 2 10 3 0 Facebook: n ∼ 20 k Yelp: n ∼ 40 k DBLP: n ∼ 1 million 10 6 10 1 Running Times (s) 10 5 Error /group 10 0 10 4 10 -1 10 3 10 -2 10 2 FB YP DBLPsub DBLP FB YP DBLPsub DBLP 6 / 24

Tensor Methods Compared with Variational Inference PubMed on Spark: 8 million docs 10 × 10 4 10 5 Tensor Variational 8 Running Time (s) Perplexity 6 e t 10 4 a r u 4 c c A e 2 r o M & 10 3 0 r e t s a Facebook: n ∼ 20 k Yelp: n ∼ 40 k DBLP: n ∼ 1 million F e d 10 6 u t i n 10 1 g a M Running Times (s) 10 5 Error /group f o 10 0 s r e 10 4 d r O 10 -1 10 3 10 -2 10 2 FB YP DBLPsub DBLP FB YP DBLPsub DBLP “Online Tensor Methods for Learning Latent Variable Models”, F. Huang, U. Niranjan, M. Hakeem, A. Anandkumar, JMLR14. “Tensor Methods on Apache Spark”, by F. Huang, A. Anandkumar, Oct. 2015. 6 / 24

Cataloging Neuronal Cell Types In the Brain 7 / 24

Cataloging Neuronal Cell Types In the Brain Our method vs Average expression level [Grange 14’] Spatial point process (ours) k Average expression level ( previous ) 2.5 2.0 1.5 1.0 0.5 Recovered known cell types 1 astrocytes 2 interneurons 3 oligodendrocytes ” Discovering Neuronal Cell Types and Their Gene Expression Profiles Using a Spatial Point Process Mixture Model ” by F. Huang, A. Anandkumar, C. Borgs, J. Chayes, E. Fraenkel, M. Hawrylycz, E. Lein, A. Ingrosso, S. Turaga, NIPS 2015 BigNeuro workshop. 8 / 24

Word Sequence Embedding Extraction The weather is good. tree Her life spanned years of soccer incredible change for women. Mary lived through an era of football liberating reform for women. Word Embedding Word Sequence Embedding 9 / 24

Word Sequence Embedding Extraction The weather is good. tree Her life spanned years of soccer incredible change for women. Mary lived through an era of football liberating reform for women. Word Embedding Word Sequence Embedding Paraphrase Detection MSR paraphrase data: 5800 pairs of sentences Method Outside Information F score Vector Similarity (Baseline) word similarity 75.3% Convolutional Tensor (Proposed) none 80.7% Skip-thought (NIPS’15) train on large corpus 81.9% “Convolutional Dictionary Learning through Tensor Factorization”, by F. Huang, A. Anandkumar, conference and workshop proceeding of JMLR, vol.44, Dec 2015. 9 / 24

Human Disease Hierarchy Discovery CMS: 1.6 million patients, 168 million diagnostic events, 11 k diseases. ” Scalable Latent TreeModel and its Application to Health Analytics ” by F. Huang, N. U.Niranjan, I. Perros, R. Chen, J. Sun, A. Anandkumar, NIPS 2015 MLHC workshop. 10 / 24

Unsupervised Learning via Probabilistic Models h Choice Variable k1 k2 k3 k4 k5 Topics A A A A A Words data DNA RNA life gene Learning Algorithm Unlabeled data Probabilistic admixture model Inference 11 / 24

Unsupervised Learning via Probabilistic Models h Choice Variable k1 k2 k3 k4 k5 Topics A A A A A Words data DNA RNA life gene MCMC Unlabeled data Probabilistic admixture model Inference MCMC: random sampling, slow ◮ Exponential mixing time 11 / 24

Unsupervised Learning via Probabilistic Models h Choice Variable k3 k4 k5 Topics k1 k2 A A A A A Words life gene data DNA RNA Likelihood Methods Unlabeled data Probabilistic admixture model Inference MCMC: random sampling, slow ◮ Exponential mixing time Likelihood: non-convex, not scalable ◮ Exponential critical points 11 / 24

Unsupervised Learning via Probabilistic Models h Choice Variable k3 k4 k5 Topics k1 k2 A A A A A Words life gene data DNA RNA Likelihood Methods Unlabeled data Probabilistic admixture model Inference MCMC: random sampling, slow ◮ Exponential mixing time Likelihood: non-convex, not scalable ◮ Exponential critical points Solution A unified tensor decomposition framework 11 / 24

Unsupervised Learning via Probabilistic Models h Choice Variable k3 k4 k5 k1 k2 Topics ❂ ✰ ✰ A A A A A Words life gene data DNA RNA T Unlabeled data Probabilistic admixture model Inference ❡� ✁ ✂ ✄ ☎ ❡✆ ✂ ✝ ♣ ✂ ✁ ✞ ✟ ✞ ✂ � tensor decomposition → correct model 12 / 24

Unsupervised Learning via Probabilistic Models h Choice Variable k3 k4 k5 k1 k2 Topics ✔ ✕ ✕ A A A A A Words life gene data DNA RNA T Unlabeled data Probabilistic admixture model Inference ✠✡ ☛ ☞ ✌ ✍ ✠✎ ☞ ✏ ✑ ☞ ☛ ✒ ✓ ✒ ☞ ✡ tensor decomposition → correct model Contributions Guaranteed online algorithm with global convergence guarantee Highly scalable, highly parallel, random projection Tensor library on CPU/GPU/Spark Interdisciplinary applications Extension to model with group invariance 12 / 24

What is a tensor? Matrix: Second Order Moments i 2 M 2 : pair-wise relationship. = [ x ⊗ x ] i 1 ,i 2 = x i 1 x i 2 → [ M 2 ] i 1 ,i 2 i 1 Tensor: Third Order Moments i 3 M 3 : triple-wise relationship. i 2 = [ x ⊗ x ⊗ x ] i 1 ,i 2 ,i 3 = x i 1 x i 2 x i 3 → [ M 3 ] i 1 ,i 2 ,i 3 i 1 13 / 24

Why are tensors powerful? Matrix Orthogonal Decomposition √ √ Not unique without eigenvalue gap 2 2 � 1 e 2 u 2 = [ 2 ] 2 , � 0 = e 1 e ⊤ 1 + e 2 e ⊤ 2 = u 1 u ⊤ 1 + u 2 u ⊤ e 1 2 0 1 √ √ 2 , − 2 2 u 1 = [ 2 ] 14 / 24

Why are tensors powerful? Matrix Orthogonal Decomposition √ √ Not unique without eigenvalue gap 2 2 � 1 e 2 u 2 = [ 2 ] 2 , � 0 = e 1 e ⊤ 1 + e 2 e ⊤ 2 = u 1 u ⊤ 1 + u 2 u ⊤ e 1 2 0 1 √ √ 2 , − 2 2 u 1 = [ 2 ] Tensor Orthogonal Decomposition Unique: eigenvalue gap not needed = + ≠ 14 / 24

Why are tensors powerful? Matrix Orthogonal Decomposition √ √ Not unique without eigenvalue gap 2 2 � 1 e 2 u 2 = [ 2 ] 2 , � 0 = e 1 e ⊤ 1 + e 2 e ⊤ 2 = u 1 u ⊤ 1 + u 2 u ⊤ e 1 2 0 1 √ √ 2 , − 2 2 u 1 = [ 2 ] Tensor Orthogonal Decomposition Unique: eigenvalue gap not needed Slice of tensor has eigenvalue gap = + ≠ 14 / 24

Outline Introduction 1 LDA and Community Models 2 From Data Aggregates to Model Parameters Guaranteed Online Algorithm Conclusion 3 15 / 24

Outline Introduction 1 LDA and Community Models 2 From Data Aggregates to Model Parameters Guaranteed Online Algorithm Conclusion 3 16 / 24

Probabilistic Topic Models - LDA Bag of words campus police Topics witness campus police witness T opic Proportion campus police witness 17 / 24

Probabilistic Topic Models - LDA Bag of words campus police Topics Educa witness S p crime o on ✖ r t s campus campus police police witness witness T opic Proportion campus police witness 17 / 24

Probabilistic Topic Models - LDA Bag of words campus police Topics Educa � on witness S p crime o r t s campus campus police police witness witness T opic Proportion campus police witness Goal campus police Topic-word matrix P [ word = i | topic = j ] witness 17 / 24

Mixture Form of Moments Goal: Linearly independent topic-word table campus police witness 18 / 24

Discovery of Latent Factors in High-dimensional Data Using Tensor - PowerPoint PPT Presentation

Discovery of Latent Factors in High-dimensional Data Using Tensor Methods Furong Huang University of California, Irvine Machine Learning Conference 2016 New York City 1 / 24 Machine Learning - Modern Challenges Big Data Challenging Tasks

Discovery of Latent Factors in High-dimensional Data via Spectral Methods Furong Huang

UNESCO Discovery Centre reference image of education space UNESCO Discovery Centre Discovery

1 Latent variable models In the next section we will discuss latent variable models for

Part III: Latent Tree Models Le Song ICML 2012 Tutorial on Spectral Algorithms for Latent

n -dimensional manifold M with T := TM n -dimensional manifold M with T := TM T n -dimensional

High Dimensional Data Alark Joshi High dimensional data Data with multiple dimensions,

Retrieval by Content Part 3: Text Retrieval Latent Semantic Indexing Srihari: CSE 626 1 Latent

High Dimensional Data, Covariance Matrices High Dimensional Data Examples and Application to

Graph Theoretic Latent Class Discovery and Its Robustness to Minimal Dominating Set Choice

Efficient Model Evaluation in the Search-Based Approach to Latent Structure Discovery Tao Chen,

Statistics for High-Dimensional Data: Selected Topics Peter B uhlmann Seminar f ur

Using Local Neighborhoods to Find Subspace Clusters Emin Aksehirli with Bart Goethals, Emmanuel

Optimization-Based Model Fitting for Latent Class and Latent Profile Analyses Guan-Hua Huang,

High Dimensional Approximation - Outline Background and Sources Wolfgang Dahmen Seminar: USC,

Electromagnetic Form Factors of Electromagnetic Form Factors of Electromagnetic Form Factors of

Stochastic Latent Actor-Critic: Deep Reinforcement Learning with a Latent Variable Model CS330

ASIC accelerators 1 To read more This days papers: Reagan et al, Minerva: Enabling

Modeling Adult Visual Function Dr. James A. Bednar jbednar@inf.ed.ac.uk

Central Representation of Touch John H. Martin, Ph.D. Center for Neurobiology & Behavior

Module 8: Evaluating Immune Correlates of Protection Instructors: Ivan Chan, Peter Gilbert, Paul

Theoretical neuroscience: From single neuron to network dynamics Nicolas Brunel Outline

Logic, language and the brain: how autists reason with rules and exceptions Michiel van

Patient and Physician Reported Outcomes Karl Swedberg Senior professor of Medicine University

What Assessment Strategies Should Be Used in Determining Physician Competence? Bill McGaghie

Discovery of Latent Factors in High-dimensional Data Using Tensor - PowerPoint PPT Presentation

Discovery of Latent Factors in High-dimensional Data Using Tensor Methods Furong Huang University of California, Irvine Machine Learning Conference 2016 New York City 1 / 24 Machine Learning - Modern Challenges Big Data Challenging Tasks

Discovery of Latent Factors in High-dimensional Data via Spectral Methods Furong Huang

UNESCO Discovery Centre reference image of education space UNESCO Discovery Centre Discovery

1 Latent variable models In the next section we will discuss latent variable models for

Part III: Latent Tree Models Le Song ICML 2012 Tutorial on Spectral Algorithms for Latent

n -dimensional manifold M with T := TM n -dimensional manifold M with T := TM T n -dimensional

High Dimensional Data Alark Joshi High dimensional data Data with multiple dimensions,

Retrieval by Content Part 3: Text Retrieval Latent Semantic Indexing Srihari: CSE 626 1 Latent

High Dimensional Data, Covariance Matrices High Dimensional Data Examples and Application to

Graph Theoretic Latent Class Discovery and Its Robustness to Minimal Dominating Set Choice

Efficient Model Evaluation in the Search-Based Approach to Latent Structure Discovery Tao Chen,

Statistics for High-Dimensional Data: Selected Topics Peter B uhlmann Seminar f ur

Using Local Neighborhoods to Find Subspace Clusters Emin Aksehirli with Bart Goethals, Emmanuel

Optimization-Based Model Fitting for Latent Class and Latent Profile Analyses Guan-Hua Huang,

High Dimensional Approximation - Outline Background and Sources Wolfgang Dahmen Seminar: USC,

Electromagnetic Form Factors of Electromagnetic Form Factors of Electromagnetic Form Factors of

Stochastic Latent Actor-Critic: Deep Reinforcement Learning with a Latent Variable Model CS330

ASIC accelerators 1 To read more This days papers: Reagan et al, Minerva: Enabling

Modeling Adult Visual Function Dr. James A. Bednar jbednar@inf.ed.ac.uk

Central Representation of Touch John H. Martin, Ph.D. Center for Neurobiology &amp; Behavior

Module 8: Evaluating Immune Correlates of Protection Instructors: Ivan Chan, Peter Gilbert, Paul

Theoretical neuroscience: From single neuron to network dynamics Nicolas Brunel Outline

Logic, language and the brain: how autists reason with rules and exceptions Michiel van

Patient and Physician Reported Outcomes Karl Swedberg Senior professor of Medicine University

What Assessment Strategies Should Be Used in Determining Physician Competence? Bill McGaghie

Central Representation of Touch John H. Martin, Ph.D. Center for Neurobiology & Behavior