Large Scale Matrix Analysis and Inference Wouter M. Koolen - Manfred - PowerPoint PPT Presentation

Large Scale Matrix Analysis and Inference Wouter M. Koolen - Manfred Warmuth Reza Bosagh Zadeh - Gunnar Carlsson - Michael Mahoney Dec 9, NIPS 2013 1 / 32

Introductory musing — What is a matrix? a i , j 1 A vector of n 2 parameters 2 A covariance 3 A generalized probability distribution 4 . . . 2 / 32

1. A vector of n 2 parameters When you regularize with the squared Frobenius norm � || W || 2 min F + loss( tr ( WX n )) W n 3 / 32

1. A vector of n 2 parameters When you regularize with the squared Frobenius norm � || W || 2 min F + loss( tr ( WX n )) W n Equivalent to � || vec( W ) || 2 min 2 + loss(vec( W ) · vec( X n )) vec( W ) n No structure: n 2 independent variables 4 / 32

2. A covariance View the symmetric positive definite matrix C as a covariance matrix of some random feature vector c ∈ R n , i.e. � ( c − E ( c ))( c − E ( c )) ⊤ � C = E n features plus their pairwise interactions 5 / 32

Symmetric matrices as ellipses Ellipse = { Cu : � u � 2 = 1 } Dotted lines connect point u on unit ball with point Cu on ellipse 6 / 32

Symmetric matrices as ellipses Eigenvectors form axes Eigenvalues are lengths 7 / 32

Dyads uu ⊤ , where u unit vector One eigenvalue one All others zero Rank one projection matrix 8 / 32

Directional variance along direction u V ( c ⊤ u ) = u ⊤ Cu = tr ( C uu ⊤ ) ≥ 0 The outer figure eight is direction u times the variance u ⊤ C u PCA: find direction of largest variance 9 / 32

3 dimensional variance plots tr ( C uu ⊤ ) is generalized probability when tr ( C ) = 1 10 / 32

3. Generalized probability distributions ω = ( . 2 , . 1 ., . 6 , . 1) ⊤ Probability vector = � e i ω i i �� mixture coefficients pure events W = � w i w ⊤ Density matrix ω i i i �� mixture coefficients pure density matrices 11 / 32

3. Generalized probability distributions ω = ( . 2 , . 1 ., . 6 , . 1) ⊤ Probability vector = � e i ω i i �� mixture coefficients pure events W = � w i w ⊤ Density matrix ω i i i �� mixture coefficients pure density matrices Matrices as generalized distributions 12 / 32

3. Generalized probability distributions ω = ( . 2 , . 1 ., . 6 , . 1) ⊤ Probability vector = � e i ω i i �� mixture coefficients pure events W = � w i w ⊤ Density matrix ω i i i �� mixture coefficients pure density matrices Matrices as generalized distributions Many mixtures lead to same density matrix There always exists a decomposition into n eigendyads Density matrix: Symmetric positive matrix of trace one 13 / 32

It’s like a probability! Total variance along orthogonal set of directions is 1 u ⊤ 1 Wu 1 + u ⊤ 2 Wu 2 = 1 a + b + c = 1 14 / 32

Uniform density? All dyads have generalized probability 1 1 n I n tr (1 n I uu ⊤ ) = 1 n tr ( uu ⊤ ) = 1 n Generalized probabilities of n orthogonal dyads sum to 1 15 / 32

Conventional Bayes Rule P ( M i | y ) = P ( M i ) P ( y | M i ) P ( y ) 4 updates with the same data likelihood Update maintains uncertainty information about maximum likelihood Soft max 16 / 32

Bayes Rule for density matrices D ( M | y ) = exp ( log D ( M ) + log D ( y | M )) tr (above matrix) 1 update with data likelyhood matrix D ( y | M ) Update maintains uncertainty information about maximum eigenvalue Soft max eigenvalue calculation 20 / 32

Bayes Rule for density matrices D ( M | y ) = exp ( log D ( M ) + log D ( y | M )) tr (above matrix) 2 updates with same data likelyhood matrix D ( y | M ) Update maintains uncertainty information about maximum eigenvalue Soft max eigenvalue calculation 21 / 32

Vector case as special case of matrix case Vectors as diagonal matrices All matrices same eigensystem Fancy ⊙ becomes · Often the hardest problem ie bounds for the vector case “lift” to the matrix case 28 / 32

Vector case as special case of matrix case Vectors as diagonal matrices All matrices same eigensystem Fancy ⊙ becomes · Often the hardest problem ie bounds for the vector case “lift” to the matrix case This phenomenon has been dubbed the “free matrix lunch” Size of matrix = size of vector = n 29 / 32

PCA setup Data vectors C = � n x n x ⊤ n u ⊤ C u tr ( Cuu ⊤ ) max = max unit u dyad uu ⊤ � �� linear in uu ⊤ not convex in u c ⊤ e i Corresponding vector problem max e i �� linear in e i Vector problem is matrix problem when everything happens in the same eigensystem Uncertainty over unit: probability vector Uncertainty over dyads: density matrix Uncertainty over k -sets of units: capped probability vector Uncertainty over rank k projection matrices: capped density matrix 30 / 32

For PCA Solve the vector problem first Do all bounds Lift to matrix case: essentially replace · by ⊙ Regret bounds stay the same Free Matrix Lunch 31 / 32

Questions When can you “lift”vector case to matrix case? When is there a free matrix lunch? Lifting matrices to tensors? Efficient algorithms for large matrices? Approximations of ⊙ Avoid eigenvalue decomposition by sampling . . . 32 / 32

Large Scale Matrix Analysis and Inference Wouter M. Koolen - Manfred - PowerPoint PPT Presentation

Large Scale Matrix Analysis and Inference Wouter M. Koolen - Manfred Warmuth Reza Bosagh Zadeh - Gunnar Carlsson - Michael Mahoney Dec 9, NIPS 2013 1 / 32 Introductory musing What is a matrix? a i , j 1 A vector of n 2 parameters 2 A

Surrogate models and reduction methods for UQ and inference in large-scale models Olivier Le

Random Walk Inference and Learning in A Large Scale Knowledge Base in A Large Scale Knowledge Base

Online-Updating Regularized Kernel Matrix Factorization Models for Large-Scale Recommender

CuMF: Large-Scale Matrix Factorization on Just One Machine with GPUs Wei Tan, IBM T. J. Watson

Random Walk Inference and Learning in A Large Scale Knowledge Base Anshul Bawa Adapted from

Large-Scale Network Topology Emulation and Inference Erik Rye Naval Postgraduate School

Transport & Multilevel Approaches for Large-Scale PDE-Constrained Bayesian Inference Robert

Using Large-Scale Matrix Factorizations to identify users of Social Networks Dr. Michael W.

Hierarchical Parallel Matrix Multiplication on Large-Scale Distributed Memory Platforms

Inference following aggregate-level hypothesis testing in large scale genomic data Ruth Heller

Large-Scale Matrix Factorization with Distributed Stochastic Gradient Descent Rainer Gemulla

Fostering sensi+vity analysis for genome-scale inference

Parallel Solution of Large-Scale Algebraic Bernoulli Equations with the matrix Sign Function

Large-Scale Matrix Factorization with Distributed Stochastic Gradient Descent Rainer Gemulla

Large-Scale Data Fusion Tutorial at BC^2, Basel 2015 by Collective Matrix Factorization

A 2-phase augmented Lagrangian approach for large scale matrix optimization Defeng Sun

Granula: Toward Fine-grained Performance Analysis of Large-scale Graph Processing Platforms Wing

A"Large(Scale"Analysis"of"the"

Introduction to Performance Analysis Visualization and Analysis of Performance on Large-scale

Large scale computation of the trace of a matrix function Giuseppe Rodriguez Department of

The Density Matrix Renormalization Group Method for Realistic Large-Scale Nuclear Shell-Model

NetSMF: Large-Scale Network Embedding as Sparse Matrix Factorization Jiezhong Qiu Tsinghua

robust control for analysis and design of large-scale optimization algorithms Laurent Lessard

Multi-scale Analysis of Large Distributed Computing Systems Lucas M. Schnorr, Arnaud Legrand,

Large Scale Matrix Analysis and Inference Wouter M. Koolen - Manfred - PowerPoint PPT Presentation

Large Scale Matrix Analysis and Inference Wouter M. Koolen - Manfred Warmuth Reza Bosagh Zadeh - Gunnar Carlsson - Michael Mahoney Dec 9, NIPS 2013 1 / 32 Introductory musing What is a matrix? a i , j 1 A vector of n 2 parameters 2 A

Surrogate models and reduction methods for UQ and inference in large-scale models Olivier Le

Random Walk Inference and Learning in A Large Scale Knowledge Base in A Large Scale Knowledge Base

Online-Updating Regularized Kernel Matrix Factorization Models for Large-Scale Recommender

CuMF: Large-Scale Matrix Factorization on Just One Machine with GPUs Wei Tan, IBM T. J. Watson

Random Walk Inference and Learning in A Large Scale Knowledge Base Anshul Bawa Adapted from

Large-Scale Network Topology Emulation and Inference Erik Rye Naval Postgraduate School

Transport &amp; Multilevel Approaches for Large-Scale PDE-Constrained Bayesian Inference Robert

Using Large-Scale Matrix Factorizations to identify users of Social Networks Dr. Michael W.

Hierarchical Parallel Matrix Multiplication on Large-Scale Distributed Memory Platforms

Inference following aggregate-level hypothesis testing in large scale genomic data Ruth Heller

Large-Scale Matrix Factorization with Distributed Stochastic Gradient Descent Rainer Gemulla

Fostering sensi+vity analysis for genome-scale inference

Parallel Solution of Large-Scale Algebraic Bernoulli Equations with the matrix Sign Function

Large-Scale Matrix Factorization with Distributed Stochastic Gradient Descent Rainer Gemulla

Large-Scale Data Fusion Tutorial at BC^2, Basel 2015 by Collective Matrix Factorization

A 2-phase augmented Lagrangian approach for large scale matrix optimization Defeng Sun

Granula: Toward Fine-grained Performance Analysis of Large-scale Graph Processing Platforms Wing

A&quot;Large(Scale&quot;Analysis&quot;of&quot;the&quot;

Introduction to Performance Analysis Visualization and Analysis of Performance on Large-scale

Large scale computation of the trace of a matrix function Giuseppe Rodriguez Department of

The Density Matrix Renormalization Group Method for Realistic Large-Scale Nuclear Shell-Model

NetSMF: Large-Scale Network Embedding as Sparse Matrix Factorization Jiezhong Qiu Tsinghua

robust control for analysis and design of large-scale optimization algorithms Laurent Lessard

Multi-scale Analysis of Large Distributed Computing Systems Lucas M. Schnorr, Arnaud Legrand,

Transport & Multilevel Approaches for Large-Scale PDE-Constrained Bayesian Inference Robert

A"Large(Scale"Analysis"of"the"