Algebraic models for multilinear dependence Jason Morton Stanford University February 21, 2009 NSF Tensor Workshop Joint work with Lek-Heng Lim of U.C. Berkeley J. Morton (Stanford) Algebraic models for multilinear dependence 2/21/09 NSF Tensor 1 / 37
Univariate cumulants Mean, variance, skewness and kurtosis describe the shape of a univariate distribution. J. Morton (Stanford) Algebraic models for multilinear dependence 2/21/09 NSF Tensor 2 / 37
Covariance matrices The covariance matrix partly describes the dependence structure of a multivariate distribution. Principal Component Analysis Factor models Risk–bilinear form computes variance h ⊤ Σ h of holdings But if the variables are not multivariate Gaussian, not the whole story. This is one point of view on the financial crisis; too much reliance on a quadratic, Gaussian perspective on risk. Exploited by trading skewness and kurtosis risk for apparent reduction in variance. J. Morton (Stanford) Algebraic models for multilinear dependence 2/21/09 NSF Tensor 3 / 37
Sharpe Ratio ( µ − µ f σ ) vs Skewness 10 0 Skewness −10 −20 −30 0 0.5 1 1.5 Sharpe Ratio Hedge Fund Research Indices daily returns J. Morton (Stanford) Algebraic models for multilinear dependence 2/21/09 NSF Tensor 4 / 37
Non-multivariate Gaussian returns are common; HFRI Distressed/Restructuring Index vs. Merger Arbitrage 5 Merger Arb 0 −5 −5 0 5 Distressed J. Morton (Stanford) Algebraic models for multilinear dependence 2/21/09 NSF Tensor 5 / 37
Even if marginals normal, dependence might not be 1000 Simulated Clayton(3)−Dependent N(0,1) Values 5 4 3 2 X2 ~ N(0,1) 1 0 −1 −2 −3 −4 −5 −5 0 5 X1 ~ N(0,1) J. Morton (Stanford) Algebraic models for multilinear dependence 2/21/09 NSF Tensor 6 / 37
Covariance matrix analogs: multivariate cumulants The cumulant tensors are the multivariate analog of skewness and kurtosis. They describe higher order dependence among random variables. The covariance matrix lets us optimize wrt variance; the cumulant tensors let us optimize wrt skewness, kurtosis, . . . Definitions: tensors and cumulants 1 Properties of cumulant tensors 2 Low multilinear rank model (subspace variety) 3 Quasi-Newton algorithm on Grassmannian 4 Multi-moment portfolio optimization 5 Dimension reduction 6 J. Morton (Stanford) Algebraic models for multilinear dependence 2/21/09 NSF Tensor 7 / 37
Introduction 1 Definitions 2 Properties 3 Principal Cumulant Component Analysis 4 Algorithm 5 Applications 6 J. Morton (Stanford) Algebraic models for multilinear dependence 2/21/09 NSF Tensor 8 / 37
Symmetric multilinear matrix multiplication K Q C If Q is a p × r matrix, C an r × r × r tensor, make a p × p × p tensor K = ( Q , Q , Q ) · C or K = Q · C ( r , r , r ) � κ ℓ mn = q ℓ i q mj q nk c ijk . i , j , k =(1 , 1 , 1) J. Morton (Stanford) Algebraic models for multilinear dependence 2/21/09 NSF Tensor 9 / 37
Moments and Cumulants are symmetric tensors Vector-valued random variable x = ( X 1 , . . . , X n ). Three natural d -way tensors are: The d th non-central moment s i 1 ,..., i d of x : � p � S d ( x ) = E ( x i 1 x i 2 · · · x id ) i 1 ,..., i d =1 . The d th central moment M d = S d ( x − E [ x ]), and The d th cumulant κ i 1 ... i d of x : p � ( − 1) q − 1 ( q − 1)! s A 1 . . . s A q . K d ( x ) = A 1 ⊔···⊔ A q = { i 1 ,..., i d } i 1 ,..., i d =1 s i 1 ,..., i d = � � b ∈ B κ b B and κ ijk ℓ = m ijk ℓ − ( m ij m k ℓ + m ik m j ℓ + m i ℓ m jk ) J. Morton (Stanford) Algebraic models for multilinear dependence 2/21/09 NSF Tensor 10 / 37
Measuring useful properties. For univariate x , the cumulants K d ( x ) for d = 1 , 2 , 3 , 4 are expectation κ i = E [ x ], variance κ ii = σ 2 , skewness κ iii /κ 3 / 2 ii , and kurtosis κ iiii /κ 2 ii . The tensor versions are the multivariate generalizations κ ijk they provide a natural measure of non-Gaussianity. J. Morton (Stanford) Algebraic models for multilinear dependence 2/21/09 NSF Tensor 11 / 37
Alternative Definitions of Cumulants In terms of log characteristic function, ∂ d � κ α 1 ··· α d ( x ) = ( − i ) d � log E (exp( i � t , x � ) . � ∂ t α 1 · · · ∂ t α d � t = 0 In terms of Edgeworth series, ∞ i | α | κ α ( x ) t α � log E (exp( i � t , x � ) = α ! α =0 where α = ( α 1 , . . . , α d ) is a multi-index, t α = t α 1 1 · · · t α d d , and α ! = α 1 ! · · · α d !. See [Fisher 1929, McCullagh 1984,1987] for definitions and properties. J. Morton (Stanford) Algebraic models for multilinear dependence 2/21/09 NSF Tensor 12 / 37
Introduction 1 Definitions 2 Properties 3 Principal Cumulant Component Analysis 4 Algorithm 5 Applications 6 J. Morton (Stanford) Algebraic models for multilinear dependence 2/21/09 NSF Tensor 13 / 37
Properties of cumulants: Multilinearity Multilinearity: if x is a R r -valued random variable and A ∈ R p × r K d ( A x ) = A · K d ( x ) , where · is the multilinear action . This makes factor models work: y = A x implies K d ( y ) = A · K d ( x ); Covariance factor model: K 2 ( y ) = AK 2 ( x ) A ⊤ . Independent Component Analysis finds an A to approximately diagonalize K d ( x ). J. Morton (Stanford) Algebraic models for multilinear dependence 2/21/09 NSF Tensor 14 / 37
Properties of cumulants: Independence Independence: If x 1 , . . . , x p are random variables mutually independent of y 1 , . . . , y p , we have K d ( x 1 + y 1 , . . . , x p + y p ) = K d ( x 1 , . . . , x p ) + K d ( y 1 , . . . , y p ). K i 1 ,..., i d ( x ) = 0 whenever there is a partition of { i 1 , . . . , i d } into two nonempty sets I and J such that x I and x J are independent. Why we want to diagonalize in independent component analysis Exploitable in other sparse cumulant techniques (breaks rotational symmetry) J. Morton (Stanford) Algebraic models for multilinear dependence 2/21/09 NSF Tensor 15 / 37
Properties of cumulants: Vanishing and Extending Gaussian: If x is multivariate normal, then K d ( x ) = 0 for all d ≥ 3. ◮ Why one might not have heard of them: for Gaussians, the covariance matrix does tell the whole story. Marcinkiewicz Theorem: There are no distributions with a bound D so that � � = 0 3 ≤ d ≤ D , K d ( x ) = 0 d > D . ◮ Parametrization is trickier when K 2 doesn’t tell the whole story. J. Morton (Stanford) Algebraic models for multilinear dependence 2/21/09 NSF Tensor 16 / 37
Making cumulants useful, tractable and estimable Cumulant tensors are a useful generalization, but too big. They have � # vars + d − 1 � quantities, too many to d estimate with a reasonable amount of data, optimize, and store. Needed: small, implicit factor models analogous to Principal Component Analysis (PCA) PCA: eigenvalue decomposition of a positive semidefinite real symmetric matrix. We need a tensor analog. But, it isn’t as easy as it looks . . . J. Morton (Stanford) Algebraic models for multilinear dependence 2/21/09 NSF Tensor 17 / 37
Tensor decomposition Three possible generalizations are the same in the matrix case but not in the tensor case. For a p × p × p tensor K , Name minimum r such that K = � r Tensor rank i =1 u i ⊗ v i ⊗ w i not closed Border rank K = lim ǫ → 0 ( S ǫ ), Tensor rank( S ǫ ) = r closed but hard to represent; defining equations unknown. K = A · C , C ∈ R r × r × r , A ∈ R p × r , Multilinear rank closed and understood. J. Morton (Stanford) Algebraic models for multilinear dependence 2/21/09 NSF Tensor 18 / 37
Geometric perspective Secants of Veronese in S d ( R p ) and rank subsets — difficult to study. Symmetric subspace variety in S d ( R p ) — closed, easy to study. We take the long skinny matrix to be orthonormal. ◮ Stiefel manifold O( p , r ) is set of p × r real matrices Q with orthonormal columns. ◮ Grassmannian Gr( p , r ) is set of equivalence classes [Q] of O( p , r ) under right multiplication by O( r ). Parametrization of S d ( R n ) via Gr( p , r ) × S d ( R r ) → S d ( R p ) . J. Morton (Stanford) Algebraic models for multilinear dependence 2/21/09 NSF Tensor 19 / 37
Introduction 1 Definitions 2 Properties 3 Principal Cumulant Component Analysis 4 Algorithm 5 Applications 6 J. Morton (Stanford) Algebraic models for multilinear dependence 2/21/09 NSF Tensor 20 / 37
Multilinear rank factor model Let y = Y 1 , . . . , Y n be a random vector. Write the d th order cumulant K d ( y ) as a best r -multilinear rank approximation in terms of the cumulant K d ( x ) of a smaller set of r factors x : K d ( y ) ≈ Q · K d ( x ) ≈ where Q is orthonormal, and Q ⊤ projects to the factors The column space of Q defines the r -dim subspace which best explains the d th order dependence. In place of eigenvalues, we have the core tensor K d ( x ), the cumulant of the factors, analogous to the covariance matrix of the factors in the r × r case. Have model, need loss and algorithm. J. Morton (Stanford) Algebraic models for multilinear dependence 2/21/09 NSF Tensor 21 / 37
Recommend
More recommend