Low Rank Approximation Lecture 6 Daniel Kressner Chair for Numerical Algorithms and HPC Institute of Mathematics, EPFL daniel.kressner@epfl.ch 1
The Kronecker product 2
Vectorization The vectorization of an m × n matrix A denoted by vec ( A ) , where vec : R m × n → R m · n stacks the columns of a matrix into a long column vector. Example: a 11 a 21 a 11 a 12 a 31 A = a 21 a 22 ⇒ vec ( A ) = a 12 a 31 a 32 a 22 a 32 Remarks: ◮ In M ATLAB : A(:) ◮ This way of vectorizing corresponds to how matrices are laid out in memory in M ATLAB . In other programming languages (e.g., C arrays) matrices are laid out rowwise. 3
Kronecker product For m × n matrix A and k × ℓ matrix B , Kronecker product defined as b 11 A · · · b 1 ℓ A . . ∈ R km × ℓ n . B ⊗ A := . . . . b k 1 A · · · b k ℓ A Most important properties (for our purposes): 1. vec ( A X ) = ( I ⊗ A ) vec ( X ) . 2. vec ( X A T ) = ( A ⊗ I ) vec ( X ) . 3. ( B ⊗ A )( D ⊗ C ) = ( BD ⊗ AC ) for A ∈ R m × n , B ∈ R k × ℓ , C ∈ R n × q , D ∈ R ℓ × p . 4. I m ⊗ I n = I mn . 5. ( A 1 + A 2 ) ⊗ B = A 1 ⊗ B + A 2 ⊗ B , A ⊗ ( B 1 ⊗ B 2 ) = A ⊗ B 1 + A ⊗ B 2 4
First steps with tensors 5
Vectors, matrices, and tensors Vector Matrix Tensor ◮ scalar = tensor of order 0 ◮ (column) vector = tensor of order 1 ◮ matrix = tensor of order 2 ◮ tensor of order 3 = n 1 n 2 n 3 numbers arranged in n 1 × n 2 × n 3 array 6
Tensors of arbitrary order A d -th order tensor X of size n 1 × n 2 × · · · × n d is a d -dimensional array with entries X i 1 , i 2 ,..., i d , i µ ∈ { 1 , . . . , n µ } for µ = 1 , . . . , d . In the following, entries of X are usually real (for simplicity) � X ∈ R n 1 × n 2 ×···× n d . Multi-index notation: I = { 1 , . . . , n 1 } × { 1 , . . . , n 2 } × · · · × { 1 , . . . , n d } . Then i ∈ I is a tuple of d indices: i = ( i 1 , i 2 , . . . , i d ) . Allows to write entries of X as X i for i ∈ I . 7
Two important points 1. A matrix A ∈ R m × n has a natural interpretation as a linear operator in terms of matrix-vector multiplications: A : R n → R m , A : x �→ A · x . There is no such (unique and natural) interpretation for tensors! � fundamental difficulty to define meaningful general notion of eigenvalues and singular values of tensors. 2. Number of entries in tensor grows exponentially with d � Curse of dimensionality. Example: Tensor of order 30 with n 1 = n 2 = · · · = n d = 10 has 10 30 entries = 8 × 10 12 Exabyte storage! 1 For d ≫ 1: Cannot afford to store tensor explicitly (in terms of its entries). 1 Global data storage a few years ago calculated at 295 exabyte, see http://www.bbc.co.uk/news/technology-12419672 . 8
Basic calculus ◮ Addition of two equal-sized tensors X , Y : Z = X + Y ⇔ Z i = X i + Y i ∀ i ∈ I . ◮ Scalar multiplication with α ∈ R : Z = α X ⇔ Z i = α X i ∀ i ∈ I . � vector space structure. ◮ Inner product of two equal-sized tensors X , Y : � �X , Y� := x i y i . i ∈ I � Induced norm � � � 1 / 2 x 2 �X� := i i ∈ I For a 2nd order tensor ( = matrix) this corresponds to the usual Euclidean geometry and Frobenius norm . 9
Vectorization Tensor X of size n 1 × n 2 × · · · × n d has n 1 · n 2 · · · n d entries � many ways to stack entries in a (loooong) column vector. One possible choice: The vectorization of X is denoted by vec ( X ) , where vec : R n 1 × n 2 ×···× n d → R n 1 · n 2 ··· n d stacks the entries of a tensor in reverse lexicographical order into a long column vector. Example: d = 3, n 1 = 3, n 2 = 2, n 3 = 3. x 111 x 211 x 311 x 121 . . vec ( X ) = . . . . x 123 x 223 x 323 10
Matricization ◮ A matrix has two modes (column mode and row mode). ◮ A d th-order tensor X has d modes ( µ = 1, µ = 2, . . . , µ = d ). Let us fix all but one mode, e.g., µ = 1: Then X (: , i 2 , i 3 , . . . , i d ) (abuse of M ATLAB notation) is a vector of length n 1 for each choice of i 2 , . . . , i d . These vectors are called fibers. � View tensor X as a bunch of column vectors: 11
Matricization Stack vectors into an n 1 × ( n 2 · · · n d ) matrix: X ( 1 ) ∈ R n 1 × ( n 2 n 3 ··· n d ) X ∈ R n 1 × n 2 ×···× n d For µ = 1 , . . . , d , the µ -mode matricization of X is a matrix X ( µ ) ∈ R n µ × ( n 1 ··· n µ − 1 n µ + 1 ··· n d ) with entries � X ( µ ) � i µ 1 , ( i 1 ,..., i µ − 1 , i µ + 1 ... i d ) = X i ∀ i ∈ I . 12
Matricization In M ATLAB : a = rand(2,3,4,5); ◮ 1-mode matricization: reshape(a,2,3*4*5) ◮ 2-mode matricization: b = permute(a,[2 1 3 4]); reshape(b,3,2*4*5) ◮ 3-mode matricization: b = permute(a,[3 1 2 4]); reshape(b,4,2*3*5) ◮ 4-mode matricization: b = permute(a,[4 1 2 3]); reshape(b,5,2*3*4) For a matrix A ∈ R n 1 × n 2 : A ( 1 ) = A , A ( 2 ) = A T . 13
µ -mode matrix products Consider 1-mode matricization X ( 1 ) ∈ R n 1 × ( n 2 ··· n d ) : Seems to make sense to multiply an m × n 1 matrix A from the left: Y ( 1 ) := A X ( 1 ) ∈ R m × ( n 2 ··· n d ) . Can rearrange Y ( 1 ) back into an m × n 2 × · · · × n d tensor Y . This is called 1-mode matrix multiplication Y ( 1 ) = AX ( 1 ) Y = A ◦ 1 X ⇔ More formally (and more ugly): n 1 � Y i 1 , i 2 ,..., i d = a i 1 , k X k , i 2 ,..., i d . k = 1 14
µ -mode matrix products General definition of a µ -mode matrix product with A ∈ R m × n 1 : Y ( µ ) = AX ( µ ) . Y = A ◦ µ X ⇔ More formally (and more ugly): n 1 � Y i 1 , i 2 ,..., i d = a i µ , k X i 1 ,..., i µ − 1 , k , i µ + 1 ,..., i d . k = 1 For matrices: ◮ 1-mode multiplication = multiplication from the left: Y = A ◦ 1 X = A X . ◮ 2-mode multiplication = transposed multiplication from the right: Y = A ◦ 2 X = X A T . 15
µ -mode matrix products and vectorization By definition, � X ( 1 ) � vec ( X ) = vec . Consequently, also � A X ( 1 ) � vec ( A ◦ 1 X ) = vec . � Vectorized version of 1-mode matrix product: vec ( A ◦ 1 X ) = ( I n 2 ··· n d ⊗ A ) vec ( X ) = ( I n d ⊗ · · · ⊗ I n 2 ⊗ A ) vec ( X ) . Relation between µ -mode matrix product and matrix-vector product: vec ( A ◦ µ X ) = ( I n d ⊗ · · · ⊗ I n µ + 1 ⊗ A ⊗ I n µ − 1 ⊗ · · · ⊗ I n 1 ) vec ( X ) 16
Summary ◮ Tensor X ∈ R n 1 ×···× n d is a d -dimensional array. ◮ Various ways of reshaping entries of a tensor X into a vector or matrix. ◮ µ -mode matrix multiplication can be expressed with Kronecker products Further reading: ◮ T. Kolda and B. W. Bader. Tensor decompositions and applications. SIAM Rev. 51 (2009), no. 3, 455–500. Software: ◮ M ATLAB (and all programming languages) offer basic functionality to work with d -dimensional arrays. ◮ M ATLAB Tensor Toolbox: http://www.tensortoolbox.org/ 17
Applications of tensors 18
Two classes of tensor problems Class 1: function-related tensors Consider a function u ( ξ 1 , . . . , ξ d ) ∈ R in d variables ξ 1 , . . . , ξ d . Tensor U ∈ R n 1 ×···× n d represents discretization of u : ◮ U contains function values of u evaluated on a grid; or ◮ U contains coefficients of truncated expansion in tensorized basis functions: � u ( ξ 1 , . . . , ξ d ) ≈ U i φ i 1 ( ξ 1 ) φ i 2 ( ξ 2 ) · · · φ i d ( ξ d ) . i ∈ I Typical setting: ◮ U only given implicitly, e.g., as the solution of a discretized PDE; ◮ seek approximations to U with very low storage and tolerable accuracy. ◮ d may become very large. 19
Discretization of function in d variables ξ 1 , . . . , ξ d ∈ [ 0 , 1 ] � # function values grows exponentially with d 20
Separability helps Ideal situation: Function f separable: f ( ξ 1 , ξ 2 , . . . , ξ d ) = f 1 ( ξ 1 ) f 2 ( ξ 2 ) . . . f d ( ξ d ) Kronecker product discretized f j O ( n d ) memory � O ( dn ) memory diskretized f Of course: Exact separability rarely satisfied in practice. 21
Two classes of tensor problems Class 2: data-related tensors Tensor U ∈ R n 1 ×···× n d contains multi-dimensional data. Example 1: U 2011 , 3 , 2 denotes the number of papers published 2011 by author 3 in the mathematical journal 2. Example 2: A video of 1000 frames with resolution 640 × 480 can be viewed as a 640 × 480 × 1000 tensor. Example 3: Hyperspectral images. Example 4: Deep learning: Coefficients in each layer of deep NN stored as tensors (TensorFlow), Interpretation of RNNs as hierarchical tensor decomposition. Typical setting (except for Example 4): ◮ entries of U often given explicitly (at least partially). ◮ extraction of dominant features from U . ◮ usually moderate values for d . 22
High-dimensional elliptic PDEs: 3D model problem ◮ Consider − ∆ u = f in Ω , u | ∂ Ω = 0 , on unit cube Ω = [ 0 , 1 ] 3 . ◮ Discretize on tensor grid. Uniform grid for simplicity: 1 ξ ( j ) µ = jh , h = n + 1 for µ = 1 , 2 , 3. ◮ Approximate solution tensor U ∈ R n × n × n : � � ξ ( i 1 ) 1 , ξ ( i 2 ) 2 , . . . , ξ ( i d ) U i 1 , i 2 , i 3 ≈ u . d 23
Recommend
More recommend