Data Mining and Matrices 05 – Semi-Discrete Decomposition Rainer Gemulla, Pauli Miettinen May 16, 2013
Outline Hunting the Bump 1 Semi-Discrete Decomposition 2 The Algorithm 3 Applications 4 SDD alone SVD + SDD Wrap-Up 5 2 / 30
An example data 100 200 300 400 500 600 700 100 200 300 400 500 600 700 The data 3 / 30
An example data 3.5 100 3 200 2.5 300 2 400 1.5 500 1 600 0.5 700 0 100 200 300 400 500 600 700 The data after permuting rows and columns 3 / 30
An example data The data in a 3D view Can we find the bumps in the picture automatically (from unpermuted data)? 3 / 30
What is a bump? A submatrix of a matrix A ∈ R m × n contains some rows of A and some columns of those 3 1 3 rows A = 2 3 1 ◮ Let I ⊆ { 1 , 2 , . . . , m } have the row indices and 3 2 3 J ⊆ { 1 , 2 , . . . , n } have the column indices of the submatrix I = { 1 , 3 } ◮ If x ∈ { 0 , 1 } m has x i = 1 iff i ∈ I and J = { 1 , 3 } y ∈ { 0 , 1 } n has y j = 1 iff j ∈ J , then xy T ∈ { 0 , 1 } m × n has ( xy T ) ij = 1 iff a ij is in 1 1 the submatrix y = x = 0 0 ◮ A ◦ xy T has the values of the submatrix and 1 1 zeros elsewhere ⋆ ( A ◦ B ) ij = a ij b ij is the Hadamard matrix 3 0 3 product A ◦ xy T = 0 0 0 The submatrix is uniform if all (or most) of its 3 0 3 values are (approximately) the same ◮ Exactly uniform submatrices with value δ can be written as δ xy T — a bump 4 / 30
The next bump and negative values Assume we know how to find the largest bump of a matrix To find another bump, we can subtract the found bump from the matrix and find the largest bump of the residual matrix ◮ But after subtraction we might have negative values in the matrix We can generalize the uniform submatrices to require uniformity only in magnitude ◮ Allow characteristic vectors x and y to take values from {− 1 , 0 , 1 } ◮ If x = ( − 1 , 0 , − 1) T and y = (1 , 0 , − 1) T , then − δ 0 δ δ xy T = 0 0 0 δ 0 − δ This allows us to define bumps in matrices with negative values 5 / 30
Outline Hunting the Bump 1 Semi-Discrete Decomposition 2 The Algorithm 3 Applications 4 SDD alone SVD + SDD Wrap-Up 5 6 / 30
The definition Semi-Discrete Decomposition Given a matrix A ∈ R m × n , the semi-discrete decomposition (SDD) of A of dimension k is A ≈ X k D k Y T k , where X k ∈ {− 1 , 0 , 1 } m × k Y k ∈ {− 1 , 0 , 1 } n × k D k ∈ R k × k is a diagonal matrix + 7 / 30
Example The first component σ 1 u 1 v T The data 1 using SVD 8 / 30
Example The second component σ 2 u 2 v T The data 2 using SVD The SVD cannot find the bumps 8 / 30
Example The first bump d 1 x 1 y T The data 1 using SDD 8 / 30
Example The second bump d 2 x 2 y T The data 2 using SDD 8 / 30
Example The third bump d 3 x 3 y T The data 3 using SDD 8 / 30
Example The fourth bump d 4 x 4 y T The data 4 using SDD 8 / 30
Example The fifth bump d 5 x 5 y T The data 5 using SDD 8 / 30
Example The data The 5-dimensional SDD approximation X 5 D 5 Y T 5 8 / 30
Properties of SDD The columns of X k and Y k do not need to be linearly independent ◮ The same column can be even repeated multiple times The dimension k might need to be large for accurate approximation (compared to SVD) ◮ k = min { n , m } is not necessarily enough for exact SDD ⋆ k = nm is always enough ◮ First factors don’t necessarily explain much about the matrix SDD factors are local ◮ Only affect a certain submatrix, typically not every element ◮ SVD factors typically change every value Storing an k -dimensional SDD takes less space than storing rank- k truncated SVD ◮ X k and Y k are ternary and often sparse For every rank-1 layer of an SDD, all non-zero values in the layer have the same magnitude ( d ii for layer i ) 9 / 30
Interpretation The factor interpretation is not very useful as the factors are not independent ◮ A later factor can change just a subset of values already changed by an earlier factor The SDD can be interpret as a form of bi-clustering ◮ Every layer (bump) defines a group of rows and columns with homogeneous values in the residual matrix The component interpretation is natural to SDD ◮ The SDD is a sum of local bumps ◮ SDD doesn’t model global phenomena (e.g. noise) well 10 / 30
Outline Hunting the Bump 1 Semi-Discrete Decomposition 2 The Algorithm 3 Applications 4 SDD alone SVD + SDD Wrap-Up 5 11 / 30
The outline of the algorithm 1 Input: Matrix A ∈ R m × n , non-negative integer k 2 Output: k -dimensional SDD of A , i.e. matrices X k ∈ {− 1 , 0 , 1 } m × k , Y k ∈ {− 1 , 0 , 1 } n × k , and diagonal D k ∈ R k × k + 3 R 1 ← A 4 for i = 1 , . . . , k Select y i ∈ {− 1 , 0 , 1 } n 1 while not converged 2 Compute x i ∈ {− 1 , 0 , 1 } m given y i and R i 1 Compute y i given x and R i 2 end while 3 Set d i to the average of R i ◦ x i y T over the non-zero locations of xy T 4 i Set x i as the i th column of X i , y i the i th column of Y i , and d i the i th 5 value of D i R i +1 ← R i − d i x i y T 6 i 5 end for 6 return X k , Y k , and D k 12 / 30
Finding the bump Problem: Given R ∈ R m × n and y ∈ {− 1 , 0 , 1 } n , find x ∈ {− 1 , 0 , 1 } m such that � R − d xy T � 2 F is minimized 2 (the average of R ◦ xy T over the ◮ We set d ← x T Ry / � x � 2 2 � y � 2 non-zero locations of xy T ) ◮ We want to minimize the residual norm Set s ← Ry Task: Find x that maximizes F ( x , y ) = ( x T s ) 2 / � x � 2 2 ◮ Maximizing F equals minimizing the residual norm after d is set as above ◮ Can be solved optimally by trying 2 m different binary vectors and setting the sign appropriately Solution: Order values s i so that | s i 1 | ≥ | s i 2 | ≥ · · · ≥ | s i m | and set x i j ← sign( s i j ) for the first J values s i and 0 elsewhere ◮ J is the number of nonzeros in x ⋆ Because we don’t know J , we have to try every possibility and select the best ◮ Values s i contain the row sums of R from those columns that are selected by y and with sign set accordingly 13 / 30
Selecting the initial vector y There are many ways to select the initial vector: MAX: set y j = 1 for the column j that has the largest squared value of R and rest to zero ◮ Intuition: the very largest squared value is probably in the best bump CYC: set y j = 1 for j = ( k mod n ) + 1 ◮ Cycle thru the columns THR: select a unit vector y that satisfies � Ry � 2 F ≥ � R � 2 F / n ◮ The selected column must have a squared sum that’s above the average squared sum ◮ The selection can be random or columns can be tried one-by-one ⋆ The CYC and THR can be mixed 14 / 30
Example result The data 5-dimensional SDD 15 / 30
Example result The matrix X 5 D 5 Y T The data 5 − A 15 / 30
Normalization Normalization can have a profound effect on SDD Zero centering the columns will change the type of bumps found ◮ The bumps in the original data have the largest-magnitude values ◮ The bumps in the zero-centered data have the most extreme values Normalizing the variance will make the matrix to have more uniform values and thus changes the bumps Squaring the values will promote smaller bumps of exceptionally high values Square-rooting the values will promote larger bumps of smaller magnitude 16 / 30
Normalization example: zero-centered data Zero-centered data The first bump 17 / 30
Normalization example: zero-centered data Zero-centered data The second bump 17 / 30
Normalization example: zero-centered data Zero-centered data The third bump Note that here red means 0 17 / 30
Normalization example: zero-centered data Zero-centered data 5-dimensional SDD 17 / 30
Normalization example: square-root of data Data after taking element-wise The first bump square-root 18 / 30
Normalization example: square-root of data Data after taking element-wise The second bump square-root 18 / 30
Normalization example: square-root of data Data after taking element-wise The third bump square-root 18 / 30
Normalization example: square-root of data Data after taking element-wise 5-dimensional SDD square-root 18 / 30
Normalization example: squared data Squared data The first bump 19 / 30
Normalization example: squared data Squared data The second bump 19 / 30
Normalization example: squared data Squared data The third bump 19 / 30
Normalization example: squared data Squared data 5-dimensional SDD 19 / 30
Outline Hunting the Bump 1 Semi-Discrete Decomposition 2 The Algorithm 3 Applications 4 SDD alone SVD + SDD Wrap-Up 5 20 / 30
Outline Hunting the Bump 1 Semi-Discrete Decomposition 2 The Algorithm 3 Applications 4 SDD alone SVD + SDD Wrap-Up 5 21 / 30
Recommend
More recommend