boolean matrix and tensor decompositions
play

BOOLEAN MATRIX AND TENSOR DECOMPOSITIONS Pauli Miettinen TML 2013 - PowerPoint PPT Presentation

BOOLEAN MATRIX AND TENSOR DECOMPOSITIONS Pauli Miettinen TML 2013 27 September 2013 BOOLEAN MATRIX AND TENSOR DECOMPOSITIONS Boolean decompositions are like normal decompositions, except that Input is a binary matrix or tensor


  1. BOOLEAN MATRIX AND TENSOR DECOMPOSITIONS Pauli Miettinen TML 2013 27 September 2013

  2. BOOLEAN MATRIX AND TENSOR DECOMPOSITIONS • Boolean decompositions are like “normal” decompositions, except that • Input is a binary matrix or tensor • Factors are binary • Arithmetic is Boolean (so reconstructions are binary) • Error measure is (usually) Hamming distance ( L 1 )

  3. 
 
 BOOLEAN ARITHMETIC • Idenpotent, anti-negative semi-ring ({0,1}, ∨ , ∧ ) • Like normal arithmetic, but addition is defined as 1+1 = 1 • A Boolean matrix is a binary (0/1) matrix endowned with Boolean arithmetic • The Boolean matrix product is defined as 
 R _ ( A � B ) ij = b il c lj r = 1

  4. WHY BOOLEAN ARITHMETIC? • Boolean decompositions find different type of structure than decompositions under normal arithmetic • Not better, not worse, just different • Normal decomposition: value is a sum of values from rank-1 components • Boolean decomposition: value is 1 if there is any rank-1 component with 1 in this location

  5. WHY BOOLEAN CONT’D Boolean artihmetic can be interpret as set operations B A B C ( ) 1 1 1 0 1 3 2 1 1 1 A C 0 1 1 3 2 Pauli Miettinen 24 September 2012

  6. EXAMPLE A B C ( ) 1 1 0 Real analysis E-mail 1 1 1 Discr. math. Contacts Internet Programming 0 1 1 ( ) ( ) 1 0 1 1 0 ○ = × 1 1 0 1 1 0 1

  7. RESULTS ON BOOLEAN MATRIX FACTORIZATION • Computing the Boolean rank is NP-hard • As hard to approximate as minimum chromatic number • Minimum-error decomposition is NP-hard • And hard to approximate in both additive and multiplicative sense • Given A and B , finding C such that B ○ C is close to A is hard even to Alternating updates are hard ! approximate

  8. SOME MORE RESULTS • Boolean rank can be a logarithm of the real rank • Sparse matrices have sparse (exact) factorizations • The rank of the decomposition can be defined automatically using the MDL principle • Planted rank-1 matrix can be recovered under XOR noise (under certain assumptions)

  9. 
 
 SOME ALGORITHMS • Alternating least-squares • Panda [Lucchese et al. 2010] • Proposed in psychometrical litterature • Expands monochromatic core in early 1980’s patterns (tiles) based on MDL-esque rule • Asso [M. et al. 2006 & 2008] • Various tiling algorithms • Builds candidate factors based on • Do not allow expressing 0 in data as 1 correlation matrix, and greedily selects them 
 in factorization (false positives) • Binary factorizations • Normal algebra but binary factors

  10. SOME APPLICATIONS • Explorative data analysis • Bipartite community detection • Psychometrics • Binary matrix completion • Role mining • But requires {0, 1, ?} data • Pattern mining • Co-clustering-y applications

  11. RANK-1 (BOOLEAN) TENSORS c b b X = = X a a X = a × 1 b × 2 c X = a × b

  12. THE BOOLEAN CP TENSOR DECOMPOSITION c 1 c 2 c R b 1 b 2 b R ∨ ∨ · · · ∨ X ≈ a 1 a 2 a R R _ x ijk ≈ a ir b jr c kr r = 1

  13. THE BOOLEAN CP TENSOR DECOMPOSITION C � ≈ B X � A R _ x ijk ≈ a ir b jr c kr r = 1

  14. FREQUENT TRI-ITEMSET MINING • Rank-1 N -way binary tensors define an N -way itemset • Particularly, rank-1 binary matrices define an itemset • In itemset mining the induced sub-tensor must be full of 1s • Here, the items can have holes • Boolean CP decomposition = lossy N -way tiling

  15. BOOLEAN TENSOR RANK The Boolean rank of a binary tensor is the minimum number of binary rank-1 tensors needed to represent the tensor exactly using Boolean arithmetic. c 1 c 2 c R b 1 b 2 b R ∨ ∨ · · · ∨ X = a 1 a 2 a R

  16. SOME RESULTS ON RANKS • Normal tensor rank is NP- • Boolean tensor rank is hard to compute NP-hard to compute • Normal tensor rank of 
 • Boolean tensor rank of 
 n -by- m -by- k tensor can be n -by- m -by- k tensor can be more than min{ n, m, k } more than min{ n, m, k } • But no more than 
 • But no more than 
 min{ nm, nk, mk } min{ nm, nk, mk }

  17. SPARSITY • Binary N -way tensor of Boolean tensor rank R has Boolean X rank- R CP-decomposition with factor matrices A 1 , A 2 , …, A N such that ∑ i | A i | ≤ N | | X • Binary matrix X of Boolean rank R and | X | 1s has Boolean rank- R decomposition A o B such that | A | + | B | ≤ 2| X | • Both results are existential only and extend to approximate decompositions 


  18. SIMPLE ALGORITHM • We can use typical alternating algorithm with Boolean algebra X ( 1 ) = A � ( C � B ) T • Finding the optimal projection is X ( 2 ) = B � ( C � A ) T NP-hard even to approximate X ( 3 ) = C � ( B � A ) T • Good initial values are needed due to multiple local minima • Obtained using Boolean matrix factorization to matricizations

  19. THE BOOLEAN TUCKER TENSOR DECOMPOSITION C B G X A ≈ Q P R _ _ _ x ijk ≈ g pqr a ip b jq c kr p = 1 q = 1 r = 1

  20. THE SIMPLE ALGORITHM WITH TUCKER • The core tensor has global effects C • Updates are hard B G X A ≈ • Factors are not orthogonal • Assume core tensor is small Q P R • We can afford more time per _ _ _ x ijk ≈ g pqr a ip b jq c kr element p = 1 q = 1 r = 1 • In Boolean case many changes make no difference

  21. WALK’N’MERGE: MORE SCALABLE ALGORITHM • Idea: For exact decomposition, we could find all N -way tiles • Then we “only” need to find the ones we need among them • Problem: For approximate decompositions, there might not be any big tiles • We need to find tiles with holes, i.e. dense rank-1 subtensors

  22. TENSORS AS GRAPHS • Create a graph from the tensor • Each 1 in the tensor: one vertex in the graph • Edge between two vertices if they differ in at most one coordinate • Idea: If two vertices are in the same all-1s rank-1 N- way subtensor, they are at most N steps from each other • Small-diameter subgraphs ⇔ dense rank-1 subtensors

  23. EXAMPLE   1 1 0 1   1 1 1 1 0 1 1 0   1 1 0 0 1 0 0 0   0 0 1 0 1,1,1 1,1,2 1,2,1 1,2,2 1,4,1 1,4,2 2,1,1 2,1,2 2,2,1 2,2,2 2,3,2 3,1,2 3,3,1

  24. RANDOM WALKS • We can identify the small-diameter subgraphs by random walks • If many (short) random walks re-visit the same nodes often, they’re on a small-diameter subgraph • Problem: The random walks might return many overlapping dense areas and miss the smallest rank-1 decompositions

  25. MERGE • We can exhaustively look for all small (e.g. 2-by-2-by-2) all-1s sub-tensors outside the already-found dense subtensors • We can now merge all partially overlapping rank-1 subtensors if the resulting subtensor is dense enough • Result: A Boolean CP-decomposition of some rank • False positive rate controlled by the density, false negative by the exhaustive search

  26. MDL STRIKES AGAIN • We have a decomposition with some rank, but what would be a good rank? • Normally: pre-defined by the user (but how does she know) • MDL principle: The best model to describe your data is the one that does it with the least number of bits • We can use MDL to choose the rank

  27. HOW YOU COUNT THE BITS? • MDL asks for an exact representation of the data • In case of Boolean CP , we represent the tensor with X • Factor matrices • Error tensor E • The bit-strings representing these are encoded to compute the description length

  28. WHY MDL AND TUCKER DECOMPOSITION • Balance between accuracy and complexity • High rank: more bits in factor matrices, less in error tensor • Small rank: less bits in factor matrices, more in error tensor • If one mode uses the same factor multiple times, CP contains it multiple times • The Tucker decomposition needs to have that factor only once

  29. FROM CP TO TUCKER WITH MDL • CP is Tucker with hyper-diagonal core tensor • If we can remove a repeated column from a factor matrix and adjust the core accordingly, our encoding is more efficient • Algorithm: Try mergin similar factors and see if that reduces the encoding length

  30. APPLICATION: FACT DISCOVERY • Input: noun phrase–verbal phrase–noun phrase triples • Non-disambiguated • E.g. from OpenIE • Goal: Find the facts (entity–relation–entity triples) underlying the observed data and mappings from surface forms to entities and relations

  31. CONNECTION TO BOOLEAN TENSORS • We should see an np 1 – vp – np 2 triple if • there exists at least one fact e 1 – r – e 2 such that • np 1 is the surface form of e 1 • vp is the surface form of r • np 2 is the surface form of e 2

  32. CONNECTION TO BOOLEAN TENSORS • What we want is Boolean Tucker3 decomposition • Core tensor contains the facts • Factors contain the mappings from entities and relations to surface forms Q P R _ _ _ x ijk ≈ g pqr a ip b jq c kr p = 1 q = 1 r = 1

  33. PROS & CONS • Pros: Naturally sparse core tensor • Core will be huge ⇒ must be sparse • Natural interpretation • Cons: No levels of certainity • Either is or not • Can only handle binary data

  34. EXAMPLE RESULT Subject: claude de lorimier, de lorimier, louis, jean-baptiste Relation: was born, [[det]] born in Object: borough of lachine, villa st. pierre, lachine quebec 39,500-by-8,000-by-21,000 tensor with 804 000non-zeros

Recommend


More recommend