BOOLEAN MATRIX AND TENSOR DECOMPOSITIONS Pauli Miettinen TML 2013 27 September 2013
BOOLEAN MATRIX AND TENSOR DECOMPOSITIONS • Boolean decompositions are like “normal” decompositions, except that • Input is a binary matrix or tensor • Factors are binary • Arithmetic is Boolean (so reconstructions are binary) • Error measure is (usually) Hamming distance ( L 1 )
BOOLEAN ARITHMETIC • Idenpotent, anti-negative semi-ring ({0,1}, ∨ , ∧ ) • Like normal arithmetic, but addition is defined as 1+1 = 1 • A Boolean matrix is a binary (0/1) matrix endowned with Boolean arithmetic • The Boolean matrix product is defined as R _ ( A � B ) ij = b il c lj r = 1
WHY BOOLEAN ARITHMETIC? • Boolean decompositions find different type of structure than decompositions under normal arithmetic • Not better, not worse, just different • Normal decomposition: value is a sum of values from rank-1 components • Boolean decomposition: value is 1 if there is any rank-1 component with 1 in this location
WHY BOOLEAN CONT’D Boolean artihmetic can be interpret as set operations B A B C ( ) 1 1 1 0 1 3 2 1 1 1 A C 0 1 1 3 2 Pauli Miettinen 24 September 2012
EXAMPLE A B C ( ) 1 1 0 Real analysis E-mail 1 1 1 Discr. math. Contacts Internet Programming 0 1 1 ( ) ( ) 1 0 1 1 0 ○ = × 1 1 0 1 1 0 1
RESULTS ON BOOLEAN MATRIX FACTORIZATION • Computing the Boolean rank is NP-hard • As hard to approximate as minimum chromatic number • Minimum-error decomposition is NP-hard • And hard to approximate in both additive and multiplicative sense • Given A and B , finding C such that B ○ C is close to A is hard even to Alternating updates are hard ! approximate
SOME MORE RESULTS • Boolean rank can be a logarithm of the real rank • Sparse matrices have sparse (exact) factorizations • The rank of the decomposition can be defined automatically using the MDL principle • Planted rank-1 matrix can be recovered under XOR noise (under certain assumptions)
SOME ALGORITHMS • Alternating least-squares • Panda [Lucchese et al. 2010] • Proposed in psychometrical litterature • Expands monochromatic core in early 1980’s patterns (tiles) based on MDL-esque rule • Asso [M. et al. 2006 & 2008] • Various tiling algorithms • Builds candidate factors based on • Do not allow expressing 0 in data as 1 correlation matrix, and greedily selects them in factorization (false positives) • Binary factorizations • Normal algebra but binary factors
SOME APPLICATIONS • Explorative data analysis • Bipartite community detection • Psychometrics • Binary matrix completion • Role mining • But requires {0, 1, ?} data • Pattern mining • Co-clustering-y applications
RANK-1 (BOOLEAN) TENSORS c b b X = = X a a X = a × 1 b × 2 c X = a × b
THE BOOLEAN CP TENSOR DECOMPOSITION c 1 c 2 c R b 1 b 2 b R ∨ ∨ · · · ∨ X ≈ a 1 a 2 a R R _ x ijk ≈ a ir b jr c kr r = 1
THE BOOLEAN CP TENSOR DECOMPOSITION C � ≈ B X � A R _ x ijk ≈ a ir b jr c kr r = 1
FREQUENT TRI-ITEMSET MINING • Rank-1 N -way binary tensors define an N -way itemset • Particularly, rank-1 binary matrices define an itemset • In itemset mining the induced sub-tensor must be full of 1s • Here, the items can have holes • Boolean CP decomposition = lossy N -way tiling
BOOLEAN TENSOR RANK The Boolean rank of a binary tensor is the minimum number of binary rank-1 tensors needed to represent the tensor exactly using Boolean arithmetic. c 1 c 2 c R b 1 b 2 b R ∨ ∨ · · · ∨ X = a 1 a 2 a R
SOME RESULTS ON RANKS • Normal tensor rank is NP- • Boolean tensor rank is hard to compute NP-hard to compute • Normal tensor rank of • Boolean tensor rank of n -by- m -by- k tensor can be n -by- m -by- k tensor can be more than min{ n, m, k } more than min{ n, m, k } • But no more than • But no more than min{ nm, nk, mk } min{ nm, nk, mk }
SPARSITY • Binary N -way tensor of Boolean tensor rank R has Boolean X rank- R CP-decomposition with factor matrices A 1 , A 2 , …, A N such that ∑ i | A i | ≤ N | | X • Binary matrix X of Boolean rank R and | X | 1s has Boolean rank- R decomposition A o B such that | A | + | B | ≤ 2| X | • Both results are existential only and extend to approximate decompositions
SIMPLE ALGORITHM • We can use typical alternating algorithm with Boolean algebra X ( 1 ) = A � ( C � B ) T • Finding the optimal projection is X ( 2 ) = B � ( C � A ) T NP-hard even to approximate X ( 3 ) = C � ( B � A ) T • Good initial values are needed due to multiple local minima • Obtained using Boolean matrix factorization to matricizations
THE BOOLEAN TUCKER TENSOR DECOMPOSITION C B G X A ≈ Q P R _ _ _ x ijk ≈ g pqr a ip b jq c kr p = 1 q = 1 r = 1
THE SIMPLE ALGORITHM WITH TUCKER • The core tensor has global effects C • Updates are hard B G X A ≈ • Factors are not orthogonal • Assume core tensor is small Q P R • We can afford more time per _ _ _ x ijk ≈ g pqr a ip b jq c kr element p = 1 q = 1 r = 1 • In Boolean case many changes make no difference
WALK’N’MERGE: MORE SCALABLE ALGORITHM • Idea: For exact decomposition, we could find all N -way tiles • Then we “only” need to find the ones we need among them • Problem: For approximate decompositions, there might not be any big tiles • We need to find tiles with holes, i.e. dense rank-1 subtensors
TENSORS AS GRAPHS • Create a graph from the tensor • Each 1 in the tensor: one vertex in the graph • Edge between two vertices if they differ in at most one coordinate • Idea: If two vertices are in the same all-1s rank-1 N- way subtensor, they are at most N steps from each other • Small-diameter subgraphs ⇔ dense rank-1 subtensors
EXAMPLE 1 1 0 1 1 1 1 1 0 1 1 0 1 1 0 0 1 0 0 0 0 0 1 0 1,1,1 1,1,2 1,2,1 1,2,2 1,4,1 1,4,2 2,1,1 2,1,2 2,2,1 2,2,2 2,3,2 3,1,2 3,3,1
RANDOM WALKS • We can identify the small-diameter subgraphs by random walks • If many (short) random walks re-visit the same nodes often, they’re on a small-diameter subgraph • Problem: The random walks might return many overlapping dense areas and miss the smallest rank-1 decompositions
MERGE • We can exhaustively look for all small (e.g. 2-by-2-by-2) all-1s sub-tensors outside the already-found dense subtensors • We can now merge all partially overlapping rank-1 subtensors if the resulting subtensor is dense enough • Result: A Boolean CP-decomposition of some rank • False positive rate controlled by the density, false negative by the exhaustive search
MDL STRIKES AGAIN • We have a decomposition with some rank, but what would be a good rank? • Normally: pre-defined by the user (but how does she know) • MDL principle: The best model to describe your data is the one that does it with the least number of bits • We can use MDL to choose the rank
HOW YOU COUNT THE BITS? • MDL asks for an exact representation of the data • In case of Boolean CP , we represent the tensor with X • Factor matrices • Error tensor E • The bit-strings representing these are encoded to compute the description length
WHY MDL AND TUCKER DECOMPOSITION • Balance between accuracy and complexity • High rank: more bits in factor matrices, less in error tensor • Small rank: less bits in factor matrices, more in error tensor • If one mode uses the same factor multiple times, CP contains it multiple times • The Tucker decomposition needs to have that factor only once
FROM CP TO TUCKER WITH MDL • CP is Tucker with hyper-diagonal core tensor • If we can remove a repeated column from a factor matrix and adjust the core accordingly, our encoding is more efficient • Algorithm: Try mergin similar factors and see if that reduces the encoding length
APPLICATION: FACT DISCOVERY • Input: noun phrase–verbal phrase–noun phrase triples • Non-disambiguated • E.g. from OpenIE • Goal: Find the facts (entity–relation–entity triples) underlying the observed data and mappings from surface forms to entities and relations
CONNECTION TO BOOLEAN TENSORS • We should see an np 1 – vp – np 2 triple if • there exists at least one fact e 1 – r – e 2 such that • np 1 is the surface form of e 1 • vp is the surface form of r • np 2 is the surface form of e 2
CONNECTION TO BOOLEAN TENSORS • What we want is Boolean Tucker3 decomposition • Core tensor contains the facts • Factors contain the mappings from entities and relations to surface forms Q P R _ _ _ x ijk ≈ g pqr a ip b jq c kr p = 1 q = 1 r = 1
PROS & CONS • Pros: Naturally sparse core tensor • Core will be huge ⇒ must be sparse • Natural interpretation • Cons: No levels of certainity • Either is or not • Can only handle binary data
EXAMPLE RESULT Subject: claude de lorimier, de lorimier, louis, jean-baptiste Relation: was born, [[det]] born in Object: borough of lachine, villa st. pierre, lachine quebec 39,500-by-8,000-by-21,000 tensor with 804 000non-zeros
Recommend
More recommend