Neural Nonnegative Matrix Factorization for Hierarchical Multilayer Topic Modeling Jamie Haddock CAMSAP 2019, December 16, 2019 Computational and Applied Mathematics UCLA joint with Mengdi Gao, Denali Molitor, Deanna Needell, Eli Sadovnik, Tyler Will, Runyu Zhang 1
Nonnegative Matrix Factorization (NMF) M k M k S ≈ X A N N � X − AS � 2 min F A ∈ R N × k ≥ 0 , S ∈ R k × M ≥ 0 Problem Challenges: Problem Setup: 2
Nonnegative Matrix Factorization (NMF) M k M k S ≈ X A N N � X − AS � 2 min F A ∈ R N × k ≥ 0 , S ∈ R k × M ≥ 0 Problem Challenges: Problem Setup: X ∈ R N × M : data matrix ≥ 0 2
Nonnegative Matrix Factorization (NMF) M k M k S ≈ X A N N � X − AS � 2 min F A ∈ R N × k ≥ 0 , S ∈ R k × M ≥ 0 Problem Challenges: Problem Setup: X ∈ R N × M : data matrix ≥ 0 A ∈ R N × k ≥ 0 : features matrix 2
Nonnegative Matrix Factorization (NMF) M k M k S ≈ X A N N � X − AS � 2 min F A ∈ R N × k ≥ 0 , S ∈ R k × M ≥ 0 Problem Challenges: Problem Setup: X ∈ R N × M : data matrix ≥ 0 A ∈ R N × k ≥ 0 : features matrix S ∈ R k × M : coefficients matrix ≥ 0 2
Nonnegative Matrix Factorization (NMF) M k M k S ≈ X A N N � X − AS � 2 min F A ∈ R N × k ≥ 0 , S ∈ R k × M ≥ 0 Problem Challenges: Problem Setup: X ∈ R N × M : data matrix ≥ 0 A ∈ R N × k ≥ 0 : features matrix S ∈ R k × M : coefficients matrix ≥ 0 k : user chosen parameter 2
Nonnegative Matrix Factorization (NMF) M k M k S ≈ X A N N � X − AS � 2 min F A ∈ R N × k ≥ 0 , S ∈ R k × M ≥ 0 Problem Challenges: Problem Setup: X ∈ R N × M : data matrix ⊲ nonconvex in A and S , ≥ 0 A ∈ R N × k ≥ 0 : features matrix NP-hard [Vavasis ’08] S ∈ R k × M : coefficients matrix ≥ 0 k : user chosen parameter 2
Nonnegative Matrix Factorization (NMF) M k M k S ≈ X A N N � X − AS � 2 min F A ∈ R N × k ≥ 0 , S ∈ R k × M ≥ 0 Problem Challenges: Problem Setup: X ∈ R N × M : data matrix ⊲ nonconvex in A and S , ≥ 0 A ∈ R N × k ≥ 0 : features matrix NP-hard [Vavasis ’08] S ∈ R k × M : coefficients matrix ⊲ interpretability of factors ≥ 0 k : user chosen parameter 2 dependent upon k
Nonnegative Matrix Factorization (NMF) M : documents k : topics M : documents k : topics N : words N : words S ≈ X A � X − AS � 2 min F A ∈ R N × k ≥ 0 , S ∈ R k × M ≥ 0 Problem Challenges: Problem Setup: X ∈ R N × M : data matrix ⊲ nonconvex in A and S , ≥ 0 A ∈ R N × k ≥ 0 : features matrix NP-hard [Vavasis ’08] S ∈ R k × M : coefficients matrix ⊲ interpretability of factors ≥ 0 k : user chosen parameter 2 dependent upon k
NMF Applications: Methods: 3
NMF Applications: Methods: ⊲ low-rank approximation 3
NMF Applications: Methods: ⊲ low-rank approximation ⊲ clustering 3
NMF Applications: Methods: ⊲ low-rank approximation ⊲ clustering ⊲ topic modeling 3
NMF Applications: Methods: ⊲ low-rank approximation ⊲ clustering ⊲ topic modeling ⊲ feature extraction 3
NMF Applications: Methods: ⊲ low-rank approximation ⊲ multiplicative updates ⊲ clustering ⊲ topic modeling ⊲ feature extraction 3
NMF Applications: Methods: ⊲ low-rank approximation ⊲ multiplicative updates ⊲ clustering ⊲ alternating nonnegative least squares ⊲ topic modeling ⊲ feature extraction 3
NMF Applications: Methods: ⊲ low-rank approximation ⊲ multiplicative updates ⊲ clustering ⊲ alternating nonnegative least squares ⊲ topic modeling ⊲ many others ⊲ feature extraction 3
(Semi)supervised NMF Goal: Incorporate known label information into problem. M P : classes Y � W ⊙ ( X − AS ) � 2 F + λ � L ⊙ ( Y − BS ) � 2 min F A ∈ R N × k ≥ 0 , S ∈ R k × M , B ∈ R P × k ≥ 0 ≥ 0 Problem Setup: Problem Advantages: 4
(Semi)supervised NMF Goal: Incorporate known label information into problem. M P : classes Y � W ⊙ ( X − AS ) � 2 F + λ � L ⊙ ( Y − BS ) � 2 min F A ∈ R N × k ≥ 0 , S ∈ R k × M , B ∈ R P × k ≥ 0 ≥ 0 Problem Setup: Problem Advantages: Y ∈ { 0 , 1 } P × M : label matrix ≥ 0 4
(Semi)supervised NMF Goal: Incorporate known label information into problem. M P : classes Y � W ⊙ ( X − AS ) � 2 F + λ � L ⊙ ( Y − BS ) � 2 min F A ∈ R N × k ≥ 0 , S ∈ R k × M , B ∈ R P × k ≥ 0 ≥ 0 Problem Setup: Problem Advantages: Y ∈ { 0 , 1 } P × M : label matrix ≥ 0 P : number of classes 4
(Semi)supervised NMF Goal: Incorporate known label information into problem. M P : classes Y � W ⊙ ( X − AS ) � 2 F + λ � L ⊙ ( Y − BS ) � 2 min F A ∈ R N × k ≥ 0 , S ∈ R k × M , B ∈ R P × k ≥ 0 ≥ 0 Problem Setup: Problem Advantages: Y ∈ { 0 , 1 } P × M : label matrix ≥ 0 P : number of classes W ∈ { 0 , 1 } N × M : data indicator ≥ 0 4
(Semi)supervised NMF Goal: Incorporate known label information into problem. M P : classes Y � W ⊙ ( X − AS ) � 2 F + λ � L ⊙ ( Y − BS ) � 2 min F A ∈ R N × k ≥ 0 , S ∈ R k × M , B ∈ R P × k ≥ 0 ≥ 0 Problem Setup: Problem Advantages: Y ∈ { 0 , 1 } P × M : label matrix ≥ 0 P : number of classes W ∈ { 0 , 1 } N × M : data indicator ≥ 0 L ∈ { 0 , 1 } P × M : label indicator ≥ 0 4
(Semi)supervised NMF Goal: Incorporate known label information into problem. M P : classes Y � W ⊙ ( X − AS ) � 2 F + λ � L ⊙ ( Y − BS ) � 2 min F A ∈ R N × k ≥ 0 , S ∈ R k × M , B ∈ R P × k ≥ 0 ≥ 0 Problem Setup: Problem Advantages: Y ∈ { 0 , 1 } P × M : label matrix ≥ 0 P : number of classes W ∈ { 0 , 1 } N × M : data indicator ≥ 0 L ∈ { 0 , 1 } P × M : label indicator ≥ 0 λ : user defined hyperparameter 4
(Semi)supervised NMF Goal: Incorporate known label information into problem. M P : classes Y � W ⊙ ( X − AS ) � 2 F + λ � L ⊙ ( Y − BS ) � 2 min F A ∈ R N × k ≥ 0 , S ∈ R k × M , B ∈ R P × k ≥ 0 ≥ 0 Problem Setup: Problem Advantages: Y ∈ { 0 , 1 } P × M : label matrix ≥ 0 P : number of classes ⊲ use of label information W ∈ { 0 , 1 } N × M : data indicator ≥ 0 L ∈ { 0 , 1 } P × M : label indicator ≥ 0 λ : user defined hyperparameter 4
(Semi)supervised NMF Goal: Incorporate known label information into problem. M P : classes Y � W ⊙ ( X − AS ) � 2 F + λ � L ⊙ ( Y − BS ) � 2 min F A ∈ R N × k ≥ 0 , S ∈ R k × M , B ∈ R P × k ≥ 0 ≥ 0 Problem Setup: Problem Advantages: Y ∈ { 0 , 1 } P × M : label matrix ≥ 0 P : number of classes ⊲ use of label information W ∈ { 0 , 1 } N × M : data indicator ⊲ can extend multiplicative ≥ 0 L ∈ { 0 , 1 } P × M : label indicator updates method to SSNMF ≥ 0 λ : user defined hyperparameter 4
Hierarchical NMF Goal: Discover hierarchical topic structure within X . Problem Setup: Problem Challenges: 5
Hierarchical NMF Goal: Discover hierarchical topic structure within X . k (0) k (0) M M M k (1) k (1) S (1) k (0) S (0) A (1) k (0) ≈ ≈ X N A (0) A (0) N N Problem Setup: X ≈ A (0) S (0) X ≈ A (0) A (1) S (1) . . Problem Challenges: . X ≈ A (0) A (1) . . . A ( L ) S ( L ) 5
Hierarchical NMF Goal: Discover hierarchical topic structure within X . k (0) k (0) M M M k (1) k (1) S (1) k (0) S (0) A (1) k (0) ≈ ≈ X N A (0) A (0) N N Problem Setup: X ≈ A (0) S (0) ⊲ k (0) , k (1) , . . . , k ( L ) : user defined parameters X ≈ A (0) A (1) S (1) . . Problem Challenges: . X ≈ A (0) A (1) . . . A ( L ) S ( L ) 5
Hierarchical NMF Goal: Discover hierarchical topic structure within X . k (0) k (0) M M M k (1) k (1) S (1) k (0) S (0) A (1) k (0) ≈ ≈ X N A (0) A (0) N N Problem Setup: X ≈ A (0) S (0) ⊲ k (0) , k (1) , . . . , k ( L ) : user defined parameters X ≈ A (0) A (1) S (1) ⊲ k ( ℓ ) : supertopics collecting k ( ℓ − 1) subtopics . . Problem Challenges: . X ≈ A (0) A (1) . . . A ( L ) S ( L ) 5
Hierarchical NMF Goal: Discover hierarchical topic structure within X . k (0) k (0) M M M k (1) k (1) S (1) k (0) S (0) A (1) k (0) ≈ ≈ X N A (0) A (0) N N Problem Setup: X ≈ A (0) S (0) ⊲ k (0) , k (1) , . . . , k ( L ) : user defined parameters X ≈ A (0) A (1) S (1) ⊲ k ( ℓ ) : supertopics collecting k ( ℓ − 1) subtopics . . Problem Challenges: . X ≈ A (0) A (1) . . . A ( L ) S ( L ) ⊲ { k ( i ) } must be chosen 5
Hierarchical NMF Goal: Discover hierarchical topic structure within X . k (0) k (0) M M M k (1) k (1) S (1) k (0) S (0) A (1) k (0) ≈ ≈ X N A (0) A (0) N N Problem Setup: X ≈ A (0) S (0) ⊲ k (0) , k (1) , . . . , k ( L ) : user defined parameters X ≈ A (0) A (1) S (1) ⊲ k ( ℓ ) : supertopics collecting k ( ℓ − 1) subtopics . . Problem Challenges: . X ≈ A (0) A (1) . . . A ( L ) S ( L ) ⊲ { k ( i ) } must be chosen ⊲ error propagates through layers 5
Hierarchical NMF 6
Deep NMF Goal: Exploit similarities between neural networks and hierarchical NMF. 7
Recommend
More recommend