Scaling the Hierarchical Topic Modeling Mountain Neural NMF and - PowerPoint PPT Presentation

Our method: Neural NMF Goal: Develop true forward and back propagation algorithms for hNMF. ⊲ Regard the A matrices as independent variables, determine the S matrices from the A matrices. ⊲ Define q ( X , A ) := argmin S ≥ 0 � X − AS � 2 F (least-squares). ⊲ Pin the values of S to those of A by recursively setting S ( ℓ ) := q ( S ( ℓ − 1) , A ( ℓ ) ). 11

Our method: Neural NMF Goal: Develop true forward and back propagation algorithms for hNMF. ⊲ Regard the A matrices as independent variables, determine the S matrices from the A matrices. ⊲ Define q ( X , A ) := argmin S ≥ 0 � X − AS � 2 F (least-squares). ⊲ Pin the values of S to those of A by recursively setting S ( ℓ ) := q ( S ( ℓ − 1) , A ( ℓ ) ). X S (0) S (1) q ( · , A (0) ) q ( · , A (1) ) 11

Our method: Neural NMF Goal: Develop true forward and back propagation algorithms for hNMF. S (0) S (1) X q ( · , A (0) ) q ( · , A (1) ) 11

Our method: Neural NMF Goal: Develop true forward and back propagation algorithms for hNMF. Training: S (0) S (1) X q ( · , A (0) ) q ( · , A (1) ) 11

Our method: Neural NMF Goal: Develop true forward and back propagation algorithms for hNMF. Training: ⊲ forward propagation: S (0) = q ( X , A (0) ), S (0) S (1) X S (1) = q ( S (0) , A (1) ), ..., q ( · , A (0) ) q ( · , A (1) ) S ( L ) = q ( S ( L − 1) , A ( L ) ) ⊲ back propagation: update { A ( i ) } with ∇ E ( { A ( i ) } ) 11

Least-squares Subroutine ⊲ least-squares is a fundamental subroutine in forward-propagation 12

Least-squares Subroutine ⊲ least-squares is a fundamental subroutine in forward-propagation ⊲ iterative projection methods can solve these problems 12

Iterative Projection Methods

General Setup 13

General Setup We are interested in solving highly overdetermined systems of equations , A x = b , where A ∈ R m × n , b ∈ R m and m ≫ n . Rows are denoted a T i . 13

Iterative Projection Methods If { x ∈ R n : A x = b } is nonempty, these methods construct an approximation to a solution: 1. Randomized Kaczmarz Method Applications: 1. Tomography (Algebraic Reconstruction Technique) 14

Iterative Projection Methods If { x ∈ R n : A x = b } is nonempty, these methods construct an approximation to a solution: 1. Randomized Kaczmarz Method 2. Motzkin’s Method Applications: 1. Tomography (Algebraic Reconstruction Technique) 2. Linear programming 14

Iterative Projection Methods If { x ∈ R n : A x = b } is nonempty, these methods construct an approximation to a solution: 1. Randomized Kaczmarz Method 2. Motzkin’s Method 3. Sampling Kaczmarz-Motzkin Methods (SKM) Applications: 1. Tomography (Algebraic Reconstruction Technique) 2. Linear programming 3. Average consensus (greedy gossip with eavesdropping) 14

Kaczmarz Method x 0 Given x 0 ∈ R n : � a ik � 2 1. Choose i k ∈ [ m ] with probability F . � A � 2 b ik − a T ik x k − 1 2. Define x k := x k − 1 + a i k . || a ik || 2 3. Repeat. [Kaczmarz 1937], [Strohmer, Vershynin 2009] 15

Kaczmarz Method x 0 x 1 Given x 0 ∈ R n : � a ik � 2 1. Choose i k ∈ [ m ] with probability F . � A � 2 b ik − a T ik x k − 1 2. Define x k := x k − 1 + a i k . || a ik || 2 3. Repeat. [Kaczmarz 1937], [Strohmer, Vershynin 2009] 15

Kaczmarz Method x 0 x 1 x 2 Given x 0 ∈ R n : � a ik � 2 1. Choose i k ∈ [ m ] with probability F . � A � 2 b ik − a T ik x k − 1 2. Define x k := x k − 1 + a i k . || a ik || 2 3. Repeat. [Kaczmarz 1937], [Strohmer, Vershynin 2009] 15

Kaczmarz Method x 0 x 1 x 2 x 3 Given x 0 ∈ R n : � a ik � 2 1. Choose i k ∈ [ m ] with probability F . � A � 2 b ik − a T ik x k − 1 2. Define x k := x k − 1 + a i k . || a ik || 2 3. Repeat. [Kaczmarz 1937], [Strohmer, Vershynin 2009] 15

Motzkin’s Method x 0 Given x 0 ∈ R n : 1. Choose i k ∈ [ m ] as | a T i k := argmax i x k − 1 − b i | . i ∈ [ m ] b ik − a T ik x k − 1 2. Define x k := x k − 1 + a i k . || a ik || 2 3. Repeat. [Motzkin, Schoenberg 1954] 16

Motzkin’s Method x 0 x 1 Given x 0 ∈ R n : 1. Choose i k ∈ [ m ] as | a T i k := argmax i x k − 1 − b i | . i ∈ [ m ] b ik − a T ik x k − 1 2. Define x k := x k − 1 + a i k . || a ik || 2 3. Repeat. [Motzkin, Schoenberg 1954] 16

Motzkin’s Method x 0 x 1 x 2 Given x 0 ∈ R n : 1. Choose i k ∈ [ m ] as | a T i k := argmax i x k − 1 − b i | . i ∈ [ m ] b ik − a T ik x k − 1 2. Define x k := x k − 1 + a i k . || a ik || 2 3. Repeat. [Motzkin, Schoenberg 1954] 16

Our Hybrid Method (SKM) x 0 Given x 0 ∈ R n : 1. Choose τ k ⊂ [ m ] to be a sample of size β constraints chosen uniformly at random among the rows of A . 2. From the β rows, choose | a T i k := argmax i x k − 1 − b i | . i ∈ τ k 3. Define b ik − a T ik x k − 1 x k := x k − 1 + a i k . || a ik || 2 4. Repeat. 17 [De Loera, H., Needell ’17]

Our Hybrid Method (SKM) x 0 x 1 Given x 0 ∈ R n : 1. Choose τ k ⊂ [ m ] to be a sample of size β constraints chosen uniformly at random among the rows of A . 2. From the β rows, choose | a T i k := argmax i x k − 1 − b i | . i ∈ τ k 3. Define b ik − a T ik x k − 1 x k := x k − 1 + a i k . || a ik || 2 4. Repeat. 17 [De Loera, H., Needell ’17]

Our Hybrid Method (SKM) x 0 x 1 Given x 0 ∈ R n : 1. Choose τ k ⊂ [ m ] to be a x 2 sample of size β constraints chosen uniformly at random among the rows of A . 2. From the β rows, choose | a T i k := argmax i x k − 1 − b i | . i ∈ τ k 3. Define b ik − a T ik x k − 1 x k := x k − 1 + a i k . || a ik || 2 4. Repeat. 17 [De Loera, H., Needell ’17]

Experimental Convergence ⊲ β : sample size ⊲ A is 50000 × 100 Gaussian matrix, consistent system ⊲ ‘faster’ convergence for larger sample size 18

Convergence Rates Below are the convergence rates for the methods on a system, A x = b , which is consistent with unique solution x , whose rows have been normalized to have unit norm. ⊲ RK (Strohmer, Vershynin ’09): 1 − σ 2 min ( A ) � k � E || x k − x || 2 || x 0 − x || 2 2 ≤ 2 m 19

Convergence Rates Below are the convergence rates for the methods on a system, A x = b , which is consistent with unique solution x , whose rows have been normalized to have unit norm. ⊲ RK (Strohmer, Vershynin ’09): 1 − σ 2 min ( A ) � k � E || x k − x || 2 || x 0 − x || 2 2 ≤ 2 m ⊲ MM (Agmon ’54): 1 − σ 2 min ( A ) � k � � x k − x � 2 � x 0 − x � 2 2 ≤ 2 m 19

Convergence Rates Below are the convergence rates for the methods on a system, A x = b , which is consistent with unique solution x , whose rows have been normalized to have unit norm. ⊲ RK (Strohmer, Vershynin ’09): 1 − σ 2 min ( A ) � k � E || x k − x || 2 || x 0 − x || 2 2 ≤ 2 m ⊲ MM (Agmon ’54): 1 − σ 2 min ( A ) � k � � x k − x � 2 � x 0 − x � 2 2 ≤ 2 m ⊲ SKM (DeLoera, H., Needell ’17): 1 − σ 2 min ( A ) � k � E � x k − x � 2 � x 0 − x � 2 2 ≤ 2 m 19

Convergence Rates Below are the convergence rates for the methods on a system, A x = b , which is consistent with unique solution x , whose rows have been normalized to have unit norm. ⊲ RK (Strohmer, Vershynin ’09): 1 − σ 2 min ( A ) � k � E || x k − x || 2 || x 0 − x || 2 2 ≤ 2 m ⊲ MM (Agmon ’54): 1 − σ 2 min ( A ) � k � � x k − x � 2 � x 0 − x � 2 2 ≤ 2 m ⊲ SKM (DeLoera, H., Needell ’17): 1 − σ 2 min ( A ) � k � E � x k − x � 2 � x 0 − x � 2 2 ≤ 2 m Why are these all the same? 19

A Pathological Example x 0 20

Structure of the Residual Several works have used sparsity of the residual to improve the convergence rate of greedy methods. [De Loera, H., Needell ’17], [Bai, Wu ’18], [Du, Gao ’19] 21

Structure of the Residual Several works have used sparsity of the residual to improve the convergence rate of greedy methods. [De Loera, H., Needell ’17], [Bai, Wu ’18], [Du, Gao ’19] However, not much sparsity can be expected in most cases. Instead, we’d like to use dynamic range of the residual to guarantee faster convergence. β ) � A τ x k − b τ � 2 � τ ∈ ( [ m ] 2 γ k := � β ) � A τ x k − b τ � 2 τ ∈ ( [ m ] ∞ 21

Accelerated Convergence Rate Theorem (H. - Ma 2019) Let A be normalized so � a i � 2 = 1 for all rows i = 1 , ..., m. If the system A x = b is consistent with the unique solution x ∗ then the SKM method converges at least linearly in expectation and the rate depends on the dynamic range of the random sample of rows of A, τ j . Precisely, in the j + 1 st iteration of SKM, we have 1 − βσ 2 min ( A ) � � E τ j � x j +1 − x ∗ � 2 � x j − x ∗ � 2 2 ≤ 2 , γ j m β ) � A τ x j − b τ � 2 � τ ∈ ( [ m ] 2 where γ j := ∞ . β ) � A τ x j − b τ � 2 � τ ∈ ( [ m ] 22

Accelerated Convergence Rate ⊲ A is 50000 × 100 Gaussian matrix, consistent system ⊲ bound uses dynamic range of sample of β rows 23

What can we say about γ j ? � A τ x j − b τ � 2 � τ ∈ ( [ m ] 2 β ) Recall γ j := ∞ . � A τ x j − b τ � 2 � τ ∈ ( [ m ] β ) 1 ≤ γ j ≤ β 24

What can we say about γ j ? � A τ x j − b τ � 2 � τ ∈ ( [ m ] 2 β ) Recall γ j := ∞ . � A τ x j − b τ � 2 � τ ∈ ( [ m ] β ) 1 ≤ γ j ≤ β E τ k � x k − x ∗ � 2 2 ≤ α � x k − 1 − x ∗ � 2 2 Previous: α = 1 − σ 2 min ( A ) RK m α = 1 − σ 2 min ( A ) SKM m 1 − σ 2 ≤ α ≤ 1 − σ 2 min ( A ) min ( A ) MM 4 m [H., Needell 2019] 24

What can we say about γ j ? � A τ x j − b τ � 2 � τ ∈ ( [ m ] 2 β ) Recall γ j := ∞ . � A τ x j − b τ � 2 � τ ∈ ( [ m ] β ) 1 ≤ γ j ≤ β E τ k � x k − x ∗ � 2 2 ≤ α � x k − 1 − x ∗ � 2 2 Previous: Current: α = 1 − σ 2 α = 1 − σ 2 min ( A ) min ( A ) RK m m α = 1 − σ 2 1 − βσ 2 ≤ α ≤ 1 − σ 2 min ( A ) min ( A ) min ( A ) SKM m m m 1 − σ 2 ≤ α ≤ 1 − σ 2 min ( A ) ≤ α ≤ 1 − σ 2 min ( A ) min ( A ) min ( A ) 1 − σ 2 MM 4 m m [H., Needell 2019], [H., Ma 2019] 24

What can we say about γ j ? � A τ x j − b τ � 2 � τ ∈ ( [ m ] β ) 2 Recall γ j := ∞ . � A τ x j − b τ � 2 � τ ∈ ( [ m ] β ) 1 ≤ γ j ≤ β ⊲ nontrivial bounds on γ k for Gaussian and average consensus systems 24

Now can we determine the optimal β ? 25

Now can we determine the optimal β ? Roughly, if we know the value of γ j , we can (just) do it. 25

Back to Hierarchical NMF 26

Back to Hierarchical NMF Compare: ⊲ hNMF (sequential NMF) 26

Back to Hierarchical NMF Compare: ⊲ hNMF (sequential NMF) ⊲ Deep NMF [Flenner, Hunter ’18] 26

Back to Hierarchical NMF Compare: ⊲ hNMF (sequential NMF) ⊲ Deep NMF [Flenner, Hunter ’18] ⊲ Neural NMF 26

Applications

Experimental results: synthetic data 27

Experimental results: synthetic data ⊲ unsupervised reconstruction with two-layer structure ( k (0) = 9 , k (1) = 4) 27

Scaling the Hierarchical Topic Modeling Mountain Neural NMF and - PowerPoint PPT Presentation

Scaling the Hierarchical Topic Modeling Mountain Neural NMF and Iterative Projection Methods Jamie Haddock Harvey Mudd College, January 28, 2020 Computational and Applied Mathematics UCLA 1 Research Overview Data Math. Data Science

Virtual Student Orientation Information for Families SLIDESMANIA.COM TOPIC TOPIC TOPIC TOPIC

ConnectHome ConnectHome Topic 2 Topic 2 Nation Webinar Nation Webinar Topic 3 Topic 3 Topic

Outline Scaling Scalinga Plenitude of Power Laws Scaling-at-large Scaling-at-large

UP UP AND OUT: SCALING SOFTWARE WITH AKKA Jonas Bonr CTO Typesafe @jboner Scaling software

Hierarchical Modeling Hierarchical modeling has taken over the landscape in contemporaery

Community Meeting Mountain North Geographic Community Mountain North, Mountain Central, and

Mountain River Mountain River Processors RAKAIA Mountain River Venison MARKETING

Ma Magic Mountain Pipeline Phase 4 Pr gic Mountain Pipeline Phase 4 Project oject Board Meeting

Mountain biking by Marius Muja Mountain biking "Mountain biking entails the sport of riding

Analysis of Scaling Algorithms for Matrix & Operator Scaling Contents Scaling Algorithms

Using Hierarchical Modeling to Assist Using Hierarchical Modeling to Assist Effects Based

Why learn topic modeling Pavel Oleinikov Associate Director Quantitative Analysis Center

Hierarchical Bounding Volume October 11, 2005 () Hierarchical Bounding Volume October 11, 2005

What is a hierarchical model? Richard Erickson Quantitative Ecologist DataCamp Hierarchical

UNIT TOPICS TOPIC 1: MINERALS TOPIC 2: IGNEOUS ROCKS TOPIC 3: SEDIMENTARY ROCKS

TOPIC #X: TOPIC NAME DATE, 2020 PRESENTATION OUTLINE Main topic #1 Main topic #2 Main

An Introduction to Machine Learning with Stata Achim Ahrens Public Policy Group, ETH Zrich

O bjectives Suicide A wareness and P revention for Review risk factors associated with suicide

STRUCTURAL COMPETENCY Helena Hansen MD, Ph.D. NYU Departments NEW MEDICINE FOR THE of

All slides available at www.washinhcf.org/resources and search COVID - 19 Water, sanitation,

Advances in microeconometrics and finance using instrumental variables Christopher F Baum 1

Exploration of Deep Web Repositories Nan Zhang, The George Washington University Gautam Das,

Voodoo, vaccines & bed nets Nik Stoop University of Leuven (LICOS), University of Antwerp

GLST 287 Christian Responses to Plagues and Public Heath: Two Perspectives from the History of

Scaling the Hierarchical Topic Modeling Mountain Neural NMF and - PowerPoint PPT Presentation

Scaling the Hierarchical Topic Modeling Mountain Neural NMF and Iterative Projection Methods Jamie Haddock Harvey Mudd College, January 28, 2020 Computational and Applied Mathematics UCLA 1 Research Overview Data Math. Data Science

Virtual Student Orientation Information for Families SLIDESMANIA.COM TOPIC TOPIC TOPIC TOPIC

ConnectHome ConnectHome Topic 2 Topic 2 Nation Webinar Nation Webinar Topic 3 Topic 3 Topic

Outline Scaling Scalinga Plenitude of Power Laws Scaling-at-large Scaling-at-large

UP UP AND OUT: SCALING SOFTWARE WITH AKKA Jonas Bonr CTO Typesafe @jboner Scaling software

Hierarchical Modeling Hierarchical modeling has taken over the landscape in contemporaery

Community Meeting Mountain North Geographic Community Mountain North, Mountain Central, and

Mountain River Mountain River Processors RAKAIA Mountain River Venison MARKETING

Ma Magic Mountain Pipeline Phase 4 Pr gic Mountain Pipeline Phase 4 Project oject Board Meeting

Mountain biking by Marius Muja Mountain biking &quot;Mountain biking entails the sport of riding

Analysis of Scaling Algorithms for Matrix &amp; Operator Scaling Contents Scaling Algorithms

Using Hierarchical Modeling to Assist Using Hierarchical Modeling to Assist Effects Based

Why learn topic modeling Pavel Oleinikov Associate Director Quantitative Analysis Center

Hierarchical Bounding Volume October 11, 2005 () Hierarchical Bounding Volume October 11, 2005

What is a hierarchical model? Richard Erickson Quantitative Ecologist DataCamp Hierarchical

UNIT TOPICS TOPIC 1: MINERALS TOPIC 2: IGNEOUS ROCKS TOPIC 3: SEDIMENTARY ROCKS

TOPIC #X: TOPIC NAME DATE, 2020 PRESENTATION OUTLINE Main topic #1 Main topic #2 Main

An Introduction to Machine Learning with Stata Achim Ahrens Public Policy Group, ETH Zrich

O bjectives Suicide A wareness and P revention for Review risk factors associated with suicide

STRUCTURAL COMPETENCY Helena Hansen MD, Ph.D. NYU Departments NEW MEDICINE FOR THE of

All slides available at www.washinhcf.org/resources and search COVID - 19 Water, sanitation,

Advances in microeconometrics and finance using instrumental variables Christopher F Baum 1

Exploration of Deep Web Repositories Nan Zhang, The George Washington University Gautam Das,

Voodoo, vaccines &amp; bed nets Nik Stoop University of Leuven (LICOS), University of Antwerp

GLST 287 Christian Responses to Plagues and Public Heath: Two Perspectives from the History of

Mountain biking by Marius Muja Mountain biking "Mountain biking entails the sport of riding

Analysis of Scaling Algorithms for Matrix & Operator Scaling Contents Scaling Algorithms

Voodoo, vaccines & bed nets Nik Stoop University of Leuven (LICOS), University of Antwerp