Variational Network Inference: Strong and Stable with Concrete Support Amir Dezfouli, Edwin V. Bonilla and Richard Nock
Network Structure Discovery: A Flexible Approach N nodes, T observations: = { y i , t i } 1 Goal: Learn network structure W 15 W 12 Existence, directionality and strengths 5 2 Model ϵ it ∼ Normal (0, σ 2 W 54 W 32 y ) W 43 4 3 y i ( t ) = f i ( t ) + ϵ it ξ jt ∼ Normal (0, σ 2 f ) N ∑ f i ( t ) = z i ( t ) + A ij W ij [ f j ( t ) + ξ jt ] Network parameters j =1, j ≠ i A ij ∈ {0,1}, W ij ∈ ℝ p ( A , W ) = ∏ Network-independent p ( A ij ) p ( W ij ) trend ij p ( A ij ) = Bern ( ρ ) z i ( t ) ∼ GP (0, κ ( t , t ′ � ; θ )) p ( W ij ) = Normal (0, σ 2 w ) 1
Inference Goal: Estimate p ( A , W | ) Complications: Trick 1: Derive “inverse” model f defined cyclically GPs notoriously unscalable f ( t ) = ( I − A ⊙ W ) − 1 ( z ( t ) + A ⊙ W ξ t ) O(N 3 T 3 ) Complicated marginal likelihood Trick 2: Marginalise f analytically ( f depends on A , W ) Trick 3: Relate to p ( y | A , W ) = Normal ( 0 , Σ y ) Multi-task learning (MTL) Σ y = K f ⊗ K t + K σ ⊗ I MTL with product covariance (Bonilla et al, 2008; Rakitsch et al, 2013) Nodes are “tasks” How to deal with complex Sum of two Kronecker products Covariances determined by A , W dependency on A , W ? Modern variational inference More efficient computation O(N 3 + T 3 ) 2
Modern Variational Inference p ( A , W ) p ( y | A , W ) ℒ elbo = ℒ kl + ℒ ell 4 2 0 ℒ kl = − KL ( q ( A , W ) || p ( A , W )) -2 log p ( y ) ℒ ell = 𝔽 q ( A , W ) log p ( y | A , W ) ELBO Expectations using Monte Carlo Re-parameterization trick q old ( A , W ) q new ( A , W ) posterior Cannot be applied to discrete rv Trick 4: Concrete Distribution: q ( A ij ) = Concrete ( α ij , λ c ) α ij are variational parameters Aka Gumble-Softmax trick, can sample and evaluate log q(A ij ) It helps us get stability for free 3
Theory: Numerical Stability Usually imposes the non-singularity of I − A ⊙ W Sometimes with additional constraints (boundedness of coordinates, eigenvalues) Theorem 1: “We get stability for free” For any is non-singular with λ c ≥ 0, α ij ≥ 0( i ≠ j ), I − A ⊙ W probability 1. Theorem 2: For any λ c ≥ 0, α ij ≥ 0( i ≠ j ), σ 2 y ≥ 0, | ℒ ell | ≪ ∞ 4
Theory: Model Stability Bounds the signal’s log likelihood as a function of external parameters Theorem 3: Statistical “robustness” W ij ∼ Normal ( μ ij , σ 2 A ij ∼ Bern ( ρ ij ) If and , then under a condition ij ) on the network signal, it holds with large probability: , where − log p ( y | W , A ) ∈ [ g ( λ ∘ , y ), g ( λ ∙ , y )] , ∀ y g ( z , y ) = θ (log z + z ∥ y ∥ 2 λ ∘ = λ ↓ ( K t )/2 + σ 2 y , λ ∙ = 2 λ ↑ ( K t ) + σ 2 f + σ 2 2 ), y and condition appears in various forms in previous work. Important practical consequences 5
Experiments and Conclusions Sydney property prices, brain fMRI, yeast genome Pittwater Manly Mosman Hunters Hill Woollahra Bayesian approach for network structure discovery Efficient inference stability for “free”, robustness and easy estimation 6
Recommend
More recommend