Graphical Models Exponential Families Variational Methods Mean Field Approximation Properties of the Cumulant ψ (Contd.) The set of mean parameters � � � µ ∈ R d |∃ p ( . ) s . t . M = t ( x ) p ( x ) ν ( dx ) = µ Consider the mapping Λ : Θ �→ M as � Λ( θ ) = E θ [ t ( x )] = t ( x ) p ( x ; θ ) ν ( dx ) x If t is minimal, Λ is one-to-one Further, Λ is onto the (relative) interior of M
Graphical Models Exponential Families Variational Methods Mean Field Approximation Fenchel-Legendre Conjugacy The conjugate dual function ψ ∗ ( µ ) = sup {� µ, θ � − ψ ( θ ) } θ ∈ Θ
Graphical Models Exponential Families Variational Methods Mean Field Approximation Fenchel-Legendre Conjugacy The conjugate dual function ψ ∗ ( µ ) = sup {� µ, θ � − ψ ( θ ) } θ ∈ Θ The (Bolzmann-Shannon) entropy of p ( x ; θ ) w.r.t. ν is � H ( p ( x ; θ )) = − p ( x ; θ ) log p ( x ; θ ) ν ( dx ) = − E θ [log p ( x ; θ )] x
Graphical Models Exponential Families Variational Methods Mean Field Approximation Fenchel-Legendre Conjugacy The conjugate dual function ψ ∗ ( µ ) = sup {� µ, θ � − ψ ( θ ) } θ ∈ Θ The (Bolzmann-Shannon) entropy of p ( x ; θ ) w.r.t. ν is � H ( p ( x ; θ )) = − p ( x ; θ ) log p ( x ; θ ) ν ( dx ) = − E θ [log p ( x ; θ )] x If µ ∈ ri M , then ψ ∗ ( µ ) = − H ( p ( x ; θ ( µ )))
Graphical Models Exponential Families Variational Methods Mean Field Approximation Fenchel-Legendre Conjugacy The conjugate dual function ψ ∗ ( µ ) = sup {� µ, θ � − ψ ( θ ) } θ ∈ Θ The (Bolzmann-Shannon) entropy of p ( x ; θ ) w.r.t. ν is � H ( p ( x ; θ )) = − p ( x ; θ ) log p ( x ; θ ) ν ( dx ) = − E θ [log p ( x ; θ )] x If µ ∈ ri M , then ψ ∗ ( µ ) = − H ( p ( x ; θ ( µ ))) In terms of the dual, ψ has a variational representation {� θ, µ � − ψ ∗ ( µ ) } ψ ( θ ) = sup µ ∈M
Graphical Models Exponential Families Variational Methods Mean Field Approximation Main Issues Key problems:
Graphical Models Exponential Families Variational Methods Mean Field Approximation Main Issues Key problems: Computation of the cumulant function ψ ( θ )
Graphical Models Exponential Families Variational Methods Mean Field Approximation Main Issues Key problems: Computation of the cumulant function ψ ( θ ) Computation of the mean parameter µ = E θ [ t ( x )]
Graphical Models Exponential Families Variational Methods Mean Field Approximation Main Issues Key problems: Computation of the cumulant function ψ ( θ ) Computation of the mean parameter µ = E θ [ t ( x )] The key equation for both problems {� θ, µ � − ψ ∗ ( µ ) } ψ ( θ ) = sup µ ∈M
Graphical Models Exponential Families Variational Methods Mean Field Approximation Main Issues Key problems: Computation of the cumulant function ψ ( θ ) Computation of the mean parameter µ = E θ [ t ( x )] The key equation for both problems {� θ, µ � − ψ ∗ ( µ ) } ψ ( θ ) = sup µ ∈M For all θ ∈ Θ, the supremum is attained by µ ∈ ri M � µ = E θ [ t ( x )] = t ( x ) p ( x ; θ ) ν ( dx ) x
Graphical Models Exponential Families Variational Methods Mean Field Approximation Main Issues Key problems: Computation of the cumulant function ψ ( θ ) Computation of the mean parameter µ = E θ [ t ( x )] The key equation for both problems {� θ, µ � − ψ ∗ ( µ ) } ψ ( θ ) = sup µ ∈M For all θ ∈ Θ, the supremum is attained by µ ∈ ri M � µ = E θ [ t ( x )] = t ( x ) p ( x ; θ ) ν ( dx ) x Two primary challenges
Graphical Models Exponential Families Variational Methods Mean Field Approximation Main Issues Key problems: Computation of the cumulant function ψ ( θ ) Computation of the mean parameter µ = E θ [ t ( x )] The key equation for both problems {� θ, µ � − ψ ∗ ( µ ) } ψ ( θ ) = sup µ ∈M For all θ ∈ Θ, the supremum is attained by µ ∈ ri M � µ = E θ [ t ( x )] = t ( x ) p ( x ; θ ) ν ( dx ) x Two primary challenges Set M is difficult to characterize
Graphical Models Exponential Families Variational Methods Mean Field Approximation Main Issues Key problems: Computation of the cumulant function ψ ( θ ) Computation of the mean parameter µ = E θ [ t ( x )] The key equation for both problems {� θ, µ � − ψ ∗ ( µ ) } ψ ( θ ) = sup µ ∈M For all θ ∈ Θ, the supremum is attained by µ ∈ ri M � µ = E θ [ t ( x )] = t ( x ) p ( x ; θ ) ν ( dx ) x Two primary challenges Set M is difficult to characterize Function ψ ∗ lacks an explicit definition
Graphical Models Exponential Families Variational Methods Mean Field Approximation Mean Parameters M has the following properties
Graphical Models Exponential Families Variational Methods Mean Field Approximation Mean Parameters M has the following properties M is full-dimensional if t is minimal
Graphical Models Exponential Families Variational Methods Mean Field Approximation Mean Parameters M has the following properties M is full-dimensional if t is minimal M is bounded iff Θ = R d and ψ is Lipschitz
Graphical Models Exponential Families Variational Methods Mean Field Approximation Mean Parameters M has the following properties M is full-dimensional if t is minimal M is bounded iff Θ = R d and ψ is Lipschitz Example: Mutinomial random vector x ∈ X n
Graphical Models Exponential Families Variational Methods Mean Field Approximation Mean Parameters M has the following properties M is full-dimensional if t is minimal M is bounded iff Θ = R d and ψ is Lipschitz Example: Mutinomial random vector x ∈ X n The set M is a polytope M = { µ ∈ R d |� a j , µ � ≤ b j , ∀ j ∈ J }
Graphical Models Exponential Families Variational Methods Mean Field Approximation Mean Parameters M has the following properties M is full-dimensional if t is minimal M is bounded iff Θ = R d and ψ is Lipschitz Example: Mutinomial random vector x ∈ X n The set M is a polytope M = { µ ∈ R d |� a j , µ � ≤ b j , ∀ j ∈ J } Index set J is finite, but can be large
Graphical Models Exponential Families Variational Methods Mean Field Approximation Mean Parameters M has the following properties M is full-dimensional if t is minimal M is bounded iff Θ = R d and ψ is Lipschitz Example: Mutinomial random vector x ∈ X n The set M is a polytope M = { µ ∈ R d |� a j , µ � ≤ b j , ∀ j ∈ J } Index set J is finite, but can be large Facets of the polytope can grow very fast with n
Graphical Models Exponential Families Variational Methods Mean Field Approximation Mean Parameters M has the following properties M is full-dimensional if t is minimal M is bounded iff Θ = R d and ψ is Lipschitz Example: Mutinomial random vector x ∈ X n The set M is a polytope M = { µ ∈ R d |� a j , µ � ≤ b j , ∀ j ∈ J } Index set J is finite, but can be large Facets of the polytope can grow very fast with n A complete graph with n = 7 has more than 2 × 10 8 facets
Graphical Models Exponential Families Variational Methods Mean Field Approximation Mean Parameters (Contd.)
Graphical Models Exponential Families Variational Methods Mean Field Approximation Dual Function ψ ∗ is the negative entropy
Graphical Models Exponential Families Variational Methods Mean Field Approximation Dual Function ψ ∗ is the negative entropy Typically, does not have an explicit closed form
Graphical Models Exponential Families Variational Methods Mean Field Approximation Dual Function ψ ∗ is the negative entropy Typically, does not have an explicit closed form In general, can be specified as a composition of two functions
Graphical Models Exponential Families Variational Methods Mean Field Approximation Dual Function ψ ∗ is the negative entropy Typically, does not have an explicit closed form In general, can be specified as a composition of two functions Compute an inverse image θ ( µ ) using Λ − 1 ( µ )
Graphical Models Exponential Families Variational Methods Mean Field Approximation Dual Function ψ ∗ is the negative entropy Typically, does not have an explicit closed form In general, can be specified as a composition of two functions Compute an inverse image θ ( µ ) using Λ − 1 ( µ ) Compute the negative entropy of p ( x ; θ ( µ ))
Graphical Models Exponential Families Variational Methods Mean Field Approximation Tractable Families Based on the key equation {� µ, θ � − ψ ∗ ( µ ) } ψ ( θ ) = sup µ ∈M
Graphical Models Exponential Families Variational Methods Mean Field Approximation Tractable Families Based on the key equation {� µ, θ � − ψ ∗ ( µ ) } ψ ( θ ) = sup µ ∈M Mean field focuses on tractable distributions
Graphical Models Exponential Families Variational Methods Mean Field Approximation Tractable Families Based on the key equation {� µ, θ � − ψ ∗ ( µ ) } ψ ( θ ) = sup µ ∈M Mean field focuses on tractable distributions Let H ⊆ G on which exact calculations are feasible
Graphical Models Exponential Families Variational Methods Mean Field Approximation Tractable Families Based on the key equation {� µ, θ � − ψ ∗ ( µ ) } ψ ( θ ) = sup µ ∈M Mean field focuses on tractable distributions Let H ⊆ G on which exact calculations are feasible I ( H ) be the indices of cliques in H
Graphical Models Exponential Families Variational Methods Mean Field Approximation Tractable Families Based on the key equation {� µ, θ � − ψ ∗ ( µ ) } ψ ( θ ) = sup µ ∈M Mean field focuses on tractable distributions Let H ⊆ G on which exact calculations are feasible I ( H ) be the indices of cliques in H Natural parameters for distributions corresponding to H E ( H ) = { θ ∈ Θ | θ α = 0 , ∀ α ∈ I \ I ( H ) }
Graphical Models Exponential Families Variational Methods Mean Field Approximation Tractable Families (Contd.) Simple tractable subgraph is H = ( V , ∅ )
Graphical Models Exponential Families Variational Methods Mean Field Approximation Tractable Families (Contd.) Simple tractable subgraph is H = ( V , ∅ ) Natural parameters belong to the subspace E ( H ) = { θ ∈ Θ | θ st = 0 , ∀ ( s , t ) ∈ E }
Graphical Models Exponential Families Variational Methods Mean Field Approximation Tractable Families (Contd.) Simple tractable subgraph is H = ( V , ∅ ) Natural parameters belong to the subspace E ( H ) = { θ ∈ Θ | θ st = 0 , ∀ ( s , t ) ∈ E } Corresponding distribution p ( x ; θ ) = � s ∈ V p ( x s ; θ s )
Graphical Models Exponential Families Variational Methods Mean Field Approximation Tractable Families (Contd.) Simple tractable subgraph is H = ( V , ∅ ) Natural parameters belong to the subspace E ( H ) = { θ ∈ Θ | θ st = 0 , ∀ ( s , t ) ∈ E } Corresponding distribution p ( x ; θ ) = � s ∈ V p ( x s ; θ s ) Structured approximation using spanning tree T = ( V , E ( T ))
Graphical Models Exponential Families Variational Methods Mean Field Approximation Tractable Families (Contd.) Simple tractable subgraph is H = ( V , ∅ ) Natural parameters belong to the subspace E ( H ) = { θ ∈ Θ | θ st = 0 , ∀ ( s , t ) ∈ E } Corresponding distribution p ( x ; θ ) = � s ∈ V p ( x s ; θ s ) Structured approximation using spanning tree T = ( V , E ( T )) Natural parameters belong to the subspace E ( T ) = { θ ∈ Θ | θ st = 0 , ∀ ( s , t ) ∈ E ( T ) }
Graphical Models Exponential Families Variational Methods Mean Field Approximation Tractable Families (Contd.) Simple tractable subgraph is H = ( V , ∅ ) Natural parameters belong to the subspace E ( H ) = { θ ∈ Θ | θ st = 0 , ∀ ( s , t ) ∈ E } Corresponding distribution p ( x ; θ ) = � s ∈ V p ( x s ; θ s ) Structured approximation using spanning tree T = ( V , E ( T )) Natural parameters belong to the subspace E ( T ) = { θ ∈ Θ | θ st = 0 , ∀ ( s , t ) ∈ E ( T ) } For a subgraph H , the set of realizable mean parameters M tract ( G ; H ) = { µ ∈ R d | µ = E θ [ t ( x )] , θ ∈ E ( H ) }
Graphical Models Exponential Families Variational Methods Mean Field Approximation Tractable Families (Contd.) Simple tractable subgraph is H = ( V , ∅ ) Natural parameters belong to the subspace E ( H ) = { θ ∈ Θ | θ st = 0 , ∀ ( s , t ) ∈ E } Corresponding distribution p ( x ; θ ) = � s ∈ V p ( x s ; θ s ) Structured approximation using spanning tree T = ( V , E ( T )) Natural parameters belong to the subspace E ( T ) = { θ ∈ Θ | θ st = 0 , ∀ ( s , t ) ∈ E ( T ) } For a subgraph H , the set of realizable mean parameters M tract ( G ; H ) = { µ ∈ R d | µ = E θ [ t ( x )] , θ ∈ E ( H ) } The inclusion M tract ( G ; H ) ⊆ M ( G ) always holds
Graphical Models Exponential Families Variational Methods Mean Field Approximation Lower Bounds For any µ ∈ ri M , ψ ( θ ) ≥ � θ, µ � − ψ ∗ ( µ )
Graphical Models Exponential Families Variational Methods Mean Field Approximation Lower Bounds For any µ ∈ ri M , ψ ( θ ) ≥ � θ, µ � − ψ ∗ ( µ ) Alternative proof using Jensen’s inequality p ( x ; θ )exp( � θ, t ( x ) � ) � ψ ( θ ) = log ν ( dx ) p ( x ; θ ) x � ≥ p ( x ; θ ) [ � θ, t ( x ) � − log p ( x ; θ ( µ ))] ν ( dx ) x � θ, µ � − ψ ∗ ( µ ) =
Graphical Models Exponential Families Variational Methods Mean Field Approximation Lower Bounds For any µ ∈ ri M , ψ ( θ ) ≥ � θ, µ � − ψ ∗ ( µ ) Alternative proof using Jensen’s inequality p ( x ; θ )exp( � θ, t ( x ) � ) � ψ ( θ ) = log ν ( dx ) p ( x ; θ ) x � ≥ p ( x ; θ ) [ � θ, t ( x ) � − log p ( x ; θ ( µ ))] ν ( dx ) x � θ, µ � − ψ ∗ ( µ ) = In general, ψ ∗ does not have closed form
Graphical Models Exponential Families Variational Methods Mean Field Approximation Lower Bounds For any µ ∈ ri M , ψ ( θ ) ≥ � θ, µ � − ψ ∗ ( µ ) Alternative proof using Jensen’s inequality p ( x ; θ )exp( � θ, t ( x ) � ) � ψ ( θ ) = log ν ( dx ) p ( x ; θ ) x � ≥ p ( x ; θ ) [ � θ, t ( x ) � − log p ( x ; θ ( µ ))] ν ( dx ) x � θ, µ � − ψ ∗ ( µ ) = In general, ψ ∗ does not have closed form Since ψ ∗ H has an explicit form, solve approximation {� µ, θ � − ψ ∗ sup H ( µ ) } µ ∈M tract
Graphical Models Exponential Families Variational Methods Mean Field Approximation Naive Mean Field Chooses a fully factorized distribution to approximate the original distribution
Graphical Models Exponential Families Variational Methods Mean Field Approximation Naive Mean Field Chooses a fully factorized distribution to approximate the original distribution We will study Ising model as an example
Graphical Models Exponential Families Variational Methods Mean Field Approximation Naive Mean Field Chooses a fully factorized distribution to approximate the original distribution We will study Ising model as an example Approximate G by fully disconnected graph H 0 with no edges
Graphical Models Exponential Families Variational Methods Mean Field Approximation Naive Mean Field Chooses a fully factorized distribution to approximate the original distribution We will study Ising model as an example Approximate G by fully disconnected graph H 0 with no edges Then, the mean parameter set M tract = { ( µ s , µ st ) | 0 ≤ µ s ≤ 1 , µ st = µ s µ t }
Graphical Models Exponential Families Variational Methods Mean Field Approximation Naive Mean Field Chooses a fully factorized distribution to approximate the original distribution We will study Ising model as an example Approximate G by fully disconnected graph H 0 with no edges Then, the mean parameter set M tract = { ( µ s , µ st ) | 0 ≤ µ s ≤ 1 , µ st = µ s µ t } The negative entropy of the product distribution is � ψ ∗ H 0 ( µ ) = [ µ s log µ s + (1 − µ s ) log(1 − µ s )] s ∈ V
Graphical Models Exponential Families Variational Methods Mean Field Approximation Naive Mean Field (Contd.) The naive mean field problem takes the form µ ∈M tract {� µ, θ � − ψ ∗ max H 0 ( µ ) }
Graphical Models Exponential Families Variational Methods Mean Field Approximation Naive Mean Field (Contd.) The naive mean field problem takes the form µ ∈M tract {� µ, θ � − ψ ∗ max H 0 ( µ ) } Using µ st = µ s µ t , we get the reduced problem � � � θ st µ s µ t − [ µ s log µ s + (1 − µ s ) log(1 max θ s µ s + { µ s }∈ [0 , 1] n s ∈ V s ∈ V ( s , t ) ∈ E
Graphical Models Exponential Families Variational Methods Mean Field Approximation Naive Mean Field (Contd.) The naive mean field problem takes the form µ ∈M tract {� µ, θ � − ψ ∗ max H 0 ( µ ) } Using µ st = µ s µ t , we get the reduced problem � � � θ st µ s µ t − [ µ s log µ s + (1 − µ s ) log(1 max θ s µ s + { µ s }∈ [0 , 1] n s ∈ V s ∈ V ( s , t ) ∈ E It is concave in µ s with other co-ordinates held fixed
Graphical Models Exponential Families Variational Methods Mean Field Approximation Naive Mean Field (Contd.) The naive mean field problem takes the form µ ∈M tract {� µ, θ � − ψ ∗ max H 0 ( µ ) } Using µ st = µ s µ t , we get the reduced problem � � � θ st µ s µ t − [ µ s log µ s + (1 − µ s ) log(1 max θ s µ s + { µ s }∈ [0 , 1] n s ∈ V s ∈ V ( s , t ) ∈ E It is concave in µ s with other co-ordinates held fixed Taking gradient and setting it to zero yields 1 µ s ← 1 + exp( − ( θ s + � t ∈ N ( s ) θ st µ t ))
Graphical Models Exponential Families Variational Methods Mean Field Approximation Structured Mean Field Considers tractable distributions with additional structure
Graphical Models Exponential Families Variational Methods Mean Field Approximation Structured Mean Field Considers tractable distributions with additional structure For subgraph H , lets I ( H ) be the index set associated with H
Graphical Models Exponential Families Variational Methods Mean Field Approximation Structured Mean Field Considers tractable distributions with additional structure For subgraph H , lets I ( H ) be the index set associated with H With µ ( H ) = { µ α | α ∈ H} , we have
Graphical Models Exponential Families Variational Methods Mean Field Approximation Structured Mean Field Considers tractable distributions with additional structure For subgraph H , lets I ( H ) be the index set associated with H With µ ( H ) = { µ α | α ∈ H} , we have The subvector µ ( H ) can be an arbitrary member of M ( H )
Graphical Models Exponential Families Variational Methods Mean Field Approximation Structured Mean Field Considers tractable distributions with additional structure For subgraph H , lets I ( H ) be the index set associated with H With µ ( H ) = { µ α | α ∈ H} , we have The subvector µ ( H ) can be an arbitrary member of M ( H ) Dual ψ ∗ H depends only on µ ( H ), not on µ β , β ∈ I ( G ) \ I ( H )
Graphical Models Exponential Families Variational Methods Mean Field Approximation Structured Mean Field Considers tractable distributions with additional structure For subgraph H , lets I ( H ) be the index set associated with H With µ ( H ) = { µ α | α ∈ H} , we have The subvector µ ( H ) can be an arbitrary member of M ( H ) Dual ψ ∗ H depends only on µ ( H ), not on µ β , β ∈ I ( G ) \ I ( H ) But such µ β do appear in the � µ, β � term
Graphical Models Exponential Families Variational Methods Mean Field Approximation Structured Mean Field Considers tractable distributions with additional structure For subgraph H , lets I ( H ) be the index set associated with H With µ ( H ) = { µ α | α ∈ H} , we have The subvector µ ( H ) can be an arbitrary member of M ( H ) Dual ψ ∗ H depends only on µ ( H ), not on µ β , β ∈ I ( G ) \ I ( H ) But such µ β do appear in the � µ, β � term Each µ β = g β ( µ ( H )), i.e., depends on µ ( H ) non-linearly
Graphical Models Exponential Families Variational Methods Mean Field Approximation Structured Mean Field Considers tractable distributions with additional structure For subgraph H , lets I ( H ) be the index set associated with H With µ ( H ) = { µ α | α ∈ H} , we have The subvector µ ( H ) can be an arbitrary member of M ( H ) Dual ψ ∗ H depends only on µ ( H ), not on µ β , β ∈ I ( G ) \ I ( H ) But such µ β do appear in the � µ, β � term Each µ β = g β ( µ ( H )), i.e., depends on µ ( H ) non-linearly The approximate optimization problem can be written as � � θ α g α ( µ ( H )) − ψ ∗ sup θ α µ α + H ( µ ( H )) µ ( H ) ∈M ( H ) α ∈I c ( H ) α ∈I ( H )
Graphical Models Exponential Families Variational Methods Mean Field Approximation Structured Mean Field Considers tractable distributions with additional structure For subgraph H , lets I ( H ) be the index set associated with H With µ ( H ) = { µ α | α ∈ H} , we have The subvector µ ( H ) can be an arbitrary member of M ( H ) Dual ψ ∗ H depends only on µ ( H ), not on µ β , β ∈ I ( G ) \ I ( H ) But such µ β do appear in the � µ, β � term Each µ β = g β ( µ ( H )), i.e., depends on µ ( H ) non-linearly The approximate optimization problem can be written as � � θ α g α ( µ ( H )) − ψ ∗ sup θ α µ α + H ( µ ( H )) µ ( H ) ∈M ( H ) α ∈I c ( H ) α ∈I ( H ) For Ising model, with H 0 = ( V , ∅ ), g st ( µ ( H 0 )) = µ s µ t
Graphical Models Exponential Families Variational Methods Mean Field Approximation Structured Mean Field (Contd.) Let F ( µ ( H )) denote the cost function
Graphical Models Exponential Families Variational Methods Mean Field Approximation Structured Mean Field (Contd.) Let F ( µ ( H )) denote the cost function Taking derivative w.r.t. µ β , β ∈ I ( H ) yields − ∂ψ ∗ ∂ F ( µ ( H )) ∂ g α ( µ ( H )) H ( µ ( H )) � = θ β + θ α ∂µ β ∂µ β ∂µ β α ∈I c ( H )
Graphical Models Exponential Families Variational Methods Mean Field Approximation Structured Mean Field (Contd.) Let F ( µ ( H )) denote the cost function Taking derivative w.r.t. µ β , β ∈ I ( H ) yields − ∂ψ ∗ ∂ F ( µ ( H )) ∂ g α ( µ ( H )) H ( µ ( H )) � = θ β + θ α ∂µ β ∂µ β ∂µ β α ∈I c ( H ) γ β ( H ) = ∂ψ ∗ H ( µ ( H )) is the inverse moment mapping ∂µ β
Graphical Models Exponential Families Variational Methods Mean Field Approximation Structured Mean Field (Contd.) Let F ( µ ( H )) denote the cost function Taking derivative w.r.t. µ β , β ∈ I ( H ) yields − ∂ψ ∗ ∂ F ( µ ( H )) ∂ g α ( µ ( H )) H ( µ ( H )) � = θ β + θ α ∂µ β ∂µ β ∂µ β α ∈I c ( H ) γ β ( H ) = ∂ψ ∗ H ( µ ( H )) is the inverse moment mapping ∂µ β Setting the gradient to zero yields the update ∂ g α ( µ ( H )) � γ β ( H ) ← θ β + θ α ∂µ β α ∈I c ( H )
Graphical Models Exponential Families Variational Methods Mean Field Approximation Structured Mean Field (Contd.) Let F ( µ ( H )) denote the cost function Taking derivative w.r.t. µ β , β ∈ I ( H ) yields − ∂ψ ∗ ∂ F ( µ ( H )) ∂ g α ( µ ( H )) H ( µ ( H )) � = θ β + θ α ∂µ β ∂µ β ∂µ β α ∈I c ( H ) γ β ( H ) = ∂ψ ∗ H ( µ ( H )) is the inverse moment mapping ∂µ β Setting the gradient to zero yields the update ∂ g α ( µ ( H )) � γ β ( H ) ← θ β + θ α ∂µ β α ∈I c ( H ) For Ising model, ∂ g st ∂µ s = µ t and so on
Graphical Models Exponential Families Variational Methods Mean Field Approximation Structured Mean Field (Contd.) Let F ( µ ( H )) denote the cost function Taking derivative w.r.t. µ β , β ∈ I ( H ) yields − ∂ψ ∗ ∂ F ( µ ( H )) ∂ g α ( µ ( H )) H ( µ ( H )) � = θ β + θ α ∂µ β ∂µ β ∂µ β α ∈I c ( H ) γ β ( H ) = ∂ψ ∗ H ( µ ( H )) is the inverse moment mapping ∂µ β Setting the gradient to zero yields the update ∂ g α ( µ ( H )) � γ β ( H ) ← θ β + θ α ∂µ β α ∈I c ( H ) For Ising model, ∂ g st ∂µ s = µ t and so on We get the exact updates as naive mean field
Graphical Models Exponential Families Variational Methods Mean Field Approximation Structured Mean Field (Contd.) Let F ( µ ( H )) denote the cost function Taking derivative w.r.t. µ β , β ∈ I ( H ) yields − ∂ψ ∗ ∂ F ( µ ( H )) ∂ g α ( µ ( H )) H ( µ ( H )) � = θ β + θ α ∂µ β ∂µ β ∂µ β α ∈I c ( H ) γ β ( H ) = ∂ψ ∗ H ( µ ( H )) is the inverse moment mapping ∂µ β Setting the gradient to zero yields the update ∂ g α ( µ ( H )) � γ β ( H ) ← θ β + θ α ∂µ β α ∈I c ( H ) For Ising model, ∂ g st ∂µ s = µ t and so on We get the exact updates as naive mean field In general, H can be more involved
Graphical Models Exponential Families Variational Methods Mean Field Approximation Non-convexity of Mean Field The original problem is concave
Recommend
More recommend