csci 8980 advanced topics in graphical models variational
play

CSci 8980: Advanced Topics in Graphical Models Variational Inference - PowerPoint PPT Presentation

Graphical Models Exponential Families Variational Methods Mean Field Approximation CSci 8980: Advanced Topics in Graphical Models Variational Inference Instructor: Arindam Banerjee October 17, 2007 Graphical Models Exponential Families


  1. Graphical Models Exponential Families Variational Methods Mean Field Approximation Properties of the Cumulant ψ (Contd.) The set of mean parameters � � � µ ∈ R d |∃ p ( . ) s . t . M = t ( x ) p ( x ) ν ( dx ) = µ Consider the mapping Λ : Θ �→ M as � Λ( θ ) = E θ [ t ( x )] = t ( x ) p ( x ; θ ) ν ( dx ) x If t is minimal, Λ is one-to-one Further, Λ is onto the (relative) interior of M

  2. Graphical Models Exponential Families Variational Methods Mean Field Approximation Fenchel-Legendre Conjugacy The conjugate dual function ψ ∗ ( µ ) = sup {� µ, θ � − ψ ( θ ) } θ ∈ Θ

  3. Graphical Models Exponential Families Variational Methods Mean Field Approximation Fenchel-Legendre Conjugacy The conjugate dual function ψ ∗ ( µ ) = sup {� µ, θ � − ψ ( θ ) } θ ∈ Θ The (Bolzmann-Shannon) entropy of p ( x ; θ ) w.r.t. ν is � H ( p ( x ; θ )) = − p ( x ; θ ) log p ( x ; θ ) ν ( dx ) = − E θ [log p ( x ; θ )] x

  4. Graphical Models Exponential Families Variational Methods Mean Field Approximation Fenchel-Legendre Conjugacy The conjugate dual function ψ ∗ ( µ ) = sup {� µ, θ � − ψ ( θ ) } θ ∈ Θ The (Bolzmann-Shannon) entropy of p ( x ; θ ) w.r.t. ν is � H ( p ( x ; θ )) = − p ( x ; θ ) log p ( x ; θ ) ν ( dx ) = − E θ [log p ( x ; θ )] x If µ ∈ ri M , then ψ ∗ ( µ ) = − H ( p ( x ; θ ( µ )))

  5. Graphical Models Exponential Families Variational Methods Mean Field Approximation Fenchel-Legendre Conjugacy The conjugate dual function ψ ∗ ( µ ) = sup {� µ, θ � − ψ ( θ ) } θ ∈ Θ The (Bolzmann-Shannon) entropy of p ( x ; θ ) w.r.t. ν is � H ( p ( x ; θ )) = − p ( x ; θ ) log p ( x ; θ ) ν ( dx ) = − E θ [log p ( x ; θ )] x If µ ∈ ri M , then ψ ∗ ( µ ) = − H ( p ( x ; θ ( µ ))) In terms of the dual, ψ has a variational representation {� θ, µ � − ψ ∗ ( µ ) } ψ ( θ ) = sup µ ∈M

  6. Graphical Models Exponential Families Variational Methods Mean Field Approximation Main Issues Key problems:

  7. Graphical Models Exponential Families Variational Methods Mean Field Approximation Main Issues Key problems: Computation of the cumulant function ψ ( θ )

  8. Graphical Models Exponential Families Variational Methods Mean Field Approximation Main Issues Key problems: Computation of the cumulant function ψ ( θ ) Computation of the mean parameter µ = E θ [ t ( x )]

  9. Graphical Models Exponential Families Variational Methods Mean Field Approximation Main Issues Key problems: Computation of the cumulant function ψ ( θ ) Computation of the mean parameter µ = E θ [ t ( x )] The key equation for both problems {� θ, µ � − ψ ∗ ( µ ) } ψ ( θ ) = sup µ ∈M

  10. Graphical Models Exponential Families Variational Methods Mean Field Approximation Main Issues Key problems: Computation of the cumulant function ψ ( θ ) Computation of the mean parameter µ = E θ [ t ( x )] The key equation for both problems {� θ, µ � − ψ ∗ ( µ ) } ψ ( θ ) = sup µ ∈M For all θ ∈ Θ, the supremum is attained by µ ∈ ri M � µ = E θ [ t ( x )] = t ( x ) p ( x ; θ ) ν ( dx ) x

  11. Graphical Models Exponential Families Variational Methods Mean Field Approximation Main Issues Key problems: Computation of the cumulant function ψ ( θ ) Computation of the mean parameter µ = E θ [ t ( x )] The key equation for both problems {� θ, µ � − ψ ∗ ( µ ) } ψ ( θ ) = sup µ ∈M For all θ ∈ Θ, the supremum is attained by µ ∈ ri M � µ = E θ [ t ( x )] = t ( x ) p ( x ; θ ) ν ( dx ) x Two primary challenges

  12. Graphical Models Exponential Families Variational Methods Mean Field Approximation Main Issues Key problems: Computation of the cumulant function ψ ( θ ) Computation of the mean parameter µ = E θ [ t ( x )] The key equation for both problems {� θ, µ � − ψ ∗ ( µ ) } ψ ( θ ) = sup µ ∈M For all θ ∈ Θ, the supremum is attained by µ ∈ ri M � µ = E θ [ t ( x )] = t ( x ) p ( x ; θ ) ν ( dx ) x Two primary challenges Set M is difficult to characterize

  13. Graphical Models Exponential Families Variational Methods Mean Field Approximation Main Issues Key problems: Computation of the cumulant function ψ ( θ ) Computation of the mean parameter µ = E θ [ t ( x )] The key equation for both problems {� θ, µ � − ψ ∗ ( µ ) } ψ ( θ ) = sup µ ∈M For all θ ∈ Θ, the supremum is attained by µ ∈ ri M � µ = E θ [ t ( x )] = t ( x ) p ( x ; θ ) ν ( dx ) x Two primary challenges Set M is difficult to characterize Function ψ ∗ lacks an explicit definition

  14. Graphical Models Exponential Families Variational Methods Mean Field Approximation Mean Parameters M has the following properties

  15. Graphical Models Exponential Families Variational Methods Mean Field Approximation Mean Parameters M has the following properties M is full-dimensional if t is minimal

  16. Graphical Models Exponential Families Variational Methods Mean Field Approximation Mean Parameters M has the following properties M is full-dimensional if t is minimal M is bounded iff Θ = R d and ψ is Lipschitz

  17. Graphical Models Exponential Families Variational Methods Mean Field Approximation Mean Parameters M has the following properties M is full-dimensional if t is minimal M is bounded iff Θ = R d and ψ is Lipschitz Example: Mutinomial random vector x ∈ X n

  18. Graphical Models Exponential Families Variational Methods Mean Field Approximation Mean Parameters M has the following properties M is full-dimensional if t is minimal M is bounded iff Θ = R d and ψ is Lipschitz Example: Mutinomial random vector x ∈ X n The set M is a polytope M = { µ ∈ R d |� a j , µ � ≤ b j , ∀ j ∈ J }

  19. Graphical Models Exponential Families Variational Methods Mean Field Approximation Mean Parameters M has the following properties M is full-dimensional if t is minimal M is bounded iff Θ = R d and ψ is Lipschitz Example: Mutinomial random vector x ∈ X n The set M is a polytope M = { µ ∈ R d |� a j , µ � ≤ b j , ∀ j ∈ J } Index set J is finite, but can be large

  20. Graphical Models Exponential Families Variational Methods Mean Field Approximation Mean Parameters M has the following properties M is full-dimensional if t is minimal M is bounded iff Θ = R d and ψ is Lipschitz Example: Mutinomial random vector x ∈ X n The set M is a polytope M = { µ ∈ R d |� a j , µ � ≤ b j , ∀ j ∈ J } Index set J is finite, but can be large Facets of the polytope can grow very fast with n

  21. Graphical Models Exponential Families Variational Methods Mean Field Approximation Mean Parameters M has the following properties M is full-dimensional if t is minimal M is bounded iff Θ = R d and ψ is Lipschitz Example: Mutinomial random vector x ∈ X n The set M is a polytope M = { µ ∈ R d |� a j , µ � ≤ b j , ∀ j ∈ J } Index set J is finite, but can be large Facets of the polytope can grow very fast with n A complete graph with n = 7 has more than 2 × 10 8 facets

  22. Graphical Models Exponential Families Variational Methods Mean Field Approximation Mean Parameters (Contd.)

  23. Graphical Models Exponential Families Variational Methods Mean Field Approximation Dual Function ψ ∗ is the negative entropy

  24. Graphical Models Exponential Families Variational Methods Mean Field Approximation Dual Function ψ ∗ is the negative entropy Typically, does not have an explicit closed form

  25. Graphical Models Exponential Families Variational Methods Mean Field Approximation Dual Function ψ ∗ is the negative entropy Typically, does not have an explicit closed form In general, can be specified as a composition of two functions

  26. Graphical Models Exponential Families Variational Methods Mean Field Approximation Dual Function ψ ∗ is the negative entropy Typically, does not have an explicit closed form In general, can be specified as a composition of two functions Compute an inverse image θ ( µ ) using Λ − 1 ( µ )

  27. Graphical Models Exponential Families Variational Methods Mean Field Approximation Dual Function ψ ∗ is the negative entropy Typically, does not have an explicit closed form In general, can be specified as a composition of two functions Compute an inverse image θ ( µ ) using Λ − 1 ( µ ) Compute the negative entropy of p ( x ; θ ( µ ))

  28. Graphical Models Exponential Families Variational Methods Mean Field Approximation Tractable Families Based on the key equation {� µ, θ � − ψ ∗ ( µ ) } ψ ( θ ) = sup µ ∈M

  29. Graphical Models Exponential Families Variational Methods Mean Field Approximation Tractable Families Based on the key equation {� µ, θ � − ψ ∗ ( µ ) } ψ ( θ ) = sup µ ∈M Mean field focuses on tractable distributions

  30. Graphical Models Exponential Families Variational Methods Mean Field Approximation Tractable Families Based on the key equation {� µ, θ � − ψ ∗ ( µ ) } ψ ( θ ) = sup µ ∈M Mean field focuses on tractable distributions Let H ⊆ G on which exact calculations are feasible

  31. Graphical Models Exponential Families Variational Methods Mean Field Approximation Tractable Families Based on the key equation {� µ, θ � − ψ ∗ ( µ ) } ψ ( θ ) = sup µ ∈M Mean field focuses on tractable distributions Let H ⊆ G on which exact calculations are feasible I ( H ) be the indices of cliques in H

  32. Graphical Models Exponential Families Variational Methods Mean Field Approximation Tractable Families Based on the key equation {� µ, θ � − ψ ∗ ( µ ) } ψ ( θ ) = sup µ ∈M Mean field focuses on tractable distributions Let H ⊆ G on which exact calculations are feasible I ( H ) be the indices of cliques in H Natural parameters for distributions corresponding to H E ( H ) = { θ ∈ Θ | θ α = 0 , ∀ α ∈ I \ I ( H ) }

  33. Graphical Models Exponential Families Variational Methods Mean Field Approximation Tractable Families (Contd.) Simple tractable subgraph is H = ( V , ∅ )

  34. Graphical Models Exponential Families Variational Methods Mean Field Approximation Tractable Families (Contd.) Simple tractable subgraph is H = ( V , ∅ ) Natural parameters belong to the subspace E ( H ) = { θ ∈ Θ | θ st = 0 , ∀ ( s , t ) ∈ E }

  35. Graphical Models Exponential Families Variational Methods Mean Field Approximation Tractable Families (Contd.) Simple tractable subgraph is H = ( V , ∅ ) Natural parameters belong to the subspace E ( H ) = { θ ∈ Θ | θ st = 0 , ∀ ( s , t ) ∈ E } Corresponding distribution p ( x ; θ ) = � s ∈ V p ( x s ; θ s )

  36. Graphical Models Exponential Families Variational Methods Mean Field Approximation Tractable Families (Contd.) Simple tractable subgraph is H = ( V , ∅ ) Natural parameters belong to the subspace E ( H ) = { θ ∈ Θ | θ st = 0 , ∀ ( s , t ) ∈ E } Corresponding distribution p ( x ; θ ) = � s ∈ V p ( x s ; θ s ) Structured approximation using spanning tree T = ( V , E ( T ))

  37. Graphical Models Exponential Families Variational Methods Mean Field Approximation Tractable Families (Contd.) Simple tractable subgraph is H = ( V , ∅ ) Natural parameters belong to the subspace E ( H ) = { θ ∈ Θ | θ st = 0 , ∀ ( s , t ) ∈ E } Corresponding distribution p ( x ; θ ) = � s ∈ V p ( x s ; θ s ) Structured approximation using spanning tree T = ( V , E ( T )) Natural parameters belong to the subspace E ( T ) = { θ ∈ Θ | θ st = 0 , ∀ ( s , t ) ∈ E ( T ) }

  38. Graphical Models Exponential Families Variational Methods Mean Field Approximation Tractable Families (Contd.) Simple tractable subgraph is H = ( V , ∅ ) Natural parameters belong to the subspace E ( H ) = { θ ∈ Θ | θ st = 0 , ∀ ( s , t ) ∈ E } Corresponding distribution p ( x ; θ ) = � s ∈ V p ( x s ; θ s ) Structured approximation using spanning tree T = ( V , E ( T )) Natural parameters belong to the subspace E ( T ) = { θ ∈ Θ | θ st = 0 , ∀ ( s , t ) ∈ E ( T ) } For a subgraph H , the set of realizable mean parameters M tract ( G ; H ) = { µ ∈ R d | µ = E θ [ t ( x )] , θ ∈ E ( H ) }

  39. Graphical Models Exponential Families Variational Methods Mean Field Approximation Tractable Families (Contd.) Simple tractable subgraph is H = ( V , ∅ ) Natural parameters belong to the subspace E ( H ) = { θ ∈ Θ | θ st = 0 , ∀ ( s , t ) ∈ E } Corresponding distribution p ( x ; θ ) = � s ∈ V p ( x s ; θ s ) Structured approximation using spanning tree T = ( V , E ( T )) Natural parameters belong to the subspace E ( T ) = { θ ∈ Θ | θ st = 0 , ∀ ( s , t ) ∈ E ( T ) } For a subgraph H , the set of realizable mean parameters M tract ( G ; H ) = { µ ∈ R d | µ = E θ [ t ( x )] , θ ∈ E ( H ) } The inclusion M tract ( G ; H ) ⊆ M ( G ) always holds

  40. Graphical Models Exponential Families Variational Methods Mean Field Approximation Lower Bounds For any µ ∈ ri M , ψ ( θ ) ≥ � θ, µ � − ψ ∗ ( µ )

  41. Graphical Models Exponential Families Variational Methods Mean Field Approximation Lower Bounds For any µ ∈ ri M , ψ ( θ ) ≥ � θ, µ � − ψ ∗ ( µ ) Alternative proof using Jensen’s inequality p ( x ; θ )exp( � θ, t ( x ) � ) � ψ ( θ ) = log ν ( dx ) p ( x ; θ ) x � ≥ p ( x ; θ ) [ � θ, t ( x ) � − log p ( x ; θ ( µ ))] ν ( dx ) x � θ, µ � − ψ ∗ ( µ ) =

  42. Graphical Models Exponential Families Variational Methods Mean Field Approximation Lower Bounds For any µ ∈ ri M , ψ ( θ ) ≥ � θ, µ � − ψ ∗ ( µ ) Alternative proof using Jensen’s inequality p ( x ; θ )exp( � θ, t ( x ) � ) � ψ ( θ ) = log ν ( dx ) p ( x ; θ ) x � ≥ p ( x ; θ ) [ � θ, t ( x ) � − log p ( x ; θ ( µ ))] ν ( dx ) x � θ, µ � − ψ ∗ ( µ ) = In general, ψ ∗ does not have closed form

  43. Graphical Models Exponential Families Variational Methods Mean Field Approximation Lower Bounds For any µ ∈ ri M , ψ ( θ ) ≥ � θ, µ � − ψ ∗ ( µ ) Alternative proof using Jensen’s inequality p ( x ; θ )exp( � θ, t ( x ) � ) � ψ ( θ ) = log ν ( dx ) p ( x ; θ ) x � ≥ p ( x ; θ ) [ � θ, t ( x ) � − log p ( x ; θ ( µ ))] ν ( dx ) x � θ, µ � − ψ ∗ ( µ ) = In general, ψ ∗ does not have closed form Since ψ ∗ H has an explicit form, solve approximation {� µ, θ � − ψ ∗ sup H ( µ ) } µ ∈M tract

  44. Graphical Models Exponential Families Variational Methods Mean Field Approximation Naive Mean Field Chooses a fully factorized distribution to approximate the original distribution

  45. Graphical Models Exponential Families Variational Methods Mean Field Approximation Naive Mean Field Chooses a fully factorized distribution to approximate the original distribution We will study Ising model as an example

  46. Graphical Models Exponential Families Variational Methods Mean Field Approximation Naive Mean Field Chooses a fully factorized distribution to approximate the original distribution We will study Ising model as an example Approximate G by fully disconnected graph H 0 with no edges

  47. Graphical Models Exponential Families Variational Methods Mean Field Approximation Naive Mean Field Chooses a fully factorized distribution to approximate the original distribution We will study Ising model as an example Approximate G by fully disconnected graph H 0 with no edges Then, the mean parameter set M tract = { ( µ s , µ st ) | 0 ≤ µ s ≤ 1 , µ st = µ s µ t }

  48. Graphical Models Exponential Families Variational Methods Mean Field Approximation Naive Mean Field Chooses a fully factorized distribution to approximate the original distribution We will study Ising model as an example Approximate G by fully disconnected graph H 0 with no edges Then, the mean parameter set M tract = { ( µ s , µ st ) | 0 ≤ µ s ≤ 1 , µ st = µ s µ t } The negative entropy of the product distribution is � ψ ∗ H 0 ( µ ) = [ µ s log µ s + (1 − µ s ) log(1 − µ s )] s ∈ V

  49. Graphical Models Exponential Families Variational Methods Mean Field Approximation Naive Mean Field (Contd.) The naive mean field problem takes the form µ ∈M tract {� µ, θ � − ψ ∗ max H 0 ( µ ) }

  50. Graphical Models Exponential Families Variational Methods Mean Field Approximation Naive Mean Field (Contd.) The naive mean field problem takes the form µ ∈M tract {� µ, θ � − ψ ∗ max H 0 ( µ ) } Using µ st = µ s µ t , we get the reduced problem   � � � θ st µ s µ t − [ µ s log µ s + (1 − µ s ) log(1 max θ s µ s + { µ s }∈ [0 , 1] n  s ∈ V s ∈ V ( s , t ) ∈ E

  51. Graphical Models Exponential Families Variational Methods Mean Field Approximation Naive Mean Field (Contd.) The naive mean field problem takes the form µ ∈M tract {� µ, θ � − ψ ∗ max H 0 ( µ ) } Using µ st = µ s µ t , we get the reduced problem   � � � θ st µ s µ t − [ µ s log µ s + (1 − µ s ) log(1 max θ s µ s + { µ s }∈ [0 , 1] n  s ∈ V s ∈ V ( s , t ) ∈ E It is concave in µ s with other co-ordinates held fixed

  52. Graphical Models Exponential Families Variational Methods Mean Field Approximation Naive Mean Field (Contd.) The naive mean field problem takes the form µ ∈M tract {� µ, θ � − ψ ∗ max H 0 ( µ ) } Using µ st = µ s µ t , we get the reduced problem   � � � θ st µ s µ t − [ µ s log µ s + (1 − µ s ) log(1 max θ s µ s + { µ s }∈ [0 , 1] n  s ∈ V s ∈ V ( s , t ) ∈ E It is concave in µ s with other co-ordinates held fixed Taking gradient and setting it to zero yields 1 µ s ← 1 + exp( − ( θ s + � t ∈ N ( s ) θ st µ t ))

  53. Graphical Models Exponential Families Variational Methods Mean Field Approximation Structured Mean Field Considers tractable distributions with additional structure

  54. Graphical Models Exponential Families Variational Methods Mean Field Approximation Structured Mean Field Considers tractable distributions with additional structure For subgraph H , lets I ( H ) be the index set associated with H

  55. Graphical Models Exponential Families Variational Methods Mean Field Approximation Structured Mean Field Considers tractable distributions with additional structure For subgraph H , lets I ( H ) be the index set associated with H With µ ( H ) = { µ α | α ∈ H} , we have

  56. Graphical Models Exponential Families Variational Methods Mean Field Approximation Structured Mean Field Considers tractable distributions with additional structure For subgraph H , lets I ( H ) be the index set associated with H With µ ( H ) = { µ α | α ∈ H} , we have The subvector µ ( H ) can be an arbitrary member of M ( H )

  57. Graphical Models Exponential Families Variational Methods Mean Field Approximation Structured Mean Field Considers tractable distributions with additional structure For subgraph H , lets I ( H ) be the index set associated with H With µ ( H ) = { µ α | α ∈ H} , we have The subvector µ ( H ) can be an arbitrary member of M ( H ) Dual ψ ∗ H depends only on µ ( H ), not on µ β , β ∈ I ( G ) \ I ( H )

  58. Graphical Models Exponential Families Variational Methods Mean Field Approximation Structured Mean Field Considers tractable distributions with additional structure For subgraph H , lets I ( H ) be the index set associated with H With µ ( H ) = { µ α | α ∈ H} , we have The subvector µ ( H ) can be an arbitrary member of M ( H ) Dual ψ ∗ H depends only on µ ( H ), not on µ β , β ∈ I ( G ) \ I ( H ) But such µ β do appear in the � µ, β � term

  59. Graphical Models Exponential Families Variational Methods Mean Field Approximation Structured Mean Field Considers tractable distributions with additional structure For subgraph H , lets I ( H ) be the index set associated with H With µ ( H ) = { µ α | α ∈ H} , we have The subvector µ ( H ) can be an arbitrary member of M ( H ) Dual ψ ∗ H depends only on µ ( H ), not on µ β , β ∈ I ( G ) \ I ( H ) But such µ β do appear in the � µ, β � term Each µ β = g β ( µ ( H )), i.e., depends on µ ( H ) non-linearly

  60. Graphical Models Exponential Families Variational Methods Mean Field Approximation Structured Mean Field Considers tractable distributions with additional structure For subgraph H , lets I ( H ) be the index set associated with H With µ ( H ) = { µ α | α ∈ H} , we have The subvector µ ( H ) can be an arbitrary member of M ( H ) Dual ψ ∗ H depends only on µ ( H ), not on µ β , β ∈ I ( G ) \ I ( H ) But such µ β do appear in the � µ, β � term Each µ β = g β ( µ ( H )), i.e., depends on µ ( H ) non-linearly The approximate optimization problem can be written as     � � θ α g α ( µ ( H )) − ψ ∗ sup θ α µ α + H ( µ ( H )) µ ( H ) ∈M ( H )  α ∈I c ( H )  α ∈I ( H )

  61. Graphical Models Exponential Families Variational Methods Mean Field Approximation Structured Mean Field Considers tractable distributions with additional structure For subgraph H , lets I ( H ) be the index set associated with H With µ ( H ) = { µ α | α ∈ H} , we have The subvector µ ( H ) can be an arbitrary member of M ( H ) Dual ψ ∗ H depends only on µ ( H ), not on µ β , β ∈ I ( G ) \ I ( H ) But such µ β do appear in the � µ, β � term Each µ β = g β ( µ ( H )), i.e., depends on µ ( H ) non-linearly The approximate optimization problem can be written as     � � θ α g α ( µ ( H )) − ψ ∗ sup θ α µ α + H ( µ ( H )) µ ( H ) ∈M ( H )  α ∈I c ( H )  α ∈I ( H ) For Ising model, with H 0 = ( V , ∅ ), g st ( µ ( H 0 )) = µ s µ t

  62. Graphical Models Exponential Families Variational Methods Mean Field Approximation Structured Mean Field (Contd.) Let F ( µ ( H )) denote the cost function

  63. Graphical Models Exponential Families Variational Methods Mean Field Approximation Structured Mean Field (Contd.) Let F ( µ ( H )) denote the cost function Taking derivative w.r.t. µ β , β ∈ I ( H ) yields − ∂ψ ∗ ∂ F ( µ ( H )) ∂ g α ( µ ( H )) H ( µ ( H )) � = θ β + θ α ∂µ β ∂µ β ∂µ β α ∈I c ( H )

  64. Graphical Models Exponential Families Variational Methods Mean Field Approximation Structured Mean Field (Contd.) Let F ( µ ( H )) denote the cost function Taking derivative w.r.t. µ β , β ∈ I ( H ) yields − ∂ψ ∗ ∂ F ( µ ( H )) ∂ g α ( µ ( H )) H ( µ ( H )) � = θ β + θ α ∂µ β ∂µ β ∂µ β α ∈I c ( H ) γ β ( H ) = ∂ψ ∗ H ( µ ( H )) is the inverse moment mapping ∂µ β

  65. Graphical Models Exponential Families Variational Methods Mean Field Approximation Structured Mean Field (Contd.) Let F ( µ ( H )) denote the cost function Taking derivative w.r.t. µ β , β ∈ I ( H ) yields − ∂ψ ∗ ∂ F ( µ ( H )) ∂ g α ( µ ( H )) H ( µ ( H )) � = θ β + θ α ∂µ β ∂µ β ∂µ β α ∈I c ( H ) γ β ( H ) = ∂ψ ∗ H ( µ ( H )) is the inverse moment mapping ∂µ β Setting the gradient to zero yields the update ∂ g α ( µ ( H )) � γ β ( H ) ← θ β + θ α ∂µ β α ∈I c ( H )

  66. Graphical Models Exponential Families Variational Methods Mean Field Approximation Structured Mean Field (Contd.) Let F ( µ ( H )) denote the cost function Taking derivative w.r.t. µ β , β ∈ I ( H ) yields − ∂ψ ∗ ∂ F ( µ ( H )) ∂ g α ( µ ( H )) H ( µ ( H )) � = θ β + θ α ∂µ β ∂µ β ∂µ β α ∈I c ( H ) γ β ( H ) = ∂ψ ∗ H ( µ ( H )) is the inverse moment mapping ∂µ β Setting the gradient to zero yields the update ∂ g α ( µ ( H )) � γ β ( H ) ← θ β + θ α ∂µ β α ∈I c ( H ) For Ising model, ∂ g st ∂µ s = µ t and so on

  67. Graphical Models Exponential Families Variational Methods Mean Field Approximation Structured Mean Field (Contd.) Let F ( µ ( H )) denote the cost function Taking derivative w.r.t. µ β , β ∈ I ( H ) yields − ∂ψ ∗ ∂ F ( µ ( H )) ∂ g α ( µ ( H )) H ( µ ( H )) � = θ β + θ α ∂µ β ∂µ β ∂µ β α ∈I c ( H ) γ β ( H ) = ∂ψ ∗ H ( µ ( H )) is the inverse moment mapping ∂µ β Setting the gradient to zero yields the update ∂ g α ( µ ( H )) � γ β ( H ) ← θ β + θ α ∂µ β α ∈I c ( H ) For Ising model, ∂ g st ∂µ s = µ t and so on We get the exact updates as naive mean field

  68. Graphical Models Exponential Families Variational Methods Mean Field Approximation Structured Mean Field (Contd.) Let F ( µ ( H )) denote the cost function Taking derivative w.r.t. µ β , β ∈ I ( H ) yields − ∂ψ ∗ ∂ F ( µ ( H )) ∂ g α ( µ ( H )) H ( µ ( H )) � = θ β + θ α ∂µ β ∂µ β ∂µ β α ∈I c ( H ) γ β ( H ) = ∂ψ ∗ H ( µ ( H )) is the inverse moment mapping ∂µ β Setting the gradient to zero yields the update ∂ g α ( µ ( H )) � γ β ( H ) ← θ β + θ α ∂µ β α ∈I c ( H ) For Ising model, ∂ g st ∂µ s = µ t and so on We get the exact updates as naive mean field In general, H can be more involved

  69. Graphical Models Exponential Families Variational Methods Mean Field Approximation Non-convexity of Mean Field The original problem is concave

Recommend


More recommend