probabilistic graphical models
play

Probabilistic Graphical Models CMSC 678 UMBC Probabilistic - PowerPoint PPT Presentation

Probabilistic Graphical Models CMSC 678 UMBC Probabilistic Graphical Models A graph G that represents a probability distribution over random variables 1 , , Probabilistic Graphical Models A graph G that represents a


  1. Multinomial NaΓ―ve Bayes: A Generative Story Generative Story 𝜚 = distribution over 𝐿 labels y for label 𝑙 = 1 to 𝐿: πœ„ 𝑙 = distribution over J feature values for item 𝑗 = 1 to 𝑂: 𝑦 𝑗1 𝑦 𝑗2 𝑦 𝑗3 𝑦 𝑗4 𝑦 𝑗5 𝑧 𝑗 ~ Cat 𝜚 for each feature π‘˜ 𝑦 π‘—π‘˜ ∼ Cat(πœ„ 𝑧 𝑗 ) Maximize Log-likelihood β„’ πœ„ = ෍ ෍ log πœ„ 𝑧 𝑗 ,𝑦 𝑗,π‘˜ + ෍ log 𝜚 𝑧 𝑗 s. t. 𝑗 π‘˜ 𝑗 𝜚 𝑙 β‰₯ 0 ෍ πœ„ π‘™π‘˜ = 1 βˆ€π‘™ ෍ 𝜚 𝑙 = 1 πœ„ π‘™π‘˜ β‰₯ 0, π‘˜ 𝑙

  2. Multinomial NaΓ―ve Bayes: A Generative Story Generative Story 𝜚 = distribution over 𝐿 labels y for label 𝑙 = 1 to 𝐿: πœ„ 𝑙 = distribution over J feature values for item 𝑗 = 1 to 𝑂: 𝑦 𝑗1 𝑦 𝑗2 𝑦 𝑗3 𝑦 𝑗4 𝑦 𝑗5 𝑧 𝑗 ~ Cat 𝜚 for each feature π‘˜ 𝑦 π‘—π‘˜ ∼ Cat(πœ„ 𝑧 𝑗 ,π‘˜ ) Maximize Log-likelihood via Lagrange Multipliers ( β‰₯ 𝟏 constraints not shown) β„’ πœ„ = ෍ ෍ log πœ„ 𝑧 𝑗 ,𝑦 𝑗,π‘˜ + ෍ log 𝜚 𝑧 𝑗 βˆ’ 𝜈 ෍ 𝜚 𝑙 βˆ’ 1 βˆ’ ෍ πœ‡ 𝑙 ෍ πœ„ π‘™π‘˜ βˆ’ 1 𝑗 π‘˜ 𝑗 𝑙 𝑙 π‘˜

  3. Multinomial NaΓ―ve Bayes: Learning Calculate feature generation terms Calculate class priors For each k : For each k : obs k = single object containing all items k = all items with class = k items labeled as k Foreach feature j n kj = # of occurrences of j in obs k π‘œ π‘™π‘˜ π‘ž 𝑙 = |items 𝑙 | π‘ž π‘˜|𝑙 = Οƒ π‘˜ β€² π‘œ π‘™π‘˜ β€² # items

  4. Brill and Banko (2001) With enough data, the classifier may not matter Adapted from Jurafsky & Martin (draft)

  5. Summary: NaΓ―ve Bayes is Not So NaΓ―ve, but not without issue Pro Con Model the posterior in one go? Very Fast, low storage requirements (e.g., use conditional maxent) Robust to Irrelevant Features Are the features really uncorrelated? Very good in domains with many equally important features Are plain counts always appropriate? Optimal if the independence assumptions hold Are there β€œbetter” ways of handling missing/noisy data? Dependable baseline for text (automated, more principled) classification (but often not the best) Adapted from Jurafsky & Martin (draft)

  6. Outline Directed Graphical Models NaΓ―ve Bayes Undirected Graphical Models Factor Graphs Ising Model Message Passing: Graphical Model Inference

  7. Undirected Graphical Models An undirected graph G=(V,E) that represents a probability distribution over random variables π‘Œ 1 , … , π‘Œ 𝑂 Joint probability factorizes based on cliques in the graph

  8. Undirected Graphical Models An undirected graph G=(V,E) that represents a probability distribution over random variables π‘Œ 1 , … , π‘Œ 𝑂 Joint probability factorizes based on cliques in the graph Common name: Markov Random Fields

  9. Undirected Graphical Models An undirected graph G=(V,E) that represents a probability distribution over random variables π‘Œ 1 , … , π‘Œ 𝑂 Joint probability factorizes based on cliques in the graph Common name: Markov Random Fields Undirected graphs can have an alternative formulation as Factor Graphs

  10. Markov Random Fields: Undirected Graphs π‘ž 𝑦 1 , 𝑦 2 , 𝑦 3 , … , 𝑦 𝑂

  11. Markov Random Fields: Undirected Graphs clique : subset of nodes, where nodes are pairwise connected maximal clique : a clique that cannot add a node and remain a clique π‘ž 𝑦 1 , 𝑦 2 , 𝑦 3 , … , 𝑦 𝑂

  12. Markov Random Fields: Undirected Graphs clique : subset of nodes, where nodes are pairwise connected maximal clique : a clique that cannot add a node and remain a clique π‘ž 𝑦 1 , 𝑦 2 , 𝑦 3 , … , 𝑦 𝑂 = 1 π‘Ž ΰ·‘ πœ” 𝐷 𝑦 𝑑 𝐷 variables part of the clique C global normalization maximal potential function (not cliques necessarily a probability!)

  13. Markov Random Fields: Undirected Graphs clique : subset of nodes, where nodes are pairwise connected maximal clique : a clique that cannot add a node and remain a clique π‘ž 𝑦 1 , 𝑦 2 , 𝑦 3 , … , 𝑦 𝑂 = 1 π‘Ž ΰ·‘ πœ” 𝐷 𝑦 𝑑 𝐷 variables part of the clique C global normalization maximal potential function (not cliques necessarily a probability!)

  14. Markov Random Fields: Undirected Graphs clique : subset of nodes, where nodes are pairwise connected maximal clique : a clique that cannot add a node and remain a clique π‘ž 𝑦 1 , 𝑦 2 , 𝑦 3 , … , 𝑦 𝑂 = 1 π‘Ž ΰ·‘ πœ” 𝐷 𝑦 𝑑 𝐷 variables part Q : What restrictions should we of the clique C place on the potentials πœ” 𝐷 ? global normalization maximal potential function (not cliques necessarily a probability!)

  15. Markov Random Fields: Undirected Graphs clique : subset of nodes, where nodes are pairwise connected maximal clique : a clique that cannot add a node and remain a clique π‘ž 𝑦 1 , 𝑦 2 , 𝑦 3 , … , 𝑦 𝑂 = 1 π‘Ž ΰ·‘ πœ” 𝐷 𝑦 𝑑 𝐷 variables part Q : What restrictions should we of the clique C place on the potentials πœ” 𝐷 ? global normalization maximal potential function (not A : πœ” 𝐷 β‰₯ 0 (or πœ” 𝐷 > 0 ) cliques necessarily a probability!)

  16. Terminology: Potential Functions π‘ž 𝑦 1 , 𝑦 2 , 𝑦 3 , … , 𝑦 𝑂 = 1 π‘Ž ΰ·‘ πœ” 𝐷 𝑦 𝑑 𝐷 energy function (for clique C) (get the total energy of a configuration by summing the individual energy functions) πœ” 𝐷 𝑦 𝑑 = exp βˆ’πΉ(𝑦 𝐷 ) Boltzmann distribution

  17. Ambiguity in Undirected Model Notation π‘ž 𝑦, 𝑧, 𝑨 ∝ πœ”(𝑦, 𝑧, 𝑨) X Y Z π‘ž 𝑦, 𝑧, 𝑨 ∝ πœ” 1 𝑦,𝑧 πœ” 2 𝑧,𝑨 πœ” 3 𝑦,𝑨

  18. Outline Directed Graphical Models NaΓ―ve Bayes Undirected Graphical Models Factor Graphs Ising Model Message Passing: Graphical Model Inference

  19. MRFs as Factor Graphs Undirected graphs: G=(V,E) that represents π‘ž(π‘Œ 1 , … , π‘Œ 𝑂 ) Factor graph of p : Bipartite graph of evidence nodes X, factor nodes F, and edges T Evidence nodes X are the random variables Factor nodes F take values associated with the potential functions Edges show what variables are used in which factors

  20. MRFs as Factor Graphs Undirected graphs: X G=(V,E) that represents Y Z π‘ž(π‘Œ 1 , … , π‘Œ 𝑂 ) Factor graph of p : Bipartite graph of evidence nodes X, factor nodes F, and edges T

  21. MRFs as Factor Graphs Undirected graphs: X G=(V,E) that represents π‘ž(π‘Œ 1 , … , π‘Œ 𝑂 ) Y Z Factor graph of p : Bipartite graph of evidence nodes X, factor nodes F, and X edges T Y Z Evidence nodes X are the random variables

  22. MRFs as Factor Graphs Undirected graphs: G=(V,E) X that represents π‘ž(π‘Œ 1 , … , π‘Œ 𝑂 ) Y Z Factor graph of p : Bipartite graph of evidence nodes X, factor nodes F, and edges T Evidence nodes X are the X random variables Factor nodes F take values Y Z associated with the potential functions

  23. MRFs as Factor Graphs Undirected graphs: G=(V,E) that X represents π‘ž(π‘Œ 1 , … , π‘Œ 𝑂 ) Factor graph of p : Bipartite graph of evidence nodes X, Y Z factor nodes F, and edges T Evidence nodes X are the random variables X Factor nodes F take values associated with the potential functions Y Z Edges show what variables are used in which factors

  24. Different Factor Graph Notation for the Same Graph X X Y Z Y Z X Y Z

  25. Directed vs. Undirected Models: Moralization x 1 x 3 x 2 x 4

  26. Directed vs. Undirected Models: Moralization x 1 x 3 x 1 x 3 x 2 x 2 x 4 x 4 π‘ž 𝑦 1 , … , 𝑦 4 = π‘ž 𝑦 1 π‘ž 𝑦 2 π‘ž 𝑦 3 π‘ž(𝑦 4 |𝑦 1 , 𝑦 2 , 𝑦 3 )

  27. Directed vs. Undirected Models: Moralization x 1 x 3 x 1 x 3 x 2 x 2 x 4 x 4 π‘ž 𝑦 1 , … , 𝑦 4 = parents of nodes in a π‘ž 𝑦 1 π‘ž 𝑦 2 π‘ž 𝑦 3 π‘ž(𝑦 4 |𝑦 1 , 𝑦 2 , 𝑦 3 ) directed graph must be connected in an undirected graph

  28. Example: Linear Chain z 1 z 2 z 3 z 4 Directed (e.g., hidden Markov model [HMM]; generative) w 1 w 2 w 3 w 4

  29. Example: Linear Chain z 1 z 2 z 3 z 4 Directed (e.g., hidden Markov model [HMM]; generative) w 1 w 2 w 3 w 4 z 1 z 2 z 3 z 4 Directed (e.g.., maximum entropy Markov model [MEMM]; conditional) w 1 w 2 w 3 w 4

  30. Example: Linear Chain z 1 z 2 z 3 z 4 Directed (e.g., hidden Markov model [HMM]; generative) w 1 w 2 w 3 w 4 z 1 z 2 z 3 z 4 Directed (e.g.., maximum entropy Markov model [MEMM]; conditional) w 1 w 2 w 3 w 4 z 1 z 2 z 3 z 4 Undirected (e.g., conditional random field w 1 w 2 w 3 w 4 [CRF])

  31. Example: Linear Chain z 1 z 2 z 3 z 4 Directed (e.g., hidden Markov model [HMM]; generative) w 1 w 2 w 3 w 4 z 1 z 2 z 3 z 4 Directed (e.g.., maximum entropy Markov model [MEMM]; conditional) w 1 w 2 w 3 w 4 z 1 z 2 z 3 z 4 Undirected as factor graph (e.g., conditional random field [CRF])

  32. Example: Linear Chain Conditional Random Field z 1 z 2 z 3 z 4 Widely used in applications like part-of-speech tagging Noun-Mod Noun Noun Verb President Obama told Congress …

  33. Example: Linear Chain Conditional Random Field z 1 z 2 z 3 z 4 Widely used in applications like part-of-speech tagging Noun-Mod Noun Noun Verb President Obama told Congress … and named entity recognition Person Person Org. Other President Obama told Congress …

  34. Linear Chain CRFs for Part of Speech Tagging A linear chain CRF is a conditional probabilistic model of the sequence of tags 𝑨 1 , 𝑨 2 , … , 𝑨 𝑂 conditioned on the entire input sequence 𝑦 1:𝑂

  35. Linear Chain CRFs for Part of Speech Tagging π‘ž ♣|β™’ A linear chain CRF is a conditional probabilistic model of the sequence of tags 𝑨 1 , 𝑨 2 , … , 𝑨 𝑂 conditioned on the entire input sequence 𝑦 1:𝑂

  36. Linear Chain CRFs for Part of Speech Tagging π‘ž 𝑨 1 , 𝑨 2 , … , 𝑨 𝑂 |β™’ A linear chain CRF is a conditional probabilistic model of the sequence of tags 𝑨 1 , 𝑨 2 , … , 𝑨 𝑂 conditioned on the entire input sequence 𝑦 1:𝑂

  37. Linear Chain CRFs for Part of Speech Tagging π‘ž 𝑨 1 , 𝑨 2 , … , 𝑨 𝑂 |𝑦 1:𝑂 A linear chain CRF is a conditional probabilistic model of the sequence of tags 𝑨 1 , 𝑨 2 , … , 𝑨 𝑂 conditioned on the entire input sequence 𝑦 1:𝑂

  38. Linear Chain CRFs for Part of Speech Tagging π‘ž 𝑨 1 , 𝑨 2 , … , 𝑨 𝑂 |𝑦 1:𝑂 𝑕 1 𝑕 3 z 1 z 2 𝑕 2 z 3 z 4 𝑕 4 𝑔 𝑔 𝑔 𝑔 1 2 3 4

  39. Linear Chain CRFs for Part of Speech Tagging 𝑕 1 𝑕 3 z 1 z 2 𝑕 2 z 3 z 4 𝑕 4 𝑔 𝑔 𝑔 𝑔 1 2 3 4 π‘ž 𝑨 1 , 𝑨 2 , … , 𝑨 𝑂 |𝑦 1:𝑂 ∝ N exp( πœ„ 𝑔 , 𝑔 + πœ„ 𝑕 , 𝑕 𝑗 𝑨 𝑗 , 𝑨 𝑗+1 ΰ·‘ 𝑗 𝑨 𝑗 ) i=1

  40. Linear Chain CRFs for Part of Speech Tagging 𝑕 π‘˜ : inter-tag features (can depend on any/all input words 𝑦 1:𝑂 ) 𝑕 1 𝑕 3 z 1 z 2 𝑕 2 z 3 z 4 𝑕 4 𝑔 𝑔 𝑔 𝑔 1 2 3 4

  41. Linear Chain CRFs for Part of Speech Tagging 𝑕 π‘˜ : inter-tag features 𝑔 𝑗 : solo tag features (can depend on (can depend on any/all input words any/all input words 𝑦 1:𝑂 ) 𝑦 1:𝑂 ) 𝑕 1 𝑕 3 z 1 z 2 𝑕 2 z 3 z 4 𝑕 4 𝑔 𝑔 𝑔 𝑔 1 2 3 4

  42. Linear Chain CRFs for Part of Speech Tagging 𝑕 π‘˜ : inter-tag features 𝑔 𝑗 : solo tag features (can depend on (can depend on any/all input words any/all input words 𝑦 1:𝑂 ) 𝑦 1:𝑂 ) Feature design, just 𝑕 1 𝑕 3 z 1 z 2 𝑕 2 z 3 z 4 𝑕 4 like in maxent 𝑔 𝑔 𝑔 𝑔 1 2 models! 3 4

  43. Linear Chain CRFs for Part of Speech Tagging 𝑕 π‘˜ : inter-tag features 𝑔 𝑗 : solo tag features (can depend on (can depend on any/all input words any/all input words 𝑦 1:𝑂 ) 𝑦 1:𝑂 ) Example: 𝑕 π‘˜,π‘‚β†’π‘Š z j , z j+1 = 1 (if z j == N & z j+1 == V) else 0 𝑕 π‘˜,told,π‘‚β†’π‘Š z j , z j+1 = 1 (if z j == N & z j+1 == V & x j == told) else 0 𝑕 1 𝑕 3 z 1 z 2 𝑕 2 z 3 z 4 𝑕 4 𝑔 𝑔 𝑔 𝑔 1 2 3 4

  44. Outline Directed Graphical Models NaΓ―ve Bayes Undirected Graphical Models Factor Graphs Ising Model Message Passing: Graphical Model Inference

  45. Example: Ising Model Image denoising (Bishop, 2006; Fig 8.30) y: observed (noisy) pixel/state w/ 10% noise original X Y x: original pixel/state

  46. Example: Ising Model Image denoising (Bishop, 2006; Fig 8.30) y: observed (noisy) pixel/state w/ 10% noise original x: original pixel/state two solutions Q : What are the cliques?

  47. Example: Ising Model Image denoising (Bishop, 2006; Fig 8.30) y: observed (noisy) pixel/state w/ 10% noise original x: original pixel/state two solutions Q : What are the cliques?

  48. Example: Ising Model y: Image denoising (Bishop, 2006; Fig 8.30) observed (noisy) pixel/state w/ 10% noise original x: original pixel/state two solutions neighboring pixels should be similar 𝐹 𝑦, 𝑧 = β„Ž ෍ 𝑦 𝑗 βˆ’ 𝛾 ෍ 𝑦 𝑗 𝑦 π‘˜ βˆ’ πœƒ ෍ 𝑦 𝑗 𝑧 𝑗 𝑗 π‘—π‘˜ 𝑗 x i and y i should allow for a bias be correlated

  49. Example: Ising Model y: Image denoising (Bishop, 2006; Fig 8.30) observed (noisy) pixel/state w/ 10% noise original x: original pixel/state two solutions neighboring pixels should be similar 𝐹 𝑦, 𝑧 = β„Ž ෍ 𝑦 𝑗 βˆ’ 𝛾 ෍ 𝑦 𝑗 𝑦 π‘˜ βˆ’ πœƒ ෍ 𝑦 𝑗 𝑧 𝑗 𝑗 π‘—π‘˜ 𝑗 x i and y i should allow for a bias be correlated

  50. Example: Ising Model y: Image denoising (Bishop, 2006; Fig 8.30) observed (noisy) pixel/state w/ 10% noise original x: original pixel/state two solutions neighboring pixels should be similar Q : Why subtract Ξ² and Ξ· ? 𝐹 𝑦, 𝑧 = β„Ž ෍ 𝑦 𝑗 βˆ’ 𝛾 ෍ 𝑦 𝑗 𝑦 π‘˜ βˆ’ πœƒ ෍ 𝑦 𝑗 𝑧 𝑗 𝑗 π‘—π‘˜ 𝑗 x i and y i should allow for a bias be correlated

  51. Example: Ising Model y: Image denoising (Bishop, 2006; Fig 8.30) observed (noisy) pixel/state w/ 10% noise original x: original pixel/state two solutions neighboring pixels should be similar Q : Why subtract Ξ² and Ξ· ? 𝐹 𝑦, 𝑧 = β„Ž ෍ 𝑦 𝑗 βˆ’ 𝛾 ෍ 𝑦 𝑗 𝑦 π‘˜ βˆ’ πœƒ ෍ 𝑦 𝑗 𝑧 𝑗 A : Better states β†’ lower 𝑗 π‘—π‘˜ 𝑗 energy (higher potential) x i and y i should allow for a bias πœ” 𝐷 𝑦 𝑑 = exp βˆ’πΉ(𝑦 𝐷 ) be correlated

  52. Markov Random Fields with Factor Graph Notation unary y: observed factor (noisy) pixel/state variable factor nodes are added according to maximal x: original cliques pixel/state binary factor factor graphs are bipartite

  53. Outline Directed Graphical Models NaΓ―ve Bayes Undirected Graphical Models Factor Graphs Ising Model Message Passing: Graphical Model Inference

  54. Two Problems for Undirected Models π‘ž 𝑦 1 , 𝑦 2 , 𝑦 3 , … , 𝑦 𝑂 = 1 π‘Ž ΰ·‘ πœ” 𝐷 𝑦 𝑑 𝐷 Finding the normalizer Computing the marginals

  55. Two Problems for Undirected Models π‘ž 𝑦 1 , 𝑦 2 , 𝑦 3 , … , 𝑦 𝑂 = 1 π‘Ž ΰ·‘ πœ” 𝐷 𝑦 𝑑 𝐷 Finding the normalizer Computing the marginals π‘Ž = ෍ ΰ·‘ πœ” 𝑑 (𝑦 𝑑 ) 𝑦 𝑑

  56. Two Problems for Undirected Models π‘ž 𝑦 1 , 𝑦 2 , 𝑦 3 , … , 𝑦 𝑂 = 1 π‘Ž ΰ·‘ πœ” 𝐷 𝑦 𝑑 𝐷 Finding the normalizer Computing the marginals Sum over all variable combinations, with the x n coordinate fixed π‘Ž π‘œ (𝑀) = ෍ ΰ·‘ πœ” 𝑑 (𝑦 𝑑 ) π‘Ž = ෍ ΰ·‘ πœ” 𝑑 (𝑦 𝑑 ) 𝑦:𝑦 π‘œ =𝑀 𝑑 𝑦 𝑑 Example: 3 variables, fix the 2 nd dimension π‘Ž 2 (𝑀) = ෍ ෍ ΰ·‘ πœ” 𝑑 (𝑦 = 𝑦 1 , 𝑀, 𝑦 3 ) 𝑦 1 𝑦 3 𝑑

  57. Two Problems for Undirected Models π‘ž 𝑦 1 , 𝑦 2 , 𝑦 3 , … , 𝑦 𝑂 = 1 π‘Ž ΰ·‘ πœ” 𝐷 𝑦 𝑑 𝐷 Finding the normalizer Computing the marginals Sum over all variable combinations, with the x n coordinate fixed π‘Ž π‘œ (𝑀) = ෍ ΰ·‘ πœ” 𝑑 (𝑦 𝑑 ) π‘Ž = ෍ ΰ·‘ πœ” 𝑑 (𝑦 𝑑 ) 𝑦:𝑦 π‘œ =𝑀 𝑑 𝑦 𝑑 Example: 3 Q : Why are these difficult? variables, fix the 2 nd dimension π‘Ž 2 (𝑀) = ෍ ෍ ΰ·‘ πœ” 𝑑 (𝑦 = 𝑦 1 , 𝑀, 𝑦 3 ) 𝑦 1 𝑦 3 𝑑

  58. Two Problems for Undirected Models π‘ž 𝑦 1 , 𝑦 2 , 𝑦 3 , … , 𝑦 𝑂 = 1 π‘Ž ΰ·‘ πœ” 𝐷 𝑦 𝑑 𝐷 Finding the normalizer Computing the marginals Sum over all variable combinations, with the x n coordinate fixed π‘Ž π‘œ (𝑀) = ෍ ΰ·‘ πœ” 𝑑 (𝑦 𝑑 ) π‘Ž = ෍ ΰ·‘ πœ” 𝑑 (𝑦 𝑑 ) 𝑦:𝑦 π‘œ =𝑀 𝑑 𝑦 𝑑 Example: 3 Q : Why are these difficult? variables, fix the 2 nd dimension A : Many different combinations π‘Ž 2 (𝑀) = ෍ ෍ ΰ·‘ πœ” 𝑑 (𝑦 = 𝑦 1 , 𝑀, 𝑦 3 ) 𝑦 1 𝑦 3 𝑑

  59. Message Passing: Count the Soldiers If you are the front soldier in the line, say the number β€˜one’ to the soldier behind you. If you are the rearmost soldier in the line, say the number β€˜one’ to the soldier in front of you. If a soldier ahead of or behind you says a number to you, add one to it, and say the new number to the soldier on the other side ITILA, Ch 16

  60. Message Passing: Count the Soldiers If you are the front soldier in the line, say the number β€˜one’ to the soldier behind you. If you are the rearmost soldier in the line, say the number β€˜one’ to the soldier in front of you. If a soldier ahead of or behind you says a number to you, add one to it, and say the new number to the soldier on the other side ITILA, Ch 16

  61. Message Passing: Count the Soldiers If you are the front soldier in the line, say the number β€˜one’ to the soldier behind you. If you are the rearmost soldier in the line, say the number β€˜one’ to the soldier in front of you. If a soldier ahead of or behind you says a number to you, add one to it, and say the new number to the soldier on the other side ITILA, Ch 16

  62. Message Passing: Count the Soldiers If you are the front soldier in the line, say the number β€˜one’ to the soldier behind you. If you are the rearmost soldier in the line, say the number β€˜one’ to the soldier in front of you. If a soldier ahead of or behind you says a number to you, add one to it, and say the new number to the soldier on the other side ITILA, Ch 16

  63. Sum-Product Algorithm Main idea: message passing An exact inference algorithm for tree-like graphs Belief propagation (forward-backward for HMMs) is a special case

  64. Sum-Product definition of π‘ž 𝑦 𝑗 = 𝑀 = ΰ·‘ π‘ž 𝑦 1 , 𝑦 2 , … , 𝑦 𝑗 , … , 𝑦 𝑂 marginal 𝑦:𝑦 𝑗 =𝑀 … …

  65. Sum-Product definition of π‘ž 𝑦 𝑗 = 𝑀 = ΰ·‘ π‘ž 𝑦 1 , 𝑦 2 , … , 𝑦 𝑗 , … , 𝑦 𝑂 marginal 𝑦:𝑦 𝑗 =𝑀 main idea : use bipartite nature of graph to efficiently compute the marginals … … The factor nodes can act as filters

  66. Sum-Product definition of π‘ž 𝑦 𝑗 = 𝑀 = ΰ·‘ π‘ž 𝑦 1 , 𝑦 2 , … , 𝑦 𝑗 , … , 𝑦 𝑂 marginal 𝑦:𝑦 𝑗 =𝑀 main idea : use bipartite nature of graph to efficiently compute the marginals 𝑠 π‘›β†’π‘œ … … 𝑠 π‘›β†’π‘œ 𝑠 π‘›β†’π‘œ

  67. Sum-Product alternative π‘ž 𝑦 𝑗 = 𝑀 = ΰ·‘ 𝑠 𝑔→𝑦 𝑗 (𝑦 𝑗 ) marginal computation 𝑔 main idea : use bipartite nature of graph to efficiently compute the marginals 𝑠 π‘›β†’π‘œ … … 𝑠 π‘›β†’π‘œ 𝑠 π‘›β†’π‘œ

  68. Sum-Product From variables to factors m π‘Ÿ π‘œβ†’π‘› 𝑦 π‘œ = ΰ·‘ 𝑠 𝑛 β€² β†’π‘œ 𝑦 π‘œ 𝑛 β€² βˆˆπ‘(π‘œ)\𝑛 n set of factors in which variable n participates default value of 1 if empty product

  69. Sum-Product From variables to factors m π‘Ÿ π‘œβ†’π‘› 𝑦 π‘œ = ΰ·‘ 𝑠 𝑛 β€² β†’π‘œ 𝑦 π‘œ 𝑛 β€² βˆˆπ‘(π‘œ)\𝑛 n set of factors in which variable n participates default value of 1 if From factors to variables empty product 𝑠 π‘›β†’π‘œ 𝑦 π‘œ m = ෍ 𝑔 𝑛 𝒙 𝑛 ΰ·‘ π‘Ÿ π‘œ β€² →𝑛 (𝑦 π‘œβ€² ) π‘œ β€² βˆˆπ‘‚(𝑛)\π‘œ 𝒙 𝑛 \π‘œ n sum over configuration of set of variables that the variables for the m th factor, m th factor depends on with variable n fixed

Recommend


More recommend