introduction to graphical models
play

Introduction To Graphical Models Peter V. Gehler Max Planck - PowerPoint PPT Presentation

Peter Gehler Introduction to Graphical Models Introduction To Graphical Models Peter V. Gehler Max Planck Institute for Intelligent Systems, T ubingen, Germany ENS/INRIA Summer School, Paris, July 2013 1 / 6 Peter Gehler


  1. Peter Gehler – Introduction to Graphical Models Structured objects: predicting M variables jointly Y = { 1 , K } × { 1 , K } · · · × { 1 , K } For each x : ◮ K M values, K M − 1 d.o.f. → K M functions Example: Object detection with variable size bounding box Y ⊂ { 1 , . . . , W } × { 1 , . . . , H } × { 1 , . . . , W } × { 1 , . . . , H } y = ( left , top , right , bottom ) For each x : 1 4 W ( W − 1) H ( H − 1) values ◮ (millions to billions...) 13 / 24

  2. Peter Gehler – Introduction to Graphical Models Example: image denoising Y = { 640 × 480 RGB images } too much! For each x : ◮ 16777216 307200 values in p ( y | x ) , ◮ ≥ 10 2 , 000 , 000 functions 14 / 24

  3. Peter Gehler – Introduction to Graphical Models Example: image denoising Y = { 640 × 480 RGB images } too much! For each x : ◮ 16777216 307200 values in p ( y | x ) , ◮ ≥ 10 2 , 000 , 000 functions We cannot consider all possible distributions, we must impose structure . 14 / 24

  4. Peter Gehler – Introduction to Graphical Models Probabilistic Graphical Models A (probabilistic) graphical model defines ◮ a family of probability distributions over a set of random variables, by means of a graph . 15 / 24

  5. Peter Gehler – Introduction to Graphical Models Probabilistic Graphical Models A (probabilistic) graphical model defines ◮ a family of probability distributions over a set of random variables, by means of a graph . Popular classes of graphical models, ◮ Undirected graphical models (Markov random fields), ◮ Directed graphical models (Bayesian networks), ◮ Factor graphs, ◮ Others: chain graphs, influence diagrams, etc. 15 / 24

  6. Peter Gehler – Introduction to Graphical Models Probabilistic Graphical Models A (probabilistic) graphical model defines ◮ a family of probability distributions over a set of random variables, by means of a graph . Popular classes of graphical models, ◮ Undirected graphical models (Markov random fields), ◮ Directed graphical models (Bayesian networks), ◮ Factor graphs, ◮ Others: chain graphs, influence diagrams, etc. The graph encodes conditional independence assumptions between the variables: ◮ for N ( i ) are the neighbors of node i in the graph p ( y i | y V \{ i } ) = p ( y i | y N ( i )) with y V \{ i } = ( y 1 , . . . , y i − 1 , y i +1 , y n ) . 15 / 24

  7. Peter Gehler – Introduction to Graphical Models Example: Pictorial Structures for Articulated Pose Estimation F (1) top X Y top F (2) top , head . . . . . . Y head . . . . . . . . . Y rarm Y torso Y larm . . . . . . . . . . . . Y rhnd Y lhnd Y rleg Y lleg . . . . . . Y rfoot Y lfoot ◮ In principle, all parts depend on each other. ◮ Knowing where the head is puts constraints on where the feet can be. ◮ But conditional independences as specified by the graph: ◮ If we know where the left leg is, the left foot ’s position does not depend on the torso position anymore, etc. p ( y lfoot | y top , . . . , y torso , . . . , y rfoot , x ) = p ( y lfoot | y lleg , x ) 16 / 24

  8. Peter Gehler – Introduction to Graphical Models Factor Graphs ◮ Decomposable output y = ( y 1 , . . . , y | V | ) Y i Y j ◮ Graph: G = ( V, F , E ) , E ⊆ V × F ◮ variable nodes V (circles), ◮ factor nodes F (boxes), ◮ edges E between variable and factor nodes. Y k Y l ◮ each factor F ∈ F connects a subset of nodes, ◮ write F = { v 1 , . . . , v | F | } and Factor graph y F = ( y v 1 , . . . , y v | F | ) 17 / 24

  9. Peter Gehler – Introduction to Graphical Models Factor Graphs ◮ Decomposable output y = ( y 1 , . . . , y | V | ) Y i Y j ◮ Graph: G = ( V, F , E ) , E ⊆ V × F ◮ variable nodes V (circles), ◮ factor nodes F (boxes), ◮ edges E between variable and factor nodes. Y k Y l ◮ each factor F ∈ F connects a subset of nodes, ◮ write F = { v 1 , . . . , v | F | } and Factor graph y F = ( y v 1 , . . . , y v | F | ) ◮ Factorization into potentials ψ at factors : p ( y ) = 1 � ψ F ( y F ) Z F ∈F 17 / 24

  10. Peter Gehler – Introduction to Graphical Models Factor Graphs ◮ Decomposable output y = ( y 1 , . . . , y | V | ) Y i Y j ◮ Graph: G = ( V, F , E ) , E ⊆ V × F ◮ variable nodes V (circles), ◮ factor nodes F (boxes), ◮ edges E between variable and factor nodes. Y k Y l ◮ each factor F ∈ F connects a subset of nodes, ◮ write F = { v 1 , . . . , v | F | } and Factor graph y F = ( y v 1 , . . . , y v | F | ) ◮ Factorization into potentials ψ at factors : p ( y ) = 1 ψ F ( y F ) = 1 � Z ψ 1 ( Y l ) ψ 2 ( Y j , Y l ) ψ 3 ( Y i , Y j ) ψ 4 ( Y i , Y k , Y l ) Z F ∈F 17 / 24

  11. Peter Gehler – Introduction to Graphical Models Factor Graphs ◮ Decomposable output y = ( y 1 , . . . , y | V | ) Y i Y j ◮ Graph: G = ( V, F , E ) , E ⊆ V × F ◮ variable nodes V (circles), ◮ factor nodes F (boxes), ◮ edges E between variable and factor nodes. Y k Y l ◮ each factor F ∈ F connects a subset of nodes, ◮ write F = { v 1 , . . . , v | F | } and Factor graph y F = ( y v 1 , . . . , y v | F | ) ◮ Factorization into potentials ψ at factors : p ( y ) = 1 ψ F ( y F ) = 1 � Z ψ 1 ( Y l ) ψ 2 ( Y j , Y l ) ψ 3 ( Y i , Y j ) ψ 4 ( Y i , Y k , Y l ) Z F ∈F ◮ Z is a normalization constant, called partition function : � � Z = ψ F ( y F ) . y ∈Y F ∈F 17 / 24

  12. Peter Gehler – Introduction to Graphical Models Conditional Distributions How to model p ( y | x ) ? X i X j ◮ Potentials become also functions of (part of) x : ψ F ( y F ; x F ) instead of just ψ F ( y F ) 1 � p ( y | x ) = ψ F ( y F ; x F ) Z ( x ) Y i Y j F ∈F ◮ Partition function depends on x F Factor graph � � Z ( x ) = ψ F ( y F ; x F ) . y ∈Y F ∈F ◮ Note: x is treated just as an argument, not as a random variable. Conditional random fields (CRFs) 18 / 24

  13. Peter Gehler – Introduction to Graphical Models Conventions: Potentials and Energy Functions Assume ψ F ( y F ) > 0 . Then ◮ instead of potentials , we can also work with energies : ψ F ( y F ; x F ) = exp( − E F ( y F ; x F )) , or equivalently E F ( y F ; x F ) = − log( ψ F ( y F ; x F )) . 19 / 24

  14. Peter Gehler – Introduction to Graphical Models Conventions: Potentials and Energy Functions Assume ψ F ( y F ) > 0 . Then ◮ instead of potentials , we can also work with energies : ψ F ( y F ; x F ) = exp( − E F ( y F ; x F )) , or equivalently E F ( y F ; x F ) = − log( ψ F ( y F ; x F )) . ◮ p ( y | x ) can be written as 1 � p ( y | x ) = ψ F ( y F ; x F ) Z ( x ) F ∈F 1 � 1 = Z ( x ) exp( − E F ( y F ; x F )) = Z ( x ) exp( − E ( y ; x )) F ∈F for E ( y ; x ) = � F ∈F E F ( y F ; x F ) 19 / 24

  15. Peter Gehler – Introduction to Graphical Models Conventions: Energy Minimization 1 argmax p ( y | x ) = argmax Z ( x ) exp( − E ( y ; x )) y y ∈Y = argmax exp( − E ( y ; x )) y ∈Y = argmax − E ( y ; x ) y ∈Y = argmin E ( y ; x ) . y ∈Y MAP prediction can be performed by energy minimization . 20 / 24

  16. Peter Gehler – Introduction to Graphical Models Conventions: Energy Minimization 1 argmax p ( y | x ) = argmax Z ( x ) exp( − E ( y ; x )) y y ∈Y = argmax exp( − E ( y ; x )) y ∈Y = argmax − E ( y ; x ) y ∈Y = argmin E ( y ; x ) . y ∈Y MAP prediction can be performed by energy minimization . In practice, one typically models the energy function directly. → the probability distribution is uniquely determined by it. 20 / 24

  17. Peter Gehler – Introduction to Graphical Models Example: An Energy Function for Image Segmentation Foreground/background image segmentation ◮ X = [0 , 255] WH , Y = { 0 , 1 } WH foreground: y i = 1 , background: y i = 0 . ◮ graph: 4-connected grid ◮ Each output pixel depends on ◮ local grayvalue (inputs) ◮ neighboring outputs Energy function components (”Ising” model): 1 1 ◮ E i ( y i = 1 , x i ) = 1 − E i ( y i = 0 , x i ) = 255 x i 255 x i x i bright → y i rather foreground, x i dark → y i rather background ◮ E ij (0 , 0) = E ij (1 , 1) = 0 , E ij (0 , 1) = E ij (1 , 0) = ω for ω > 0 prefer that neighbors have the same label → labeling smooth 21 / 24

  18. Peter Gehler – Introduction to Graphical Models 1 1 � � � � E ( y ; x ) = (1 − 255 x i ) � y i = 1 � + 255 x i � y i = 0 � + w � y i � = y j � i i ∼ j input image segmentation segmentation from from thresholding minimal energy 22 / 24

  19. Peter Gehler – Introduction to Graphical Models What to do with Structured Prediction Models? Case 1) p ( y | x ) is known MAP Prediction Predict f : X → Y by solving y ∗ = argmax p ( y | x ) y ∈Y = argmin E ( y, x ) y ∈Y Probabilistic Inference Compute marginal probabilities p ( y F | x ) for any factor F , in particular, p ( y i | x ) for all i ∈ V . 23 / 24

  20. Peter Gehler – Introduction to Graphical Models What to do with Structured Prediction Models? Case 2) p ( y | x ) is unknown, but we have training data Parameter Learning Assume fixed graph structure, learn potentials/energies ( ψ F ) Among other tasks (learn the graph structure, variables, etc.) ⇒ Topic of Wednesdays’ lecture 24 / 24

  21. Peter Gehler – Introduction to Graphical Models – Probabilistic Inference Example: Pictorial Structures input image x argmax y p ( y | x ) p ( y i | x ) ◮ MAP makes a single (structured) prediction (point estimate) ◮ best overall pose ◮ Marginal probabilities p ( y i | x ) give us ◮ potential positions ◮ uncertainty of the individual body parts. 1 / 24

  22. Peter Gehler – Introduction to Graphical Models – Probabilistic Inference Example: Man-made structure detection input image x argmax y p ( y | x ) p ( y i | x ) ◮ Task: Pixel depicts a man made structure or not? y i ∈ { 0 , 1 } ◮ Middle: MAP inference ◮ Right: variable marginals ◮ Attention: Max-Marginals � = MAP 2 / 24

  23. Peter Gehler – Introduction to Graphical Models – Probabilistic Inference Probabilistic Inference Compute p ( y F | x ) and Z ( x ) . 3 / 24

  24. Peter Gehler – Introduction to Graphical Models – Probabilistic Inference Assume y = ( y i , y j , y k , y l ) , Y = Y i × Y j × Y k × Y l , and an energy function E ( y ; x ) compatible with the following factor graph: Y i Y j Y k Y l F G H 4 / 24

  25. Peter Gehler – Introduction to Graphical Models – Probabilistic Inference Assume y = ( y i , y j , y k , y l ) , Y = Y i × Y j × Y k × Y l , and an energy function E ( y ; x ) compatible with the following factor graph: Y i Y j Y k Y l F G H Task 1: for any y ∈ Y , compute p ( y | x ) , using 1 p ( y | x ) = Z ( x ) exp( − E ( y ; x )) . 4 / 24

  26. Peter Gehler – Introduction to Graphical Models – Probabilistic Inference Assume y = ( y i , y j , y k , y l ) , Y = Y i × Y j × Y k × Y l , and an energy function E ( y ; x ) compatible with the following factor graph: Y i Y j Y k Y l F G H Task 1: for any y ∈ Y , compute p ( y | x ) , using 1 p ( y | x ) = Z ( x ) exp( − E ( y ; x )) . Problem: We don’t know Z ( x ) , and computing it using � Z ( x ) = exp( − E ( y ; x )) y ∈Y looks expensive (the sum has |Y i | · |Y j | · |Y k | · |Y l | terms). A lot research has been done on how to efficiently compute Z ( x ) . 4 / 24

  27. Peter Gehler – Introduction to Graphical Models – Probabilistic Inference Probabilistic Inference – Belief Propagation / Message Passing Y i Y j Y k Y l F G H For notational simplicity, we drop the dependence on (fixed) x : � Z = exp( − E ( y )) y ∈Y 5 / 24

  28. Peter Gehler – Introduction to Graphical Models – Probabilistic Inference Probabilistic Inference – Belief Propagation / Message Passing Y i Y j Y k Y l F G H For notational simplicity, we drop the dependence on (fixed) x : � Z = exp( − E ( y )) y ∈Y � � � � = exp( − E ( y i , y j , y k , y l )) y i ∈Y i y j ∈Y j y k ∈Y k y l ∈Y l 5 / 24

  29. Peter Gehler – Introduction to Graphical Models – Probabilistic Inference Probabilistic Inference – Belief Propagation / Message Passing Y i Y j Y k Y l F G H For notational simplicity, we drop the dependence on (fixed) x : � Z = exp( − E ( y )) y ∈Y � � � � = exp( − E ( y i , y j , y k , y l )) y i ∈Y i y j ∈Y j y k ∈Y k y l ∈Y l � � � � = exp( − ( E F ( y i , y j ) + E G ( y j , y k ) + E H ( y k , y l ))) y i ∈Y i y j ∈Y j y k ∈Y k y l ∈Y l 5 / 24

  30. Peter Gehler – Introduction to Graphical Models – Probabilistic Inference Probabilistic Inference – Belief Propagation / Message Passing Y i Y j Y k Y l F G H � � � � Z = exp( − ( E F ( y i , y j ) + E G ( y j , y k ) + E H ( y k , y l ))) y i ∈Y i y j ∈Y j y k ∈Y k y l ∈Y l 5 / 24

  31. Peter Gehler – Introduction to Graphical Models – Probabilistic Inference Probabilistic Inference – Belief Propagation / Message Passing Y i Y j Y k Y l F G H � � � � Z = exp( − ( E F ( y i , y j ) + E G ( y j , y k ) + E H ( y k , y l ))) y i ∈Y i y j ∈Y j y k ∈Y k y l ∈Y l � � � � = exp( − E F ( y i , y j )) exp( − E G ( y j , y k )) exp( − E H ( y k , y l )) y i y j y k y l 5 / 24

  32. Peter Gehler – Introduction to Graphical Models – Probabilistic Inference Probabilistic Inference – Belief Propagation / Message Passing Y i Y j Y k Y l F G H � � � � Z = exp( − ( E F ( y i , y j ) + E G ( y j , y k ) + E H ( y k , y l ))) y i ∈Y i y j ∈Y j y k ∈Y k y l ∈Y l � � � � = exp( − E F ( y i , y j )) exp( − E G ( y j , y k )) exp( − E H ( y k , y l )) y i y j y k y l � � � � = exp( − E F ( y i , y j )) exp( − E G ( y j , y k )) exp( − E H ( y k , y l )) y i y j y k y l 5 / 24

  33. Peter Gehler – Introduction to Graphical Models – Probabilistic Inference Probabilistic Inference – Belief Propagation / Message Passing r H → Y k ∈ R Y k Y i Y j Y k Y l F G H � � � � Z = exp( − E F ( y i , y j )) exp( − E G ( y j , y k )) exp( − E H ( y k , y l )) y i y j y k y l � �� � r H → Yk ( y k ) 5 / 24

  34. Peter Gehler – Introduction to Graphical Models – Probabilistic Inference Probabilistic Inference – Belief Propagation / Message Passing r H → Y k ∈ R Y k Y i Y j Y k Y l F G H � � � � Z = exp( − E F ( y i , y j )) exp( − E G ( y j , y k )) exp( − E H ( y k , y l )) y i y j y k y l � �� � r H → Yk ( y k ) � � � = exp( − E F ( y i , y j )) exp( − E G ( y j , y k )) r H → Y k ( y k ) y i y j y k 5 / 24

  35. Peter Gehler – Introduction to Graphical Models – Probabilistic Inference Probabilistic Inference – Belief Propagation / Message Passing r G → Y j ∈ R Y j Y i Y j Y k Y l F G H � � � Z = exp( − E F ( y i , y j )) exp( − E G ( y j , y k )) r H → Y k ( y k ) y i y j y k � �� � r G → Yj ( y j ) 5 / 24

  36. Peter Gehler – Introduction to Graphical Models – Probabilistic Inference Probabilistic Inference – Belief Propagation / Message Passing r G → Y j ∈ R Y j Y i Y j Y k Y l F G H � � � Z = exp( − E F ( y i , y j )) exp( − E G ( y j , y k )) r H → Y k ( y k ) y i y j y k � �� � r G → Yj ( y j ) � � = exp( − E F ( y i , y j )) r G → Y j ( y j ) y i y j 5 / 24

  37. Peter Gehler – Introduction to Graphical Models – Probabilistic Inference Probabilistic Inference – Belief Propagation / Message Passing r F → Y i ∈ R Y i Y i Y j Y k Y l F G H � � � Z = exp( − E F ( y i , y j )) exp( − E G ( y j , y k )) r H → Y k ( y k ) y i y j y k � �� � r G → Yj ( y j ) � � = exp( − E F ( y i , y j )) r G → Y j ( y j ) y i y j � = r F → Y i ( y i ) y i 5 / 24

  38. Peter Gehler – Introduction to Graphical Models – Probabilistic Inference Example: Inference on Trees Y i Y j Y k Y l F G H I Y m � Z = exp( − E ( y )) y ∈Y � � � � � = exp( − ( E F ( y i , y j ) + · · · + E I ( y k , y m ))) y i ∈Y i y j ∈Y i y k ∈Y i y l ∈Y i y m ∈Y m 6 / 24

  39. Peter Gehler – Introduction to Graphical Models – Probabilistic Inference Example: Inference on Trees Y i Y j Y k Y l F G H I Y m � � � Z = exp( − E F ( y i , y j )) exp( − E G ( y j , y k )) · y i ∈Y i y j ∈Y j y k ∈Y k            �  �   exp( − E H ( y k , y l )) · exp( − E I ( y k , y m ))        y l ∈Y l y m ∈Y m    � �� � � �� � r H → Yk ( y k ) r I → Yk ( y k ) 6 / 24

  40. Peter Gehler – Introduction to Graphical Models – Probabilistic Inference Example: Inference on Trees r H → Y k ( y k ) Y i Y j Y k Y l F G H r I → Y k ( y k ) I Y m � � � Z = exp( − E F ( y i , y j )) exp( − E G ( y j , y k )) · y i ∈Y i y j ∈Y j y k ∈Y k ( r H → Y k ( y k ) · r I → Y k ( y k )) 6 / 24

  41. Peter Gehler – Introduction to Graphical Models – Probabilistic Inference Example: Inference on Trees q Y k → G ( y k ) r H → Y k ( y k ) Y i Y j Y k Y l F G H r I → Y k ( y k ) I Y m � � � Z = exp( − E F ( y i , y j )) exp( − E G ( y j , y k )) · y i ∈Y i y j ∈Y j y k ∈Y k ( r H → Y k ( y k ) · r I → Y k ( y k )) � �� � q Yk → G ( y k ) 6 / 24

  42. Peter Gehler – Introduction to Graphical Models – Probabilistic Inference Example: Inference on Trees q Y k → G ( y k ) r H → Y k ( y k ) Y i Y j Y k Y l F G H r I → Y k ( y k ) I Y m � � � Z = exp( − E F ( y i , y j )) exp( − E G ( y j , y k )) q Y k → G ( y k ) y i ∈Y i y j ∈Y j y k ∈Y k 6 / 24

  43. Peter Gehler – Introduction to Graphical Models – Probabilistic Inference Factor Graph Sum-Product Algorithm ◮ “Message”: pair of vectors at each . . . factor graph edge ( i, F ) ∈ E r F → Y i . . . 1. r F → Y i ∈ R Y i : factor-to-variable Y i . . . message F 2. q Y i → F ∈ R Y i : variable-to-factor . . . q Y i → F message 7 / 24

  44. Peter Gehler – Introduction to Graphical Models – Probabilistic Inference Factor Graph Sum-Product Algorithm ◮ “Message”: pair of vectors at each . . . factor graph edge ( i, F ) ∈ E r F → Y i . . . 1. r F → Y i ∈ R Y i : factor-to-variable Y i . . . message F 2. q Y i → F ∈ R Y i : variable-to-factor . . . q Y i → F message ◮ Algorithm iteratively update messages ◮ After convergence: Z and p ( y F ) can be obtained from the messages. Belief Propagation 7 / 24

  45. Peter Gehler – Introduction to Graphical Models – Probabilistic Inference Example: Pictorial Structures F (1) top X Y top F (2) top , head . . . . . . Y head . . . . . . . . . Y rarm Y torso Y larm . . . . . . . . . . . . Y rhnd Y lhnd Y rleg Y lleg . . . . . . Y rfoot Y lfoot ◮ Tree-structured model for articulated pose (Felzenszwalb and Huttenlocher, 2000), (Fischler and Elschlager, 1973) ◮ Body-part variables, states: discretized tuple ( x, y, s, θ ) ◮ ( x, y ) position, s scale, and θ rotation 8 / 24

  46. Peter Gehler – Introduction to Graphical Models – Probabilistic Inference Example: Pictorial Structures input image x p ( y i | x ) ◮ Exact marginals although state space is huge and thus partition function is a huge sum. � Z ( x ) = exp ( − E ( y ; x )) all bodies y 9 / 24

  47. Peter Gehler – Introduction to Graphical Models – Probabilistic Inference Belief Propagation in Loopy Graphs Can we do message passing also in graphs with loops? A B A B Y i Y j Y k Y i Y j Y k C D E C D E F G F G Y l Y m Y n Y l Y m Y n H I J H I J K L K L Y o Y p Y q Y o Y p Y q Problem: There is no well-defined leaf–to–root order. Suggested solution: Loopy Belief Propagation (LBP) ◮ initialize all messages as constant 1 ◮ pass messages until convergence 10 / 24

  48. Peter Gehler – Introduction to Graphical Models – Probabilistic Inference Belief Propagation in Loopy Graphs A B A B Y i Y j Y k Y i Y j Y k C D E C D E F G F G Y l Y m Y n Y l Y m Y n H I J H I J K L K L Y o Y p Y q Y o Y p Y q Loopy Belief Propagation is very popular, but has some problems: ◮ it might not converge (e.g. oscillate) ◮ even if it does, the computed probabilities are only approximate . Many improved message-passing schemes exist (see tutorial book). 11 / 24

  49. Peter Gehler – Introduction to Graphical Models – Probabilistic Inference Probabilistic Inference – Variational Inference / Mean Field Task: Compute marginals p ( y F | x ) for general p ( y | x ) Idea: Approximate p ( y | x ) by simpler q ( y ) and use marginals from that. q ∗ = argmin D KL ( q ( y ) � p ( y | x )) q ∈Q � E.g. Naive Mean Field : Q all distributions of the form q ( y ) = q i ( y i ) . i ∈ V q e q f q g �→ q h q i q j q k q l q m 12 / 24

  50. Peter Gehler – Introduction to Graphical Models – Probabilistic Inference Probabilistic Inference – Sampling / Markov-Chain Monte Carlo Task: Compute marginals p ( y F | x ) for general p ( y | x ) Idea: Rephrase as computing the expected value of a quantity : E y ∼ p ( y | x,w ) [ h ( x, y )] , for some (well-behaved) function h : X × Y → R . For probabilistic inference, this step is easy. Set h F,z ( x, y ) := � y F = z � , then � E y ∼ p ( y | x,w ) [ h F,z ( x, y )] = p ( y | x ) � y F = z � y ∈Y � = p ( y F | x ) � y F = z � = p ( y F = z | x ) . y F ∈Y F 13 / 24

  51. Peter Gehler – Introduction to Graphical Models – Probabilistic Inference Probabilistic Inference – Sampling / Markov-Chain Monte Carlo Expectations can be computed/approximated by sampling : ◮ For fixed x , let y (1) , y (2) , . . . be i.i.d. samples from p ( y | x ) , then S E y ∼ p ( y | x ) [ h ( x, y )] ≈ 1 � h ( x, y ( s ) ) . S s =1 ◮ The law of large numbers guarantees convergence for S → ∞ , √ ◮ For S independent samples, approximation error is O (1 / S ) , independent of the dimension of Y . 14 / 24

  52. Peter Gehler – Introduction to Graphical Models – Probabilistic Inference Probabilistic Inference – Sampling / Markov-Chain Monte Carlo Expectations can be computed/approximated by sampling : ◮ For fixed x , let y (1) , y (2) , . . . be i.i.d. samples from p ( y | x ) , then S E y ∼ p ( y | x ) [ h ( x, y )] ≈ 1 � h ( x, y ( s ) ) . S s =1 ◮ The law of large numbers guarantees convergence for S → ∞ , √ ◮ For S independent samples, approximation error is O (1 / S ) , independent of the dimension of Y . Problem: ◮ Producing i.i.d. samples, y ( s ) , from p ( y | x ) is hard . Solution: ◮ We can get away with a sequence of dependent samples → Monte-Carlo Markov Chain (MCMC) sampling 14 / 24

  53. Peter Gehler – Introduction to Graphical Models – Probabilistic Inference Probabilistic Inference – Sampling / Markov-Chain Monte Carlo One example how to do MCMC sampling: Gibbs sampler ◮ Initialize y (0) = ( y 1 , . . . , y d ) arbitrarily ◮ For s = 1 , . . . , S : 1. Select a variable y i , 2. Re-sample y i ∼ p ( y i | y ( s − 1) V \{ i } , x ) . 3. Output sample y ( s ) = ( y ( s − 1) , . . . , y ( s − 1) , y i , y ( s − 1) , . . . , y ( s − 1) ) 1 i − 1 i +1 d p ( y i , y ( t ) V \{ i } | x ) p ( y i | y ( s ) V \{ i } , x ) = � y i ∈Y i p ( y i , y ( t ) V \{ i } | x ) exp( − E ( y i , y ( t ) , x ) = � y i ∈Y i exp( − E ( y i , y ( t ) , x ) 15 / 24

  54. Peter Gehler – Introduction to Graphical Models – Probabilistic Inference MAP Prediction Compute y ∗ = argmax y p ( y | x ) . 16 / 24

  55. Peter Gehler – Introduction to Graphical Models – Probabilistic Inference MAP Prediction – Belief Propagation / Message Passing A B Y i Y j Y k 10 . B Y m C D E 9 . F 6 . 8 . F G C Y l Y m Y n Y k Y l 7 . 3 . 5 . H I J D E 2 . 4 . A K L Y i Y j Y o Y p Y q 1 . One can also derive message passing algorithms for MAP prediction. ◮ In trees: guaranteed to converge to optimal solution. ◮ In loopy graphs: convergence not guaranteed, approximate solution. 17 / 24

  56. Peter Gehler – Introduction to Graphical Models – Probabilistic Inference MAP Prediction – Graph Cuts For loopy graphs, we can find the global optimum only in special cases : ◮ Binary output variables: Y i = { 0 , 1 } for i = 1 , . . . , d , ◮ Energy function with only unary and pairwise terms � � E ( y ; x, w ) = E i ( y i ; x ) + E i,j ( y i , y j ; x ) i i ∼ j 18 / 24

  57. Peter Gehler – Introduction to Graphical Models – Probabilistic Inference MAP Prediction – Graph Cuts For loopy graphs, we can find the global optimum only in special cases : ◮ Binary output variables: Y i = { 0 , 1 } for i = 1 , . . . , d , ◮ Energy function with only unary and pairwise terms � � E ( y ; x, w ) = E i ( y i ; x ) + E i,j ( y i , y j ; x ) i i ∼ j ◮ Restriction 1 (positive unary potentials): E F ( y i ; x, w t F ) ≥ 0 (always achievable by reparametrization) ◮ Restriction 2 (regular/submodular/attractive pairwise potentials) E F ( y i , y j ; x, w t F ) = 0 , if y i = y j , E F ( y i , y j ; x, w t F ) = E F ( y j , y i ; x, w t F ) ≥ 0 , otherwise. (not always achievable, depends on the task) 18 / 24

  58. Peter Gehler – Introduction to Graphical Models – Probabilistic Inference ◮ Construct auxiliary undirected graph s ◮ One node { i } i ∈ V per variable { i, s } ◮ Two extra nodes: source s , sink t ◮ Edges i j k { i, t } Edge Graph cut weight { i, j } E F ( y i = 0 , y j = 1; x, w t F ) m n l { i, s } E F ( y i = 1; x, w t F ) { i, t } E F ( y i = 0; x, w t F ) t ◮ Find linear s - t -mincut ◮ Solution defines optimal binary labeling of the original energy minimization problem GraphCuts algorithms (Approximate) multi-class extensions exist, see tutorial book. 19 / 24

  59. Peter Gehler – Introduction to Graphical Models – Probabilistic Inference GraphCuts Example Image segmentation energy: � � 1 1 � � E ( y ; x ) = (1 − 255 x i ) � y i = 1 � + 255 x i � y i = 0 � + w � y i � = y j � i ∼ j i All conditions to apply GraphCuts are fulfilled. ◮ E i ( y i , x ) ≥ 0 , ◮ E ij ( y i , y j ) = 0 for y i = y j , ◮ E ij ( y i , y j ) = w > 0 for y i � = y j . input image thresholding GraphCuts 20 / 24

  60. Peter Gehler – Introduction to Graphical Models – Probabilistic Inference MAP Prediction – Linear Programming Relaxation More general alternative, Y i = { 1 , . . . , K } : � � E ( y ; x ) = E i ( y i ; x ) + E ij ( y i , y j ; x ) i ij Linearize the energy using indicator functions: K K � � E i ( y i ; x ) = E i ( k ; x ) � y i = k � = a i ; k µ i ; k � �� � k =1 k =1 =: a ik for new variables µ i ; k ∈ { 0 , 1 } with � k µ i ; k = 1 . K K K � � � E ij ( y i , y j ; x ) = E ij ( k, l ; x ) � y i = k ∧ y j = l � = a ij ; kl µ ij ; kl � �� � k =1 l =1 k =1 =: a ij ; kl for new variables µ ij ; kl ∈ { 0 , 1 } with � l µ ij ; kl = µ i ; k and � k µ ij ; kl = µ j ; l . 21 / 24

  61. Peter Gehler – Introduction to Graphical Models – Probabilistic Inference MAP Prediction – Linear Programming Relaxation Energy minimization becomes � � y ∗ ← µ ∗ := argmin a i ; k µ i ; k + a ij ; kl µ ij ; kl = argmin Aµ µ µ i ij subject to µ i ; k ∈ { 0 , 1 } µ ij ; kl ∈ { 0 , 1 } � � � µ i ; k = 1 , µ ij ; kl = µ i ; k , µ ij ; kl = µ j ; l k l k Integer variables, linear objective function, linear constraints: Integer linear program (ILP) Unfortunately, ILPs are –in general– NP-hard. 22 / 24

  62. Peter Gehler – Introduction to Graphical Models – Probabilistic Inference MAP Prediction – Linear Programming Relaxation Energy minimization becomes � � y ∗ ← µ ∗ := argmin a i ; k µ i ; k + a ij ; kl µ ij ; kl = argmin Aµ µ µ i ij subject to µ i ; k ∈ [0 , 1] ✟✟✟ ❍❍❍ µ ij ; kl ∈ [0 , 1] ✟✟✟ ❍❍❍ { 0 , 1 } { 0 , 1 } � � � µ i ; k = 1 , µ ij ; kl = µ i ; k , µ ij ; kl = µ j ; l k l k ❳❳❳ ✘ ✘✘✘ Integer real-values variables, linear objective function, linear constraints: ❳ Linear program (LP) relaxation LPs can be solved very efficiently, µ ∗ yields approximate solution for y ∗ . 23 / 24

  63. Peter Gehler – Introduction to Graphical Models – Probabilistic Inference MAP Prediction – Custom solutions: E.g. branch-and-bound Note: we just try to solve an optimization problem y ∗ = argmin E ( y ; x ) y ∈Y We can use any optimization technique that fits the problem. 24 / 24

  64. Peter Gehler – Introduction to Graphical Models – Probabilistic Inference MAP Prediction – Custom solutions: E.g. branch-and-bound Note: we just try to solve an optimization problem y ∗ = argmin E ( y ; x ) y ∈Y We can use any optimization technique that fits the problem. For low-dimensional Y , such as bounding boxes: branch-and-bound : 24 / 24

  65. Peter Gehler – Introduction to Graphical Models – Probabilistic Inference MAP Prediction – Custom solutions: E.g. branch-and-bound Note: we just try to solve an optimization problem y ∗ = argmin E ( y ; x ) y ∈Y We can use any optimization technique that fits the problem. For low-dimensional Y , such as bounding boxes: branch-and-bound : 24 / 24

  66. Peter Gehler – Introduction to Graphical Models – Probabilistic Inference MAP Prediction – Custom solutions: E.g. branch-and-bound Note: we just try to solve an optimization problem y ∗ = argmin E ( y ; x ) y ∈Y We can use any optimization technique that fits the problem. For low-dimensional Y , such as bounding boxes: branch-and-bound : 24 / 24

  67. Peter Gehler – Introduction to Graphical Models – Probabilistic Inference MAP Prediction – Custom solutions: E.g. branch-and-bound Note: we just try to solve an optimization problem y ∗ = argmin E ( y ; x ) y ∈Y We can use any optimization technique that fits the problem. For low-dimensional Y , such as bounding boxes: branch-and-bound : 24 / 24

  68. Peter Gehler – Introduction to Graphical Models – Probabilistic Inference MAP Prediction – Custom solutions: E.g. branch-and-bound Note: we just try to solve an optimization problem y ∗ = argmin E ( y ; x ) y ∈Y We can use any optimization technique that fits the problem. For low-dimensional Y , such as bounding boxes: branch-and-bound : 24 / 24

  69. Peter Gehler – Introduction to Graphical Models – Probabilistic Inference MAP Prediction – Custom solutions: E.g. branch-and-bound Note: we just try to solve an optimization problem y ∗ = argmin E ( y ; x ) y ∈Y We can use any optimization technique that fits the problem. For low-dimensional Y , such as bounding boxes: branch-and-bound : 24 / 24

  70. Peter Gehler – Introduction to Graphical Models – Probabilistic Inference MAP Prediction – Custom solutions: E.g. branch-and-bound Note: we just try to solve an optimization problem y ∗ = argmin E ( y ; x ) y ∈Y We can use any optimization technique that fits the problem. For low-dimensional Y , such as bounding boxes: branch-and-bound : 24 / 24

Recommend


More recommend