Introduction To Graphical Models Peter V. Gehler Max Planck - PowerPoint PPT Presentation

Peter Gehler – Introduction to Graphical Models Structured objects: predicting M variables jointly Y = { 1 , K } × { 1 , K } · · · × { 1 , K } For each x : ◮ K M values, K M − 1 d.o.f. → K M functions Example: Object detection with variable size bounding box Y ⊂ { 1 , . . . , W } × { 1 , . . . , H } × { 1 , . . . , W } × { 1 , . . . , H } y = ( left , top , right , bottom ) For each x : 1 4 W ( W − 1) H ( H − 1) values ◮ (millions to billions...) 13 / 24

Peter Gehler – Introduction to Graphical Models Example: image denoising Y = { 640 × 480 RGB images } too much! For each x : ◮ 16777216 307200 values in p ( y | x ) , ◮ ≥ 10 2 , 000 , 000 functions 14 / 24

Peter Gehler – Introduction to Graphical Models Example: image denoising Y = { 640 × 480 RGB images } too much! For each x : ◮ 16777216 307200 values in p ( y | x ) , ◮ ≥ 10 2 , 000 , 000 functions We cannot consider all possible distributions, we must impose structure . 14 / 24

Peter Gehler – Introduction to Graphical Models Probabilistic Graphical Models A (probabilistic) graphical model defines ◮ a family of probability distributions over a set of random variables, by means of a graph . 15 / 24

Peter Gehler – Introduction to Graphical Models Probabilistic Graphical Models A (probabilistic) graphical model defines ◮ a family of probability distributions over a set of random variables, by means of a graph . Popular classes of graphical models, ◮ Undirected graphical models (Markov random fields), ◮ Directed graphical models (Bayesian networks), ◮ Factor graphs, ◮ Others: chain graphs, influence diagrams, etc. 15 / 24

Peter Gehler – Introduction to Graphical Models Probabilistic Graphical Models A (probabilistic) graphical model defines ◮ a family of probability distributions over a set of random variables, by means of a graph . Popular classes of graphical models, ◮ Undirected graphical models (Markov random fields), ◮ Directed graphical models (Bayesian networks), ◮ Factor graphs, ◮ Others: chain graphs, influence diagrams, etc. The graph encodes conditional independence assumptions between the variables: ◮ for N ( i ) are the neighbors of node i in the graph p ( y i | y V \{ i } ) = p ( y i | y N ( i )) with y V \{ i } = ( y 1 , . . . , y i − 1 , y i +1 , y n ) . 15 / 24

Peter Gehler – Introduction to Graphical Models Example: Pictorial Structures for Articulated Pose Estimation F (1) top X Y top F (2) top , head . . . . . . Y head . . . . . . . . . Y rarm Y torso Y larm . . . . . . . . . . . . Y rhnd Y lhnd Y rleg Y lleg . . . . . . Y rfoot Y lfoot ◮ In principle, all parts depend on each other. ◮ Knowing where the head is puts constraints on where the feet can be. ◮ But conditional independences as specified by the graph: ◮ If we know where the left leg is, the left foot ’s position does not depend on the torso position anymore, etc. p ( y lfoot | y top , . . . , y torso , . . . , y rfoot , x ) = p ( y lfoot | y lleg , x ) 16 / 24

Peter Gehler – Introduction to Graphical Models Factor Graphs ◮ Decomposable output y = ( y 1 , . . . , y | V | ) Y i Y j ◮ Graph: G = ( V, F , E ) , E ⊆ V × F ◮ variable nodes V (circles), ◮ factor nodes F (boxes), ◮ edges E between variable and factor nodes. Y k Y l ◮ each factor F ∈ F connects a subset of nodes, ◮ write F = { v 1 , . . . , v | F | } and Factor graph y F = ( y v 1 , . . . , y v | F | ) 17 / 24

Peter Gehler – Introduction to Graphical Models Factor Graphs ◮ Decomposable output y = ( y 1 , . . . , y | V | ) Y i Y j ◮ Graph: G = ( V, F , E ) , E ⊆ V × F ◮ variable nodes V (circles), ◮ factor nodes F (boxes), ◮ edges E between variable and factor nodes. Y k Y l ◮ each factor F ∈ F connects a subset of nodes, ◮ write F = { v 1 , . . . , v | F | } and Factor graph y F = ( y v 1 , . . . , y v | F | ) ◮ Factorization into potentials ψ at factors : p ( y ) = 1 � ψ F ( y F ) Z F ∈F 17 / 24

Peter Gehler – Introduction to Graphical Models Factor Graphs ◮ Decomposable output y = ( y 1 , . . . , y | V | ) Y i Y j ◮ Graph: G = ( V, F , E ) , E ⊆ V × F ◮ variable nodes V (circles), ◮ factor nodes F (boxes), ◮ edges E between variable and factor nodes. Y k Y l ◮ each factor F ∈ F connects a subset of nodes, ◮ write F = { v 1 , . . . , v | F | } and Factor graph y F = ( y v 1 , . . . , y v | F | ) ◮ Factorization into potentials ψ at factors : p ( y ) = 1 ψ F ( y F ) = 1 � Z ψ 1 ( Y l ) ψ 2 ( Y j , Y l ) ψ 3 ( Y i , Y j ) ψ 4 ( Y i , Y k , Y l ) Z F ∈F 17 / 24

Peter Gehler – Introduction to Graphical Models Factor Graphs ◮ Decomposable output y = ( y 1 , . . . , y | V | ) Y i Y j ◮ Graph: G = ( V, F , E ) , E ⊆ V × F ◮ variable nodes V (circles), ◮ factor nodes F (boxes), ◮ edges E between variable and factor nodes. Y k Y l ◮ each factor F ∈ F connects a subset of nodes, ◮ write F = { v 1 , . . . , v | F | } and Factor graph y F = ( y v 1 , . . . , y v | F | ) ◮ Factorization into potentials ψ at factors : p ( y ) = 1 ψ F ( y F ) = 1 � Z ψ 1 ( Y l ) ψ 2 ( Y j , Y l ) ψ 3 ( Y i , Y j ) ψ 4 ( Y i , Y k , Y l ) Z F ∈F ◮ Z is a normalization constant, called partition function : � � Z = ψ F ( y F ) . y ∈Y F ∈F 17 / 24

Peter Gehler – Introduction to Graphical Models Conditional Distributions How to model p ( y | x ) ? X i X j ◮ Potentials become also functions of (part of) x : ψ F ( y F ; x F ) instead of just ψ F ( y F ) 1 � p ( y | x ) = ψ F ( y F ; x F ) Z ( x ) Y i Y j F ∈F ◮ Partition function depends on x F Factor graph � � Z ( x ) = ψ F ( y F ; x F ) . y ∈Y F ∈F ◮ Note: x is treated just as an argument, not as a random variable. Conditional random fields (CRFs) 18 / 24

Peter Gehler – Introduction to Graphical Models Conventions: Potentials and Energy Functions Assume ψ F ( y F ) > 0 . Then ◮ instead of potentials , we can also work with energies : ψ F ( y F ; x F ) = exp( − E F ( y F ; x F )) , or equivalently E F ( y F ; x F ) = − log( ψ F ( y F ; x F )) . 19 / 24

Peter Gehler – Introduction to Graphical Models Conventions: Potentials and Energy Functions Assume ψ F ( y F ) > 0 . Then ◮ instead of potentials , we can also work with energies : ψ F ( y F ; x F ) = exp( − E F ( y F ; x F )) , or equivalently E F ( y F ; x F ) = − log( ψ F ( y F ; x F )) . ◮ p ( y | x ) can be written as 1 � p ( y | x ) = ψ F ( y F ; x F ) Z ( x ) F ∈F 1 � 1 = Z ( x ) exp( − E F ( y F ; x F )) = Z ( x ) exp( − E ( y ; x )) F ∈F for E ( y ; x ) = � F ∈F E F ( y F ; x F ) 19 / 24

Peter Gehler – Introduction to Graphical Models Conventions: Energy Minimization 1 argmax p ( y | x ) = argmax Z ( x ) exp( − E ( y ; x )) y y ∈Y = argmax exp( − E ( y ; x )) y ∈Y = argmax − E ( y ; x ) y ∈Y = argmin E ( y ; x ) . y ∈Y MAP prediction can be performed by energy minimization . 20 / 24

Peter Gehler – Introduction to Graphical Models Conventions: Energy Minimization 1 argmax p ( y | x ) = argmax Z ( x ) exp( − E ( y ; x )) y y ∈Y = argmax exp( − E ( y ; x )) y ∈Y = argmax − E ( y ; x ) y ∈Y = argmin E ( y ; x ) . y ∈Y MAP prediction can be performed by energy minimization . In practice, one typically models the energy function directly. → the probability distribution is uniquely determined by it. 20 / 24

Peter Gehler – Introduction to Graphical Models Example: An Energy Function for Image Segmentation Foreground/background image segmentation ◮ X = [0 , 255] WH , Y = { 0 , 1 } WH foreground: y i = 1 , background: y i = 0 . ◮ graph: 4-connected grid ◮ Each output pixel depends on ◮ local grayvalue (inputs) ◮ neighboring outputs Energy function components (”Ising” model): 1 1 ◮ E i ( y i = 1 , x i ) = 1 − E i ( y i = 0 , x i ) = 255 x i 255 x i x i bright → y i rather foreground, x i dark → y i rather background ◮ E ij (0 , 0) = E ij (1 , 1) = 0 , E ij (0 , 1) = E ij (1 , 0) = ω for ω > 0 prefer that neighbors have the same label → labeling smooth 21 / 24

Peter Gehler – Introduction to Graphical Models 1 1 � � � � E ( y ; x ) = (1 − 255 x i ) � y i = 1 � + 255 x i � y i = 0 � + w � y i � = y j � i i ∼ j input image segmentation segmentation from from thresholding minimal energy 22 / 24

Peter Gehler – Introduction to Graphical Models What to do with Structured Prediction Models? Case 1) p ( y | x ) is known MAP Prediction Predict f : X → Y by solving y ∗ = argmax p ( y | x ) y ∈Y = argmin E ( y, x ) y ∈Y Probabilistic Inference Compute marginal probabilities p ( y F | x ) for any factor F , in particular, p ( y i | x ) for all i ∈ V . 23 / 24

Peter Gehler – Introduction to Graphical Models What to do with Structured Prediction Models? Case 2) p ( y | x ) is unknown, but we have training data Parameter Learning Assume fixed graph structure, learn potentials/energies ( ψ F ) Among other tasks (learn the graph structure, variables, etc.) ⇒ Topic of Wednesdays’ lecture 24 / 24

Peter Gehler – Introduction to Graphical Models – Probabilistic Inference Example: Pictorial Structures input image x argmax y p ( y | x ) p ( y i | x ) ◮ MAP makes a single (structured) prediction (point estimate) ◮ best overall pose ◮ Marginal probabilities p ( y i | x ) give us ◮ potential positions ◮ uncertainty of the individual body parts. 1 / 24

Peter Gehler – Introduction to Graphical Models – Probabilistic Inference Example: Man-made structure detection input image x argmax y p ( y | x ) p ( y i | x ) ◮ Task: Pixel depicts a man made structure or not? y i ∈ { 0 , 1 } ◮ Middle: MAP inference ◮ Right: variable marginals ◮ Attention: Max-Marginals � = MAP 2 / 24

Peter Gehler – Introduction to Graphical Models – Probabilistic Inference Probabilistic Inference Compute p ( y F | x ) and Z ( x ) . 3 / 24

Peter Gehler – Introduction to Graphical Models – Probabilistic Inference Assume y = ( y i , y j , y k , y l ) , Y = Y i × Y j × Y k × Y l , and an energy function E ( y ; x ) compatible with the following factor graph: Y i Y j Y k Y l F G H 4 / 24

Peter Gehler – Introduction to Graphical Models – Probabilistic Inference Assume y = ( y i , y j , y k , y l ) , Y = Y i × Y j × Y k × Y l , and an energy function E ( y ; x ) compatible with the following factor graph: Y i Y j Y k Y l F G H Task 1: for any y ∈ Y , compute p ( y | x ) , using 1 p ( y | x ) = Z ( x ) exp( − E ( y ; x )) . 4 / 24

Peter Gehler – Introduction to Graphical Models – Probabilistic Inference Assume y = ( y i , y j , y k , y l ) , Y = Y i × Y j × Y k × Y l , and an energy function E ( y ; x ) compatible with the following factor graph: Y i Y j Y k Y l F G H Task 1: for any y ∈ Y , compute p ( y | x ) , using 1 p ( y | x ) = Z ( x ) exp( − E ( y ; x )) . Problem: We don’t know Z ( x ) , and computing it using � Z ( x ) = exp( − E ( y ; x )) y ∈Y looks expensive (the sum has |Y i | · |Y j | · |Y k | · |Y l | terms). A lot research has been done on how to efficiently compute Z ( x ) . 4 / 24

Peter Gehler – Introduction to Graphical Models – Probabilistic Inference Probabilistic Inference – Belief Propagation / Message Passing Y i Y j Y k Y l F G H For notational simplicity, we drop the dependence on (fixed) x : � Z = exp( − E ( y )) y ∈Y 5 / 24

Peter Gehler – Introduction to Graphical Models – Probabilistic Inference Probabilistic Inference – Belief Propagation / Message Passing Y i Y j Y k Y l F G H For notational simplicity, we drop the dependence on (fixed) x : � Z = exp( − E ( y )) y ∈Y � � � � = exp( − E ( y i , y j , y k , y l )) y i ∈Y i y j ∈Y j y k ∈Y k y l ∈Y l 5 / 24

Peter Gehler – Introduction to Graphical Models – Probabilistic Inference Probabilistic Inference – Belief Propagation / Message Passing Y i Y j Y k Y l F G H For notational simplicity, we drop the dependence on (fixed) x : � Z = exp( − E ( y )) y ∈Y � � � � = exp( − E ( y i , y j , y k , y l )) y i ∈Y i y j ∈Y j y k ∈Y k y l ∈Y l � � � � = exp( − ( E F ( y i , y j ) + E G ( y j , y k ) + E H ( y k , y l ))) y i ∈Y i y j ∈Y j y k ∈Y k y l ∈Y l 5 / 24

Peter Gehler – Introduction to Graphical Models – Probabilistic Inference Probabilistic Inference – Belief Propagation / Message Passing Y i Y j Y k Y l F G H � � � � Z = exp( − ( E F ( y i , y j ) + E G ( y j , y k ) + E H ( y k , y l ))) y i ∈Y i y j ∈Y j y k ∈Y k y l ∈Y l 5 / 24

Peter Gehler – Introduction to Graphical Models – Probabilistic Inference Probabilistic Inference – Belief Propagation / Message Passing Y i Y j Y k Y l F G H � � � � Z = exp( − ( E F ( y i , y j ) + E G ( y j , y k ) + E H ( y k , y l ))) y i ∈Y i y j ∈Y j y k ∈Y k y l ∈Y l � � � � = exp( − E F ( y i , y j )) exp( − E G ( y j , y k )) exp( − E H ( y k , y l )) y i y j y k y l 5 / 24

Peter Gehler – Introduction to Graphical Models – Probabilistic Inference Probabilistic Inference – Belief Propagation / Message Passing Y i Y j Y k Y l F G H � � � � Z = exp( − ( E F ( y i , y j ) + E G ( y j , y k ) + E H ( y k , y l ))) y i ∈Y i y j ∈Y j y k ∈Y k y l ∈Y l � � � � = exp( − E F ( y i , y j )) exp( − E G ( y j , y k )) exp( − E H ( y k , y l )) y i y j y k y l � � � � = exp( − E F ( y i , y j )) exp( − E G ( y j , y k )) exp( − E H ( y k , y l )) y i y j y k y l 5 / 24

Peter Gehler – Introduction to Graphical Models – Probabilistic Inference Probabilistic Inference – Belief Propagation / Message Passing r H → Y k ∈ R Y k Y i Y j Y k Y l F G H � � � � Z = exp( − E F ( y i , y j )) exp( − E G ( y j , y k )) exp( − E H ( y k , y l )) y i y j y k y l � �� r H → Yk ( y k ) 5 / 24

Peter Gehler – Introduction to Graphical Models – Probabilistic Inference Probabilistic Inference – Belief Propagation / Message Passing r H → Y k ∈ R Y k Y i Y j Y k Y l F G H � � � � Z = exp( − E F ( y i , y j )) exp( − E G ( y j , y k )) exp( − E H ( y k , y l )) y i y j y k y l � �� r H → Yk ( y k ) � � � = exp( − E F ( y i , y j )) exp( − E G ( y j , y k )) r H → Y k ( y k ) y i y j y k 5 / 24

Peter Gehler – Introduction to Graphical Models – Probabilistic Inference Probabilistic Inference – Belief Propagation / Message Passing r G → Y j ∈ R Y j Y i Y j Y k Y l F G H � � � Z = exp( − E F ( y i , y j )) exp( − E G ( y j , y k )) r H → Y k ( y k ) y i y j y k � �� r G → Yj ( y j ) 5 / 24

Peter Gehler – Introduction to Graphical Models – Probabilistic Inference Probabilistic Inference – Belief Propagation / Message Passing r G → Y j ∈ R Y j Y i Y j Y k Y l F G H � � � Z = exp( − E F ( y i , y j )) exp( − E G ( y j , y k )) r H → Y k ( y k ) y i y j y k � �� r G → Yj ( y j ) � � = exp( − E F ( y i , y j )) r G → Y j ( y j ) y i y j 5 / 24

Peter Gehler – Introduction to Graphical Models – Probabilistic Inference Probabilistic Inference – Belief Propagation / Message Passing r F → Y i ∈ R Y i Y i Y j Y k Y l F G H � � � Z = exp( − E F ( y i , y j )) exp( − E G ( y j , y k )) r H → Y k ( y k ) y i y j y k � �� r G → Yj ( y j ) � � = exp( − E F ( y i , y j )) r G → Y j ( y j ) y i y j � = r F → Y i ( y i ) y i 5 / 24

Peter Gehler – Introduction to Graphical Models – Probabilistic Inference Example: Inference on Trees Y i Y j Y k Y l F G H I Y m � Z = exp( − E ( y )) y ∈Y � � � � � = exp( − ( E F ( y i , y j ) + · · · + E I ( y k , y m ))) y i ∈Y i y j ∈Y i y k ∈Y i y l ∈Y i y m ∈Y m 6 / 24

Peter Gehler – Introduction to Graphical Models – Probabilistic Inference Example: Inference on Trees Y i Y j Y k Y l F G H I Y m � � � Z = exp( − E F ( y i , y j )) exp( − E G ( y j , y k )) · y i ∈Y i y j ∈Y j y k ∈Y k            �  �   exp( − E H ( y k , y l )) · exp( − E I ( y k , y m ))        y l ∈Y l y m ∈Y m    � �� r H → Yk ( y k ) r I → Yk ( y k ) 6 / 24

Peter Gehler – Introduction to Graphical Models – Probabilistic Inference Example: Inference on Trees r H → Y k ( y k ) Y i Y j Y k Y l F G H r I → Y k ( y k ) I Y m � � � Z = exp( − E F ( y i , y j )) exp( − E G ( y j , y k )) · y i ∈Y i y j ∈Y j y k ∈Y k ( r H → Y k ( y k ) · r I → Y k ( y k )) 6 / 24

Peter Gehler – Introduction to Graphical Models – Probabilistic Inference Example: Inference on Trees q Y k → G ( y k ) r H → Y k ( y k ) Y i Y j Y k Y l F G H r I → Y k ( y k ) I Y m � � � Z = exp( − E F ( y i , y j )) exp( − E G ( y j , y k )) · y i ∈Y i y j ∈Y j y k ∈Y k ( r H → Y k ( y k ) · r I → Y k ( y k )) � �� q Yk → G ( y k ) 6 / 24

Peter Gehler – Introduction to Graphical Models – Probabilistic Inference Example: Inference on Trees q Y k → G ( y k ) r H → Y k ( y k ) Y i Y j Y k Y l F G H r I → Y k ( y k ) I Y m � � � Z = exp( − E F ( y i , y j )) exp( − E G ( y j , y k )) q Y k → G ( y k ) y i ∈Y i y j ∈Y j y k ∈Y k 6 / 24

Peter Gehler – Introduction to Graphical Models – Probabilistic Inference Factor Graph Sum-Product Algorithm ◮ “Message”: pair of vectors at each . . . factor graph edge ( i, F ) ∈ E r F → Y i . . . 1. r F → Y i ∈ R Y i : factor-to-variable Y i . . . message F 2. q Y i → F ∈ R Y i : variable-to-factor . . . q Y i → F message 7 / 24

Peter Gehler – Introduction to Graphical Models – Probabilistic Inference Factor Graph Sum-Product Algorithm ◮ “Message”: pair of vectors at each . . . factor graph edge ( i, F ) ∈ E r F → Y i . . . 1. r F → Y i ∈ R Y i : factor-to-variable Y i . . . message F 2. q Y i → F ∈ R Y i : variable-to-factor . . . q Y i → F message ◮ Algorithm iteratively update messages ◮ After convergence: Z and p ( y F ) can be obtained from the messages. Belief Propagation 7 / 24

Peter Gehler – Introduction to Graphical Models – Probabilistic Inference Example: Pictorial Structures F (1) top X Y top F (2) top , head . . . . . . Y head . . . . . . . . . Y rarm Y torso Y larm . . . . . . . . . . . . Y rhnd Y lhnd Y rleg Y lleg . . . . . . Y rfoot Y lfoot ◮ Tree-structured model for articulated pose (Felzenszwalb and Huttenlocher, 2000), (Fischler and Elschlager, 1973) ◮ Body-part variables, states: discretized tuple ( x, y, s, θ ) ◮ ( x, y ) position, s scale, and θ rotation 8 / 24

Peter Gehler – Introduction to Graphical Models – Probabilistic Inference Example: Pictorial Structures input image x p ( y i | x ) ◮ Exact marginals although state space is huge and thus partition function is a huge sum. � Z ( x ) = exp ( − E ( y ; x )) all bodies y 9 / 24

Peter Gehler – Introduction to Graphical Models – Probabilistic Inference Belief Propagation in Loopy Graphs Can we do message passing also in graphs with loops? A B A B Y i Y j Y k Y i Y j Y k C D E C D E F G F G Y l Y m Y n Y l Y m Y n H I J H I J K L K L Y o Y p Y q Y o Y p Y q Problem: There is no well-defined leaf–to–root order. Suggested solution: Loopy Belief Propagation (LBP) ◮ initialize all messages as constant 1 ◮ pass messages until convergence 10 / 24

Peter Gehler – Introduction to Graphical Models – Probabilistic Inference Belief Propagation in Loopy Graphs A B A B Y i Y j Y k Y i Y j Y k C D E C D E F G F G Y l Y m Y n Y l Y m Y n H I J H I J K L K L Y o Y p Y q Y o Y p Y q Loopy Belief Propagation is very popular, but has some problems: ◮ it might not converge (e.g. oscillate) ◮ even if it does, the computed probabilities are only approximate . Many improved message-passing schemes exist (see tutorial book). 11 / 24

Peter Gehler – Introduction to Graphical Models – Probabilistic Inference Probabilistic Inference – Variational Inference / Mean Field Task: Compute marginals p ( y F | x ) for general p ( y | x ) Idea: Approximate p ( y | x ) by simpler q ( y ) and use marginals from that. q ∗ = argmin D KL ( q ( y ) � p ( y | x )) q ∈Q � E.g. Naive Mean Field : Q all distributions of the form q ( y ) = q i ( y i ) . i ∈ V q e q f q g �→ q h q i q j q k q l q m 12 / 24

Peter Gehler – Introduction to Graphical Models – Probabilistic Inference Probabilistic Inference – Sampling / Markov-Chain Monte Carlo Task: Compute marginals p ( y F | x ) for general p ( y | x ) Idea: Rephrase as computing the expected value of a quantity : E y ∼ p ( y | x,w ) [ h ( x, y )] , for some (well-behaved) function h : X × Y → R . For probabilistic inference, this step is easy. Set h F,z ( x, y ) := � y F = z � , then � E y ∼ p ( y | x,w ) [ h F,z ( x, y )] = p ( y | x ) � y F = z � y ∈Y � = p ( y F | x ) � y F = z � = p ( y F = z | x ) . y F ∈Y F 13 / 24

Peter Gehler – Introduction to Graphical Models – Probabilistic Inference Probabilistic Inference – Sampling / Markov-Chain Monte Carlo Expectations can be computed/approximated by sampling : ◮ For fixed x , let y (1) , y (2) , . . . be i.i.d. samples from p ( y | x ) , then S E y ∼ p ( y | x ) [ h ( x, y )] ≈ 1 � h ( x, y ( s ) ) . S s =1 ◮ The law of large numbers guarantees convergence for S → ∞ , √ ◮ For S independent samples, approximation error is O (1 / S ) , independent of the dimension of Y . 14 / 24

Peter Gehler – Introduction to Graphical Models – Probabilistic Inference Probabilistic Inference – Sampling / Markov-Chain Monte Carlo Expectations can be computed/approximated by sampling : ◮ For fixed x , let y (1) , y (2) , . . . be i.i.d. samples from p ( y | x ) , then S E y ∼ p ( y | x ) [ h ( x, y )] ≈ 1 � h ( x, y ( s ) ) . S s =1 ◮ The law of large numbers guarantees convergence for S → ∞ , √ ◮ For S independent samples, approximation error is O (1 / S ) , independent of the dimension of Y . Problem: ◮ Producing i.i.d. samples, y ( s ) , from p ( y | x ) is hard . Solution: ◮ We can get away with a sequence of dependent samples → Monte-Carlo Markov Chain (MCMC) sampling 14 / 24

Peter Gehler – Introduction to Graphical Models – Probabilistic Inference Probabilistic Inference – Sampling / Markov-Chain Monte Carlo One example how to do MCMC sampling: Gibbs sampler ◮ Initialize y (0) = ( y 1 , . . . , y d ) arbitrarily ◮ For s = 1 , . . . , S : 1. Select a variable y i , 2. Re-sample y i ∼ p ( y i | y ( s − 1) V \{ i } , x ) . 3. Output sample y ( s ) = ( y ( s − 1) , . . . , y ( s − 1) , y i , y ( s − 1) , . . . , y ( s − 1) ) 1 i − 1 i +1 d p ( y i , y ( t ) V \{ i } | x ) p ( y i | y ( s ) V \{ i } , x ) = � y i ∈Y i p ( y i , y ( t ) V \{ i } | x ) exp( − E ( y i , y ( t ) , x ) = � y i ∈Y i exp( − E ( y i , y ( t ) , x ) 15 / 24

Peter Gehler – Introduction to Graphical Models – Probabilistic Inference MAP Prediction Compute y ∗ = argmax y p ( y | x ) . 16 / 24

Peter Gehler – Introduction to Graphical Models – Probabilistic Inference MAP Prediction – Belief Propagation / Message Passing A B Y i Y j Y k 10 . B Y m C D E 9 . F 6 . 8 . F G C Y l Y m Y n Y k Y l 7 . 3 . 5 . H I J D E 2 . 4 . A K L Y i Y j Y o Y p Y q 1 . One can also derive message passing algorithms for MAP prediction. ◮ In trees: guaranteed to converge to optimal solution. ◮ In loopy graphs: convergence not guaranteed, approximate solution. 17 / 24

Peter Gehler – Introduction to Graphical Models – Probabilistic Inference MAP Prediction – Graph Cuts For loopy graphs, we can find the global optimum only in special cases : ◮ Binary output variables: Y i = { 0 , 1 } for i = 1 , . . . , d , ◮ Energy function with only unary and pairwise terms � � E ( y ; x, w ) = E i ( y i ; x ) + E i,j ( y i , y j ; x ) i i ∼ j 18 / 24

Peter Gehler – Introduction to Graphical Models – Probabilistic Inference MAP Prediction – Graph Cuts For loopy graphs, we can find the global optimum only in special cases : ◮ Binary output variables: Y i = { 0 , 1 } for i = 1 , . . . , d , ◮ Energy function with only unary and pairwise terms � � E ( y ; x, w ) = E i ( y i ; x ) + E i,j ( y i , y j ; x ) i i ∼ j ◮ Restriction 1 (positive unary potentials): E F ( y i ; x, w t F ) ≥ 0 (always achievable by reparametrization) ◮ Restriction 2 (regular/submodular/attractive pairwise potentials) E F ( y i , y j ; x, w t F ) = 0 , if y i = y j , E F ( y i , y j ; x, w t F ) = E F ( y j , y i ; x, w t F ) ≥ 0 , otherwise. (not always achievable, depends on the task) 18 / 24

Peter Gehler – Introduction to Graphical Models – Probabilistic Inference ◮ Construct auxiliary undirected graph s ◮ One node { i } i ∈ V per variable { i, s } ◮ Two extra nodes: source s , sink t ◮ Edges i j k { i, t } Edge Graph cut weight { i, j } E F ( y i = 0 , y j = 1; x, w t F ) m n l { i, s } E F ( y i = 1; x, w t F ) { i, t } E F ( y i = 0; x, w t F ) t ◮ Find linear s - t -mincut ◮ Solution defines optimal binary labeling of the original energy minimization problem GraphCuts algorithms (Approximate) multi-class extensions exist, see tutorial book. 19 / 24

Peter Gehler – Introduction to Graphical Models – Probabilistic Inference GraphCuts Example Image segmentation energy: � � 1 1 � � E ( y ; x ) = (1 − 255 x i ) � y i = 1 � + 255 x i � y i = 0 � + w � y i � = y j � i ∼ j i All conditions to apply GraphCuts are fulfilled. ◮ E i ( y i , x ) ≥ 0 , ◮ E ij ( y i , y j ) = 0 for y i = y j , ◮ E ij ( y i , y j ) = w > 0 for y i � = y j . input image thresholding GraphCuts 20 / 24

Peter Gehler – Introduction to Graphical Models – Probabilistic Inference MAP Prediction – Linear Programming Relaxation More general alternative, Y i = { 1 , . . . , K } : � � E ( y ; x ) = E i ( y i ; x ) + E ij ( y i , y j ; x ) i ij Linearize the energy using indicator functions: K K � � E i ( y i ; x ) = E i ( k ; x ) � y i = k � = a i ; k µ i ; k � �� k =1 k =1 =: a ik for new variables µ i ; k ∈ { 0 , 1 } with � k µ i ; k = 1 . K K K � � � E ij ( y i , y j ; x ) = E ij ( k, l ; x ) � y i = k ∧ y j = l � = a ij ; kl µ ij ; kl � �� k =1 l =1 k =1 =: a ij ; kl for new variables µ ij ; kl ∈ { 0 , 1 } with � l µ ij ; kl = µ i ; k and � k µ ij ; kl = µ j ; l . 21 / 24

Peter Gehler – Introduction to Graphical Models – Probabilistic Inference MAP Prediction – Linear Programming Relaxation Energy minimization becomes � � y ∗ ← µ ∗ := argmin a i ; k µ i ; k + a ij ; kl µ ij ; kl = argmin Aµ µ µ i ij subject to µ i ; k ∈ { 0 , 1 } µ ij ; kl ∈ { 0 , 1 } � � � µ i ; k = 1 , µ ij ; kl = µ i ; k , µ ij ; kl = µ j ; l k l k Integer variables, linear objective function, linear constraints: Integer linear program (ILP) Unfortunately, ILPs are –in general– NP-hard. 22 / 24

Peter Gehler – Introduction to Graphical Models – Probabilistic Inference MAP Prediction – Linear Programming Relaxation Energy minimization becomes � � y ∗ ← µ ∗ := argmin a i ; k µ i ; k + a ij ; kl µ ij ; kl = argmin Aµ µ µ i ij subject to µ i ; k ∈ [0 , 1] ✟✟✟ ❍❍❍ µ ij ; kl ∈ [0 , 1] ✟✟✟ ❍❍❍ { 0 , 1 } { 0 , 1 } � � � µ i ; k = 1 , µ ij ; kl = µ i ; k , µ ij ; kl = µ j ; l k l k ❳❳❳ ✘ ✘✘✘ Integer real-values variables, linear objective function, linear constraints: ❳ Linear program (LP) relaxation LPs can be solved very efficiently, µ ∗ yields approximate solution for y ∗ . 23 / 24

Peter Gehler – Introduction to Graphical Models – Probabilistic Inference MAP Prediction – Custom solutions: E.g. branch-and-bound Note: we just try to solve an optimization problem y ∗ = argmin E ( y ; x ) y ∈Y We can use any optimization technique that fits the problem. 24 / 24

Peter Gehler – Introduction to Graphical Models – Probabilistic Inference MAP Prediction – Custom solutions: E.g. branch-and-bound Note: we just try to solve an optimization problem y ∗ = argmin E ( y ; x ) y ∈Y We can use any optimization technique that fits the problem. For low-dimensional Y , such as bounding boxes: branch-and-bound : 24 / 24

Introduction To Graphical Models Peter V. Gehler Max Planck - PowerPoint PPT Presentation

Peter Gehler Introduction to Graphical Models Introduction To Graphical Models Peter V. Gehler Max Planck Institute for Intelligent Systems, T ubingen, Germany ENS/INRIA Summer School, Paris, July 2013 1 / 6 Peter Gehler

Graphical Models Graphical Models Bayesian Networks Siamak Ravanbakhsh Fall 2019 Previously on

Transforming Graphical System Models to Graphical Attack Models ! Joint work with Marieta

Probabilistic Graphical Models Probabilistic Graphical Models Variable elimination Siamak

Probabilistic Graphical Models CMSC 678 UMBC Probabilistic Graphical Models A graph G that

Probabilistic Graphical Models Probabilistic Graphical Models introduction to learning Siamak

Undirected Graphical Models Aaron Courville, Universit de Montral 2 (UNDIRECTED) GRAPHICAL

Graphical models Review Graphical models (Bayes nets, Markov random fields, factor graphs) !

Probabilistic Graphical Models CMSC 691 UMBC Two Problems for Graphical Models 1 ,

Graphical Models Graphical Models Relationship between the directed & undirected models

Probabilistic Graphical Models Probabilistic Graphical Models Undirected Models Fall 2019

Probabilistic Graphical Models Probabilistic Graphical Models parameter learning in undirected

Probabilistic Graphical Models Probabilistic Graphical Models Gaussian Network Models Fall 2019

Graphical Screen Design Grids are an essential tool for graphical design Important graphical

Graphical > Tangible? What are their limitations? 93 94 Graphical > Tangible? Graphical

Graphical Screen Design Grids are an essential tool for graphical design Important graphical

10/4/15 Graphical Programming (1) Maze Program TOPICS Graphical Programming Using

Capturing Independence Graphically; Directed Graphs COMPSCI 276, Spring 2011 Set 3: Rina Dechter

Updating the Knowledge Compilation Map Simone Bova (TU Wien) Dagstuhl Seminar on Recent Trends in

A Lagrangean Based Branch-and-Cut Algorithm for Global Optimization of Nonconvex Mixed-Integer

Inference and Representation David Sontag New York University Lecture 3, Sept. 15, 2014 David

Advanced Tools from Modern Cryptography Lecture 14 MPC: Feasibility Results Summary

DECOMPOSABILITY OF ULTRAFILTERS, MODEL-THEORETICAL PRINCIPLES, AND COMPACTNESS OF TOPOLOGICAL

Inference Probabilis7c Graphical Models: Marginal and condi,onal

Data Mining Graphical Models for Discrete Data Undirected Graphs (Markov Random Fields) Ad

Introduction To Graphical Models Peter V. Gehler Max Planck - PowerPoint PPT Presentation

Peter Gehler Introduction to Graphical Models Introduction To Graphical Models Peter V. Gehler Max Planck Institute for Intelligent Systems, T ubingen, Germany ENS/INRIA Summer School, Paris, July 2013 1 / 6 Peter Gehler

Graphical Models Graphical Models Bayesian Networks Siamak Ravanbakhsh Fall 2019 Previously on

Transforming Graphical System Models to Graphical Attack Models ! Joint work with Marieta

Probabilistic Graphical Models Probabilistic Graphical Models Variable elimination Siamak

Probabilistic Graphical Models CMSC 678 UMBC Probabilistic Graphical Models A graph G that

Probabilistic Graphical Models Probabilistic Graphical Models introduction to learning Siamak

Undirected Graphical Models Aaron Courville, Universit de Montral 2 (UNDIRECTED) GRAPHICAL

Graphical models Review Graphical models (Bayes nets, Markov random fields, factor graphs) !

Probabilistic Graphical Models CMSC 691 UMBC Two Problems for Graphical Models 1 ,

Graphical Models Graphical Models Relationship between the directed &amp; undirected models

Probabilistic Graphical Models Probabilistic Graphical Models Undirected Models Fall 2019

Probabilistic Graphical Models Probabilistic Graphical Models parameter learning in undirected

Probabilistic Graphical Models Probabilistic Graphical Models Gaussian Network Models Fall 2019

Graphical Screen Design Grids are an essential tool for graphical design Important graphical

Graphical &gt; Tangible? What are their limitations? 93 94 Graphical &gt; Tangible? Graphical

Graphical Screen Design Grids are an essential tool for graphical design Important graphical

10/4/15 Graphical Programming (1) Maze Program TOPICS Graphical Programming Using

Capturing Independence Graphically; Directed Graphs COMPSCI 276, Spring 2011 Set 3: Rina Dechter

Updating the Knowledge Compilation Map Simone Bova (TU Wien) Dagstuhl Seminar on Recent Trends in

A Lagrangean Based Branch-and-Cut Algorithm for Global Optimization of Nonconvex Mixed-Integer

Inference and Representation David Sontag New York University Lecture 3, Sept. 15, 2014 David

Advanced Tools from Modern Cryptography Lecture 14 MPC: Feasibility Results Summary

DECOMPOSABILITY OF ULTRAFILTERS, MODEL-THEORETICAL PRINCIPLES, AND COMPACTNESS OF TOPOLOGICAL

Inference Probabilis7c Graphical Models: Marginal and condi,onal

Data Mining Graphical Models for Discrete Data Undirected Graphs (Markov Random Fields) Ad

Graphical Models Graphical Models Relationship between the directed & undirected models

Graphical > Tangible? What are their limitations? 93 94 Graphical > Tangible? Graphical