Graphical Models Graphical Models MAP inference Siamak Ravanbakhsh Winter 2018
Learning objectives Learning objectives MAP inference and its complexity exact & approximate MAP inference max-product and max-sum message passing relationship to LP relaxation graph-cuts for MAP inference
Definition & complexity Definition & complexity MAP arg max p ( x ) x given Bayes-net, deciding whether decision problem for some is NP-complete! p ( x ) > c x side-chain prediction as MAP inference (Yanover & Weiss)
Definition & complexity Definition & complexity MAP arg max p ( x ) x given Bayes-net, deciding whether decision problem for some is NP-complete! p ( x ) > c x Marginal MAP arg max p ( x , y ) x ∑ y side-chain prediction as MAP inference (Yanover & Weiss) given Bayes-net for , deciding whether for p ( x , y ) p ( x ) > c decision some is complete for problem x NP PP is NP-hard even for trees a non-deterministic Turing machine that accepts if the majority of paths accept cannot use the distributive law a non-deterministic Turing machine that accepts if a single path accepts (with access to a PP oracle)
Problem & terminology Problem & terminology 1 ∏ I MAP inference: arg max p ( x ) = arg max ϕ ( x ) x x Z I I ~ ≡ arg max ( x ) = arg max ϕ ( x ) x ∏ I x p I I ignore the normalization constant aka max-product inference
Problem & terminology Problem & terminology 1 ∏ I MAP inference: arg max p ( x ) = arg max ϕ ( x ) x x Z I I ~ ≡ arg max ( x ) = arg max ϕ ( x ) x ∏ I x p I I ignore the normalization constant aka max-product inference with evidence: p ( x , e ) arg max p ( x ∣ e ) = arg max ≡ arg max p ( x , e ) x x x p ( e )
Problem & terminology Problem & terminology 1 ∏ I MAP inference: arg max p ( x ) = arg max ϕ ( x ) x x Z I I ~ ≡ arg max ( x ) = arg max ϕ ( x ) x ∏ I x p I I ignore the normalization constant aka max-product inference with evidence: p ( x , e ) arg max p ( x ∣ e ) = arg max ≡ arg max p ( x , e ) x x x p ( e ) log domain: ~ arg max p ( x ) ≡ arg max ln ϕ ( x ) ≡ arg min − ln ( x ) x ∑ I p x I I x aka max-sum inference aka min-sum inference (energy minimization)
Max-marginals Max-marginals marginal used in sum-product inference ϕ ( x , y ) ∑ x ∈ V al ( x ) is replaced with max-marginal max ϕ ( x , y ) x ∈ V al ( x ) ϕ ( a , c ) = max ϕ ( a , b , c ) ϕ ( a , b , c ) b ′
distributive law distributive law for MAP inference for MAP inference max( ab , ac ) = a max( b , c ) max-product inference max( a + b , a + c ) = a + max( b , c ) max-sum inference max(min( a , b ), min( a , c )) = max( a , min( b , c )) min-max inference ab + ac = a ( b + c ) sum-product inference 3 operations 2 operations
distributive law distributive law for MAP inference for MAP inference max( ab , ac ) = a max( b , c ) max-product inference max( a + b , a + c ) = a + max( b , c ) max-sum inference max(min( a , b ), min( a , c )) = max( a , min( b , c )) min-max inference ab + ac = a ( b + c ) sum-product inference 3 operations 2 operations save computation by factoring the operations in disguise max f ( x , y ) g ( y , z ) = max g ( y , z ) max f ( x , y ) x , y y x assuming ∣ V al ( X )∣ = ∣ V al ( Y )∣ = ∣ V al ( Z )∣ = d complexity: from to 3 2 O ( d ) O ( d )
Max-product Max-product variable elimination variable elimination the procedure is similar to VE for sum-product inference eliminate all the variables input: a set of factors (e.g. CPDs) t =0 Φ = { ϕ , … , ϕ } 1 K ~ output: max ( x ) = max x ∏ I ϕ ( x ) x p I I go over in some order: x , … , x i 1 i n collect all the relevant factors: Ψ = { ϕ ∈ Φ ∣ x t t ∈ Scope [ ϕ ]} i t calculate their product: ψ = ∏ ϕ ∈Ψ t ϕ t max-marginalize out : ′ x i t ψ = max ψ x it t t update the set of factors: t −1 ′ Φ = Φ t − Ψ + { ψ } t t return the scalar in as ~ Φ t = m max ( x ) maximizing value x p ~ Z = ( x ) ∑ x p similar to the partition function:
Decoding Decoding the max-value the max-value we need to recover the maximizing assignment x ∗ keep , produced during inference { ψ , … , ψ } t =1 t = n input: a set of factors (e.g. CPDs) t =0 Φ = { ϕ , … , ϕ } 1 K ~ output: max ( x ) = max x ∏ I ϕ ( x ) x p I I go over in some order: x , … , x i 1 i n collect all the relevant factors: Ψ = { ϕ ∈ Φ ∣ x t t ∈ Scope [ ϕ ]} i t calculate their product: ψ = ∏ ϕ ∈Ψ t ϕ t max-marginalize out : ′ x i t ψ = max ψ x it t t update the set of factors: t −1 ′ Φ = Φ t − Ψ + { ψ } t t return the scalar in as ~ Φ t = m max ( x ) x p
Decoding Decoding the max-value the max-value start from the last eliminated variable should have been a function of alone: ∗ ← arg max ψ x i n x ψ t = n i n n t =0 input: a set of factors (e.g. CPDs) Φ = { ϕ , … , ϕ } 1 K ~ max ( x ) = max ϕ ( x ) x ∏ I output: x p I I go over in some order: x , … , x i 1 i n collect all the relevant factors: Ψ = { ϕ ∈ Φ ∣ x t t ∈ Scope [ ϕ ]} i t calculate their product: ψ = ∏ ϕ ∈Ψ t ϕ t max-marginalize out : ′ x i t ψ = max ψ x it t t update the set of factors: t −1 ′ Φ = Φ t − Ψ + { ψ } t t return the scalar in as ~ Φ t = m max ( x ) x p
Decoding Decoding the max-value the max-value start from the last eliminated variable ∗ at this point we have x i n can only have in its domain , x ∗ ← arg max ( x , x ) x ψ t = n −1 x ψ n −1 ∗ i n −1 i n i n −1 x in −1 i n −1 i n and so on... input: a set of factors (e.g. CPDs) t =0 Φ = { ϕ , … , ϕ } 1 K ~ max ( x ) = max x ∏ I ϕ ( x ) output: x p I I x , … , x go over in some order: i 1 i n collect all the relevant factors: Ψ = { ϕ ∈ Φ ∣ x t t ∈ Scope [ ϕ ]} i t calculate their product: ψ = ∏ ϕ ∈Ψ t ϕ t max-marginalize out : ′ x i t ψ = max ψ x it t t update the set of factors: t −1 ′ Φ = Φ t − Ψ + { ψ } t t return the product of scalars in as ~ Φ t = m max ( x ) x p
Marginal-MAP Marginal-MAP variable elimination variable elimination max ϕ ( x ) the procedure remains similar for m ∑ x ,…, x n ∏ I y ,…, y I I 1 1 max and sum in do not commute max ϕ ( x , y ) ≠ max ϕ ( x , y ) x ∑ y ∑ y x
Marginal-MAP Marginal-MAP variable elimination variable elimination max ϕ ( x ) the procedure remains similar for m ∑ x ,…, x n ∏ I y ,…, y I I 1 1 max and sum in do not commute max ϕ ( x , y ) ≠ max ϕ ( x , y ) x ∑ y ∑ y x cannot use arbitrary elimination order
Marginal-MAP Marginal-MAP variable elimination variable elimination max ϕ ( x ) the procedure remains similar for m ∑ x ,…, x n ∏ I y ,…, y I I 1 1 max and sum in do not commute max ϕ ( x , y ) ≠ max ϕ ( x , y ) x ∑ y ∑ y x cannot use arbitrary elimination order first, eliminate (sum-prod VE) { x , … , x } 1 n
Marginal-MAP Marginal-MAP variable elimination variable elimination max ϕ ( x ) the procedure remains similar for m ∑ x ,…, x n ∏ I y ,…, y I I 1 1 max and sum in do not commute max ϕ ( x , y ) ≠ max ϕ ( x , y ) x ∑ y ∑ y x cannot use arbitrary elimination order first, eliminate (sum-prod VE) { x , … , x } 1 n then eliminate (max-prod VE) { y , … , y } 1 m decode the maximizing value
Marginal-MAP Marginal-MAP variable elimination variable elimination max ϕ ( x ) the procedure remains similar for m ∑ x ,…, x n ∏ I y ,…, y I I 1 1 max and sum in do not commute max ϕ ( x , y ) ≠ max ϕ ( x , y ) x ∑ y ∑ y x cannot use arbitrary elimination order first, eliminate (sum-prod VE) { x , … , x } 1 n then eliminate (max-prod VE) example: exponential complexity despite { y , … , y } 1 m low tree-width decode the maximizing value
Max-product BP Max-product BP In clique-trees, cluster-graphs, factor-graph building the chordal graph building the clique-tree tree-width (complexity of inference) ... remains the same !
Max-product BP Max-product BP In clique-trees, cluster-graphs, factor-graph building the chordal graph building the clique-tree tree-width (complexity of inference) ... remains the same ! main differences : replacing sum with max decoding the maximizing assignment variational interpretation
Max-product BP Max-product BP ψ {1,2,4} ψ {3,5} Example factor-graph 1 ∏ I p ( x ) = ψ ( x ) I I Z x 1 x 5 x 2 x 3 x 4
Max-product BP Max-product BP ψ {1,2,4} ψ {3,5} Example factor-graph 1 ∏ I p ( x ) = ψ ( x ) I I Z x 1 x 5 x 2 x 3 x 4 variable-to-factor message: ( x ) ∝ ∏ J ∣ i ∈ J , J ≠ I ( x ) δ δ i → I J → i i i
Recommend
More recommend