Probabilistic Graphical Models Probabilistic Graphical Models MAP inference Siamak Ravanbakhsh Fall 2019
Learning objectives Learning objectives MAP inference and its complexity exact & approximate MAP inference max-product and max-sum message passing relationship to LP relaxation graph-cuts for MAP inference
Optimization Optimization ∗ x = arg max f ( x ) x
Optimization Optimization ∗ x = arg max f ( x ) x ( x ) ≥ 0 ∀ c g c may or may not have constraints ( x ) = 0 ∀ d h d continuous or discrete (combinatorial) ...
Optimization Optimization ∗ x = arg max f ( x ) x ( x ) ≥ 0 ∀ c g c may or may not have constraints ( x ) = 0 ∀ d h d continuous or discrete (combinatorial) ... local search heuristics hill-climbing beam search tabu search .. simulated annealing integer program genetic algorithm branch and bound: when you can efficiently upper-bound partial assignments
Optimization Optimization ∗ x = arg max f ( x ) x ( x ) ≥ 0 ∀ c g c may or may not have constraints ( x ) = 0 ∀ d h d continuous or discrete (combinatorial) ... local search heuristics what if f(x) is structured? hill-climbing f ( x ) = ( x ) ∑ I f I I beam search MAP inference in a graphical model tabu search .. simulated annealing integer program genetic algorithm branch and bound: when you can efficiently upper-bound partial assignments
Definition & complexity Definition & complexity MAP arg max p ( x ) x given Bayes-net, deciding whether decision problem for some is NP-complete! p ( x ) > c x side-chain prediction as MAP inference (Yanover & Weiss)
Definition & complexity Definition & complexity MAP arg max p ( x ) x given Bayes-net, deciding whether decision problem for some is NP-complete! p ( x ) > c x Marginal MAP arg max p ( x , y ) x ∑ y side-chain prediction as MAP inference (Yanover & Weiss) given Bayes-net for , deciding whether for p ( x , y ) p ( x ) > c decision some is complete for problem x NP PP is NP-hard even for trees a non-deterministic Turing machine that accepts if the majority of paths accept a non-deterministic Turing machine that accepts if a single path accepts (with access to a PP oracle)
Problem & terminology Problem & terminology 1 ∏ I MAP inference: arg max p ( x ) = arg max ( x ) ϕ x x Z I I ~ ≡ arg max ( x ) = arg max ( x ) x ∏ I x p ϕ I I ignore the normalization constant aka max-product inference
Problem & terminology Problem & terminology 1 ∏ I MAP inference: arg max p ( x ) = arg max ( x ) ϕ x x Z I I ~ ≡ arg max ( x ) = arg max ( x ) x ∏ I x p ϕ I I ignore the normalization constant aka max-product inference with evidence: p ( x , e ) arg max p ( x ∣ e ) = arg max ≡ arg max p ( x , e ) x x x p ( e )
Problem & terminology Problem & terminology 1 ∏ I MAP inference: arg max p ( x ) = arg max ( x ) ϕ x x Z I I ~ ≡ arg max ( x ) = arg max ( x ) x ∏ I x p ϕ I I ignore the normalization constant aka max-product inference with evidence: p ( x , e ) arg max p ( x ∣ e ) = arg max ≡ arg max p ( x , e ) x x x p ( e ) log domain: ~ arg max p ( x ) ≡ arg max ln ϕ ( x ) ≡ arg min − ln ( x ) x ∑ I p x I I x aka max-sum inference aka min-sum inference (energy minimization)
Max-marginals Max-marginals marginal used in sum-product inference ϕ ( x , y ) ∑ x ∈ V al ( x ) is replaced with max-marginal max ϕ ( x , y ) x ∈ V al ( x ) ϕ ( a , b , c ) ϕ ( a , b , c ) b max ϕ ( a , c ) = ′
distributive law distributive law for MAP inference for MAP inference max( ab , ac ) = a max( b , c ) max-product inference max( a + b , a + c ) = a + max( b , c ) max-sum inference max(min( a , b ), min( a , c )) = max( a , min( b , c )) min-max inference ab + ac = a ( b + c ) sum-product inference 3 operations 2 operations
distributive law distributive law for MAP inference for MAP inference max( ab , ac ) = a max( b , c ) max-product inference max( a + b , a + c ) = a + max( b , c ) max-sum inference max(min( a , b ), min( a , c )) = max( a , min( b , c )) min-max inference ab + ac = a ( b + c ) sum-product inference 3 operations 2 operations save computation by factoring the operations in disguise max f ( x , y ) g ( y , z ) = max g ( y , z ) max f ( x , y ) x , y y x assuming ∣ V al ( X )∣ = ∣ V al ( Y )∣ = ∣ V al ( Z )∣ = d complexity: from to 3 2 O ( d ) O ( d )
Max-product Max-product variable elimination variable elimination the procedure is similar to VE for sum-product inference eliminate all the variables input: a set of factors (e.g. CPDs) t =0 Φ = { ϕ , … , ϕ } 1 K ~ output: max ( x ) = max x ∏ I ( x ) x p ϕ I I go over in some order: , … , x x i i 1 n collect all the relevant factors: Ψ = t { ϕ ∈ Φ ∣ t ∈ Scope [ ϕ ]} x i t calculate their product: = ∏ ϕ ∈Ψ t ψ ϕ t max-marginalize out : ′ = max x i t ψ ψ x t t i t update the set of factors: t −1 ′ Φ = t Φ − Ψ + t { ψ } t return the scalar in as ~ Φ t = m max ( x ) maximizing value x p ~ Z = ( x ) ∑ x p similar to the partition function:
Decoding Decoding the max-value the max-value we need to recover the maximizing assignment x ∗ keep , produced during inference { ψ , … , ψ } t =1 t = n input: a set of factors (e.g. CPDs) t =0 Φ = { ϕ , … , ϕ } 1 K ~ output: max ( x ) = max x ∏ I ( x ) x p ϕ I I go over in some order: , … , x x i i 1 n collect all the relevant factors: Ψ = t { ϕ ∈ Φ ∣ t ∈ Scope [ ϕ ]} x i t calculate their product: = ∏ ϕ ∈Ψ t ψ ϕ t max-marginalize out : ′ = max x i t ψ ψ x t t i t update the set of factors: t −1 ′ Φ = t Φ − Ψ + t { ψ } t return the scalar in as ~ Φ t = m max ( x ) x p
Decoding Decoding the max-value the max-value start from the last eliminated variable should have been a function of alone: ∗ ← arg max ψ x x ψ t = n i i n n n t =0 input: a set of factors (e.g. CPDs) Φ = { ϕ , … , ϕ } 1 K ~ max ( x ) = max ( x ) x ∏ I output: x p ϕ I I go over in some order: , … , x x i i 1 n collect all the relevant factors: Ψ = t { ϕ ∈ Φ ∣ t ∈ Scope [ ϕ ]} x i t calculate their product: = ∏ ϕ ∈Ψ t ψ ϕ t max-marginalize out : ′ = max x i t ψ ψ x t t i t update the set of factors: t −1 ′ Φ = t Φ − Ψ + t { ψ } t return the scalar in as ~ Φ t = m max ( x ) x p
Decoding Decoding the max-value the max-value start from the last eliminated variable ∗ at this point we have x i n can only have in its domain , x ∗ ← arg max ( x , x ) x ψ x ψ t = n −1 n −1 ∗ i i i x i i n −1 n −1 n −1 n i n −1 n and so on... input: a set of factors (e.g. CPDs) t =0 Φ = { ϕ , … , ϕ } 1 K ~ max ( x ) = max x ∏ I ( x ) output: x p ϕ I I , … , x go over in some order: x i i 1 n collect all the relevant factors: Ψ = t { ϕ ∈ Φ ∣ t ∈ Scope [ ϕ ]} x i t calculate their product: = ∏ ϕ ∈Ψ t ψ ϕ t max-marginalize out : ′ = max x i t ψ ψ x t t i t update the set of factors: t −1 ′ Φ = t Φ − Ψ + t { ψ } t return the product of scalars in as ~ Φ t = m max ( x ) x p
Marginal-MAP Marginal-MAP variable elimination variable elimination max ( x ) the procedure remains similar for m ∑ x n ∏ I ϕ ,…, y y ,…, x I I 1 1 max and sum do not commute max ϕ ( x , y ) = max ϕ ( x , y ) x ∑ y ∑ y x
Marginal-MAP Marginal-MAP variable elimination variable elimination max ( x ) the procedure remains similar for m ∑ x n ∏ I ϕ ,…, y y ,…, x I I 1 1 max and sum do not commute max ϕ ( x , y ) = max ϕ ( x , y ) x ∑ y ∑ y x cannot use arbitrary elimination order
Marginal-MAP Marginal-MAP variable elimination variable elimination max ( x ) the procedure remains similar for m ∑ x n ∏ I ϕ ,…, y y ,…, x I I 1 1 max and sum do not commute max ϕ ( x , y ) = max ϕ ( x , y ) x ∑ y ∑ y x cannot use arbitrary elimination order first, eliminate (sum-prod VE) { x , … , x } 1 n
Marginal-MAP variable elimination Marginal-MAP variable elimination max ( x ) the procedure remains similar for m ∑ x n ∏ I ϕ ,…, y y ,…, x I I 1 1 max and sum do not commute max ϕ ( x , y ) = max ϕ ( x , y ) x ∑ y ∑ y x cannot use arbitrary elimination order first, eliminate (sum-prod VE) { x , … , x } 1 n then eliminate (max-prod VE) { y , … , y } 1 m decode the maximizing value
Recommend
More recommend