Slides Set 10: Bounded In Inference Non-iteratively; Min - PowerPoint PPT Presentation

CPCS Networks – Medical Diagnosis (noisy-OR model) Test case: no evidence Anytime-mpe(0.0001) U/L error vs time 3.8 cpcs422b 3.4 cpcs360b 3.0 Upper/Lower 2.6 2.2 1.8 1.4 1.0 0.6 i=1 i=21 1 10 100 1000 Time and parameter i Time (sec) Algorithm cpcs360 cpcs422 elim-mpe 115.8 1697.6  10 −  = anytime-mpe( ), 70.3 505.2 4  10 −  = 1 anytime-mpe( ), 70.3 110.5 slides10 828X 2019

Outline • Mini-bucket elimination • Weighted Mini-bucket • Mini-clustering • Re-parameterization, cost-shifting • Iterative Belief propagation • Iterative-join-graph propagation slides10 828X 2019

Decomposition for Sum • Generalize technique to sum via Holder’s inequality: • Define the weighted (or powered) sum: • “Temperature” interpolates between sum & max: • Different weights do not commute: slides10 828X 2019

The Power Sum and Holder Inequality slides10 828X 2019

Working Example • Model: • Markov network • Task: A • Partition function B C (Qiang Liu slides) slides10 828X 2019

Mini-Bucket (Basic Principles) • Upper bound • Lower bound (Qiang Liu slides) slides10 828X 2019

Holder Inequality • Where and • When , the equality is achieved. (Qiang Liu slides) G. H. Hardy, J. E. Littlewood and G. Pólya, Inequalities, Cambridge Univ. Press, London and New York, 1934. •

Reverse Holder Inequality • If , but the direction of the inequality reverses. (Qiang Liu slides) G. H. Hardy, J. E. Littlewood and G. Pólya, Inequalities, Cambridge Univ. Press, London and New York, 1934.

Weighted Mini-Bucket (for summation) 𝑥 Exact bucket elimination: mini-buckets ෍ ෑ 𝑔(𝑦) 𝑦 𝜇 𝐶 𝑏, 𝑑, 𝑒, 𝑓 = ෍ 𝑔 𝑏, 𝑐 ⋅ 𝑔 𝑐, 𝑑 ⋅ 𝑔 𝑐, 𝑒 ⋅ 𝑔 𝑐, 𝑓 𝑐 bucket B: 𝑔 𝑏, 𝑐 𝑔 𝑐, 𝑑 𝑔 𝑐, 𝑒 𝑔 𝑐, 𝑓 𝑥 1 𝑥 2 ≤ ෍ 𝑔 𝑏, 𝑐 𝑔 𝑐, 𝑑 ⋅ ෍ 𝑔 𝑐, 𝑒 𝑔 𝑐, 𝑓 𝑐 𝑐 bucket C: 𝜇 𝐶→𝐷 (𝑏, 𝑑) 𝑔 𝑏, 𝑑 𝑔 𝑑, 𝑓 = 𝜇 𝐶→𝐷 (𝑏, 𝑑) ⋅ 𝜇 𝐶→𝐸 (𝑒, 𝑓) 𝑔 𝑏, 𝑒 𝜇 𝐶→𝐸 (𝑒, 𝑓) (mini-buckets) bucket D: 𝑥 𝑥 1 𝜇 𝐷→𝐹 (𝑏, 𝑓) 𝜇 𝐸→𝐹 (𝑏, 𝑓) where bucket E: ෍ 𝑔 𝑦 = ෍ 𝑔 𝑦 𝑥 𝑦 𝑦 is the weighted or “power” sum operator 𝑔 𝑏 bucket A: 𝜇 𝐹→𝐵 (𝑏) 𝑥 𝑥 1 𝑥 2 ෍ 𝑔 1 𝑦 𝑔 2 𝑦 ≤ ෍ 𝑔 1 𝑦 ෍ 𝑔 2 𝑦 U = upper bound 𝑦 𝑦 𝑦 where 𝑥 1 + 𝑥 2 = 𝑥 and 𝑥 1 > 0, 𝑥 2 > 0 [Liu and Ihler, 2011] (lower bound if ) 𝑥 1 > 0, 𝑥 2 < 0 slides10 828X 2019

slides10 828X 2019

Weighted-mini-bucket for Marginal Map slides10 828X 2019

Bucket Elimination for MMAP Bucket Elimination A B: constrained elimination order SUM B C C: D E D: E: MAX A: MAP* is the marginal MAP value slides7 828X 2019

A MB and WMB for Marginal MAP B C D E mini-buckets Marginal MAP 𝑥 1 𝜇 𝐶→𝐷 𝑏, 𝑑 = ෍ 𝑔 𝑏, 𝑐 𝑔(𝑐, 𝑑) Σ 𝐶 bucket B: 𝑔 𝑏, 𝑐 𝑔 𝑐, 𝑑 𝑔 𝑐, 𝑒 𝑔 𝑐, 𝑓 𝑐 𝑥 2 𝜇 𝐶→𝐸 𝑒, 𝑓 = ෍ 𝑔 𝑐, 𝑒 𝑔(𝑐, 𝑓) Σ 𝐷 bucket C: 𝜇 𝐶→𝐷 (𝑏, 𝑑) 𝑔 𝑏, 𝑑 𝑔 𝑑, 𝑓 𝑐 (𝑥 1 + 𝑥 2 = 1) . . 𝑔 𝑏, 𝑒 𝜇 𝐶→𝐸 (𝑒, 𝑓) max D bucket D: . max E 𝜇 𝐷→𝐹 (𝑏, 𝑓) 𝜇 𝐸→𝐹 (𝑏, 𝑓) bucket E: 𝜇 𝐹→𝐵 𝑏 = max 𝜇 𝐷→𝐹 𝑏, 𝑓 𝜇 𝐸→𝐹 (𝑏, 𝑓) 𝑓 max A 𝑔 𝑏 bucket A: 𝜇 𝐹→𝐵 (𝑏) 𝑉 = max 𝑔 𝑏 𝜇 𝐹→𝐵 (𝑏) 𝑏 U = upper bound Can optimize over cost-shifting and weights (single pass “MM” or iterative message passing) [Liu and Ihler, 2011; 2013] [Dechter and Rish, 2003] slides10 828X 2019

MBE-map Process max buckets With max mini-buckets And sum buckets with weighted Mini-buckets slides10 828X 2019

Initial partitioning slides10 828X 2019

slides10 828X 2019

Complexity and Tractability of MBE(i,m) slides10 828X 2019

Outline • Mini-bucket elimination • Weighted Mini-bucket • Mini-clustering • Re-parameterization, cost-shifting • Iterative Belief propagation • Iterative-join-graph propagation slides10 828X 2019

Join-Tree Clustering (Cluster-Tree Elimination) A 1 ABC =    h b c p a p b a p c a b ( , ) ( ) ( | ) ( | , ) ( 1 , 2 ) B a BC =    h b c p d b p f c d h b f ( , ) ( | ) ( | , ) ( , ) ( 2 , 1 ) ( 3 , 2 ) d f , 2 BCDF C D E =    h b f p d b p f c d h b c ( , ) ( | ) ( | , ) ( , ) ( 2 , 3 ) ( 1 , 2 ) c d , BF =  F  h b f p e b f h e f ( , ) ( | , ) ( , ) ( 3 , 2 ) ( 4 , 3 ) e 3 BEF =  G  h e f p e b f h b f ( , ) ( | , ) ( , ) EXACT algorithm ( 3 , 4 ) ( 2 , 3 ) b EF Time and space: = = h e f p G g e f ( , ) ( | , ) e ( 4 , 3 ) exp(cluster size)= 4 EFG exp(treewidth) slides10 828X 2019

We can replace the sum with power sum For weights that sum to 1 in each mini-bucket slides10 828X 2019

Mini-Clustering, i-bound=3 A A B C 1 B p(a), p(b|a), p(c|a,b) =    1 C D E h b c p a p b a p c a b ( , ) ( ) ( | ) ( | , ) BC ( 1 , 2 ) a F B C D G p(d|b), h (1,2) (b,c) 2 C D F =   p(f|c,d) 1 1 h b p d b h b c ( ) ( | ) ( , ) ( 2 , 3 ) ( 1 , 2 ) c d , BF = 2 h f p f c d ( ) max ( | , ) ( 2 , 3 ) c d , B E F 3 p(e|b,f), h 1 (2,3) (b), h 2 (2,3) (f) APPROXIMATE algorithm EF Time and space: exp(i-bound) E F G 4 p(g|e,f) slides10 828X 2019 Number of variables in a mini-cluster

Mini-Clustering - Example A 1 ABC B =    h 1 b c p a p b a p c a b ( , ) : ( ) ( | ) ( | , ) H ( 1 , 2 ) C D E ( 1 , 2 ) a =  BC  1 1 h b p d b h b f ( ) : ( | ) ( , ) F ( 2 , 1 ) ( 3 , 2 ) d f , H = G h 2 c p f c d ( ) : ( | , ) max ( 2 , 1 ) ( 2 , 1 ) d f , 2 BCDF =   1 1 h b p d b h b c ( ) : ( | ) ( , ) ( 2 , 3 ) ( 1 , 2 ) H c d , ( 2 , 3 ) = h 2 f p f c d ( ) : max ( | , ) BF ( 2 , 3 ) c d , =   h 1 b f p e b f h 1 e f H ( , ) : ( | , ) ( , ) ( 3 , 2 ) ( 4 , 3 ) ( 3 , 2 ) e 3 BEF =    H h 1 e f p e b f h 1 b h 2 f ( , ) : ( | , ) ( ) ( ) ( 3 , 4 ) ( 3 , 4 ) ( 2 , 3 ) ( 2 , 3 ) EF b = = H h 1 e f p G g e f ( , ) : ( | , ) ( 4 , 3 ) e ( 4 , 3 ) 4 EFG slides10 828X 2019

A Cluster Tree Elimination vs. Mini-Clustering B C D E MC CTE F 1 1 ABC ABC h 1 b c h b c H ( , ) ( , ) G ( 1 , 2 ) ( 1 , 2 ) ( 1 , 2 ) 1 BC BC h b ( ) ( 2 , 1 ) h 2 c h b c H ( ) ( , ) ( 2 , 1 ) ( 2 , 1 ) ( 2 , 1 ) 2 2 BCDF BCDF h 1 b h b f ( ) ( , ) H ( 2 , 3 ) ( 2 , 3 ) ( 2 , 3 ) h 2 f ( ) BF BF ( 2 , 3 ) H h 1 b f ( , ) h b f ( , ) ( 3 , 2 ) ( 3 , 2 ) ( 3 , 2 ) 3 3 BEF BEF h e f H ( , ) h 1 e f ( , ) ( 3 , 4 ) ( 3 , 4 ) ( 3 , 4 ) EF EF h e f ( , ) H h 1 e f ( , ) ( 4 , 3 ) ( 4 , 3 ) ( 4 , 3 ) 4 4 EFG EFG slides10 828X 2019

Heuristics for partitioning (Dechter and Rish, 2003, Rollon and Dechter 2010) Scope-based Partitioning Heuristic (SCP) aims at minimizing the number of mini-buckets in the partition by including in each minibucket as many functions as respecting the i bound is satisfied Use greedy heuristic derived from a distance function to decide which functions go into a single mini-bucket slides10 828X 2019

Greedy Scope-based Partitioning slides10 828X 2019

Heuristic for Partitioning Scope-based Partitioning Heuristic. The scope-based partition heuristic (SCP) aims at minimizing the number of mini-buckets in the partition by including in each mini-bucket as many functions as possible as long as the i bound is satisfied. First, single function mini-buckets are decreasingly ordered according to their arity from left to right. Then, each mini-bucket is absorbed into the left-most mini-bucket with whom it can be merged. The time complexity of Partition( B, i ) , where B is the bucket to be partitioned, and |B|,the number of functions in the bucket, using the SCP heuristic is O ( |B| log ( |B| ) + |B|^ 2) . The scope-based heuristic is is quite fast, its shortcoming is that it does not consider the actual information in the functions. slides10 828X 2019

Greedy Partition as a function of a distance function h slides10 828X 2019

Comparing Mini-clustering against Belief Propagation. What is belief propagation slides10 828X 2019

Iterative Belief Proapagation • Belief propagation is exact for poly-trees • IBP - applying BP iteratively to cyclic networks U U U One step : 1 3 2 update   3 x 2 x ( ) ( ) U 1 U 1 BEL(U ) 1  1 u ( ) X 1  2 u ( ) X 1 X X 1 2 • No guarantees for convergence • Works well for many coding networks slides10 828X 2019

Linear Block Codes a b c d e f g h Received bits σ A B C D E F G H Input bits Gaussian channel noise + + + + + + Parity bits σ p 1 p 2 p 3 p 4 p 5 p 6 Received bits slides10 828X 2019

Probabilistic decoding Error-correcting linear block code State-of-the-art: approximate algorithm – iterative belief propagation (IBP) (Pearl’s poly -tree algorithm applied to loopy networks) slides10 828X 2019

MBE-mpe vs. IBP MBE-mpe is better on low w* codes IBP (or BP) is better on randomly generated (high w*) codes. Bit error rate (BER) as a function of noise (sigma): slides10 828X 2019

Grid 15x15 - 10 evidence Grid 15x15, evid=10, w*=22, 10 instances Grid 15x15, evid=10, w*=22, 10 instances 0.06 0.14 MC 0.12 0.05 IBP MC IBP 0.10 0.04 Absolute error 0.08 NHD 0.03 0.06 0.02 0.04 0.01 0.02 0.00 0.00 0 2 4 6 8 10 12 14 16 18 0 2 4 6 8 10 12 14 16 18 i-bound i-bound Grid 15x15, evid=10, w*=22, 10 instances Grid 15x15, evid=10, w*=22, 10 instances 0.12 12 MC MC 10 0.10 IBP IBP 8 0.08 Time (seconds) Relative error 6 0.06 4 0.04 2 0.02 0 0.00 slides10 828X 2019 0 2 4 6 8 10 12 14 16 18 0 2 4 6 8 10 12 14 16 18 i-bound i-bound

Outline • Mini-bucket elimination • Weighted Mini-bucket • Mini-clustering • Iterative Belief propagation • Iterative-join-graph propagation • Re-parameterization, cost-shifting slides10 828X 2019

Iterative Belief Proapagation • Belief propagation is exact for poly-trees • IBP - applying BP iteratively to cyclic networks U U U One step : 1 3 2 update   3 x 2 x ( ) ( ) U 1 U 1 BEL(U ) 1  1 u ( ) X 1  2 u ( ) X 1 X X 1 2 • No guarantees for convergence • Works well for many coding networks • Lets combine iterative-nature with anytime--IJGP slides10 828X 2019

Iterative Join Graph Propagation • Loopy Belief Propagation • Cyclic graphs • Iterative • Converges fast in practice (no guarantees though) • Very good approximations (e.g., turbo decoding, LDPC codes, SAT – survey propagation) • Mini-Clustering(i) • Tree decompositions • Only two sets of messages (inward, outward) • Anytime behavior – can improve with more time by increasing the i-bound • We want to combine: • Iterative virtues of Loopy BP • Anytime behavior of Mini-Clustering(i) slides10 828X 2019

IJGP - The basic idea • Apply Cluster Tree Elimination to any join-graph • We commit to graphs that are I-maps • Avoid cycles as long as I-mapness is not violated • Result: use minimal arc-labeled join-graphs slides10 828X 2019

Tree Decomposition for Belief Updating p(a ) A p(b|a ) B p(e b f | , ) P(d|b p(c a b ) E C D | , ) p(f|c , d ) F G p(g|e , f ) 93

CT CTE: : Clu luster Tree Eli limination 1 ABC =    A h b c p a p b a p c a b ( , ) ( ) ( | ) ( | , ) ( 1 , 2 ) a BC B =    h b c p d b p f c d h b f ( , ) ( | ) ( | , ) ( , ) ( 2 , 1 ) ( 3 , 2 ) d f , 2 BCDF =    h b f p d b p f c d h b c ( , ) ( | ) ( | , ) ( , ) ( 2 , 3 ) ( 1 , 2 ) c d , C D E BF =   h b f p e b f h e f ( , ) ( | , ) ( , ) ( 3 , 2 ) ( 4 , 3 ) e 3 BEF F =   h e f p e b f h b f ( , ) ( | , ) ( , ) ( 3 , 4 ) ( 2 , 3 ) EF b = = h e f p G g e f ( , ) ( | , ) G e ( 4 , 3 ) 4 EFG Time: O ( exp(w+1 )) For each cluster P(X|e) is computed, also P(e) Space: O ( exp(sep)) 95

Example A B C p(a), p(b|a), p(c|a,b) =  tree decomposit ion BN X,D,G,P A for a belief network is a BC     = T T (V,E) χ ψ triple , , , where is a tree and and are labeling   v V χ(v) X functions, associatin g with each verte x two sets, and B C D F p(d|b), p(f|c,d)  ψ(v) P satisfying :  p P 1. For each function there is exactly one vertex such that BF i   ψ(v) χ(v) p scope(p ) and i i B E F    χ(v)} X X {v V|X 2. For each varia ble the set forms a p(e|b,f) i i connected subtree (running intersecti on property) EF A B E F G p(g|e,f) E C D Belief network Tree decomposition F 96 G

IJGP - The basic idea • Apply Cluster Tree Elimination to any join-graph • We commit to graphs that are I-maps • Avoid cycles as long as I-mapness is not violated • Result: use minimal arc-labeled join-graphs slides10 828X 2019

Minimal Arc-Labeled Decomposition B C B C ABCDE BCE ABCDE BCE C DE C E DE C E CDEF CDEF a) Fragment of an a) Shrinking labels to make it a arc-labeled join-graph minimal arc-labeled join-graph • Use a DFS algorithm to eliminate cycles relative to each variable slides10 828X 2019

Minimal arc-labeled join-graph

Message propagation B C ABCDE BCE ABCDE h (3,1) (bc) C DE p(a), p(c), p(b|ac), 1 3 BCE C E p(d|abe),p(e|b,c) B C h(3,1)(bc) CDEF FGH h (1,2) C DE C E F F GH 2 CDEF GI FGI GHIJ Minimal arc-labeled:  = h de p a p c p b ac p d abe p e bc h bc ( ) ( ) ( ) ( | ) ( | ) ( | ) ( ) sep(1,2)={D,E} ( 1 , 2 ) ( 3 , 1 ) a b c , , elim(1,2)={A,B,C}  = h cde p a p c p b ac p d abe p e bc h bc ( ) ( ) ( ) ( | ) ( | ) ( | ) ( ) Non-minimal arc-labeled: ( 1 , 2 ) ( 3 , 1 ) sep(1,2)={C,D,E} a b , elim(1,2)={A,B} slides10 828X 2019

IJGP - Example A C A B C A ABC C A AB BC C D E ABDE BCE BE C DE CE CDEF F H G H F FGH H F FG GH H GI I J FGI GHIJ Belief network Loopy BP graph slides10 828X 2019

Arc-Minimal Join-Graph A A A A A A A A A A A A A C C C C A A A A A A A A A A A A A ABC ABC ABC ABC ABC ABC ABC ABC ABC ABC ABC ABC ABC C C C C C C C C C C C C C A A AB AB AB AB AB AB AB AB AB AB AB AB AB BC BC BC BC BC BC BC BC BC BC BC BC BC C C C C ABDE ABDE ABDE ABDE ABDE ABDE ABDE ABDE ABDE ABDE ABDE ABDE ABDE BCE BCE BCE BCE BCE BCE BCE BCE BCE BCE BCE BCE BCE Arcs labeled with BE BE BE BE BE BE BE BE BE BE BE BE any single variable C C C C C C C C C C C C C DE DE DE DE DE DE DE DE DE DE DE DE DE CE CE CE CE CE CE CE CE CE CE CE CE CE should form a TREE CDEF CDEF CDEF CDEF CDEF CDEF CDEF CDEF CDEF CDEF CDEF CDEF CDEF H H H H H H H H H H H H H F F F F F F F F FGH FGH FGH FGH FGH FGH FGH FGH FGH FGH FGH FGH FGH H H H H H H H H H H H H H F F F F F F F F F F F F F FG FG FG FG FG FG FG FG FG FG FG FG FG GH GH GH GH GH GH GH GH GH GH GH GH GH H H H H H H H H H H GI GI GI GI GI GI GI GI GI GI GI GI GI FGI FGI FGI FGI FGI FGI FGI FGI FGI FGI FGI FGI FGI GHIJ GHIJ GHIJ GHIJ GHIJ GHIJ GHIJ GHIJ GHIJ GHIJ GHIJ GHIJ GHIJ slides10 828X 2019

Collapsing Clusters A A ABC C AB BC ABDE BCE ABCDE ABCDE ABCDE BCE BCE BCE ABCDE BE BC BC BC C DE CE CDE CDE CDE CE CE CE CDE CDEF CDEF CDEF CDEF CDEF H FGH H FGH FGH FGH F F F F F FG GH FG FG FG GH GH GH GI GI GI GI GHI FGI GHIJ FGI FGI FGI GHIJ GHIJ GHIJ FGHI GHIJ slides10 828X 2019

Join-Graphs A C A A ABC C A ABC C BC A AB BC C AB BC ABCDE ABCDE BCE ABDE BCE ABDE BCE BE DE CE CDE C C DE CE DE CE CDEF CDEF CDEF CDEF FGH H H F FGH H FGH H F F F F F GH FG GH H F GH GHI GI GI GI FGI GHIJ FGI GHIJ FGI GHIJ FGHI GHIJ more accuracy less complexity slides10 828X 2019

Bounded decompositions • We want arc-labeled decompositions such that: • the cluster size (internal width) is bounded by i (the accuracy parameter) • Possible approaches to build decompositions: • partition-based algorithms - inspired by the mini-bucket decomposition • grouping-based algorithms slides10 828X 2019

Constructing Join-Graphs A B G: (GFE) P(G|F,E) GFE C D E EF E: (EBF) (EF) EBF P(E|B,F) F G F BF F: ( FCD ) ( BF ) P(F|C,D) FCD BF CD D: (DB) (CD) P(D|B) CDB CB B C: (CAB) (CB) P(C|A,B) CAB BA B: (BA) (AB) (B) P(B|A) BA A A: (A) P(A) A a) schematic mini-bucket(i), i=3 b) arc-labeled join-graph decomposition slides10 828X 2019

IJGP properties • IJGP( i ) applies BP to min arc-labeled join-graph, whose cluster size is bounded by i • On join-trees IJGP finds exact beliefs • IJGP is a Generalized Belief Propagation algorithm (Yedidia, Freeman, Weiss 2001) • Complexity of one iteration: • time: O(deg •( n+N ) •d i+1 ) O( N•d  ) • space: slides10 828X 2019

Empirical evaluation ◼ Measures: • Algorithms: ◼ Absolute error • Exact ◼ Relative error • IBP • MC ◼ Kulbach-Leibler (KL) distance • IJGP ◼ Bit Error Rate ◼ Time ◼ Networks (all variables are binary): ◼ Random networks ◼ Grid networks (MxM) ◼ CPCS 54, 360, 422 ◼ Coding networks slides10 828X 2019

Coding Networks – Bit Error Rate σ = .22 σ = .32 Coding, N=400, 1000 instances, 30 it, w*=43, sigma=.22 Coding, N=400, 500 instances, 30 it, w*=43, sigma=.32 1e-1 0.00243 IJGP MC 0.00242 IBP 1e-2 0.00241 IBP IJGP BER BER 1e-3 0.00240 0.00239 1e-4 0.00238 1e-5 0.00237 0 2 4 6 8 10 12 0 2 4 6 8 10 12 i-bound i-bound σ = .51 σ = .65 Coding, N=400, 500 instances, 30 it, w*=43, sigma=.51 Coding, N=400, 500 instances, 30 it, w*=43, sigma=.65 0.0785 0.0780 0.1914 0.0775 0.1912 IBP IBP 0.0770 0.1910 IJGP IJGP BER BER 0.0765 0.1908 0.0760 0.1906 0.0755 0.1904 0.0750 0.1902 0.0745 0.1900 0 2 4 6 8 10 12 0 2 4 6 8 10 12 i-bound i-bound slides10 828X 2019

CPCS 422 – KL Distance CPCS 422, evid=0, w*=23, 1instance CPCS 422, evid=30, w*=23, 1instance 0.1 0.1 IJGP 30 it (at convergence) MC IBP 10 it (at convergence) 0.01 0.01 KL distance KL distance IJGP at convergence MC IBP at convergence 0.001 0.001 0.0001 0.0001 2 4 6 8 10 12 14 16 18 3 4 5 6 7 8 9 10 11 12 13 14 15 16 i-bound i-bound evidence=0 evidence=30 slides10 828X 2019

CPCS 422 – KL vs. Iterations CPCS 422, evid=0, w*=23, 1instance CPCS 422, evid=30, w*=23, 1instance 1 0.1 IJGP (3) IJGP(3) IJGP(10) IJGP(10) IBP IBP 0.1 0.01 KL distance KL distance 0.01 0.001 0.001 0.0001 0.0001 0 5 10 15 20 25 30 35 0 5 10 15 20 25 30 35 number of iterations number of iterations evidence=0 evidence=30 slides10 828X 2019

Coding networks - Time Coding, N=400, 500 instances, 30 iterations, w*=43 10 8 IJGP 30 iterations MC IBP 30 iterations 6 Time (seconds) 4 2 0 0 2 4 6 8 10 12 i-bound slides10 828X 2019

More On the Power of Belief Propagation • BP as local minima of KL distance (Read Darwiche) • BP’s power from constraint propagation perspective. 115

Lambda is Grounding for evidence e)

Theorem: Yedidia, Frieman and Weiss 2005

Summary of IJGP so far

Outline • Mini-bucket elimination • Weighted Mini-bucket • Mini-clustering • Iterative Belief propagation • Iterative-join-graph propagation • Re-parameterization, cost-shifting slides10 828X 2019

Cost-Shifting (Reparameterization) + 𝜇(𝐶) − 𝜇(𝐶) A B f(A,B) B C f(B,C) b b 6 + 3 b b 6 – 3 A B C f(A,B,C) b g 0 – 1 b g 0 – 3 b b b 12 g b 0 + 3 g b 0 + 1 b b g 6 g g 6 – 1 g g 6 + 1 b g b 0 b g g 6 = 0 + 6 g b b 6 B λ (B) g b g 0 b 3 Modify the individual functions g g b 6 g -1 g g g 12 - but – keep the sum of functions the same slides10 828X 2019

Tightening the bound • Reparameterization (or, “cost shifting”) A B C F(A,B,C) • Decrease bound without changing overall function 0 0 0 3.0 0 0 1 2.0 A B f 1 (A,B) B C f 2 (B,C) 0 1 0 2.0 0 0 2.0 0 0 1.0 = + 0 1 1 4.0 1 0 3.5 0 1 0.0 1 0 0 4.5 0 1 1.0 1 0 1.0 1 0 1 3.5 1 1 3.0 1 1 3.0 1 1 0 4.0 1 1 1 6.0 A B f 1 (A,B) ¸ (B) B C f 2 (B,C) - ¸ (B) (Adjusting functions 0 0 2.0 0 0 1.0 cancel each other) + 0 0 = 1 0 3.5 0 1 0.0 0 1 1.0 1 0 1.0 +1 -1 (Decomposition bound is exact) 1 1 3.0 1 1 3.0 slides10 828X 2019 127

Dual Decomposition 𝑔 13 (𝑦 1 , 𝑦 3 ) 𝑔 13 (∙) 𝑦 1 𝑦 3 𝑦 1 𝑦 3 𝑦 3 𝑦 1 𝑔 12 (𝑦 1 , 𝑦 2 ) 𝑔 23 (𝑦 2 , 𝑦 3 ) 𝑔 12 (∙) 𝑔 23 (∙) 𝑦 2 𝑦 2 𝑦 2 𝐺 ∗ = min ≥ ෍ min 𝑔 𝛽 (𝑦) 𝑦 ෍ 𝑔 𝛽 (𝑦) 𝑦 𝛽 𝛽 • Bound solution using decomposed optimization • Solve independently: optimistic bound slides10 828X 2019

Dual Decomposition 𝑔 13 (𝑦 1 , 𝑦 3 ) 𝜇 1→13 (𝑦 1 ) 𝜇 3→13 (𝑦 3 ) 𝑔 13 (∙) 𝑦 1 𝑦 3 𝑦 1 𝑦 3 𝜇 3→23 (𝑦 3 ) 𝜇 1→12 (𝑦 1 ) 𝑦 3 𝑦 1 𝑔 12 (𝑦 1 , 𝑦 2 ) 𝑔 23 (𝑦 2 , 𝑦 3 ) 𝑔 12 (∙) 𝑔 23 (∙) Reparameterization: 𝑦 2 𝑦 2 𝑦 2 ∀𝑘 ∶ ෍ 𝜇 𝑘→𝛽 𝑦 𝑘 = 0 𝜇 2→13 (𝑦 2 ) 𝜇 2→23 (𝑦 2 ) 𝛽∋𝑘 𝐺 ∗ = min max + ෍ 𝜇 𝑗→𝛽 𝑦 𝑗 ≥ ෍ min 𝑔 𝛽 (𝑦) 𝑦 ෍ 𝑔 𝛽 (𝑦) 𝜇 𝑗→𝛽 𝑦 𝑗∈𝛽 𝛽 𝛽 • Bound solution using decomposed optimization • Solve independently: optimistic bound • Tighten the bound by reparameterization ‒ Enforce lost equality constraints via Lagrange multipliers slides10 828X 2019

Dual Decomposition 𝑔 13 (𝑦 1 , 𝑦 3 ) 𝜇 1→13 (𝑦 1 ) 𝜇 3→13 (𝑦 3 ) 𝑔 13 (∙) 𝑦 1 𝑦 3 𝑦 1 𝑦 3 𝜇 3→23 (𝑦 3 ) 𝜇 1→12 (𝑦 1 ) 𝑦 3 𝑦 1 𝑔 12 (𝑦 1 , 𝑦 2 ) 𝑔 23 (𝑦 2 , 𝑦 3 ) 𝑔 12 (∙) 𝑔 23 (∙) Reparameterization: 𝑦 2 𝑦 2 𝑦 2 ∀𝑘 ∶ ෍ 𝜇 𝑘→𝛽 𝑦 𝑘 = 0 𝜇 2→13 (𝑦 2 ) 𝜇 2→23 (𝑦 2 ) 𝛽∋𝑘 𝐺 ∗ = min max + ෍ 𝜇 𝑗→𝛽 𝑦 𝑗 ≥ ෍ min 𝑔 𝛽 (𝑦) 𝑦 ෍ 𝑔 𝛽 (𝑦) 𝜇 𝑗→𝛽 𝑦 𝑗∈𝛽 𝛽 𝛽 Many names for the same class of bounds: ‒ Dual decomposition [Komodakis et al. 2007] ‒ TRW, MPLP [Wainwright et al. 2005; Globerson & Jaakkola, 2007] ‒ Soft arc consistency [Cooper & Schiex, 2004] ‒ Max-sum diffusion [Warner 2007] slides10 828X 2019

Dual Decomposition 𝑔 13 (𝑦 1 , 𝑦 3 ) 𝜇 1→13 (𝑦 1 ) 𝜇 3→13 (𝑦 3 ) 𝑔 13 (∙) 𝑦 1 𝑦 3 𝑦 1 𝑦 3 𝜇 3→23 (𝑦 3 ) 𝜇 1→12 (𝑦 1 ) 𝑦 3 𝑦 1 𝑔 12 (𝑦 1 , 𝑦 2 ) 𝑔 23 (𝑦 2 , 𝑦 3 ) 𝑔 12 (∙) 𝑔 23 (∙) Reparameterization: 𝑦 2 𝑦 2 𝑦 2 ∀𝑘 ∶ ෍ 𝜇 𝑘→𝛽 𝑦 𝑘 = 0 𝜇 2→13 (𝑦 2 ) 𝜇 2→23 (𝑦 2 ) 𝛽∋𝑘 𝐺 ∗ = min max + ෍ 𝜇 𝑗→𝛽 𝑦 𝑗 ≥ ෍ min 𝑔 𝛽 (𝑦) 𝑦 ෍ 𝑔 𝛽 (𝑦) 𝜇 𝑗→𝛽 𝑦 𝑗∈𝛽 𝛽 𝛽 Many ways to optimize the bound: ‒ Sub-gradient descent [Komodakis et al. 2007; Jojic et al. 2010] ‒ Coordinate descent [Warner 2007; Globerson & Jaakkola 2007; Sontag et al. 2009; Ihler et al. 2012] ‒ Proximal optimization [Ravikumar et al, 2010] ‒ ADMM [Meshi & Globerson 2011; Martins et al. 2011; Forouzan & Ihler 2013] slides10 828X 2019

Slides Set 10: Bounded In Inference Non-iteratively; Min - PowerPoint PPT Presentation

Algorithms for Reasoning with graphical models Slides Set 10: Bounded In Inference Non-iteratively; Min ini-Bucket Eli limination Rina Dechter (Class Notes (8-9), Darwiche chapter 14 slides10 828X 2019 Outline Mini-bucket elimination

Knape &Vogt Slides Last Updated: 07/02/10 M averick Hardware KV Slides Medium Duty Slides

Slides of Anatomy Please note: These slides are Dr. Maher Hadidis slides of spring 2016 and

Making Slides in L A T X E Same L A T X that typesets your papers; you can use the same

CSC2542 Other slides are modifications of slides developed by Malte Helmert, Representations

Slides of Anatomy Please note : These slides are Dr. Maher Hadidis slides of spring 2016 and

Make My Slides https://www.indiamart.com/make-my-slides/ Make My Slides is a couple of years old

2020 - PLAYGROUND SLIDES We offer a range of slides from beginner/smaller children sized slides up

Slides Available Now! Slides from Todays Workshop are available at:

L M RAIL SLIDES (HS-LM-B_-S_) SPECIFICATION :- These are L.M Rail slides specially used

Some ROS Slides D.A. Forsyth Credits I didnt make these slides Ive cut them from a

DEPARTMENT OF PATHOLOGY List of UG Class slides S.No Slides Histopathology Slides Fatty Change

CSE 115 Introduction to Computer Science I Note about posted slides The slides we post will

Telescopic slides 603 Technical information for telescopic slides Construction: Telescopic

Telescopic slides 701 Technical information for telescopic slides Construction: Telescopic

odpdown - markdown to slides Nice slides from your favourite text editor! Thorsten.Behrens@cib.de

Preface There are more slides here than will be used in lectures. The slides not covered will be

Supplementary neurophysiology slides Fall 2013 Slides are taken from Vanders Human

This figure (of humerus) is from Dr. Maher's newest slides. -Its added here just for

WATER PARK SLIDES 340 www.proteam-me.com Free fall water slides 343 Open & closed turning

Swings Slides & Sand W E B S I T E F A C E B O O K T W I T T E R Swings, Slides, and

NTDC Partner Forum Contents Slides 3-6 SMart Mentoring Programme Slides 7-21 The Benefits

STR Summary Slides Jan-Oct 2020 Slides have previously been presented to Joint Committee in

SLIDES AGAINST HUMANITY 1 Slides Against Humanity Course/Program: The Basic Course, Public

Slides and Sections in PowerPoint Question: Is there a faster way to navigate through slides

Slides Set 10: Bounded In Inference Non-iteratively; Min - PowerPoint PPT Presentation

Algorithms for Reasoning with graphical models Slides Set 10: Bounded In Inference Non-iteratively; Min ini-Bucket Eli limination Rina Dechter (Class Notes (8-9), Darwiche chapter 14 slides10 828X 2019 Outline Mini-bucket elimination

Knape &amp;Vogt Slides Last Updated: 07/02/10 M averick Hardware KV Slides Medium Duty Slides

Slides of Anatomy Please note: These slides are Dr. Maher Hadidis slides of spring 2016 and

Making Slides in L A T X E Same L A T X that typesets your papers; you can use the same

CSC2542 Other slides are modifications of slides developed by Malte Helmert, Representations

Slides of Anatomy Please note : These slides are Dr. Maher Hadidis slides of spring 2016 and

Make My Slides https://www.indiamart.com/make-my-slides/ Make My Slides is a couple of years old

2020 - PLAYGROUND SLIDES We offer a range of slides from beginner/smaller children sized slides up

Slides Available Now! Slides from Todays Workshop are available at:

L M RAIL SLIDES (HS-LM-B_-S_) SPECIFICATION :- These are L.M Rail slides specially used

Some ROS Slides D.A. Forsyth Credits I didnt make these slides Ive cut them from a

DEPARTMENT OF PATHOLOGY List of UG Class slides S.No Slides Histopathology Slides Fatty Change

CSE 115 Introduction to Computer Science I Note about posted slides The slides we post will

Telescopic slides 603 Technical information for telescopic slides Construction: Telescopic

Telescopic slides 701 Technical information for telescopic slides Construction: Telescopic

odpdown - markdown to slides Nice slides from your favourite text editor! Thorsten.Behrens@cib.de

Preface There are more slides here than will be used in lectures. The slides not covered will be

Supplementary neurophysiology slides Fall 2013 Slides are taken from Vanders Human

This figure (of humerus) is from Dr. Maher's newest slides. -Its added here just for

WATER PARK SLIDES 340 www.proteam-me.com Free fall water slides 343 Open &amp; closed turning

Swings Slides &amp; Sand W E B S I T E F A C E B O O K T W I T T E R Swings, Slides, and

NTDC Partner Forum Contents Slides 3-6 SMart Mentoring Programme Slides 7-21 The Benefits

STR Summary Slides Jan-Oct 2020 Slides have previously been presented to Joint Committee in

SLIDES AGAINST HUMANITY 1 Slides Against Humanity Course/Program: The Basic Course, Public

Slides and Sections in PowerPoint Question: Is there a faster way to navigate through slides

Knape &Vogt Slides Last Updated: 07/02/10 M averick Hardware KV Slides Medium Duty Slides

WATER PARK SLIDES 340 www.proteam-me.com Free fall water slides 343 Open & closed turning

Swings Slides & Sand W E B S I T E F A C E B O O K T W I T T E R Swings, Slides, and