Abstraction Sampling in Graphical Models Filjor Broka*, Rina Dechter, Alexander Ihler, and Kalev Kask UCI *In memory of Filjor (1985-2018)
Outline ❑ Background: Graphical models, search, sampling ❑ Motivation and the main idea ❑ Abstraction sampling algorithm – OR ❑ The AND/OR case, properness ❑ Properties ❑ Experiments ❑ Conclusion and Future Directions 2
Graphical models Bayesian Networks MINVOLSET Markov Logic PULMEMBOLUS INTUBATION KINKEDTUBE VENTMACH DISCONNECT Friends(A,B) PAP SHUNT VENTLUNG VENITUBE PRESS MINOVL FIO2 VENTALV Friends(A,A) Smokes(A) Smokes(B) Friends(B,B) ANAPHYLAXIS PVSAT ARTCO2 EXPCO2 TPR SAO2 INSUFFANESTH Cancer(A) Cancer(B) HYPOVOLEMIA LVFAILURE CATECHOL Friends(B,A) LVEDVOLUME STROEVOLUME ERRCAUTER HISTORY ERRBLOWOUTPUT HR CVP PCWP CO HREKG HRSAT HRBP BP Deep Boltzmann Machines 4
Graphical models Example: A graphical model consists of: A B f(A,B) A graphical model consists of: 0 0 2 -- variables -- variables (w e’ll assume discrete) 0 1 4 -- domains -- domains 1 0 3 -- functions or “factors” -- functions or “factors” 1 1 1 and a combination operator and a combination operator The combination operator defines an overall function from the factors, e.g., “ x ” : Inference: compute quantities of interest about the distribution, e.g., or (marginals) (partition function) Primal graph 5
Search trees & A C A F B Enumeration D B E C E D F pseudo tree A 0 1 A 0 1 B 0 1 0 1 B 0 1 0 1 C 0 1 0 1 0 1 0 1 C 0 1 0 1 0 1 0 1 D 0 1 0 1 0 1 0 1 D 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 E 0 1 0 1 E 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 F 01 01 0 1 0 101 01 0 1 0 1 0 1 0 1 01 010 1 0 1 01 01 0 1 0 1 01 010 1 0 1 01 01 01 01 01 0101 01 01 01 F 0 1 Context minimal OR search graph Full OR search tree 28 nodes 126 nodes OR A OR A AND 0 1 AND 0 1 OR B B OR B B AND 0 1 0 1 1 1 AND 0 0 C C C C C C C OR E E E E C OR E E E E 0 1 0 1 AND 0 1 0 1 0 1 0 1 0 1 0 1 AND 0 1 0 1 0 1 0 1 D D OR D D F F F F D D D D D D D D OR F F F F F F F F AND 0 1 0 1 AND 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 Context minimal AND/OR search graph Full AND/OR search tree Any query can be computed 18 AND nodes 54 AND nodes over any of the search spaces 6
Search vs. Sampling ◼ Search Enumerate states; no stone unturned, none more than once. ◼ Sampling Exploit randomization “typicality”; concentration inequalities (Heuristic) Search (Monte Carlo) Sampling Structured enumeration over Use randomization to estimate all possible states averages over the state space A 0 1 B 0 1 0 1 E 0 1 0 1 0 1 0 1 C 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 D 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 F 01010101 010 1010101010101010101010101010101010101010101010 1010101 9
Motivation 1: 1 1 C 0 1 2 5 1 6 Sampling to Searching A 0 1 0 1 3 3 6 4 2 4 5 7 B 0 1 0 1 0 1 0 1 2-config-subtree sampling 4-config-subtree sampling Importance sampling 1 1 1 1 1 1 1 1 1 0 1 0 1 C 0 1 1 6 1 1 5 6 1 0 1 5 6 5 6 0 1 1 6 1 2 4 5 7 A 0 1 0 1 3 4 5 7 … 1 0 1 … 0 1 0 1 0 1 0 1 2 3 4 4 4 2 0 B 1 0 1 0 1 1 S1 S2 w3 w1 w2 S1 S2 Z estimate Z estimate Z estimate More searching less sampling 11
Motivation 2: Searching to Sampling similar ◼ Merge nodes that root identical subtrees Sampled subtree 1 Sampled subtree 2 14
Stratified sampling ◼ Knuth 1975, Chen 1992 estimate search space size ◼ Partially enumerate, partially sample Subdivide space into parts Enumerate over parts, sample within parts “Probe”: random draw corresponding to multiple states Theorem (Rizzo 2007): The variance reduction moving from Importance Sampling (IS) to Stratified IS with k strata’s (under some conditions) is 𝑙 ∙ 𝑤𝑏𝑠(𝑎 𝐾 ) 16
Full OR Tree 0.6 0.4 A = 1 A = 0 0.3 0.7 0.1 0.9 A = 0 A = 0 A = 1 A = 1 B = 0 B = 1 B = 0 B = 1 0.8 0.2 0.8 0.2 0.4 0.6 0.6 0.4 A = 0 A = 0 A = 0 A = 1 A = 0 A = 1 A = 1 A = 1 B = 0 B = 0 B = 1 B = 1 B = 0 B = 0 B = 1 B = 1 C = 0 C = 1 C = 0 C = 1 C = 0 C = 1 C = 0 C = 1 Z(A=0,B=1,C=1) = 0.6*0.7*0.8
Method 1 – OR Tree 0.4 0.6 p = 2/4 p = 1/2 A = 1 A = 0 w = 1 w = 1 0.7 0.3 0.1 0.9 A = 0 A = 0 A = 1 A = 1 w = 1 B = 0 B = 1 B = 0 B = 1 w = 1 w = 2 w = 1 w = 2 w = 1 0.2 0.8 0.4 0.6 A = 0 A = 0 A = 1 A = 1 Z est = 4*(0.6*0.7*0.8) B = 1 B = 1 B = 0 B = 0 + 4*(0.4*0.1*0.6) C = 0 C = 1 C = 0 C = 1 =1.44 w = 2 w = 2 w = 2 w = 4 w = 2 w = 4
Abstraction Sampling - AND/OR B Improper Abstraction C A Full AND/OR Search Tree Sampled AND/OR Search Tree D OR B OR B AND 0 1 AND 0 1 Not a proper abstraction OR A C OR A A C C AND 0 1 0 1 0 1 0 1 AND 0 1 0 1 OR D D D D OR D D AND 0 1 0 1 0 1 0 1 AND 0 1 Not a subset of solution trees 16 Solution trees Estimate 𝒂 is biased OR B AND 0 1 OR A C A C a proper abstraction AND 0 1 0 1 0 1 0 1 OR D D D AND 0 1 0 1 25
The Proposal Distribution ❑ Our scheme is like any IS-based scheme where any proposal can be used ❑ In our experiments we use a proposal 1 C g(s) 0 5 𝒒 ∝ 𝒙(𝒕) ∙ 𝒉(𝒕) ∙ 𝒊(𝒕) < 𝒕, 𝒙 𝒕 > A 1 𝒊 𝒕 ≥ 𝒂(𝒕) 3 4 B 0 1 15 20
Properties of AS Theorem . [unbiasedness] Estimate መ 𝑎 generated by AS is unbiased ( 𝐹 መ 𝑎 = 𝑎). Theorem . [exact proposal] If ℎ 𝑜 = 𝑎(𝑜) then መ 𝑎 is exact for any choice of abstraction function 𝑏 . Theorem . If 𝑢ℎ𝑓 𝑏𝑐𝑡𝑢𝑠𝑏𝑑𝑢𝑗𝑝𝑜 𝑏 is Z-isomorph, namely: ( 𝑏 𝑜 = 𝑏(𝑜 ′ ) ) ➔ ( 𝑎 𝑜 = 𝑎(𝑜 ′ ) ) then መ 𝑎 is exact for any choice of proposal . 29
Experimental Setup ◼ Use 4 classes of problems Grids, DBN, Promedas, Pedigree ◼ Use weighted MB to generate the h ◼ Evaluate 2 context-based abstractions Randomized, Relaxed ◼ Competing algorithms AS-(OR,AO), WMB-IS, IJGP-SS ◼ Questions : AS impact on variance, OR vs AO, vs competition 32
Abstractions Based on Context ◼ context (X) = ancestors of X in pseudo tree, that disconnect its subtree from the rest of the C C A A F F problem D D B B E E ◼ Context-based (CB) Abstractions: [ ] A assignments to context Relaxed : most recent subset of context B [A ] variables [ AB ] [ A B ] C E Randomized : random subset of context [AE ] D F [ BC ] variables 35
36
38
39
41
Future Directions ❑ Explore choice of abstraction in order to reduce variance: relaxed-path based, relaxed-context based, heuristic based abstractions. Further explore tradeoffs between: ❑ Portion of search space sampled in a probe vs. number of probes ❑ Accuracy of sampling probability (heuristic) vs. time/memory needed to compute it ❑ Sampling in OR space vs. AND/OR space ❑ Sampling search trees vs. search graphs 43
THANK YOU 44
Recommend
More recommend