Anytime Approximate Inference in Graphical Models Qi Lou Final Defense Dec. 5, 2018 Committee: Alexander Ihler (Chair) Rina Dechter Sameer Singh 1
Core of This Thesis 2
Graphical Models • Describe structure in large problems – Large complex system – Made of “smaller”, “local” interactions – Complexity emerges through interdependence • More formally: A graphical model consists of: -- variables (we’ll assume discrete) -- domains -- (non- negative) functions or “factors” • Example: A B f(A,B) B C f(B,C) A B 0 0 0.24 0 0 0.12 … 0 1 0.56 0 1 0.36 1 0 1.1 1 0 0.3 C 1 1 1.2 1 1 1.8 3
Graphical Models • Describe structure in large problems – Large complex system – Made of “smaller”, “local” interactions – Complexity emerges through interdependence • Examples & Tasks – Maximization ( MAP ): compute the most probable configuration [Yanover & Weiss 2002] 4
Graphical Models • Describe structure in large problems – Large complex system – Made of “smaller”, “local” interactions – Complexity emerges through interdependence • Examples & Tasks – Summation & marginalization “ partition function ” and Observation y Marginals p( x i | y ) Observation y Marginals p( x i | y ) sky cow plane grass grass e.g., [Plath et al. 2009] 5
Graphical Models • Describe structure in large problems – Large complex system – Made of “smaller”, “local” interactions – Complexity emerges through interdependence • Examples & Tasks – Mixed inference (marginal MAP, MEU, …) Test Drill Oil Test cost cost sales Influence diagrams & optimal decision-making Oil sale Test Oil Drill result produced policy (the “oil wildcatter” problem) e.g., [Raiffa 1968; Shachter 1986] Oil Market Seismic Sales underground information structure cost 6
Inference Queries/Tasks • Maximum A Posteriori (MAP) NP-hard in general • The Partition Function #P-complete [Valiant 1979]) • Marginal MAP (MMAP) NP PP (decision version) [Park 2002]) 7
Desired Properties: Guarantee, Anytime, Anyspace Bounded error time • Anytime – valid solution at any point – solution quality improves with additional computation • Anyspace – run with limited memory resources 8
Approximate inference • Three major paradigms Variational methods Reason over small subsets of variables at a time Sampling Search Use randomization to estimate Structured enumeration over averages over the state space all possible states 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 9
Approximate inference • Three major paradigms Variational methods – Variational methods (e.g., tree- Reason over small subsets of variables at a time reweighted belief propagation [Wainwright et al. 2003]), mini- bucket elimination [Dechter & Rish] 2001). Sampling Search Use randomization to estimate Structured enumeration over averages over the state space all possible states 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 10
Approximate inference • Three major paradigms Variational methods – (Monte Carlo) Sampling (e.g., Reason over small subsets of variables at a time importance sampling based (e.g., [Bidyuk & Dechter 2007]), approximate hash-based counting (e.g., [Chakraborty et al. 2016])). Sampling Search Use randomization to estimate Structured enumeration over averages over the state space all possible states 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 11
Approximate inference • Three major paradigms Variational methods – (Heuristic) Search (e.g., [Lou et al. Reason over small subsets of variables at a time 2017], [Viricel et al. 2016], [Henrion 1991]). Sampling Search Use randomization to estimate Structured enumeration over averages over the state space all possible states 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 12
Main Contributions of This Thesis 13
Chapter 3: Best-first Search Aided by Variational Heuristics Variational methods provide pre-compiled heuristics Search 0 AND/OR best-first search (AOBFS) 0 1 0 1 0 1 0 1 0 1 0 1 0 1 unified best-first search (UBFS) 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 14
Search Trees and Summation • Organize / structure the state space – Leaf nodes = model configurations – “Value” of a node = sum of configurations below A 0 0 1 B 0 1 1 0 1 C 0 1 0 1 1 0 1 0 1 D 0 1 0 1 0 1 0 1 1 0 1 0 1 0 1 0 1 E 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 F 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 1 15
Search Trees and Summation • Heuristic search for summation – Heuristic function upper bounds value (sum below) at any node – Expand tree and compute updated bounds A 0 1 B 0 1 C 0 1 16
AND/OR Best-first Search (AOBFS) AND/OR search tree search space best-first search priority heuristic potentially reduce the weighted mini-bucket bound gap U – L on Z most 17
AND/OR Search Trees [Nillson 1980, Dechter and Mateescu 2007] F G A B E (full) solution tree : corresponds D C OR A to a complete configuration of all AND variables 1 0 OR B B AND 1 0 1 0 OR C F F F F C C C AND 0 0 0 0 0 1 1 1 1 1 0 1 0 1 0 1 OR G G G G G G G E D E D E D G E D E D E D E D E D AND 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 18
weighted mini-bucket (WMB) Heuristics [Liu and Ihler , ICML’11] … Formed by intermediately generated A f(A) factors (called messages, e.g., ) λ D (A) λ B (A) Upper (or lower) bound of the node f(A,B) B value. λ F (A,B) λ C (B) λ D (A) Monotonic: Resolving relaxations using C f(B,C) F f(B,F) search makes heuristics more (no less) λ D (B,C) λ E (B,C) λ G (A,F) accurate. Quality can be roughly controlled by f(B,E) f(A,G) G f(B,D) f(A,D) f(C,E) f(F,G) the ibound. f(C,D) 19
Priority Intuition: expand the frontier node A that potentially reduces the bound gap U – L ( L<=Z<=U ) most 1 0 B gap priority B 0 1 0 1 upper priority C F 0 1 0 1 20
Overcome The Memory Limit • Main strategy (SMA*-like A [Russell 1992]) 1 0 – Keep track of the lowest-priority B B node as well 0 1 0 1 – When reach the memory limit, delete the lowest-priority nodes, F C C F and keep expanding the top- priority ones 0 0 1 0 1 0 1 1 21
Anytime Behavior of AOBFS (a) PIC’11/queen5_5_4 (b) Protein/1g6x 22
Aggregated Results • Number of instances solved to “ tight ” tolerance interval. The best (most solved) for each setting is bolded . 23
Best-first Search Aided by Variational Heuristics Variational weighted mini-bucket (WMB) methods [Liu and Ihler , ICML’11] provide optimized heuristics Search 0 AND/OR best-first search (AOBFS) for Z 0 1 0 1 0 1 0 1 0 1 0 1 0 1 unified best-first search (UBFS) for marginal MAP 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 25
Unified Best-first Search (UBFS) • Idea: unify max- and sum- inference in one search framework – avoids some unnecessary exact evaluation of conditional summation problems • Principle: focus on reducing the upper bound of MMAP as quickly as possible • How it works: – Track the current most promising (partial) MAP configuration, i.e., one with the highest upper bound – Expand the most “influential” frontier node of that (partial) MAP configuration • Frontier node that contributes most to its upper bound • Identified by a specially designed “double - priority” system 26
27
28
29
Recommend
More recommend