structured and unstructured spaces
play

Structured and Unstructured Spaces Guy Van den Broeck DeLBP Aug - PowerPoint PPT Presentation

PSDDs for Tractable Learning in Structured and Unstructured Spaces Guy Van den Broeck DeLBP Aug 18, 2017 References Probabilistic Sentential Decision Diagrams Doga Kisa, Guy Van den Broeck, Arthur Choi and Adnan Darwiche KR, 2014 Learning


  1. Boolean Constraints unstructured structured L K P A L K P A 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 1 0 0 0 1 0 0 0 1 1 0 0 1 1 0 1 0 0 0 1 0 0 0 1 0 1 0 1 0 1 0 1 1 0 0 1 1 0 0 1 1 1 0 1 1 1 1 0 0 0 1 0 0 0 1 0 0 1 1 0 0 1 1 0 1 0 1 0 1 0 7 out of 16 instantiations 1 0 1 1 1 0 1 1 are impossible 1 1 0 0 1 1 0 0 1 1 0 1 1 1 0 1 1 1 1 0 1 1 1 0 1 1 1 1 1 1 1 1

  2. Combinatorial Objects: Rankings rank sushi rank sushi 1 fatty tuna 1 shrimp 10 items : 2 sea urchin 2 sea urchin 3,628,800 3 salmon roe 3 salmon roe rankings 4 shrimp 4 fatty tuna 5 tuna 5 tuna 20 items : 6 squid 6 squid 2,432,902,008,176,640,000 7 tuna roll 7 tuna roll rankings 8 see eel 8 see eel 9 egg 9 egg 10 cucumber roll 10 cucumber roll

  3. Combinatorial Objects: Rankings rank sushi rank sushi A ij item i at position j 1 fatty tuna 1 shrimp (n items require n 2 2 sea urchin 2 sea urchin Boolean variables) 3 salmon roe 3 salmon roe 4 shrimp 4 fatty tuna 5 tuna 5 tuna 6 squid 6 squid 7 tuna roll 7 tuna roll 8 see eel 8 see eel 9 egg 9 egg 10 cucumber roll 10 cucumber roll

  4. Combinatorial Objects: Rankings rank sushi rank sushi A ij item i at position j 1 fatty tuna 1 shrimp (n items require n 2 2 sea urchin 2 sea urchin Boolean variables) 3 salmon roe 3 salmon roe 4 shrimp 4 fatty tuna An item may be assigned 5 tuna 5 tuna to more than one position 6 squid 6 squid 7 tuna roll 7 tuna roll A position may contain 8 see eel 8 see eel more than one item 9 egg 9 egg 10 cucumber roll 10 cucumber roll

  5. Encoding Rankings in Logic A ij : item i at position j pos 1 pos 2 pos 3 pos 4 item 1 A 11 A 12 A 13 A 14 item 2 A 21 A 22 A 23 A 24 item 3 A 31 A 32 A 33 A 34 item 4 A 41 A 42 A 43 A 44

  6. Encoding Rankings in Logic A ij : item i at position j constraint: each item i assigned to a unique position ( n constraints) pos 1 pos 2 pos 3 pos 4 item 1 A 11 A 12 A 13 A 14 item 2 A 21 A 22 A 23 A 24 item 3 A 31 A 32 A 33 A 34 item 4 A 41 A 42 A 43 A 44

  7. Encoding Rankings in Logic A ij : item i at position j constraint: each item i assigned to a unique position ( n constraints) pos 1 pos 2 pos 3 pos 4 item 1 A 11 A 12 A 13 A 14 item 2 A 21 A 22 A 23 A 24 constraint: each position j assigned item 3 A 31 A 32 A 33 A 34 a unique item ( n constraints) item 4 A 41 A 42 A 43 A 44

  8. Encoding Rankings in Logic A ij : item i at position j constraint: each item i assigned to a unique position ( n constraints) pos 1 pos 2 pos 3 pos 4 item 1 A 11 A 12 A 13 A 14 item 2 A 21 A 22 A 23 A 24 constraint: each position j assigned item 3 A 31 A 32 A 33 A 34 a unique item ( n constraints) item 4 A 41 A 42 A 43 A 44

  9. Structured Space for Paths cf. Nature paper

  10. Structured Space for Paths cf. Nature paper Good variable assignment (represents route) 184

  11. Structured Space for Paths cf. Nature paper Good variable assignment Bad variable assignment (represents route) (does not represent route) 184 16,777,032

  12. Structured Space for Paths cf. Nature paper Good variable assignment Bad variable assignment (represents route) (does not represent route) 184 16,777,032 Space easily encoded in logical constraints  See [Choi, Tavabi, Darwiche, AAAI 2016]

  13. Structured Space for Paths cf. Nature paper Good variable assignment Bad variable assignment (represents route) (does not represent route) 184 16,777,032 Space easily encoded in logical constraints  See [Choi, Tavabi, Darwiche, AAAI 2016] Unstructured probability space: 184+16,777,032 = 2 24

  14. “Deep Architecture” Logic + Probability

  15. Logical Circuits  L K L  P A  P   L  K L   P   L   P  A L P P K  K A  A A  A

  16. Property: Decomposability  L K L  P A  P   L  K L   P   L   P  A L P P K  K A  A A  A

  17. Property: Decomposability  L K L  P A  P   L  K L   P   L   P  A L P P K  K A  A A  A

  18. Property: Determinism  L K L  P A  P   L   P  A  L  K L   P  L P P Input: L, K, P, A K  K A  A A  A

  19. Sentential Decision Diagram (SDD)  L K L  P A  P   L   P  A  L  K L   P  L P P Input: L, K, P, A K  K A  A A  A

  20. Sentential Decision Diagram (SDD)  L K L  P A  P   L   P  A  L  K L   P  L P P Input: L, K, P, A K  K A  A A  A

  21. Sentential Decision Diagram (SDD)  L K L  P A  P   L   P  A  L  K L   P  L P P Input: L, K, P, A K  K A  A A  A

  22. Tractable for Logical Inference • Is structured space empty? (SAT) • Count size of structured space (#SAT) • Check equivalence of spaces

  23. Tractable for Logical Inference • Is structured space empty? (SAT) • Count size of structured space (#SAT) • Check equivalence of spaces • Algorithms linear in circuit size  (pass up, pass down, similar to backprop)

  24. Tractable for Logical Inference • Is structured space empty? (SAT) • Count size of structured space (#SAT) • Check equivalence of spaces • Algorithms linear in circuit size  (pass up, pass down, similar to backprop)

  25. PSDD: Probabilistic SDD 0.1 0.6 0.3 1 0 1 0 1 0 1 0 1 0 0.6 0.4  L K L  P A  P   L   P  A  L  K L   P  L P P 0.8 0.2 0.25 0.75 0.9 0.1 K  K A  A A  A

  26. PSDD: Probabilistic SDD 0.1 0.6 0.3 1 0 1 0 1 0 1 0 1 0 0.6 0.4  L K L  P A  P   L   P  A  L  K L   P  L P P 0.8 0.2 0.25 0.75 0.9 0.1 K  K A  A A  A Input: L, K, P, A

  27. PSDD: Probabilistic SDD 0.1 0.6 0.3 1 0 1 0 1 0 1 0 1.0 0 0.6 0.4  L K L  P A  P   L   P  A  L  K L   P  L P P 0.8 0.2 0.25 0.75 0.9 0.1 K  K A  A A  A Input: L, K, P, A

  28. PSDD: Probabilistic SDD 0.1 0.6 0.3 1 0 1 0 1 0 1 0 1.0 0 0.6 0.4  L K L  P A  P   L   P  A  L  K L   P  L P P 0.8 0.2 0.25 0.75 0.9 0.1 K  K A  A A  A Input: L, K, P, A Pr( L,K,P,A ) = 0.3 x 1.0 x 0.8 x 0.4 x 0.25 = 0.024

  29. PSDD nodes induce a normalized distribution! 0.1 0.6 0.3 1 0 1 0 1 0 0.6 0.4 1 0 1 0  L K L  P A  P   L  K L   P   L   P  A L P P 0.8 0.2 0.25 0.75 0.9 0.1 A  A A  A A  A

  30. PSDD nodes induce a normalized distribution! 0.1 0.6 0.3 1 0 1 0 1 0 0.6 0.4 1 0 1 0  L K L  P A  P   L  K L   P   L   P  A L P P 0.8 0.2 0.25 0.75 0.9 0.1 A  A A  A A  A

  31. PSDD nodes induce a normalized distribution! 0.1 0.6 0.3 1 0 1 0 1 0 0.6 0.4 1 0 1 0  L K L  P A  P   L  K L   P   L   P  A L P P 0.8 0.2 0.25 0.75 0.9 0.1 A  A A  A A  A Can read probabilistic independences off the circuit structure

  32. Tractable for Probabilistic Inference • MAP inference : Find most-likely assignment (otherwise NP-complete) • Computing conditional probabilities Pr(x|y) (otherwise PP-complete) • Sample from Pr(x|y)

  33. Tractable for Probabilistic Inference • MAP inference : Find most-likely assignment (otherwise NP-complete) • Computing conditional probabilities Pr(x|y) (otherwise PP-complete) • Sample from Pr(x|y) • Algorithms linear in circuit size  (pass up, pass down, similar to backprop)

  34. Learning PSDDs Logic + Probability + ML

  35. Parameters are Interpretable 0.1 0.6 0.3 1 0 1 0 1 0 0.6 0.4 1 0 1 0  L  K L   P   L K L  P A  P   L   P  A L P P 0.8 0.2 0.25 0.75 0.9 0.1 K  K A  A A  A Explainable AI DARPA Program

  36. Parameters are Interpretable 0.1 0.6 0.3 1 0 1 0 1 0 0.6 0.4 1 0 1 0  L  K L   P   L K L  P A  P   L   P  A L P P 0.8 0.2 0.25 0.75 0.9 0.1 K  K A  A A  A Student takes course L Explainable AI DARPA Program

  37. Parameters are Interpretable 0.1 0.6 0.3 1 0 1 0 1 0 0.6 0.4 1 0 1 0  L  K L   P   L K L  P A  P   L   P  A L P P 0.8 0.2 0.25 0.75 0.9 0.1 K  K A  A A  A Student takes course P Student takes course L Explainable AI DARPA Program

  38. Parameters are Interpretable 0.1 0.6 0.3 Probability of P given L 1 0 1 0 1 0 0.6 0.4 1 0 1 0  L  K L   P   L K L  P A  P   L   P  A L P P 0.8 0.2 0.25 0.75 0.9 0.1 K  K A  A A  A Student takes course P Student takes course L Explainable AI DARPA Program

  39. Learning Algorithms • Parameter learning: Closed form max likelihood from complete data One pass over data to estimate Pr(x|y)

  40. Learning Algorithms • Parameter learning: Closed form max likelihood from complete data One pass over data to estimate Pr(x|y) Not a lot to say: very easy!

  41. Learning Algorithms • Parameter learning: Closed form max likelihood from complete data One pass over data to estimate Pr(x|y) Not a lot to say: very easy! • Circuit learning (naïve): Compile constraints to SDD circuit – Use SAT solver technology Circuit does not depend on data

  42. Learning Preference Distributions PSDD Special-purpose distribution: Mixture-of-Mallows – # of components from 1 to 20 – EM with 10 random seeds – implementation of Lu & Boutilier

  43. Learning Preference Distributions PSDD Special-purpose distribution: Mixture-of-Mallows – # of components from 1 to 20 – EM with 10 random seeds – implementation of Lu & Boutilier This is the naive approach, circuit does not depend on data!

  44. Learn Circuit from Data Even in unstructured spaces

  45. Tractable Learning Markov networks Bayesian networks

  46. Tractable Learning Markov networks Bayesian networks Do not support linear-time exact inference

  47. Tractable Learning Historically: Polytrees, Chow-Liu trees, etc. Cutset Networks SPNs Both are Arithmetic Circuits (ACs) [Darwiche, JACM 2003]

  48. PSDDs are Arithmetic Circuits +  1 * * *  n  2  2  1  n * * * p 1 s 1 p 2 s 2 p n s n p 1 p n s 1 p 2 s n s 2 PSDD AC

  49. Tractable Learning Representational Freedom Strong Properties

  50. Tractable Learning DNN Representational Freedom Strong Properties

  51. Tractable Learning DNN Representational Freedom Strong Properties

  52. Tractable Learning DNN SPN Cutset Representational Freedom Strong Properties

  53. Tractable Learning DNN SPN Cutset Representational Freedom Strong Properties Perhaps the most powerful circuit proposed to date

  54. PSDDs for the Logic-Phobic

  55. PSDDs for the Logic-Phobic Bottom-up each node is a distribution

  56. PSDDs for the Logic-Phobic Bottom-up each node is a distribution

  57. PSDDs for the Logic-Phobic

  58. PSDDs for the Logic-Phobic Multiply independent distributions

  59. PSDDs for the Logic-Phobic

  60. PSDDs for the Logic-Phobic Weighted mixture of lower level distributions

  61. PSDDs for the Logic-Phobic

  62. PSDDs for the Logic-Phobic

  63. Variable Trees (vtrees) PSDD Vtree Correspondence

  64. Learning Variable Trees • How much do vars depend on each other? • Learn vtree by hierarchical clustering

  65. Learning Variable Trees • How much do vars depend on each other? • Learn vtree by hierarchical clustering

  66. Learning Primitives

  67. Learning Primitives

  68. Learning Primitives Primitives maintain PSDD properties and structured space!

  69. LearnPSDD 1 Vtree learning 2 Construct the most naïve PSDD 3 LearnPSDD (search for better structure)

  70. LearnPSDD 1 Vtree learning Generate 2 Simulate candidate Construct the most operations operations naïve PSDD 3 Execute the LearnPSDD best (search for better structure)

Recommend


More recommend