structured and unstructured spaces
play

Structured and Unstructured Spaces Guy Van den Broeck UBC Jun 7, - PowerPoint PPT Presentation

PSDDs for Tractable Learning in Structured and Unstructured Spaces Guy Van den Broeck UBC Jun 7, 2017 References Probabilistic Sentential Decision Diagrams Doga Kisa, Guy Van den Broeck, Arthur Choi and Adnan Darwiche KR, 2014 Learning with


  1. PSDDs for Tractable Learning in Structured and Unstructured Spaces Guy Van den Broeck UBC Jun 7, 2017

  2. References Probabilistic Sentential Decision Diagrams Doga Kisa, Guy Van den Broeck, Arthur Choi and Adnan Darwiche KR, 2014 Learning with Massive Logical Constraints Doga Kisa, Guy Van den Broeck, Arthur Choi and Adnan Darwiche ICML 2014 workshop Tractable Learning for Structured Probability Spaces Arthur Choi, Guy Van den Broeck and Adnan Darwiche IJCAI, 2015 Tractable Learning for Complex Probability Queries Jessa Bekker, Jesse Davis, Arthur Choi, Adnan Darwiche, Guy Van den Broeck. NIPS, 2015 Learning the Structure of PSDDs Jessa Bekker, Yitao Liang and Guy Van den Broeck Under review, 2017 Towards Compact Interpretable Models: Learning and Shrinking PSDDs Yitao Liang and Guy Van den Broeck Under review, 2017

  3. Structured vs. unstructured probability spaces?

  4. Running Example Courses: Data • Logic (L) • Knowledge Representation (K) • Probability (P) • Artificial Intelligence (A) Constraints • Must take at least one of Probability or Logic. • Probability is a prerequisite for AI. • The prerequisites for KR is either AI or Logic.

  5. Probability Space unstructured L K P A 0 0 0 0 0 0 0 1 0 0 1 0 0 0 1 1 0 1 0 0 0 1 0 1 0 1 1 0 0 1 1 1 1 0 0 0 1 0 0 1 1 0 1 0 1 0 1 1 1 1 0 0 1 1 0 1 1 1 1 0 1 1 1 1

  6. Structured Probability Space unstructured structured L K P A L K P A 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 • Must take at least one of 0 0 1 0 0 0 1 0 Probability or Logic. 0 0 1 1 0 0 1 1 • Probability is a prerequisite for AI. 0 1 0 0 0 1 0 0 • 0 1 0 1 The prerequisites for KR is 0 1 0 1 0 1 1 0 either AI or Logic. 0 1 1 0 0 1 1 1 0 1 1 1 1 0 0 0 1 0 0 0 1 0 0 1 1 0 0 1 1 0 1 0 1 0 1 0 7 out of 16 instantiations 1 0 1 1 1 0 1 1 are impossible 1 1 0 0 1 1 0 0 1 1 0 1 1 1 0 1 1 1 1 0 1 1 1 0 1 1 1 1 1 1 1 1

  7. Learning with Constraints Data Statistical Model Learn (Distribution) Constraints (Background Knowledge) (Physics) Learn a statistical model that assigns zero probability to instantiations that violate the constraints.

  8. Example: Video [Lu, W. L., Ting, J. A., Little, J. J., & Murphy, K. P. (2013). Learning to track and identify players from broadcast sports videos.]

  9. Example: Language • Non-local dependencies: At least one verb in each sentence • Sentence compression If a modifier is kept, its subject is also kept • Information extraction • Semantic role labeling • … and many more! [Chang, M., Ratinov, L., & Roth, D. (2008). Constraints as prior knowledge],…, [ Chang, M. W., Ratinov, L., & Roth, D. (2012). Structured learning with constrained conditional models.], [https://en.wikipedia.org/wiki/Constrained_conditional_model]

  10. Example: Deep Learning [Graves, A., Wayne, G., Reynolds, M., Harley, T., Danihelka, I., Grabska- Barwińska , A., et al.. (2016). Hybrid computing using a neural network with dynamic external memory. Nature , 538 (7626), 471-476.]

  11. What are people doing now? • Ignore constraints • Handcraft into models • Use specialized distributions Accuracy ? • Find non-structured encoding Specialized skill ? • Try to learn constraints Intractable inference ? • Hack your way around Intractable learning ? Waste parameters ? Risk predicting out of space ? + you are on your own 

  12. Structured Probability Spaces • Everywhere in ML! – Configuration problems, inventory, video, text, deep learning – Planning and diagnosis (physics) – Causal models: cooking scenarios (interpreting videos) – Combinatorial objects: parse trees, rankings, directed acyclic graphs, trees, simple paths, game traces, etc. • Some representations: constrained conditional models, mixed networks, probabilistic logics. No statistical ML boxes out there that take constraints as input!  Goal: Constraints as important as data! General purpose!

  13. Specification Language: Logic

  14. Structured Probability Space unstructured structured L K P A L K P A 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 • Must take at least one of 0 0 1 0 0 0 1 0 Probability or Logic. 0 0 1 1 0 0 1 1 • Probability is a prerequisite for AI. 0 1 0 0 0 1 0 0 • 0 1 0 1 The prerequisites for KR is 0 1 0 1 0 1 1 0 either AI or Logic. 0 1 1 0 0 1 1 1 0 1 1 1 1 0 0 0 1 0 0 0 1 0 0 1 1 0 0 1 1 0 1 0 1 0 1 0 7 out of 16 instantiations 1 0 1 1 1 0 1 1 are impossible 1 1 0 0 1 1 0 0 1 1 0 1 1 1 0 1 1 1 1 0 1 1 1 0 1 1 1 1 1 1 1 1

  15. Boolean Constraints unstructured structured L K P A L K P A 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 1 0 0 0 1 0 0 0 1 1 0 0 1 1 0 1 0 0 0 1 0 0 0 1 0 1 0 1 0 1 0 1 1 0 0 1 1 0 0 1 1 1 0 1 1 1 1 0 0 0 1 0 0 0 1 0 0 1 1 0 0 1 1 0 1 0 1 0 1 0 7 out of 16 instantiations 1 0 1 1 1 0 1 1 are impossible 1 1 0 0 1 1 0 0 1 1 0 1 1 1 0 1 1 1 1 0 1 1 1 0 1 1 1 1 1 1 1 1

  16. Combinatorial Objects: Rankings rank sushi rank sushi 1 fatty tuna 1 shrimp 10 items : 2 sea urchin 2 sea urchin 3,628,800 3 salmon roe 3 salmon roe rankings 4 shrimp 4 fatty tuna 5 tuna 5 tuna 6 squid 6 squid 20 items : 7 tuna roll 7 tuna roll 2,432,902,008,176,640,000 8 see eel 8 see eel rankings 9 egg 9 egg 10 cucumber roll 10 cucumber roll

  17. Combinatorial Objects: Rankings rank sushi rank sushi A ij item i at position j 1 fatty tuna 1 shrimp (n items require n 2 2 sea urchin 2 sea urchin Boolean variables) 3 salmon roe 3 salmon roe 4 shrimp 4 fatty tuna An item may be assigned 5 tuna 5 tuna to more than one position 6 squid 6 squid 7 tuna roll 7 tuna roll A position may contain 8 see eel 8 see eel more than one item 9 egg 9 egg 10 cucumber roll 10 cucumber roll

  18. Encoding Rankings in Logic A ij : item i at position j constraint: each item i assigned to a unique position ( n constraints) pos 1 pos 2 pos 3 pos 4 item 1 A 11 A 12 A 13 A 14 item 2 A 21 A 22 A 23 A 24 constraint: each position j assigned item 3 A 31 A 32 A 33 A 34 a unique item ( n constraints) item 4 A 41 A 42 A 43 A 44

  19. Structured Space for Paths cf. Nature paper Good variable assignment Bad variable assignment (represents route) (does not represent route) 184 16,777,032 Space easily encoded in logical constraints  See [Choi, Tavabi, Darwiche, AAAI 2016] Unstructured probability space: 184+16,777,032 = 2 24

  20. “Deep Architecture” Logic + Probability

  21. Logical Circuits  L K L  P A  P   L  K L   P   L   P  A L P P K  K A  A A  A

  22. Property: Decomposability  L K L  P A  P   L  K L   P   L   P  A L P P K  K A  A A  A

  23. Property: Determinism  L K L  P A  P   L   P  A  L  K L   P  L P P Input: L, K, P, A K  K A  A A  A

  24. Sentential Decision Diagram (SDD)  L K L  P A  P   L   P  A  L  K L   P  L P P Input: L, K, P, A K  K A  A A  A

  25. Tractable for Logical Inference • Is structured space empty? (SAT) • Count size of structured space (#SAT) • Check equivalence of spaces • Algorithms linear in circuit size  (pass up, pass down, similar to backprop)

  26. PSDD: Probabilistic SDD 0.1 0.6 0.3 1 0 1 0 1 0 1 0 1 0 0.6 0.4  L K L  P A  P   L   P  A  L  K L   P  L P P 0.8 0.2 0.25 0.75 0.9 0.1 K  K A  A A  A

  27. PSDD: Probabilistic SDD 0.1 0.6 0.3 1 0 1 0 1 0 1 0 1 0 0.6 0.4  L K L  P A  P   L   P  A  L  K L   P  L P P 0.8 0.2 0.25 0.75 0.9 0.1 K  K A  A A  A Input: L, K, P, A

  28. PSDD: Probabilistic SDD 0.1 0.6 0.3 1 0 1 0 1 0 1 0 1.0 0 0.6 0.4  L K L  P A  P   L   P  A  L  K L   P  L P P 0.8 0.2 0.25 0.75 0.9 0.1 K  K A  A A  A Input: L, K, P, A Pr( L,K,P,A ) = 0.3 x 1.0 x 0.8 x 0.4 x 0.25 = 0.024

  29. PSDD nodes induce a normalized distribution! 0.1 0.6 0.3 1 0 1 0 1 0 0.6 0.4 1 0 1 0  L K L  P A  P   L  K L   P   L   P  A L P P 0.8 0.2 0.25 0.75 0.9 0.1 A  A A  A A  A Can read probabilistic independences off the circuit structure

  30. Tractable for Probabilistic Inference • MAP inference : Find most-likely assignment (otherwise NP-complete) • Computing conditional probabilities Pr(x|y) (otherwise PP-complete) • Sample from Pr(x|y) • Algorithms linear in circuit size  (pass up, pass down, similar to backprop)

  31. PSDDs are Arithmetic Circuits [Darwiche, JACM 2003] +  1 * * *  n  2  2  1  n * * * p 1 s 1 p 2 s 2 p n s n p 1 s 1 p 2 p n s n s 2 PSDD AC Known in the ML literature as SPNs [ICML 2014] UAI 2011, NIPS 2012 best paper awards (SPNs equivalent to ACs)

  32. Learning PSDDs Logic + Probability + ML

Recommend


More recommend