probabilistic programs
play

Probabilistic Programs Guy Van den Broeck StarAI Workshop @ AAAI, - PowerPoint PPT Presentation

Computer Science Querying Advanced Probabilistic Models: From Relational Embeddings to Probabilistic Programs Guy Van den Broeck StarAI Workshop @ AAAI, Feb 7, 2020 The AI Dilemma Pure Learning Pure Logic The AI Dilemma Pure Learning


  1. Computer Science Querying Advanced Probabilistic Models: From Relational Embeddings to Probabilistic Programs Guy Van den Broeck StarAI Workshop @ AAAI, Feb 7, 2020

  2. The AI Dilemma Pure Learning Pure Logic

  3. The AI Dilemma Pure Learning Pure Logic • Slow thinking: deliberative, cognitive, model-based, extrapolation • Amazing achievements until this day • “ Pure logic is brittle ” noise, uncertainty, incomplete knowledge, …

  4. The AI Dilemma Pure Learning Pure Logic • Fast thinking: instinctive, perceptive, model-free, interpolation • Amazing achievements recently • “ Pure learning is brittle ” bias, algorithmic fairness, interpretability, explainability, adversarial attacks, unknown unknowns, calibration, verification, missing features, missing labels, data efficiency, shift in distribution, general robustness and safety fails to incorporate a sensible model of the world

  5. The FALSE AI Dilemma So all hope is lost? Probabilistic World Models • Joint distribution P(X) • Wealth of representations: can be causal, relational, etc. • Knowledge + data • Reasoning + learning

  6. Probabilistic World Models Pure Learning Pure Logic A New Synthesis of Learning and Reasoning Tutorial on Probabilistic Circuits This afternoon: 2pm-6pm Sutton Center, 2nd floor

  7. Probabilistic World Models Pure Learning Pure Logic High-Level Probabilistic Representations 1 Probabilistic Databases Meets Relational Embeddings: Symbolic Querying of Vector Spaces 2 Modular Exact Inference for Discrete Probabilistic Programs

  8. What we’d like to do…

  9. What we’d like to do… ∃ x Coauthor(Einstein,x) ∧ Coauthor(Erdos,x)

  10. Einstein is in the Knowledge Graph

  11. Erdős is in the Knowledge Graph

  12. This guy is in the Knowledge Graph … and he published with both Einstein and Erdos!

  13. Desired Query Answer 1. Fuse uncertain information from web Ernst Straus ⇒ Embrace probability! Barack Obama, … 2. Cannot come from labeled data ⇒ Embrace query eval! Justin Bieber , …

  14. Cartoon Motivation ∃ x Coauthor(Einstein,x) ∧ Coauthor(Erdos,x) ? Relational Curate Query Embedding Knowledge in a Vectors Graph DBMS Many exceptions in StarAI and PDB communities, but, we need to embed…

  15. Probabilistic Databases • Probabilistic database Scientist Coauthor x y P x P Erdos Renyi Erdos 0.9 0.6 Einstein Einstein Pauli 0.7 0.8 Pauli Obama Erdos 0.6 0.1 • Learned from the web, large text corpora, ontologies, etc., using statistical machine learning. [VdB&Suciu’17]

  16. Probabilistic Databases Semantics • All possible databases: Ω = *𝜕 1 , … , 𝜕 𝑜 + x y x y x y A B x y A C A B x y x y A C x y B C B C A C A B B C A C • Probabilistic database 𝑄 assigns a probability to each: 𝑄: Ω → ,0,1- • Probabilities sum to 1: 𝑄 𝜕 = 1 𝜕∈Ω [VdB&Suciu’17]

  17. Commercial Break • Survey book http://www.nowpublishers.com/article/Details/DBS-052 • IJCAI 2016 tutorial http://web.cs.ucla.edu/~guyvdb/talks/IJCAI16-tutorial/

  18. How to specify all these numbers? • Only specify marginals: 𝑄 𝐷𝑝𝑏𝑣𝑢ℎ𝑝𝑠 𝐵𝑚𝑗𝑑𝑓, 𝐶𝑝𝑐 = 0.23 Coauthor • Assume tuple-independence x y P A B p 1 A C p 2 x y x y B C p 3 x y A B x y A C x y A B x y A C x y A B p 1 p 2 p 3 B C A C A C A B B C B C B C x y (1-p 1 )p 2 p 3 (1-p 1 )(1-p 2 )(1-p 3 ) [VdB&Suciu’17]

  19. Probabilistic Query Evaluation Q = ∃ x ∃ y Scientist(x) ∧ Coauthor(x,y) P( Q ) = 1- {1- } * p 1 *[ ] 1-(1-q 1 )*(1-q 2 ) {1- } p 2 *[ ] 1-(1-q 3 )*(1-q 4 )*(1-q 5 ) Coauthor x y P A D q 1 Y 1 Scientist x P A E q 2 Y 2 A p 1 X 1 B F q 3 Y 3 B p 2 X 2 B G q 4 Y 4 C p 3 X 3 B H q 5 Y 5

  20. Lifted Inference Rules Preprocess Q (omitted), Then apply rules (some have preconditions) Negation P(¬Q) = 1 – P(Q) P(Q1 ∧ Q2) = P(Q1) P(Q2) Decomposable ∧ , ∨ P(Q1 ∨ Q2) =1 – (1 – P(Q1)) (1 – P(Q2)) P( ∀ z Q ) = Π A ∈ Domain P(Q[A/z]) Decomposable ∃ , ∀ P( ∃ z Q) = 1 – Π A ∈ Domain (1 – P(Q[A/z])) P(Q1 ∧ Q2) = P(Q1) + P(Q2) - P(Q1 ∨ Q2) Inclusion/ P(Q1 ∨ Q2) = P(Q1) + P(Q2) - P(Q1 ∧ Q2) exclusion

  21. Example Query Evaluation Q = ∃ x ∃ y Scientist(x) ∧ Coauthor(x,y) Decomposable ∃ -Rule P(Q) = 1 - Π A ∈ Domain (1 - P(Scientist(A) ∧ ∃ y Coauthor(A,y)) Check independence: Scientist(A) ∧ ∃ y Coauthor(A,y) Scientist(B) ∧ ∃ y Coauthor(B,y) = 1 - (1 - P(Scientist(A) ∧ ∃ y Coauthor(A,y)) x (1 - P(Scientist(B) ∧ ∃ y Coauthor(B,y)) x (1 - P(Scientist(C) ∧ ∃ y Coauthor(C,y)) x (1 - P(Scientist(D) ∧ ∃ y Coauthor(D,y)) x (1 - P(Scientist(E) ∧ ∃ y Coauthor(E,y)) x (1 - P(Scientist(F) ∧ ∃ y Coauthor(F,y)) … Complexity PTIME

  22. Limitations H 0 = ∀ x ∀ y Smoker(x) ∨ Friend(x,y) ∨ Jogger(y) P( ∀ z Q) = Π A ∈ Domain P(Q[A/z]) The decomposable ∀ -rule: … does not apply: Dependent H 0 [Alice/x] and H 0 [Bob/x] are dependent: ∀ y (Smoker(Alice) ∨ Friend(Alice,y) ∨ Jogger(y)) ∀ y (Smoker(Bob) ∨ Friend(Bob,y) ∨ Jogger(y)) Lifted inference sometimes fails.

  23. Are the Lifted Rules Complete? Dichotomy Theorem for Unions of Conjunction Queries / Monotone CNF • If lifted rules succeed, then PTIME query • If lifted rules fail, then query is #P-hard Lifted rules are complete for UCQ! [Dalvi and Suciu;JACM’11]

  24. The Good, Bad, Ugly • We understand querying very well!  – and it is often efficient (a rare property!) – but often also highly intractable  • Tuple-independence is limiting unless reducing from a more expressive model  Can reduce from MLNs but then intractable… • Where do probabilities come from?   An unspecified “statistical model”

  25. Throwing Relational Embedding Models Over the Wall Coauthor • Associate vector with x y S – each relation R A B .6 A C -.1 – each entity A, B, … B C .4 • Score S(head, relation, tail) (based on Euclidian, cosine, …)

  26. Throwing Relational Embedding Models Over the Wall Interpret scores as probabilities High score ~ prob 1 ; Low score ~ prob 0 Coauthor Coauthor x y S x y P A B .6 A B 0.9 A C -.1 A C 0.1 B C .4 B C 0.5

  27. The Good, Bad, Ugly • Where do probabilities come from? We finally know the “statistical model ”!  Both capture marginals: a good match • We still understand querying very well!  but it is often highly intractable  • Tuple-independence is limiting   Relational embedding models do not attempt to capture dependencies in link prediction

  28. A Second Attempt • Let’s simplify drastically! • Assume each relation has the form 𝑆 𝑦, 𝑧 ⇔ 𝑈 𝑆 ∧ 𝐹(𝑦) ∧ 𝐹(𝑧) • That is, there are latent relations – 𝑈 ∗ to decide which relations can be true – 𝐹 to decide which entities participate E T Coauthor x P P x y P ~ , A 0.2 0.2 A B 0.9 B 0.5 A C 0.1 C 0.3 B C 0.5

  29. Can this do link prediction? • Predict Coauthor(Alice,Bob) E Coauthor T x P x y P P ~ , A 0.2 A B ? 0.3 B 0.5 C 0.3 • Rewrite query using 𝑆 𝑦, 𝑧 ⇔ 𝑈 𝑆 ∧ 𝐹(𝑦) ∧ 𝐹(𝑧) • Apply standard lifted inference rules • P(Coauthor(Alice,Bob)) = 0.3 ⋅ 0.2 ⋅ 0.5

  30. The Good, Bad, Ugly • Where do probabilities come from? We finally know the “statistical model ”!  • We still understand querying very well!  By rewriting 𝑆 into 𝐹 and 𝑈 𝑆 , every UCQ query becomes tractable!      • Tuples sharing entities or relation symbols depend one each other • The model is not very expressive 

  31. A Third Attempt • Mixture models of the second attempt 𝑆 𝑦, 𝑧 ⇔ 𝑈 𝑆 ∧ 𝐹(𝑦) ∧ 𝐹(𝑧) Now, there are latent relations 𝑈 𝑆 and 𝐹 for each mixture component • The Good:  – Still a clear statistical model – Every UCQ query is still tractable – Still captures tuple dependencies – Mixture can approximate any distribution

  32. Can this do link prediction? • Predict Coauthor(Alice,Bob) in each mixture component – 𝑄 1 (Coauthor(Alice,Bob)) = 0.3 ⋅ 0.2 ⋅ 0.5 – 𝑄 2 (Coauthor(Alice,Bob)) = 0.9 ⋅ 0.1 ⋅ 0.6 – Etc. • Probability in mixture of d components 𝑄 (Coauthor(Alice,Bob)) = 1 𝑒 0.3 ⋅ 0.2 ⋅ 0.5 + 1 𝑒 0.9 ⋅ 0.1 ⋅ 0.6 + ⋯

  33. How good is this? Does it look familiar? 𝑄 (Coauthor(Alice,Bob)) = 1 𝑒 0.3 ⋅ 0.2 ⋅ 0.5 + 1 𝑒 0.9 ⋅ 0.1 ⋅ 0.6 + ⋯

  34. How good is this? • At link prediction: same as DistMult • At queries on bio dataset [Hamilton] Competitive, while having a consistent underlying distribution Ask Tal at his poster!

  35. How expressive is this? GQE baseline are graph queries translated to linear algebra by Hamilton et al [2018]

  36. First Conclusions • We can give probabilistic database semantics to relational embedding models – Gives more meaningful query results • By doing some solve some annoyances of the theoretical PDB framework – Tuple dependence – Clear connection to learning – While everything stays tractable – And the intractable becomes tractable • Enables much more (train on Q, consistency)

Recommend


More recommend