machine learning
play

Machine Learning Guy Van den Broeck UCSD May 14, 2018 Overview - PowerPoint PPT Presentation

Probabilistic Circuits: A New Synthesis of Logic and Machine Learning Guy Van den Broeck UCSD May 14, 2018 Overview Statistical ML Probability Connectionism Deep Symbolic AI Logic Probabilistic Circuits References


  1. Combinatorial Objects: Rankings rank sushi rank sushi A ij item i at position j 1 fatty tuna 1 shrimp (n items require n 2 2 sea urchin 2 sea urchin Boolean variables) 3 salmon roe 3 salmon roe 4 shrimp 4 fatty tuna 5 tuna 5 tuna 6 squid 6 squid 7 tuna roll 7 tuna roll 8 see eel 8 see eel 9 egg 9 egg cucumber cucumber 10 10 roll roll

  2. Combinatorial Objects: Rankings rank sushi rank sushi A ij item i at position j 1 fatty tuna 1 shrimp (n items require n 2 2 sea urchin 2 sea urchin Boolean variables) 3 salmon roe 3 salmon roe 4 shrimp 4 fatty tuna An item may be assigned 5 tuna 5 tuna to more than one position 6 squid 6 squid 7 tuna roll 7 tuna roll A position may contain 8 see eel 8 see eel more than one item 9 egg 9 egg cucumber cucumber 10 10 roll roll

  3. Encoding Rankings in Logic A ij : item i at position j pos 1 pos 2 pos 3 pos 4 item 1 A 11 A 12 A 13 A 14 item 2 A 21 A 22 A 23 A 24 item 3 A 31 A 32 A 33 A 34 item 4 A 41 A 42 A 43 A 44

  4. Encoding Rankings in Logic A ij : item i at position j constraint: each item i assigned to a unique position ( n constraints) pos 1 pos 2 pos 3 pos 4 item 1 A 11 A 12 A 13 A 14 item 2 A 21 A 22 A 23 A 24 item 3 A 31 A 32 A 33 A 34 item 4 A 41 A 42 A 43 A 44

  5. Encoding Rankings in Logic A ij : item i at position j constraint: each item i assigned to a unique position ( n constraints) pos 1 pos 2 pos 3 pos 4 item 1 A 11 A 12 A 13 A 14 item 2 A 21 A 22 A 23 A 24 constraint: each position j assigned item 3 A 31 A 32 A 33 A 34 a unique item ( n constraints) item 4 A 41 A 42 A 43 A 44

  6. Encoding Rankings in Logic A ij : item i at position j constraint: each item i assigned to a unique position ( n constraints) pos 1 pos 2 pos 3 pos 4 item 1 A 11 A 12 A 13 A 14 item 2 A 21 A 22 A 23 A 24 constraint: each position j assigned item 3 A 31 A 32 A 33 A 34 a unique item ( n constraints) item 4 A 41 A 42 A 43 A 44

  7. Structured Space for Paths cf. Nature paper

  8. Structured Space for Paths cf. Nature paper Good variable assignment (represents route) 184

  9. Structured Space for Paths cf. Nature paper Good variable assignment Bad variable assignment (represents route) (does not represent route) 184 16,777,032

  10. Structured Space for Paths cf. Nature paper Good variable assignment Bad variable assignment (represents route) (does not represent route) 184 16,777,032 Space easily encoded in logical constraints  [Nishino et al.]

  11. Structured Space for Paths cf. Nature paper Good variable assignment Bad variable assignment (represents route) (does not represent route) 184 16,777,032 Space easily encoded in logical constraints  [Nishino et al.] Unstructured probability space: 184+16,777,032 = 2 24

  12. Logical Circuits

  13. Logical Circuits  L K L  P A  P   L   P  A  L  K L   P  L P P K  K A  A A  A

  14. Property: Decomposability  L K L  P A  P   L   P  A  L  K L   P  L P P K  K A  A A  A

  15. Property: Decomposability  L K L  P A  P   L   P  A  L  K L   P  L P P K  K A  A A  A

  16. Property: Decomposability  L K L  P A  P   L   P  A  L  K L   P  L P P K  K A  A A  A Property: AND gates have disjoint input circuits

  17. Property: Determinism L ⊥ ¬ P ⊥ ¬ L ⊥ L ⊥ ¬ P ⊥ ¬ L K P A ¬ P ¬ A ¬ L ¬ K P L P K ¬ K A ¬ A A ¬ A Input: L, K, P, A are true and ¬L, ¬K, ¬P, ¬A are false

  18. Property: Determinism L ⊥ ¬ P ⊥ ¬ L ⊥ L ⊥ ¬ P ⊥ ¬ L K P A ¬ P ¬ A ¬ L ¬ K P L P K ¬ K A ¬ A A ¬ A Input: L, K, P, A are true and ¬L, ¬K, ¬P, ¬A are false

  19. Property: Determinism L ⊥ ¬ P ⊥ ¬ L ⊥ L ⊥ ¬ P ⊥ ¬ L K P A ¬ P ¬ A ¬ L ¬ K P L P K ¬ K A ¬ A A ¬ A Input: L, K, P, A are true and ¬L, ¬K, ¬P, ¬A are false Property: OR gates have at most one true input wire

  20. Tractable for Logical Inference • Is structured space empty? (SAT) • Count size of structured space (#SAT) • Check equivalence of spaces

  21. Tractable for Logical Inference • Is structured space empty? (SAT) • Count size of structured space (#SAT) • Check equivalence of spaces • Algorithms linear in circuit size  (pass up, pass down, similar to backprop)

  22. Tractable for Logical Inference • Is structured space empty? (SAT) • Count size of structured space (#SAT) • Check equivalence of spaces • Algorithms linear in circuit size  (pass up, pass down, similar to backprop) • Compilation by exhaustive SAT solvers

  23. Semantic Loss for Deep Learning

  24. + Data Constraints Deep Structured (Background Knowledge) (Physics) Output Prediction Deep Neural Learn Network

  25. + Data Constraints Deep Structured (Background Knowledge) (Physics) Output Prediction Deep Neural Learn Network Neural Network Logical Constraint Output Input

  26. + Data Constraints Deep Structured (Background Knowledge) (Physics) Output Prediction Deep Neural Learn Network Neural Network Logical Constraint Output Input

  27. Semantic Loss

  28. Semantic Loss • Output is probability vector p , not logic! How close is output to satisfying constraint?

  29. Semantic Loss • Output is probability vector p , not logic! How close is output to satisfying constraint? • Answer: Semantic loss function L( α , p )

  30. Semantic Loss • Output is probability vector p , not logic! How close is output to satisfying constraint? • Answer: Semantic loss function L( α , p ) • Axioms, for example:

  31. Semantic Loss • Output is probability vector p , not logic! How close is output to satisfying constraint? • Answer: Semantic loss function L( α , p ) • Axioms, for example: – If p is Boolean then L( p,p ) = 0

  32. Semantic Loss • Output is probability vector p , not logic! How close is output to satisfying constraint? • Answer: Semantic loss function L( α , p ) • Axioms, for example: – If p is Boolean then L( p,p ) = 0 – If α implies β then L( α , p ) ≥ L(β , p )

  33. Semantic Loss • Output is probability vector p , not logic! How close is output to satisfying constraint? • Answer: Semantic loss function L( α , p ) • Axioms, for example: – If p is Boolean then L( p,p ) = 0 – If α implies β then L( α , p ) ≥ L(β , p ) • Properties:

  34. Semantic Loss • Output is probability vector p , not logic! How close is output to satisfying constraint? • Answer: Semantic loss function L( α , p ) • Axioms, for example: – If p is Boolean then L( p,p ) = 0 – If α implies β then L( α , p ) ≥ L(β , p ) • Properties: – If α is equivalent to β then L( α , p ) = L( β , p )

  35. Semantic Loss • Output is probability vector p , not logic! How close is output to satisfying constraint? • Answer: Semantic loss function L( α , p ) • Axioms, for example: – If p is Boolean then L( p,p ) = 0 – If α implies β then L( α , p ) ≥ L(β , p ) • Properties: – If α is equivalent to β then L( α , p ) = L( β , p ) – If p is Boolean and satisfies α then L( α , p ) = 0

  36. Semantic Loss • Output is probability vector p , not logic! How close is output to satisfying constraint? • Answer: Semantic loss function L( α , p ) • Axioms, for example: – If p is Boolean then L( p,p ) = 0 – If α implies β then L( α , p ) ≥ L(β , p ) SEMANTIC • Properties: Loss! – If α is equivalent to β then L( α , p ) = L( β , p ) – If p is Boolean and satisfies α then L( α , p ) = 0

  37. Semantic Loss: Definition Theorem: Axioms imply unique semantic loss: Probability of getting x after flipping coins with prob. p Probability of satisfying α after flipping coins with prob. p

  38. How to Compute Semantic Loss?

  39. How to Compute Semantic Loss? • In general: #P-hard 

  40. How to Compute Semantic Loss? • In general: #P-hard  • With a logical circuit for α : Linear!

  41. How to Compute Semantic Loss? • In general: #P-hard  • With a logical circuit for α : Linear! 𝒚 𝟐 ∨ 𝒚 𝟑 ∨ 𝒚 𝟒 ¬𝒚 𝟐 ∨ ¬𝒚 𝟑 • Example: exactly-one constraint: ¬𝒚 𝟑 ∨ ¬𝒚 𝟒 ¬𝒚 𝟐 ∨ ¬𝒚 𝟒

  42. How to Compute Semantic Loss? • In general: #P-hard  • With a logical circuit for α : Linear! 𝒚 𝟐 ∨ 𝒚 𝟑 ∨ 𝒚 𝟒 ¬𝒚 𝟐 ∨ ¬𝒚 𝟑 • Example: exactly-one constraint: ¬𝒚 𝟑 ∨ ¬𝒚 𝟒 ¬𝒚 𝟐 ∨ ¬𝒚 𝟒 L( α , p ) = L( , p )

  43. How to Compute Semantic Loss? • In general: #P-hard  • With a logical circuit for α : Linear! 𝒚 𝟐 ∨ 𝒚 𝟑 ∨ 𝒚 𝟒 ¬𝒚 𝟐 ∨ ¬𝒚 𝟑 • Example: exactly-one constraint: ¬𝒚 𝟑 ∨ ¬𝒚 𝟒 ¬𝒚 𝟐 ∨ ¬𝒚 𝟒 L( α , p ) = L( , p ) = - log( )

  44. How to Compute Semantic Loss? • In general: #P-hard  • With a logical circuit for α : Linear! 𝒚 𝟐 ∨ 𝒚 𝟑 ∨ 𝒚 𝟒 ¬𝒚 𝟐 ∨ ¬𝒚 𝟑 • Example: exactly-one constraint: ¬𝒚 𝟑 ∨ ¬𝒚 𝟒 ¬𝒚 𝟐 ∨ ¬𝒚 𝟒 L( α , p ) = L( , p ) = - log( ) • Why? Decomposability and determinism!

  45. Supervised Learning • Predict shortest paths • Add semantic loss to objective

  46. Supervised Learning • Predict shortest paths • Add semantic loss to objective Is output a path?

  47. Supervised Learning • Predict shortest paths • Add semantic loss to objective Is output Does output a path? have true edges?

  48. Supervised Learning • Predict shortest paths • Add semantic loss to objective Is output Is output Does output a path? the true path? have true edges?

  49. rank sushi 1 fatty tuna Supervised Learning 2 sea urchin 3 salmon roe 4 shrimp 5 tuna • Predict sushi preferences 6 squid 7 tuna roll • Add semantic loss to objective 8 see eel 9 egg 10 cucumber roll

  50. rank sushi 1 fatty tuna Supervised Learning 2 sea urchin 3 salmon roe 4 shrimp 5 tuna • Predict sushi preferences 6 squid 7 tuna roll • Add semantic loss to objective 8 see eel 9 egg 10 cucumber roll Is output a ranking?

  51. rank sushi 1 fatty tuna Supervised Learning 2 sea urchin 3 salmon roe 4 shrimp 5 tuna • Predict sushi preferences 6 squid 7 tuna roll • Add semantic loss to objective 8 see eel 9 egg 10 cucumber roll Is output Does output a ranking? correctly rank individual sushis?

  52. rank sushi 1 fatty tuna Supervised Learning 2 sea urchin 3 salmon roe 4 shrimp 5 tuna • Predict sushi preferences 6 squid 7 tuna roll • Add semantic loss to objective 8 see eel 9 egg 10 cucumber roll Is output Is output Does output a ranking? the true ranking? correctly rank individual sushis?

  53. Semi-Supervised Learning • Unlabeled data must have some label

  54. Semi-Supervised Learning • Unlabeled data must have some label

  55. Semi-Supervised Learning • Unlabeled data must have some label

  56. Semi-Supervised Learning • Unlabeled data must have some label • Low semantic loss with exactly-one constraint

  57. Semi-Supervised Learning • Unlabeled data must have some label • Low semantic loss with exactly-one constraint

  58. MNIST

  59. FASHION

  60. CIFAR10

  61. Semantic Loss Conclusions • Cares about meaning not syntax • Elegant axiomatic approach

  62. Semantic Loss Conclusions • Cares about meaning not syntax • Elegant axiomatic approach • If you have complex output constraints Use logical circuits to enforce them

  63. Semantic Loss Conclusions • Cares about meaning not syntax • Elegant axiomatic approach • If you have complex output constraints Use logical circuits to enforce them If you have unlabeled data (no constraints) Get a lot of signal by minimizing semantic loss of exactly-one

  64. Probabilistic Circuits

  65. Logical Circuits  L K L  P A  P   L   P  A  L  K L   P  P L P K  K A  A A  A

  66. PSDD: Probabilistic SDD 0.1 0.6 0.3 1 0 1 0 0.6 1 0 1 0 1 0 0.4 L ⊥ P A ¬ P ⊥ ¬ L ⊥ L ⊥ ¬ P ⊥ ¬ L K ¬ L ¬ K L ¬ P ¬ A P P 0.8 0.2 0.25 0.75 0.9 0.1 K ¬ K A ¬ A A ¬ A

  67. PSDD: Probabilistic SDD 0.1 0.6 0.3 0.6 1 0 1 0 1 0 0.4 1 0 1 0 L ⊥ ¬ P ⊥ L ⊥ ¬ P ⊥ ¬ L ⊥ ¬ L K P A L ¬ P ¬ A P ¬ L ¬ K P Input: L, K, P, A 0.8 0.2 0.75 0.25 0.9 0.1 are true K ¬ K A ¬ A A ¬ A

  68. Pr( L,K,P,A ) PSDD: = 0.3 x 1 Probabilistic SDD x 0.8 x 0.4 x 0.25 = 0.024 0.1 0.6 0.3 0.6 1 0 1 0 1 0 0.4 1 0 1 0 L ⊥ ¬ P ⊥ L ⊥ ¬ P ⊥ ¬ L ⊥ ¬ L K P A L ¬ P ¬ A P ¬ L ¬ K P Input: L, K, P, A 0.8 0.2 0.75 0.25 0.9 0.1 are true K ¬ K A ¬ A A ¬ A

  69. PSDD nodes induce a normalized distribution! 0.1 0.6 0.3 1 0 1 0 1 0 0.6 0.4 1 0 1 0  L K L  P A  P   L  K L   P   L   P  A L P P 0.8 0.2 0.25 0.75 0.9 0.1 A  A A  A A  A

  70. PSDD nodes induce a normalized distribution! 0.1 0.6 0.3 1 0 1 0 1 0 0.6 0.4 1 0 1 0  L K L  P A  P   L  K L   P   L   P  A L P P 0.8 0.2 0.25 0.75 0.9 0.1 A  A A  A A  A

  71. PSDD nodes induce a normalized distribution! 0.1 0.6 0.3 1 0 1 0 1 0 0.6 0.4 1 0 1 0  L K L  P A  P   L  K L   P   L   P  A L P P 0.8 0.2 0.25 0.75 0.9 0.1 A  A A  A A  A Can read probabilistic independences off the circuit structure

  72. Tractable for Probabilistic Inference • MAP inference : Find most-likely assignment (otherwise NP-complete) • Computing conditional probabilities Pr(x|y) (otherwise PP-complete) • Sample from Pr(x|y)

Recommend


More recommend