Combinatorial Objects: Rankings rank sushi rank sushi A ij item i at position j 1 fatty tuna 1 shrimp (n items require n 2 2 sea urchin 2 sea urchin Boolean variables) 3 salmon roe 3 salmon roe 4 shrimp 4 fatty tuna 5 tuna 5 tuna 6 squid 6 squid 7 tuna roll 7 tuna roll 8 see eel 8 see eel 9 egg 9 egg cucumber cucumber 10 10 roll roll
Combinatorial Objects: Rankings rank sushi rank sushi A ij item i at position j 1 fatty tuna 1 shrimp (n items require n 2 2 sea urchin 2 sea urchin Boolean variables) 3 salmon roe 3 salmon roe 4 shrimp 4 fatty tuna An item may be assigned 5 tuna 5 tuna to more than one position 6 squid 6 squid 7 tuna roll 7 tuna roll A position may contain 8 see eel 8 see eel more than one item 9 egg 9 egg cucumber cucumber 10 10 roll roll
Encoding Rankings in Logic A ij : item i at position j pos 1 pos 2 pos 3 pos 4 item 1 A 11 A 12 A 13 A 14 item 2 A 21 A 22 A 23 A 24 item 3 A 31 A 32 A 33 A 34 item 4 A 41 A 42 A 43 A 44
Encoding Rankings in Logic A ij : item i at position j constraint: each item i assigned to a unique position ( n constraints) pos 1 pos 2 pos 3 pos 4 item 1 A 11 A 12 A 13 A 14 item 2 A 21 A 22 A 23 A 24 item 3 A 31 A 32 A 33 A 34 item 4 A 41 A 42 A 43 A 44
Encoding Rankings in Logic A ij : item i at position j constraint: each item i assigned to a unique position ( n constraints) pos 1 pos 2 pos 3 pos 4 item 1 A 11 A 12 A 13 A 14 item 2 A 21 A 22 A 23 A 24 constraint: each position j assigned item 3 A 31 A 32 A 33 A 34 a unique item ( n constraints) item 4 A 41 A 42 A 43 A 44
Encoding Rankings in Logic A ij : item i at position j constraint: each item i assigned to a unique position ( n constraints) pos 1 pos 2 pos 3 pos 4 item 1 A 11 A 12 A 13 A 14 item 2 A 21 A 22 A 23 A 24 constraint: each position j assigned item 3 A 31 A 32 A 33 A 34 a unique item ( n constraints) item 4 A 41 A 42 A 43 A 44
Structured Space for Paths cf. Nature paper
Structured Space for Paths cf. Nature paper Good variable assignment (represents route) 184
Structured Space for Paths cf. Nature paper Good variable assignment Bad variable assignment (represents route) (does not represent route) 184 16,777,032
Structured Space for Paths cf. Nature paper Good variable assignment Bad variable assignment (represents route) (does not represent route) 184 16,777,032 Space easily encoded in logical constraints [Nishino et al.]
Structured Space for Paths cf. Nature paper Good variable assignment Bad variable assignment (represents route) (does not represent route) 184 16,777,032 Space easily encoded in logical constraints [Nishino et al.] Unstructured probability space: 184+16,777,032 = 2 24
Logical Circuits
Logical Circuits L K L P A P L P A L K L P L P P K K A A A A
Property: Decomposability L K L P A P L P A L K L P L P P K K A A A A
Property: Decomposability L K L P A P L P A L K L P L P P K K A A A A
Property: Decomposability L K L P A P L P A L K L P L P P K K A A A A Property: AND gates have disjoint input circuits
Property: Determinism L ⊥ ¬ P ⊥ ¬ L ⊥ L ⊥ ¬ P ⊥ ¬ L K P A ¬ P ¬ A ¬ L ¬ K P L P K ¬ K A ¬ A A ¬ A Input: L, K, P, A are true and ¬L, ¬K, ¬P, ¬A are false
Property: Determinism L ⊥ ¬ P ⊥ ¬ L ⊥ L ⊥ ¬ P ⊥ ¬ L K P A ¬ P ¬ A ¬ L ¬ K P L P K ¬ K A ¬ A A ¬ A Input: L, K, P, A are true and ¬L, ¬K, ¬P, ¬A are false
Property: Determinism L ⊥ ¬ P ⊥ ¬ L ⊥ L ⊥ ¬ P ⊥ ¬ L K P A ¬ P ¬ A ¬ L ¬ K P L P K ¬ K A ¬ A A ¬ A Input: L, K, P, A are true and ¬L, ¬K, ¬P, ¬A are false Property: OR gates have at most one true input wire
Tractable for Logical Inference • Is structured space empty? (SAT) • Count size of structured space (#SAT) • Check equivalence of spaces
Tractable for Logical Inference • Is structured space empty? (SAT) • Count size of structured space (#SAT) • Check equivalence of spaces • Algorithms linear in circuit size (pass up, pass down, similar to backprop)
Tractable for Logical Inference • Is structured space empty? (SAT) • Count size of structured space (#SAT) • Check equivalence of spaces • Algorithms linear in circuit size (pass up, pass down, similar to backprop) • Compilation by exhaustive SAT solvers
Semantic Loss for Deep Learning
+ Data Constraints Deep Structured (Background Knowledge) (Physics) Output Prediction Deep Neural Learn Network
+ Data Constraints Deep Structured (Background Knowledge) (Physics) Output Prediction Deep Neural Learn Network Neural Network Logical Constraint Output Input
+ Data Constraints Deep Structured (Background Knowledge) (Physics) Output Prediction Deep Neural Learn Network Neural Network Logical Constraint Output Input
Semantic Loss
Semantic Loss • Output is probability vector p , not logic! How close is output to satisfying constraint?
Semantic Loss • Output is probability vector p , not logic! How close is output to satisfying constraint? • Answer: Semantic loss function L( α , p )
Semantic Loss • Output is probability vector p , not logic! How close is output to satisfying constraint? • Answer: Semantic loss function L( α , p ) • Axioms, for example:
Semantic Loss • Output is probability vector p , not logic! How close is output to satisfying constraint? • Answer: Semantic loss function L( α , p ) • Axioms, for example: – If p is Boolean then L( p,p ) = 0
Semantic Loss • Output is probability vector p , not logic! How close is output to satisfying constraint? • Answer: Semantic loss function L( α , p ) • Axioms, for example: – If p is Boolean then L( p,p ) = 0 – If α implies β then L( α , p ) ≥ L(β , p )
Semantic Loss • Output is probability vector p , not logic! How close is output to satisfying constraint? • Answer: Semantic loss function L( α , p ) • Axioms, for example: – If p is Boolean then L( p,p ) = 0 – If α implies β then L( α , p ) ≥ L(β , p ) • Properties:
Semantic Loss • Output is probability vector p , not logic! How close is output to satisfying constraint? • Answer: Semantic loss function L( α , p ) • Axioms, for example: – If p is Boolean then L( p,p ) = 0 – If α implies β then L( α , p ) ≥ L(β , p ) • Properties: – If α is equivalent to β then L( α , p ) = L( β , p )
Semantic Loss • Output is probability vector p , not logic! How close is output to satisfying constraint? • Answer: Semantic loss function L( α , p ) • Axioms, for example: – If p is Boolean then L( p,p ) = 0 – If α implies β then L( α , p ) ≥ L(β , p ) • Properties: – If α is equivalent to β then L( α , p ) = L( β , p ) – If p is Boolean and satisfies α then L( α , p ) = 0
Semantic Loss • Output is probability vector p , not logic! How close is output to satisfying constraint? • Answer: Semantic loss function L( α , p ) • Axioms, for example: – If p is Boolean then L( p,p ) = 0 – If α implies β then L( α , p ) ≥ L(β , p ) SEMANTIC • Properties: Loss! – If α is equivalent to β then L( α , p ) = L( β , p ) – If p is Boolean and satisfies α then L( α , p ) = 0
Semantic Loss: Definition Theorem: Axioms imply unique semantic loss: Probability of getting x after flipping coins with prob. p Probability of satisfying α after flipping coins with prob. p
How to Compute Semantic Loss?
How to Compute Semantic Loss? • In general: #P-hard
How to Compute Semantic Loss? • In general: #P-hard • With a logical circuit for α : Linear!
How to Compute Semantic Loss? • In general: #P-hard • With a logical circuit for α : Linear! 𝒚 𝟐 ∨ 𝒚 𝟑 ∨ 𝒚 𝟒 ¬𝒚 𝟐 ∨ ¬𝒚 𝟑 • Example: exactly-one constraint: ¬𝒚 𝟑 ∨ ¬𝒚 𝟒 ¬𝒚 𝟐 ∨ ¬𝒚 𝟒
How to Compute Semantic Loss? • In general: #P-hard • With a logical circuit for α : Linear! 𝒚 𝟐 ∨ 𝒚 𝟑 ∨ 𝒚 𝟒 ¬𝒚 𝟐 ∨ ¬𝒚 𝟑 • Example: exactly-one constraint: ¬𝒚 𝟑 ∨ ¬𝒚 𝟒 ¬𝒚 𝟐 ∨ ¬𝒚 𝟒 L( α , p ) = L( , p )
How to Compute Semantic Loss? • In general: #P-hard • With a logical circuit for α : Linear! 𝒚 𝟐 ∨ 𝒚 𝟑 ∨ 𝒚 𝟒 ¬𝒚 𝟐 ∨ ¬𝒚 𝟑 • Example: exactly-one constraint: ¬𝒚 𝟑 ∨ ¬𝒚 𝟒 ¬𝒚 𝟐 ∨ ¬𝒚 𝟒 L( α , p ) = L( , p ) = - log( )
How to Compute Semantic Loss? • In general: #P-hard • With a logical circuit for α : Linear! 𝒚 𝟐 ∨ 𝒚 𝟑 ∨ 𝒚 𝟒 ¬𝒚 𝟐 ∨ ¬𝒚 𝟑 • Example: exactly-one constraint: ¬𝒚 𝟑 ∨ ¬𝒚 𝟒 ¬𝒚 𝟐 ∨ ¬𝒚 𝟒 L( α , p ) = L( , p ) = - log( ) • Why? Decomposability and determinism!
Supervised Learning • Predict shortest paths • Add semantic loss to objective
Supervised Learning • Predict shortest paths • Add semantic loss to objective Is output a path?
Supervised Learning • Predict shortest paths • Add semantic loss to objective Is output Does output a path? have true edges?
Supervised Learning • Predict shortest paths • Add semantic loss to objective Is output Is output Does output a path? the true path? have true edges?
rank sushi 1 fatty tuna Supervised Learning 2 sea urchin 3 salmon roe 4 shrimp 5 tuna • Predict sushi preferences 6 squid 7 tuna roll • Add semantic loss to objective 8 see eel 9 egg 10 cucumber roll
rank sushi 1 fatty tuna Supervised Learning 2 sea urchin 3 salmon roe 4 shrimp 5 tuna • Predict sushi preferences 6 squid 7 tuna roll • Add semantic loss to objective 8 see eel 9 egg 10 cucumber roll Is output a ranking?
rank sushi 1 fatty tuna Supervised Learning 2 sea urchin 3 salmon roe 4 shrimp 5 tuna • Predict sushi preferences 6 squid 7 tuna roll • Add semantic loss to objective 8 see eel 9 egg 10 cucumber roll Is output Does output a ranking? correctly rank individual sushis?
rank sushi 1 fatty tuna Supervised Learning 2 sea urchin 3 salmon roe 4 shrimp 5 tuna • Predict sushi preferences 6 squid 7 tuna roll • Add semantic loss to objective 8 see eel 9 egg 10 cucumber roll Is output Is output Does output a ranking? the true ranking? correctly rank individual sushis?
Semi-Supervised Learning • Unlabeled data must have some label
Semi-Supervised Learning • Unlabeled data must have some label
Semi-Supervised Learning • Unlabeled data must have some label
Semi-Supervised Learning • Unlabeled data must have some label • Low semantic loss with exactly-one constraint
Semi-Supervised Learning • Unlabeled data must have some label • Low semantic loss with exactly-one constraint
MNIST
FASHION
CIFAR10
Semantic Loss Conclusions • Cares about meaning not syntax • Elegant axiomatic approach
Semantic Loss Conclusions • Cares about meaning not syntax • Elegant axiomatic approach • If you have complex output constraints Use logical circuits to enforce them
Semantic Loss Conclusions • Cares about meaning not syntax • Elegant axiomatic approach • If you have complex output constraints Use logical circuits to enforce them If you have unlabeled data (no constraints) Get a lot of signal by minimizing semantic loss of exactly-one
Probabilistic Circuits
Logical Circuits L K L P A P L P A L K L P P L P K K A A A A
PSDD: Probabilistic SDD 0.1 0.6 0.3 1 0 1 0 0.6 1 0 1 0 1 0 0.4 L ⊥ P A ¬ P ⊥ ¬ L ⊥ L ⊥ ¬ P ⊥ ¬ L K ¬ L ¬ K L ¬ P ¬ A P P 0.8 0.2 0.25 0.75 0.9 0.1 K ¬ K A ¬ A A ¬ A
PSDD: Probabilistic SDD 0.1 0.6 0.3 0.6 1 0 1 0 1 0 0.4 1 0 1 0 L ⊥ ¬ P ⊥ L ⊥ ¬ P ⊥ ¬ L ⊥ ¬ L K P A L ¬ P ¬ A P ¬ L ¬ K P Input: L, K, P, A 0.8 0.2 0.75 0.25 0.9 0.1 are true K ¬ K A ¬ A A ¬ A
Pr( L,K,P,A ) PSDD: = 0.3 x 1 Probabilistic SDD x 0.8 x 0.4 x 0.25 = 0.024 0.1 0.6 0.3 0.6 1 0 1 0 1 0 0.4 1 0 1 0 L ⊥ ¬ P ⊥ L ⊥ ¬ P ⊥ ¬ L ⊥ ¬ L K P A L ¬ P ¬ A P ¬ L ¬ K P Input: L, K, P, A 0.8 0.2 0.75 0.25 0.9 0.1 are true K ¬ K A ¬ A A ¬ A
PSDD nodes induce a normalized distribution! 0.1 0.6 0.3 1 0 1 0 1 0 0.6 0.4 1 0 1 0 L K L P A P L K L P L P A L P P 0.8 0.2 0.25 0.75 0.9 0.1 A A A A A A
PSDD nodes induce a normalized distribution! 0.1 0.6 0.3 1 0 1 0 1 0 0.6 0.4 1 0 1 0 L K L P A P L K L P L P A L P P 0.8 0.2 0.25 0.75 0.9 0.1 A A A A A A
PSDD nodes induce a normalized distribution! 0.1 0.6 0.3 1 0 1 0 1 0 0.6 0.4 1 0 1 0 L K L P A P L K L P L P A L P P 0.8 0.2 0.25 0.75 0.9 0.1 A A A A A A Can read probabilistic independences off the circuit structure
Tractable for Probabilistic Inference • MAP inference : Find most-likely assignment (otherwise NP-complete) • Computing conditional probabilities Pr(x|y) (otherwise PP-complete) • Sample from Pr(x|y)
Recommend
More recommend