Boolean Constraints unstructured structured L K P A L K P A 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 1 0 0 0 1 0 0 0 1 1 0 0 1 1 0 1 0 0 0 1 0 0 0 1 0 1 0 1 0 1 0 1 1 0 0 1 1 0 0 1 1 1 0 1 1 1 1 0 0 0 1 0 0 0 1 0 0 1 1 0 0 1 1 0 1 0 1 0 1 0 7 out of 16 instantiations 1 0 1 1 1 0 1 1 are impossible 1 1 0 0 1 1 0 0 1 1 0 1 1 1 0 1 1 1 1 0 1 1 1 0 1 1 1 1 1 1 1 1
Combinatorial Objects: Rankings rank sushi rank sushi 1 fatty tuna 1 shrimp 10 items : 2 sea urchin 2 sea urchin 3,628,800 3 salmon roe 3 salmon roe rankings 4 shrimp 4 fatty tuna 5 tuna 5 tuna 20 items : 6 squid 6 squid 2,432,902,008,176,640,000 7 tuna roll 7 tuna roll rankings 8 see eel 8 see eel 9 egg 9 egg 10 cucumber roll 10 cucumber roll
Combinatorial Objects: Rankings rank sushi rank sushi A ij item i at position j 1 fatty tuna 1 shrimp (n items require n 2 2 sea urchin 2 sea urchin Boolean variables) 3 salmon roe 3 salmon roe 4 shrimp 4 fatty tuna 5 tuna 5 tuna 6 squid 6 squid 7 tuna roll 7 tuna roll 8 see eel 8 see eel 9 egg 9 egg 10 cucumber roll 10 cucumber roll
Combinatorial Objects: Rankings rank sushi rank sushi A ij item i at position j 1 fatty tuna 1 shrimp (n items require n 2 2 sea urchin 2 sea urchin Boolean variables) 3 salmon roe 3 salmon roe 4 shrimp 4 fatty tuna An item may be assigned 5 tuna 5 tuna to more than one position 6 squid 6 squid 7 tuna roll 7 tuna roll A position may contain 8 see eel 8 see eel more than one item 9 egg 9 egg 10 cucumber roll 10 cucumber roll
Encoding Rankings in Logic A ij : item i at position j pos 1 pos 2 pos 3 pos 4 item 1 A 11 A 12 A 13 A 14 item 2 A 21 A 22 A 23 A 24 item 3 A 31 A 32 A 33 A 34 item 4 A 41 A 42 A 43 A 44
Encoding Rankings in Logic A ij : item i at position j constraint: each item i assigned to a unique position ( n constraints) pos 1 pos 2 pos 3 pos 4 item 1 A 11 A 12 A 13 A 14 item 2 A 21 A 22 A 23 A 24 item 3 A 31 A 32 A 33 A 34 item 4 A 41 A 42 A 43 A 44
Encoding Rankings in Logic A ij : item i at position j constraint: each item i assigned to a unique position ( n constraints) pos 1 pos 2 pos 3 pos 4 item 1 A 11 A 12 A 13 A 14 item 2 A 21 A 22 A 23 A 24 constraint: each position j assigned item 3 A 31 A 32 A 33 A 34 a unique item ( n constraints) item 4 A 41 A 42 A 43 A 44
Encoding Rankings in Logic A ij : item i at position j constraint: each item i assigned to a unique position ( n constraints) pos 1 pos 2 pos 3 pos 4 item 1 A 11 A 12 A 13 A 14 item 2 A 21 A 22 A 23 A 24 constraint: each position j assigned item 3 A 31 A 32 A 33 A 34 a unique item ( n constraints) item 4 A 41 A 42 A 43 A 44
Structured Space for Paths cf. Nature paper
Structured Space for Paths cf. Nature paper Good variable assignment (represents route) 184
Structured Space for Paths cf. Nature paper Good variable assignment Bad variable assignment (represents route) (does not represent route) 184 16,777,032
Structured Space for Paths cf. Nature paper Good variable assignment Bad variable assignment (represents route) (does not represent route) 184 16,777,032 Space easily encoded in logical constraints See [Choi, Tavabi, Darwiche, AAAI 2016]
Structured Space for Paths cf. Nature paper Good variable assignment Bad variable assignment (represents route) (does not represent route) 184 16,777,032 Space easily encoded in logical constraints See [Choi, Tavabi, Darwiche, AAAI 2016] Unstructured probability space: 184+16,777,032 = 2 24
“Deep Architecture” Logic + Probability
Logical Circuits L K L P A P L K L P L P A L P P K K A A A A
Property: Decomposability L K L P A P L K L P L P A L P P K K A A A A
Property: Decomposability L K L P A P L K L P L P A L P P K K A A A A
Property: Determinism L K L P A P L P A L K L P L P P Input: L, K, P, A K K A A A A
Sentential Decision Diagram (SDD) L K L P A P L P A L K L P L P P Input: L, K, P, A K K A A A A
Sentential Decision Diagram (SDD) L K L P A P L P A L K L P L P P Input: L, K, P, A K K A A A A
Sentential Decision Diagram (SDD) L K L P A P L P A L K L P L P P Input: L, K, P, A K K A A A A
Tractable for Logical Inference • Is structured space empty? (SAT) • Count size of structured space (#SAT) • Check equivalence of spaces
Tractable for Logical Inference • Is structured space empty? (SAT) • Count size of structured space (#SAT) • Check equivalence of spaces • Algorithms linear in circuit size (pass up, pass down, similar to backprop)
Tractable for Logical Inference • Is structured space empty? (SAT) • Count size of structured space (#SAT) • Check equivalence of spaces • Algorithms linear in circuit size (pass up, pass down, similar to backprop)
PSDD: Probabilistic SDD 0.1 0.6 0.3 1 0 1 0 1 0 1 0 1 0 0.6 0.4 L K L P A P L P A L K L P L P P 0.8 0.2 0.25 0.75 0.9 0.1 K K A A A A
PSDD: Probabilistic SDD 0.1 0.6 0.3 1 0 1 0 1 0 1 0 1 0 0.6 0.4 L K L P A P L P A L K L P L P P 0.8 0.2 0.25 0.75 0.9 0.1 K K A A A A Input: L, K, P, A
PSDD: Probabilistic SDD 0.1 0.6 0.3 1 0 1 0 1 0 1 0 1.0 0 0.6 0.4 L K L P A P L P A L K L P L P P 0.8 0.2 0.25 0.75 0.9 0.1 K K A A A A Input: L, K, P, A
PSDD: Probabilistic SDD 0.1 0.6 0.3 1 0 1 0 1 0 1 0 1.0 0 0.6 0.4 L K L P A P L P A L K L P L P P 0.8 0.2 0.25 0.75 0.9 0.1 K K A A A A Input: L, K, P, A Pr( L,K,P,A ) = 0.3 x 1.0 x 0.8 x 0.4 x 0.25 = 0.024
PSDD nodes induce a normalized distribution! 0.1 0.6 0.3 1 0 1 0 1 0 0.6 0.4 1 0 1 0 L K L P A P L K L P L P A L P P 0.8 0.2 0.25 0.75 0.9 0.1 A A A A A A
PSDD nodes induce a normalized distribution! 0.1 0.6 0.3 1 0 1 0 1 0 0.6 0.4 1 0 1 0 L K L P A P L K L P L P A L P P 0.8 0.2 0.25 0.75 0.9 0.1 A A A A A A
PSDD nodes induce a normalized distribution! 0.1 0.6 0.3 1 0 1 0 1 0 0.6 0.4 1 0 1 0 L K L P A P L K L P L P A L P P 0.8 0.2 0.25 0.75 0.9 0.1 A A A A A A Can read probabilistic independences off the circuit structure
Tractable for Probabilistic Inference • MAP inference : Find most-likely assignment (otherwise NP-complete) • Computing conditional probabilities Pr(x|y) (otherwise PP-complete) • Sample from Pr(x|y)
Tractable for Probabilistic Inference • MAP inference : Find most-likely assignment (otherwise NP-complete) • Computing conditional probabilities Pr(x|y) (otherwise PP-complete) • Sample from Pr(x|y) • Algorithms linear in circuit size (pass up, pass down, similar to backprop)
Learning PSDDs Logic + Probability + ML
Parameters are Interpretable 0.1 0.6 0.3 1 0 1 0 1 0 0.6 0.4 1 0 1 0 L K L P L K L P A P L P A L P P 0.8 0.2 0.25 0.75 0.9 0.1 K K A A A A Explainable AI DARPA Program
Parameters are Interpretable 0.1 0.6 0.3 1 0 1 0 1 0 0.6 0.4 1 0 1 0 L K L P L K L P A P L P A L P P 0.8 0.2 0.25 0.75 0.9 0.1 K K A A A A Student takes course L Explainable AI DARPA Program
Parameters are Interpretable 0.1 0.6 0.3 1 0 1 0 1 0 0.6 0.4 1 0 1 0 L K L P L K L P A P L P A L P P 0.8 0.2 0.25 0.75 0.9 0.1 K K A A A A Student takes course P Student takes course L Explainable AI DARPA Program
Parameters are Interpretable 0.1 0.6 0.3 Probability of P given L 1 0 1 0 1 0 0.6 0.4 1 0 1 0 L K L P L K L P A P L P A L P P 0.8 0.2 0.25 0.75 0.9 0.1 K K A A A A Student takes course P Student takes course L Explainable AI DARPA Program
Learning Algorithms • Parameter learning: Closed form max likelihood from complete data One pass over data to estimate Pr(x|y)
Learning Algorithms • Parameter learning: Closed form max likelihood from complete data One pass over data to estimate Pr(x|y) Not a lot to say: very easy!
Learning Algorithms • Parameter learning: Closed form max likelihood from complete data One pass over data to estimate Pr(x|y) Not a lot to say: very easy! • Circuit learning (naïve): Compile constraints to SDD circuit – Use SAT solver technology Circuit does not depend on data
Learning Preference Distributions PSDD Special-purpose distribution: Mixture-of-Mallows – # of components from 1 to 20 – EM with 10 random seeds – implementation of Lu & Boutilier
Learning Preference Distributions PSDD Special-purpose distribution: Mixture-of-Mallows – # of components from 1 to 20 – EM with 10 random seeds – implementation of Lu & Boutilier This is the naive approach, circuit does not depend on data!
Learn Circuit from Data Even in unstructured spaces
Tractable Learning Markov networks Bayesian networks
Tractable Learning Markov networks Bayesian networks Do not support linear-time exact inference
Tractable Learning Historically: Polytrees, Chow-Liu trees, etc. Cutset Networks SPNs Both are Arithmetic Circuits (ACs) [Darwiche, JACM 2003]
PSDDs are Arithmetic Circuits + 1 * * * n 2 2 1 n * * * p 1 s 1 p 2 s 2 p n s n p 1 p n s 1 p 2 s n s 2 PSDD AC
Tractable Learning Representational Freedom Strong Properties
Tractable Learning DNN Representational Freedom Strong Properties
Tractable Learning DNN Representational Freedom Strong Properties
Tractable Learning DNN SPN Cutset Representational Freedom Strong Properties
Tractable Learning DNN SPN Cutset Representational Freedom Strong Properties Perhaps the most powerful circuit proposed to date
PSDDs for the Logic-Phobic
PSDDs for the Logic-Phobic Bottom-up each node is a distribution
PSDDs for the Logic-Phobic Bottom-up each node is a distribution
PSDDs for the Logic-Phobic
PSDDs for the Logic-Phobic Multiply independent distributions
PSDDs for the Logic-Phobic
PSDDs for the Logic-Phobic Weighted mixture of lower level distributions
PSDDs for the Logic-Phobic
PSDDs for the Logic-Phobic
Variable Trees (vtrees) PSDD Vtree Correspondence
Learning Variable Trees • How much do vars depend on each other? • Learn vtree by hierarchical clustering
Learning Variable Trees • How much do vars depend on each other? • Learn vtree by hierarchical clustering
Learning Primitives
Learning Primitives
Learning Primitives Primitives maintain PSDD properties and structured space!
LearnPSDD 1 Vtree learning 2 Construct the most naïve PSDD 3 LearnPSDD (search for better structure)
LearnPSDD 1 Vtree learning Generate 2 Simulate candidate Construct the most operations operations naïve PSDD 3 Execute the LearnPSDD best (search for better structure)
Recommend
More recommend