Learning Logistic Circuits Yitao Liang, Guy Van den Broeck January 31, 2019
Which model to choose Neural Networks: Classical AI Methods: Hungry? $25? Sleep? Restaurant? … “Black Box” Clear Modeling Assumption Good performance on Image Classification 1
Starting Point: Probabilistic Circuits A promising synthesis of the two SPN State-of-the-art on Density Estimation !"($) 3 2
What if we only want to learn a classifier !" # $) 4 3
Logical Circuits 1 0 1 Input: 1 1 A ¬ A 0 1 0 1 0 A B C D 0 = 1 AND 0 0 1 1 0 1 0 1 1 0 0 0 1 Bottom-up Evaluation B ¬ B 1 0 C ¬ C ¬ D D 1 0 1 0 4
Logical -> Probabilistic Circuits 0 . 9 0 . 1 A ¬ A Red Parameters: 0 . 2 0 . 8 0 . 4 0 . 6 Conditional Probabilities 0 . 1 0 . 9 0 . 3 0 . 7 0 . 9 0 . 8 0 . 1 0 . 2 B ¬ B C ¬ C ¬ D D 5
Logical -> Probabilistic Circuits !"($, &, ', () = +. +-. Input: 0 . 9 0 . 1 0 0.096 Pr( A, B, C, D ) A B C D A ¬ A 0.096 0.194 1 0 0 . 2 0 . 8 0 . 4 0 . 6 0.24= 0.8*0.3 0 1 1 0 ? 0.01 0.24 0.00 0.1= 0.1*1 + 0.9*0 0.0 0.3 0.1 0.8 0 . 1 0 . 9 0 . 3 0 . 7 Multiply the parameters 0 . 9 0 . 8 0 . 1 0 . 2 0 0 0 1 bottom-up B ¬ B 1 0 C ¬ C ¬ D D 1 0 1 0 6
Evaluate Logistic Circuits % !" # = % &, (, ), *) = %,-./(1%.3) = 4. 563 − 2 . 6 − 5 . 8 Input: A ¬ A 0 1 Pr( Y | A, B, C, D ) A B C D − 1 3 2 . 3 4 0 1 1 0 ? − 0 . 5 0 . 3 1 . 5 2 . 8 Multiply the parameters − 4 1 3 . 9 4 bottom-up B ¬ B 1 0 Logistic function on final output C ¬ C ¬ D D 1 0 1 0 7
Are logistic circuits amenable to tractable learning 8
Special Case: Logistic Regression Logistic Regression 1 Pr # = 1 &, (, ), * = ) 1 + ex p( − & ∗ 3 4 − ¬& ∗ 3 ¬4 − ( ∗ 3 6 − ⋯ What about other logistic circuits in more general forms? 9
Parameter Learning Pr($ = 1 ∣ ( = 0, + = 1, , = 1, - = 0) “Hot” wires are active features 10
Parameter Learning Due to decomposability and determinism, reduce to logistic regression Features associated with each wire “Global Circuit Flow” Convex Parameter learning 11
Structure Learning Generate Calculate candidate Variance operations Execute the best Similar to LearnPsdd Split nodes to reduce variance of gradients 14 12
Comparable Accuracy with Neural Nets 15 13
Significantly Smaller in Size 16 14
Better Data Efficiency 17 15
Probabilistic -> Logistic Circuits Probabilities become log-odds Discriminative Counterparts 18 16
What do Features Mean This is the feature that contributes the most to this image’s classification probability feature value : 0.925 feature weight : 3.489 feature interpretation: curvy lines and hallow center 19 17
Conclusion Logistic circuits : • Synthesis of symbolic AI and statistical learning • Discriminative counterparts of probabilistic circuits • Convex parameter learning • Simple heuristic for structure learning • Good performance • Easy to interpret 18
Thanks https://github.com/UCLA-StarAI/LogisticCircuit
Recommend
More recommend