Einsum Networks Fast and Scalable Learning ofTractable Probabilistic Circuits Robert Peharz Steven Lang Antonio Vergari Eindhoven University of Technology Technical University of Darmstadt University of California, Los Angeles Karl Stelzner Alejandro Molina Martin Trapp Technical University of Darmstadt Technical University of Darmstadt Graz University of Technology Guy Van den Broeck Kristian Kersting Zoubin Ghahramani University of Cambridge; Uber AI Labs University of California, Los Angeles Technical University of Darmstadt International Conference on Machine Learning (ICML), July 2020
In This Paper Probabilistic Circuits (PCs) — Just a special type of neural network Yet, they are slow Computational graphs highly sparse and cluttered Operations implemented in the log-domain ∼ 50 times slower than neural net of comparable size We propose Einsum Networks (EiNets) PC architecture using a few monolithic einsum operations Run and train PCs up to two orders of magnitude faster Scale PCs to datasets previously out of reach (CelebA, SVHN) 2 /21
Probabilistic Circuits
Probabilistic Circuits Computational graph containing 3 types of operations: Distributions (leaves), products, and weighted sums. 4 /21
Probabilistic Circuits Computational graph containing 3 types of operations: Distributions (leaves) , products, and weighted sums. 4 /21
Probabilistic Circuits Computational graph containing 3 types of operations: Distributions (leaves), products , and weighted sums. 4 /21
Probabilistic Circuits Computational graph containing 3 types of operations: Distributions (leaves), products, and weighted sums . 4 /21
Probabilistic Circuits — Leaf Distributions 5 /21
Probabilistic Circuits — Leaf Distributions Arbitrary probability function (pdf, pmf, mixed) over some set of random variables X . Should facilitate tractable inference routines, e.g. marginalization, conditioning, MAP, … 6 /21
Probabilistic Circuits — Leaf Distributions Arbitrary probability function (pdf, pmf, mixed) over some set of random variables X . Should facilitate tractable inference routines, e.g. marginalization, conditioning, MAP, … x 6 /21
Probabilistic Circuits — Leaf Distributions Arbitrary probability function (pdf, pmf, mixed) over some set of random variables X . Should facilitate tractable inference routines, e.g. marginalization, conditioning, MAP, … p ( x ) x 6 /21
Probabilistic Circuits — Leaf Distributions Arbitrary probability function (pdf, pmf, mixed) over some set of random variables X . Should facilitate tractable inference routines, e.g. marginalization, conditioning, MAP, … p ( x ) θ x 6 /21
Probabilistic Circuits — Leaf Distributions Arbitrary probability function (pdf, pmf, mixed) over some set of random variables X . Should facilitate tractable inference routines, e.g. marginalization, conditioning, MAP, … p ( x ) h ( x ) exp( θ T T ( x ) − A ( θ )) x 6 /21
Probabilistic Circuits — Leaf Distributions Arbitrary probability function (pdf, pmf, mixed) over some set of random variables X . Should facilitate tractable inference routines, e.g. marginalization, conditioning, MAP, … p ( x ) h ( x ) exp( θ T T ( x ) − A ( θ )) Gaussian, Bernoulli, Dirichlet, Poisson, Gamma, … x 6 /21
Probabilistic Circuits — Products 7 /21
Probabilistic Circuits — Products Simply product units 8 /21
Probabilistic Circuits — Products Simply product units 0 . 5 1 . 4 8 /21
Probabilistic Circuits — Products Simply product units 0 . 7 0 . 5 1 . 4 8 /21
Probabilistic Circuits — Sums 9 /21
Probabilistic Circuits — Sums Weighted sums 10 /21
Probabilistic Circuits — Sums Weighted sums w 1 w 2 10 /21
Probabilistic Circuits — Sums Weighted sums w 1 w 2 3 . 14 42 . 10 /21
Probabilistic Circuits — Sums Weighted sums w 1 3 . 14 + w 2 42 . w 1 w 2 3 . 14 42 . 10 /21
Probabilistic Circuits — Sums Weighted sums w 1 3 . 14 + w 2 42 . w k ≥ 0 w 1 w 2 3 . 14 42 . 10 /21
Probabilistic Circuits — Sums Weighted sums w 1 3 . 14 + w 2 42 . w k ≥ 0 ∑ k w k = 1 w 1 w 2 3 . 14 42 . 10 /21
Plus: Structural properties! Probabilistic Circuits Computational graph containing distributions, products, and weighted sums. 11 /21
Probabilistic Circuits Computational graph containing distributions, products, and weighted sums. Plus: Structural properties! 11 /21
Probabilistic Circuits Computational graph containing distributions, products, and weighted sums. Plus: Structural properties! X 3 X 1 X 2 11 /21
Probabilistic Circuits Computational graph containing distributions, products, and weighted sums. Plus: Structural properties! =: p ( X 1 , X 2 , X 3 ) X 3 X 1 X 2 11 /21
Probabilistic Circuits Computational graph containing distributions, products, and weighted sums. Plus: Structural properties! =: p ( X 1 , X 2 , X 3 ) Smoothness sum children have same scope X 3 X 1 X 2 11 /21
Probabilistic Circuits Computational graph containing distributions, products, and weighted sums. Plus: Structural properties! =: p ( X 1 , X 2 , X 3 ) Smoothness sum children have same scope Decomposability product children have disjoint scope X 3 X 1 X 2 11 /21
Smoothness and decomposability Single bottom up pass! Check out our AAAI tutorial on Probabilistic Circuits! Upcoming tutorials at ECAI, ECML/PKDD, IJCAI ! Probabilistic Circuits — Inference Example: Marginalization and Conditioning X = X q ∪ X m ∪ X e ∫ p ( X q , x ′ m , x e )d x ′ m p ( X q | x e ) = ∫ ∫ p ( x ′ m , x e )d x ′ q d x ′ q , x ′ m 12 /21
Check out our AAAI tutorial on Probabilistic Circuits! Upcoming tutorials at ECAI, ECML/PKDD, IJCAI ! Probabilistic Circuits — Inference Example: Marginalization and Conditioning X = X q ∪ X m ∪ X e ∫ p ( X q , x ′ m , x e )d x ′ m p ( X q | x e ) = ∫ ∫ p ( x ′ m , x e )d x ′ q d x ′ q , x ′ m Smoothness and decomposability ⇒ Single bottom up pass! 12 /21
Probabilistic Circuits — Inference Example: Marginalization and Conditioning X = X q ∪ X m ∪ X e ∫ p ( X q , x ′ m , x e )d x ′ m p ( X q | x e ) = ∫ ∫ p ( x ′ m , x e )d x ′ q d x ′ q , x ′ m Smoothness and decomposability ⇒ Single bottom up pass! Check out our AAAI tutorial on Probabilistic Circuits! Upcoming tutorials at ECAI, ECML/PKDD, IJCAI ! 12 /21
The Problem
Einsum Networks
Step I – Vectorize Nodes 15 /21
single einsum -operation Step II – The Basic Einsum Operation 16 /21
Step II – The Basic Einsum Operation single einsum -operation S k = W kij N i N ′ j 16 /21
single einsum -operation Step III – Einsum Layers 17 /21
Step III – Einsum Layers S lk = W lkij N li N ′ single einsum -operation lj 17 /21
Results
Runtime and Memory Comparison K D (depth) R (# replicas) 40 10 1 EiNets (x) 10 1 SPFlow (+) GPU memory (GB) LibSPN (*) 30 10 0 10 0 10 0 20 10 −1 10 −1 10 10 −1 10 −2 10 −2 10 0 10 1 10 −1 10 0 10 1 10 2 10 −1 10 0 10 1 10 2 Training time (sec/epoch) Training time (sec/epoch) Training time (sec/epoch) 0 19 /21
Generative Image Models 20 /21
https://github.com/SPFlow/SPFlow https://github.com/cambridge-mlg/EinsumNetworks Conclusion PCs: intersection of classical graphical models and neural networks. Crucial advantage: many exact inference routines. But, they used to be painful to scale. In this paper, we made a big step to close the gap. More to come! 21 /21
Recommend
More recommend