probabilistic modelling with tensor networks
play

Probabilistic Modelling with Tensor Networks: From Hidden Markov - PowerPoint PPT Presentation

Probabilistic Modelling with Tensor Networks: From Hidden Markov Models to Quantum Circuits Ryan Sweke Freie Universitt Berlin The Big Picture Machine Learning Classical ML Quantum ML Heuristics Statistical Learning Theory


  1. Probabilistic Modelling with Tensor Networks: From Hidden Markov Models to Quantum Circuits Ryan Sweke Freie Universität Berlin

  2. The Big Picture “Machine Learning” Classical ML Quantum ML Heuristics Statistical Learning Theory Heuristics Statistical Learning Theory • Abstract settings • Few models • Sophisticated models • Simplified models • Very few results • Very little understanding • Incredible results • Often loose bounds • Q vs C?! • Very little understanding • Hard! Tensor networks TN’s provide a nice language to bridge heuristics with theory, and quantum with classical!

  3. ⃗ ⃗ ⃗ ⃗ What is this talk about? This talk is about Probabilistic Modelling… d M } from an unknown discrete multivariate probability distribution P ( X 1 , …, X N ) . Given: Samples { d 1 , …, d j = ( X j 1 , …, X j X i ∈ {1,…, d } N ) Task: “Learn” a parameterized model P ( X 1 , …, X N | θ ) . This may mean many di ff erent things, depending on the task you are interested in… Performing inference (i.e. calculating marginals). Calculating expectation values. Generating samples. Depending on your goal, your model/approach may di ff er significantly!

  4. ⃗ ⃗ ⃗ ⃗ ⃗ ⃗ Probabilistic Modelling I like to think of there being three distinct elements: (1) The model P ( X 1 , …, X N | Key Question: Expressivity? θ ) . (2) The learning algorithm: { Model Dependent! d M } → d 1 , …, θ Typically by maximising the (log) likelihood: ℒ = ∑ log[ P ( d i | θ )] i (3) The “task” algorithm. Model Dependent! Performing inference via belief propagation for Probabilistic Graphical Models. Expectation values via sampling for Boltzmann Machines. Generating samples directly via a GAN.

  5. ⃗ Probabilistic Modelling This overall picture is summarised quite nicely by the following “hierarchy of generative models”: Maximum Likelihood Explicit Density Implicit Density P ( X 1 , …, X N | θ ) Tractable Density Approximate Density (Some) Probabilistic Graphical Models Boltzmann Machines GANs VAE We focus here!

  6. Probabilistic Graphical Models We will see that tensor networks provide a unifying framework for analyzing probabilistic graphical models: Probabilistic Graphical Models Bayesian Networks Markov Random Fields (Directed Acyclic Graphs) (General Graphs) Factor Graphs Tensor Networks

  7. Tensor Networks Tensor network notation provides a powerful and convenient diagrammatic language for tensor manipulation... We represent tensors as boxes, with an "open leg" for each tensor index - A vector is a 1-tensor: An element of the vector is a scalar ("close" the index) - A matrix is a 2-tensor: "vectorization" is very natural in this notation... - A shared index denotes a contraction over that index:

  8. Tensor Networks A discrete multivariate probability distribution is naturally represented as an N-tensor... P ( X 1 , …, X N ) = d N parameters! A tensor network decomposition of P is a decomposition into a network of contracted tensors... r Matrix Product State P ( X 1 , …, X N ) = parameters! We call the bond dimension - directly related to the underlying correlation structure. r eg: for independent (uncorrelated random variables). These representations are very well understood in the context of many-body quantum physics.

  9. Probabilistic Graphical Models: Bayesian Networks X N X 1 Given a probability distribution P ( X 1 , …, X N ) = A BN models this distribution via a directed acyclic graph expressing the structure of conditional dependencies. For example: A Hidden Markov Model… H 1 H 1 H 1 H 1 H 1 H 2 H 2 H 2 H 2 H 2 H 3 H 3 H 3 H 1 < d N parameters! X 1 X 1 X 1 X 1 X 1 X 2 X 2 X 2 X 2 X 3 X 3 X 1 X 1 P ( H 1 | X 1 ) P ( H 2 | H 1 ) P ( X 2 | H 2 ) P ( H 3 | H 2 ) P ( X 1 , X 2 , X 3 , H 1 , H 2 , H 3 ) = P ( X 1 ) P ( X 3 | H 3 ) The probability of “visible” variables is via marginalisation: ∑ P ( X 1 , X 2 , X 3 ) = P ( X 1 , X 2 , X 3 , H 1 , H 2 , H 3 ) H 1 , H 2 , H 3

  10. Probabilistic Graphical Models: Markov Random Fields X N X 1 Given a probability distribution P ( X 1 , …, X N ) = A Markov Random Field models the distribution via the product of clique potentials defined by a generic graph. maximal fully-connected subgraph For example: H 1 H 2 X 1 X 2 X 3 1 P ( X 1 , X 2 , X 3 , H 1 , H 2 , H 3 ) = g 1 ( X 1 , H 1 ) g 2 ( H 1 , X 2 , H 2 ) g 3 ( H 2 , X 3 ) Z NB - Clique potentials are not normalised - explicit normalisation is necessary!

  11. ⃗ Probabilistic Graphical Models: Factor Graphs Bayesian Networks and Markov Random Fields are unified via Factor Graphs… P ( X 1 , …, X N ) = 1 Z ∏ f j ( X j ) j Bayesian Networks: Factors are conditional probability distributions (inherently normalised) Markov Random Fields: Factors are clique potentials (explicit normalisation necessary) Explicitly: H 1 H 2 H 3 H 1 H 2 H 3 f 2 f 4 f 1 f 3 f 5 X 1 X 2 X 3 X 1 X 2 X 3 H 1 H 2 H 1 H 2 f 2 f 1 f 3 X 1 X 2 X 3 X 1 X 2 X 3

  12. Probabilistic Graphical Models: Factor Graphs to Tensor Networks Let’s consider the Hidden Markov Model in more detail… H 1 H 2 H 3 f 4 f 2 Marginalizing out the hidden variables means contracting the connected factor tensors! f 1 f 3 f 5 X 1 X 2 X 3 ∑ H 2 H 1 H 3 f 1 f 5 X 2 X 1 X 3 ∑ H 3 ∑ H 1 The probability distribution over the visible variables is exactly equivalent to an MPS decomposition of the global probability tensor! X 1 X 2 X 3 With non-negative tensors!

  13. Probabilistic Graphical Models: Factor Graphs to Tensor Networks The other direction also holds… X 1 X 2 X 3 Exact non-negative canonical polyadic decomposition H 1 H 2 H 3 X 1 X 2 X 3 Contract H 1 H 2 H 3 Hidden detail: r ′ � ≤ min( dr , r 2 ) X 1 X 2 X 3 Hidden Markov Models and non-negative MPS are almost exactly equivalent

  14. Probabilistic Graphical Models: Factor Graphs to Tensor Networks Take home message - we can use Tensor Networks to study and to generalise probabilistic graphical models! X N X 1 Any tensor network which yields a non-negative tensor when contracted! = Includes all probabilistic graphical models See I. Glasser et al “Supervised Learning with generalised tensor networks” (Formal connection and heuristic algorithms) Goal: By studying MPS based decompositions can we… Make rigorous claims concerning expressivity? Draw connections to quantum circuits? Make claims concerning expressivity of classical vs quantum models? Yes.

  15. Tensor Network Models: HMM are MPS The first model we consider is non-negative MPS - which we already showed are equivalent to HMM… X 1 X N X 1 X N r A 1 A N T NB: All tensors have only non-negative (real) entries! We call the minimal bond dimension r necessary to factorise T exactly the TT − Rank ℝ≥ 0 . ``Tensor-Train” rank The bond-dimension necessary to represent a class of tensors characterises the expressivity of the model!

  16. Tensor Network Models: HMM are MPS Note that for probability distributions over two variables (matrices) the TT − Rank ℝ≥ 0 is the non-negative rank: X 1 X 2 X 1 X 2 r T A B i.e. the smallest r such that T = AB with and non-negative. B A Not such an easy rank to determine! (NP-hard to determine whether rank is equal to non-negative rank.)

  17. Tensor Network Models: Born Machines The second model we consider is Born Machines … X 1 X N X 1 X N r A 1 A N T A † A † 1 N X 1 X N We can use either real or complex tensors! We call the minimal bond dimension r necessary to factorise T exactly the Born − Rank ℝ / ℂ .

  18. ⃗ Tensor Network Models: Born Machines In the case of only two variables this is the real/complex Hadamard (entry-wise) square root rank… X 1 X 2 X 1 X 2 r A B AB is an element wise square root! T A † B † r X 1 X 2 such that T = | AB | ∘ 2 i.e. the smallest r In the real case: ± ± t 11 … t 1 d ± [ rank ] 2 d 2 combinations - bad fast! r = min ⋮ ⋮ ± ± t d 1 … t dd In the complex case: e i θ 11 e i θ 1 d … t 11 t 1 d θ [ rank ] even worse :( ⋮ ⋮ r = min e i θ d 1 e i θ dd t d 1 … t dd

  19. Tensor Network Models: Born Machines d D +1 . are described exactly by a BM of bond dimension Outcome probabilities of a 2-local quantum circuit of depth D SVD contract d d 2 d D +1 . d | 0 ⟩ | 0 ⟩ | 0 ⟩ | 0 ⟩ | 0 ⟩ | 0 ⟩ | 0 ⟩ | 0 ⟩ The probability of a measurement outcome is described by the BM defined via the circuit MPS: X N X 1 P ( X 1 , …, X N ) = X 1 X N

  20. Tensor Network Models: Locally Purified States The final model we consider is Locally Purified States … X 1 X N X 1 X N r A 1 A N T μ A † A † N 1 X 1 X N We can use either real or complex tensors! We call the minimal bond dimension r necessary to factorise T exactly the Puri − Rank ℝ / ℂ . In the case of only two variables this is the positive-semidefinite rank

  21. Tensor Network Models: Locally Purified States In the case of only two variables this is positive semidefinite rank… Given a matrix the PSD rank is the smallest for which there exist positive semidefinite matrices M , A i , B j r of size such that r × r M = Tr( A i B j ) . j i j i j j i i B A A † A C D B B † i j M A † B † i j

Recommend


More recommend