Introduction to Markov Categories Eigil Fjeldgren Rischel - PowerPoint PPT Presentation

Introduction to Markov Categories Eigil Fjeldgren Rischel University of Copenhagen Categorical Probabiliy and Statistics, June 2020

TLDR ◮ Consider a category where the maps are “stochastic functions”, or “parameterized probability distributions”. ◮ This is a symmetric monoidal category ◮ Many important notions in probability/statistics are expressible as diagram equations in this category. ◮ We can axiomatize the structure of this category to do “synthetic probability”. ◮ Several theorems admit proofs in this purely synthetic setting.

Overview of talk Introduction Diagrams for probability Markov categories Kolmogorov’s 0 to 1 law Sufficient statistics

A graphical model (Figure stolen from Kissinger-Jacobs-Zanasi: Causal Inference by String Diagram Surgery)

Independence A map I → X ⊗ Y is a “joint distribution”. When are the two variables “independent”? ◮ If the distribution is the product of the marginals. ◮ If you can generate X and Y separately and get the same result.

Deterministic What does it mean that f : X → Y is deterministic? “If you run it twice with the same input, you get the same output”.

Markov categories A Markov category (Fritz 2019) is a category with the structure to interpret these examples: a symmetric monoidal category with a terminal unit and a choice of comonoid on every object. (These have been considered by several different authors)

Examples of Markov categories ◮ Stoch: measurable spaces and Markov kernels. ◮ FinStoch: finite sets and stochastic matrices. ◮ BorelStoch: Standard Borel spaces and Markov kernels. ◮ Gauss: Finite-dimensional real vector spaces and stochastic processes of the form “an affine map + Gaussian noise”. ◮ SetMulti: Sets and multivalued functions . ◮ More exotic examples.

Kolmogorov’s 0 to 1 law (classical) Theorem(Kolmogorov) Let X 1 , X 2 . . . be an infinite family of independent random variables. Suppose A ∈ σ ( X 1 , . . . ) ( A is an event which depends “measurably” on these variables), and A is independent of any finite subset of the X n s. Then P ( A ) ∈ { 0 , 1 } . Example: A is the event “the sequence X i converges”. The theorem says either the sequence converges almost surely, or it diverges almost surely.

Digression: Infinite tensor products An “infinite tensor product” X N := � n ∈ N X n is the cofiltered limit � � X F := � of the finite tensor products n ∈ F X n F ⊂ N finite if this limit exists and is preserved by tensor products − ⊗ Y An infinite tensor product is called a Kolmogorov product if all the projections to finite tensor products π F : X N → X F are deterministic. (This somewhat technical condition is necessary to fix the comonoid structure on X N )

Kolmogorov’s 0 to 1 law (abstract) With a suitable definition of infinite tensor products, we can prove: Theorem(Fritz-R) Let p : A → � i ∈ N X n and s : � i ∈ N X i → T be maps, with s deterministic and p presenting the indepenence of all the X s. Suppose in each diagram � i ∈ F X i is independent of T . Then sp : A → T is deterministic. Applying this theorem to BorelStoch recovers the classical statement.

Proof(sketch) ◮ First, we see that T is independent of the whole infinite product X N as well. ◮ This statement means that two maps A → X N ⊗ T agree. ◮ By assumption the codomain is a limit, so it suffices to check that all the projections A → X N ⊗ T → X F ⊗ T agree. ◮ This is true by assumption. ◮ A diagram manipulation now shows that T , being both independent of X N and a deterministic function of it, is a deterministic function of A .

Sufficient statistics ◮ A “statistical model” is simply a map p : Θ → X . ◮ A “statistic” is a deterministic map s : X → V . ◮ A statistic is sufficient if X ⊥ Θ | V That means that we have α such that

Fisher-Neyman Classically: Suppose we are in “a nice situation” (measures with density...) Fisher-Neyman Theorem A statistic s ( x ) is sufficient if and only if the density p θ ( x ) factors as h ( x ) f θ ( s ( x )) Abstract version: Suppose we are in “a nice Markov category”. Then: Abstract Fisher-Neyman (Fritz) s is sufficient iff there is α : V → X so that α sp = p , and so that s α = 1 V almost surely.

Thank you for listening! Some papers mentioned: ◮ Fritz(2019): A synthetic approach to Markov kernels, conditional independence and theorems on sufficient statistics arxiv:1908.07021. ◮ Fritz-R(2020): Infinite products and zero-one laws in categorical probability arxiv:1912.02769 ◮ Jacobs-Kissinger-Zanasi(2018): Causal inference by String Diagram Surgery arxiv:1811.08338

Introduction to Markov Categories Eigil Fjeldgren Rischel - PowerPoint PPT Presentation

Introduction to Markov Categories Eigil Fjeldgren Rischel University of Copenhagen Categorical Probabiliy and Statistics, June 2020 TLDR Consider a category where the maps are stochastic functions, or parameterized probability

Hidden Markov Models Pratik Lahiri Introduction A hidden Markov model (HMM) is a

Notes on derived categories and motives Daniel Krashen Table of Contents Introduction The

Introduction to Markov Models Kasthuri Kannan, PhD Assistant Professor

Markov chains and Markov decision processes in Isabelle/HOL Introduction Coalgebraic view on

Introduction to Hidden Markov Models Antonio Art es-Rodr guez Unviersidad Carlos III de

Markov Chains and Hidden Markov Models CE417: Introduction to Artificial Intelligence Sharif

Markov Models Yanbing Xue Outline Introduction Markov chains Dynamic belief networks

Markov Random Fields and its Applications Huiwen Chang Introduction Markov Random

Markov processes (Markov chains) Construct a Bayes net from these variables: parents? Markov

CSCE 970 Lecture 2: Markov Chains and Hidden Markov Models Stephen D. Scott 1 Introduction

Markov chains and Hidden Markov Models 9000 Markov chains and HMMs We will discuss: Markov

TURING CATEGORIES Turing categories Reducibility Partial combinatory algebras Recursion

Discrete time Markov chains Today: Short recap of probability theory Markov chain

Markov Systems, Markov Decision Processes, and Dynamic Programming Andrew W. Moore Note to

Outline Md Md Markov Markov Decision Decision Processes Processes Grid World Example

Markov Chains Markov Processes Discrete-time Markov Chains Continuous-time Markov Chains Dr

Discrete Time Markov Chains Discrete-Time Markov Chains Books - Introduction to Stochastic

Markov Logic Markov Logic Probability First-Order Logic Propositional Logic Markov Logic

Outline Markov networks (a.k.a. Markov random fields) Markov Networks Reading: Michael

Markov Models Kunsch, H.R., State Space and Hidden Markov Models . ETH- Zurich, Zurich;

Introduction to Markov-switching regression models using the mswitch command Gustavo Snchez

1 X 1 X 2 X 3 Ghostbusters HMM Chain Rule and HMMs E 1 E 2 E 3 P(X 1 ) = uniform 1/9 1/9

Partial ordering of inhomogeneous Markov chains with applications to Markov chain Monte Carlo

Markov Decision Process Assumption: agent gets to observe the state [Drawing from Sutton and