Markov processes (Markov chains) Construct a Bayes net from these - PDF document

Markov processes (Markov chains) Construct a Bayes net from these variables: parents? CPTs? Temporal probability models Chapter 15, Sections 1–5 Chapter 15, Sections 1–5 1 Chapter 15, Sections 1–5 4 Outline Markov processes (Markov chains) ♦ Time and uncertainty Construct a Bayes net from these variables: parents? CPTs? ♦ Inference: filtering, prediction, smoothing Markov assumption: X t depends on bounded subset of X 0: t − 1 ♦ Hidden Markov models First-order Markov process: P ( X t | X 0: t − 1 ) = P ( X t | X t − 1 ) Second-order Markov process: P ( X t | X 0: t − 1 ) = P ( X t | X t − 2 , X t − 1 ) ♦ Dynamic Bayesian networks X t −2 X t −1 X t X t +1 X t +2 First−order X t −2 X t −1 X t X t +1 X t +2 Second−order Stationary process: transition model P ( X t | X t − 1 ) fixed for all t Chapter 15, Sections 1–5 2 Chapter 15, Sections 1–5 5 Time and uncertainty Hidden Markov Model (HMM) The world changes; we need to track and predict it Sensor Markov assumption: P ( E t | X 0: t , E 1: t − 1 ) = P ( E t | X t ) Diabetes management vs vehicle diagnosis Stationary process: transition model P ( X t | X t − 1 ) and sensor model P ( E t | X t ) fixed for all t Basic idea: copy state and evidence variables for each time step HMM is a special type of Bayes net, X t is single discrete random variable: X t = set of unobservable state variables at time t e.g., BloodSugar t , StomachContents t , etc. E t = set of observable evidence variables at time t e.g., MeasuredBloodSugar t , PulseRate t , FoodEaten t This assumes discrete time ; step size depends on problem Notation: X a : b = X a , X a +1 , . . . , X b − 1 , X b with joint probability distribution P ( X 0: t , E 1: t ) =? Chapter 15, Sections 1–5 3 Chapter 15, Sections 1–5 6

Hidden Markov Model (HMM) Filtering Aim: devise a recursive state estimation algorithm: Sensor Markov assumption: P ( E t | X 0: t , E 1: t − 1 ) = P ( E t | X t ) P ( X t +1 | e 1: t +1 ) = f ( e t +1 , P ( X t | e 1: t )) Stationary process: transition model P ( X t | X t − 1 ) and sensor model P ( E t | X t ) fixed for all t HMM is a special type of Bayes net, X t is single discrete random variable: with joint probability distribution P ( X 0: t , E 1: t ) = P ( X 0 ) Π t i =1 P ( X i | X i − 1 ) P ( E i | X i ) Chapter 15, Sections 1–5 7 Chapter 15, Sections 1–5 10 Example Filtering Aim: devise a recursive state estimation algorithm: R t −1 P(R ) t t 0.7 f 0.3 P ( X t +1 | e 1: t +1 ) = f ( e t +1 , P ( X t | e 1: t )) Rain t −1 Rain Rain t +1 t R P(U ) t t t 0.9 f 0.2 P ( X t +1 | e 1: t +1 ) = P ( X t +1 | e 1: t , e t +1 ) Umbrella t −1 Umbrella Umbrella t +1 = α P ( e t +1 | X t +1 , e 1: t ) P ( X t +1 | e 1: t ) t = α P ( e t +1 | X t +1 ) P ( X t +1 | e 1: t ) First-order Markov assumption not exactly true in real world! Possible fixes: 1. Increase order of Markov process 2. Augment state , e.g., add Temp t , Pressure t Example: robot motion. Augment position and velocity with Battery t Chapter 15, Sections 1–5 8 Chapter 15, Sections 1–5 11 Inference tasks Filtering Filtering: P ( X t | e 1: t ) Aim: devise a recursive state estimation algorithm: belief state—input to the decision process of a rational agent P ( X t +1 | e 1: t +1 ) = f ( e t +1 , P ( X t | e 1: t )) Prediction: P ( X t + k | e 1: t ) for k > 0 evaluation of possible action sequences; like filtering without the evidence P ( X t +1 | e 1: t +1 ) = P ( X t +1 | e 1: t , e t +1 ) = α P ( e t +1 | X t +1 , e 1: t ) P ( X t +1 | e 1: t ) Smoothing: P ( X k | e 1: t ) for 0 ≤ k < t = α P ( e t +1 | X t +1 ) P ( X t +1 | e 1: t ) better estimate of past states, essential for learning I.e., prediction + estimation. Prediction by summing out X t : Most likely explanation: arg max x 1: t P ( x 1: t | e 1: t ) speech recognition, decoding with a noisy channel P ( X t +1 | e 1: t +1 ) = α P ( e t +1 | X t +1 ) Σ x t P ( X t +1 , x t | e 1: t ) = α P ( e t +1 | X t +1 ) Σ x t P ( X t +1 | x t , e 1: t ) P ( x t | e 1: t ) = α P ( e t +1 | X t +1 ) Σ x t P ( X t +1 | x t ) P ( x t | e 1: t ) Chapter 15, Sections 1–5 9 Chapter 15, Sections 1–5 12

Filtering Most likely explanation Aim: devise a recursive state estimation algorithm: Most likely sequence � = sequence of most likely states!!!! P ( X t +1 | e 1: t +1 ) = f ( e t +1 , P ( X t | e 1: t )) Most likely path to each x t +1 = most likely path to some x t plus one more step x 1 ... x t P ( x 1 , . . . , x t , X t +1 | e 1: t +1 ) max P ( X t +1 | e 1: t +1 ) = P ( X t +1 | e 1: t , e t +1 )   = P ( e t +1 | X t +1 ) max  P ( X t +1 | x t ) max x 1 ... x t − 1 P ( x 1 , . . . , x t − 1 , x t | e 1: t ) = α P ( e t +1 | X t +1 , e 1: t ) P ( X t +1 | e 1: t )  x t = α P ( e t +1 | X t +1 ) P ( X t +1 | e 1: t ) Identical to filtering, except f 1: t replaced by I.e., prediction + estimation. Prediction by summing out X t : m 1: t = x 1 ... x t − 1 P ( x 1 , . . . , x t − 1 , X t | e 1: t ) , max P ( X t +1 | e 1: t +1 ) = α P ( e t +1 | X t +1 ) Σ x t P ( X t +1 , x t | e 1: t ) I.e., m 1: t ( i ) gives the probability of the most likely path to state i . = α P ( e t +1 | X t +1 ) Σ x t P ( X t +1 | x t , e 1: t ) P ( x t | e 1: t ) Update has sum replaced by max, giving the Viterbi algorithm: = α P ( e t +1 | X t +1 ) Σ x t P ( X t +1 | x t ) P ( x t | e 1: t ) m 1: t +1 = P ( e t +1 | X t +1 ) max x t ( P ( X t +1 | x t ) m 1: t ) f 1: t +1 = Forward ( f 1: t , e t +1 ) where f 1: t = P ( X t | e 1: t ) Time and space constant (independent of t ) Chapter 15, Sections 1–5 13 Chapter 15, Sections 1–5 16 Filtering example Viterbi example 0.500 0.627 Rain 1 Rain 2 Rain 3 Rain 4 Rain 5 0.500 0.373 true true true true true state True 0.500 0.818 0.883 space False 0.500 0.182 0.117 paths false false false false false Rain 0 Rain 1 Rain 2 true true false true true umbrella .8182 .5155 .0361 .0334 .0210 most likely paths Umbrella 1 Umbrella 2 .1818 .0491 .1237 .0173 .0024 m 1:1 m 1:2 m 1:3 m 1:4 m 1:5 P ( X t +1 | e 1: t +1 ) = α P ( e t +1 | X t +1 ) Σ x t P ( X t +1 | x t ) P ( x t | e 1: t ) P ( R t ) P ( U t ) R t − 1 R t t 0.7 t 0.9 f 0.3 f 0.2 Chapter 15, Sections 1–5 14 Chapter 15, Sections 1–5 17 Most likely explanation Implementation Issues Viterbi message: m 1: t +1 = P ( e t +1 | X t +1 ) max x t ( P ( X t +1 | x t ) m 1: t ) or filtering update: P ( X t +1 | e 1: t +1 ) = α P ( e t +1 | X t +1 ) Σ x t P ( X t +1 | x t ) P ( x t | e 1: t ) What is 10 − 6 · 10 − 6 · 10 − 6 ? Chapter 15, Sections 1–5 15 Chapter 15, Sections 1–5 18

Implementation Issues Hidden Markov models X t is a single, discrete variable (usually E t is too) Viterbi message: m 1: t +1 = P ( e t +1 | X t +1 ) max x t ( P ( X t +1 | x t ) m 1: t ) Domain of X t is { 1 , . . . , S } or filtering update: P ( X t +1 | e 1: t +1 ) = α P ( e t +1 | X t +1 ) Σ x t P ( X t +1 | x t ) P ( x t | e 1: t )   0 . 7 0 . 3  Transition matrix T ij = P ( X t = j | X t − 1 = i ) , e.g.,     0 . 3 0 . 7  Sensor matrix O t for each time step, diagonal elements P ( e t | X t = i ) What is 10 − 6 · 10 − 6 · 10 − 6 ?   0 . 9 0  e.g., with U 1 = true , O 1 =     0 0 . 2  What is floating point arithmetic precision? Forward messages as column vectors: f 1: t +1 = α O t +1 T ⊤ f 1: t Chapter 15, Sections 1–5 19 Chapter 15, Sections 1–5 22 Implementation Issues Dynamic Bayesian networks X t , E t contain arbitrarily many variables in a replicated Bayes net Viterbi message: m 1: t +1 = P ( e t +1 | X t +1 ) max x t ( P ( X t +1 | x t ) m 1: t ) or filtering update: P ( X t +1 | e 1: t +1 ) = α P ( e t +1 | X t +1 ) Σ x t P ( X t +1 | x t ) P ( x t | e 1: t ) BMeter 1 R 0 P(R ) P(R ) 1 0 t 0.7 Battery 0 Battery 0.7 f 0.3 1 Rain 0 Rain 1 X 0 X 1 What is 10 − 6 · 10 − 6 · 10 − 6 ? R 1 P(U ) 1 t 0.9 f 0.2 X 0 X X 1 What is floating point arithmetic precision? t Umbrella 1 10 − 6 · 10 − 6 · 10 − 6 = 0 Z 1 Chapter 15, Sections 1–5 20 Chapter 15, Sections 1–5 23 Answer? Summary Use either: Temporal models use state and sensor variables replicated over time – Rescaling, multiply values by a (large) constant Markov assumptions and stationarity assumption, so we need – logsum trick (Assignment 5) – transition model P ( X t | X t − 1 ) log is monotone increasing, so: – sensor model P ( E t | X t ) arg max f ( x ) = arg max log f ( x ) Tasks are filtering, prediction, smoothing, most likely sequence; Also, all done recursively with constant cost per time step log( a · b ) = log a + log b Hidden Markov models have a single discrete state variable; used Therefore, work with sums of logarithms of probabilities, rather than products for speech recognition of probabilities: Dynamic Bayes nets subsume HMMs; exact update intractable m 1: t +1 = P ( e t +1 | X t +1 ) max x t ( P ( X t +1 | x t ) m 1: t ) → log m 1: t +1 = log P ( e t +1 | X t +1 ) + max x t (log P ( X t +1 | x t ) + log m 1: t ) Chapter 15, Sections 1–5 21 Chapter 15, Sections 1–5 24

Example Umbrella Problems Filtering: f 1: t +1 := P ( X t +1 | e 1: t +1 ) = α P ( e t +1 | X t +1 ) Σ x t P ( X t +1 | x t ) P ( x t | e 1: t ) Viterbi: m 1: t +1 = P ( e t +1 | X t +1 ) max x t ( P ( X t +1 | x t ) m 1: t ) P ( R t = t ) P ( R t = f ) P ( U t = t ) P ( U t = f ) R t − 1 R t t 0.7 0.3 t 0.9 0.1 f 0.3 0.7 f 0.2 0.8 P ( R 3 |¬ u 1 , u 2 , ¬ u 3 ) = ? arg max R 1:3 P ( R 1:3 |¬ u 1 , u 2 , ¬ u 3 ) = ? Chapter 15, Sections 1–5 25

Markov processes (Markov chains) Construct a Bayes net from these - PDF document

Markov processes (Markov chains) Construct a Bayes net from these variables: parents? CPTs? Temporal probability models Chapter 15, Sections 15 Chapter 15, Sections 15 1 Chapter 15, Sections 15 4 Outline Markov processes (Markov

Markov Chains Markov Processes Discrete-time Markov Chains Continuous-time Markov Chains Dr

Markov chains and Hidden Markov Models 9000 Markov chains and HMMs We will discuss: Markov

CSCE 471/871 Lecture 3: Markov Chains Markov Chains and and Hidden Markov Models Hidden

Discrete Time Markov Chains Discrete-Time Markov Chains Books - Introduction to Stochastic

Imprecise Markov chains From basic theory to applications II prof. Jasper De Bock Imprecise

Overview Motivation Verifying Continuous-Time Markov Chains 1 Lecture 1+2: Discrete-Time Markov

Stochastic Processes Markov Processes Hamid R. Rabiee 1 Overview o Markov Property o Markov

Discrete time Markov chains Today: Discrete Time Markov Chains, Limiting Discrete time Markov

Hidden Markov Models Discrete Markov Processes 1 Hidden Markov Models Hidden Markov Models 2

Overview Verifying Continuous-Time Markov Chains Negative exponential distributions 1 Lecture

Markov decision processes and interval Markov chains: exploiting the connection Mingmei Teo

Markov Chains and Hidden Markov Models COMP 571 Luay Nakhleh, Rice University Markov Chains and

Markov Chains and Hidden Markov Models COMP 571 Luay Nakhleh, Rice University 2 Markov Chains

Simulation of Discrete-Time Markov Chains Discrete-Time Markov Chains (DTMCs) Numerical Solution

Under Interval and Fuzzy From the . . . Symmetric Markov Chains Uncertainty, Symmetric In

Markov processes (Markov chains) Construct a Bayes net from these variables: parents? Markov

Search for a pair of BEH production with ATLAS University of Birmingham N. Andari (NIU)

Morphology David Yarowsky 9/8/2020 Acknowledgements and thanks to: Chris Quirk Marta

Precision observables of compositeness Roman Pasechnik Dept. of Astronomy and Theoretical physics,

QCD in a strong magnetic field Part I: magnetic field and anomaly Yoshimasa Hidaka (RIKEN)

On the Dependency of Soccer Scores - A Sparse Bivariate Poisson Model for the UEFA EURO 2016 A.

Euro 2016 Predictions Using Team Rating Systems Jan Lasek deepsense.io Machine Learning and

June 1, 2016 Meeting Materials The mission of the Boston Green Ribbon

Playoff Draw Procedures European Zone Playoff Draw Procedures Playoff Format: 8 best runners-up