graphical models for sequential data
play

Graphical Models for Sequential Data Marco Chiarandini Department - PowerPoint PPT Presentation

Lecture 8 Graphical Models for Sequential Data Marco Chiarandini Department of Mathematics & Computer Science University of Southern Denmark Slides by Stuart Russell and Peter Norvig Course Overview Uncertainty over Time Introduction


  1. Lecture 8 Graphical Models for Sequential Data Marco Chiarandini Department of Mathematics & Computer Science University of Southern Denmark Slides by Stuart Russell and Peter Norvig

  2. Course Overview Uncertainty over Time ✔ Introduction Learning Supervised ✔ Artificial Intelligence Learning Bayesian Networks, ✔ Intelligent Agents Neural Networks ✔ Search Unsupervised ✔ Uninformed Search EM Algorithm ✔ Heuristic Search Reinforcement Learning Uncertain knowledge and Games and Adversarial Search Reasoning Minimax search and ✔ Probability and Bayesian Alpha-beta pruning approach Multiagent search ✔ Bayesian Networks Knowledge representation and Hidden Markov Chains Reasoning Kalman Filters Propositional logic First order logic Inference Plannning 2

  3. Outline Uncertainty over Time 1. Uncertainty over Time 3

  4. Outline Uncertainty over Time ♦ Time and uncertainty ♦ Inference: filtering, prediction, smoothing ♦ Hidden Markov models ♦ Kalman filters (a brief mention) ♦ Dynamic Bayesian networks (an even briefer mention) ♦ Particle filtering 4

  5. Time and uncertainty Uncertainty over Time The world changes; we need to track and predict it Diabetes management vs vehicle diagnosis Basic idea: copy state and evidence variables for each time step X t = set of unobservable state variables at time t e.g., BloodSugar t , StomachContents t , etc. E t = set of observable evidence variables at time t e.g., MeasuredBloodSugar t , PulseRate t , FoodEaten t This assumes discrete time ; step size depends on problem Notation: X a : b = X a , X a + 1 , . . . , X b − 1 , X b 5

  6. Uncertainty over Time Markov processes (Markov chains) Construct a Bayes net from these variables: - unbounded number of conditional probability table - unbounded number of parents Markov assumption: X t depends on bounded subset of X 0 : t − 1 First-order Markov process: Pr ( X t | X 0 : t − 1 ) = Pr ( X t | X t − 1 ) Second-order Markov process: Pr ( X t | X 0 : t − 1 ) = Pr ( X t | X t − 2 , X t − 1 ) X t −2 X t −1 X t X t +1 X t +2 First−order X t −2 X t −1 X t X t +1 X t +2 Second−order Sensor Markov assumption: Pr ( E t | X 0 : t , E 0 : t − 1 ) = Pr ( E t | X t ) � Stationary process: transition model Pr ( X t | X t − 1 ) and sensor model Pr ( E t | X t ) fixed for all t 6

  7. Example Uncertainty over Time R t −1 P(R ) t t 0.7 f 0.3 Rain t −1 Rain Rain t +1 t R P(U ) t t t 0.9 f 0.2 Umbrella t −1 Umbrella Umbrella t +1 t First-order Markov assumption not exactly true in real world! Possible fixes: 1. Increase order of Markov process 2. Augment state , e.g., add Temp t , Pressure t Example: robot motion. Augment position and velocity with Battery t 7

  8. Inference tasks Uncertainty over Time 1. Filtering: Pr ( X t | e 1 : t ) belief state—input to the decision process of a rational agent 2. Prediction: Pr ( X t + k | e 1 : t ) for k > 0 evaluation of possible action sequences; like filtering without the evidence 3. Smoothing: Pr ( X k | e 1 : t ) for 0 ≤ k < t better estimate of past states, essential for learning 4. Most likely explanation: arg max x 1 : t P ( x 1 : t | e 1 : t ) speech recognition, decoding with a noisy channel 8

  9. Filtering Uncertainty over Time Aim: devise a recursive state estimation algorithm: Pr ( X t + 1 | e 1 : t + 1 ) = f ( e t + 1 , Pr ( X t | e 1 : t )) Pr ( X t + 1 | e 1 : t + 1 ) = Pr ( X t + 1 | e 1 : t , e t + 1 ) = α Pr ( e t + 1 | X t + 1 , e 1 : t ) Pr ( X t + 1 | e 1 : t ) = α Pr ( e t + 1 | X t + 1 ) Pr ( X t + 1 | e 1 : t ) I.e., prediction + estimation. Prediction by summing out X t : � Pr ( X t + 1 | e 1 : t + 1 ) = α Pr ( e t + 1 | X t + 1 ) Pr ( X t + 1 | x t , e 1 : t ) P ( x t | e 1 : t ) x t � = α Pr ( e t + 1 | X t + 1 ) Pr ( X t + 1 | x t ) P ( x t | e 1 : t ) x t f 1 : t + 1 = Forward ( f 1 : t , e t + 1 ) where f 1 : t = Pr ( X t | e 1 : t ) Time and space constant (independent of t ) by keeping track of f 9

  10. Filtering example Uncertainty over Time R t −1 P(R ) t t 0.7 f 0.3 Rain t −1 Rain Rain t +1 t R P(U ) t t t 0.9 f 0.2 Umbrella t −1 Umbrella Umbrella t +1 t 0.500 0.627 0.500 0.373 True 0.500 0.818 0.883 False 0.500 0.182 0.117 Rain 0 Rain 1 Rain 2 Umbrella 1 Umbrella 2 10

  11. Prediction Uncertainty over Time Pr ( X t + k + 1 | e 1 : t ) = � x t + k Pr ( X t + k + 1 | x t + k ) P ( x t + k | e 1 : t ) As k → ∞ , P ( x t + k | e 1 : t ) tends to the stationary distribution of the Markov chain Mixing time depends on how stochastic the chain is 11

  12. Smoothing Uncertainty over Time X 0 X 1 X k X t E E k E t 1 Divide evidence e 1 : t into e 1 : k , e k + 1 : t : Pr ( X k | e 1 : t ) = Pr ( X k | e 1 : k , e k + 1 : t ) = α Pr ( X k | e 1 : k ) Pr ( e k + 1 : t | X k , e 1 : k ) = α Pr ( X k | e 1 : k ) Pr ( e k + 1 : t | X k ) = α f 1 : k b k + 1 : t Backward message computed by a backwards recursion: � Pr ( e k + 1 : t | X k ) = Pr ( e k + 1 : t | X k , x k + 1 ) Pr ( x k + 1 | X k ) x k + 1 � = P ( e k + 1 : t | x k + 1 ) Pr ( x k + 1 | X k ) x k + 1 � = P ( e k + 1 | x k + 1 ) P ( e k + 2 : t | x k + 1 ) Pr ( x k + 1 | X k ) x k + 1 12

  13. Smoothing example Uncertainty over Time 0.500 0.627 0.500 0.373 True 0.500 0.818 0.883 forward False 0.500 0.182 0.117 0.883 0.883 smoothed 0.117 0.117 0.690 1.000 backward 0.410 1.000 Rain 0 Rain 1 Rain 2 Umbrella 1 Umbrella 2 If we want to smooth the whole sequence: Forward–backward algorithm: cache forward messages along the way Time linear in t (polytree inference), space O ( t | f | ) 13

  14. Most likely explanation Uncertainty over Time Most likely sequence � = sequence of most likely states (joint distr.)! Most likely path to each x t + 1 = most likely path to some x t plus one more step x 1 ... x t Pr ( x 1 , . . . , x t , X t + 1 | e 1 : t + 1 ) max � � = Pr ( e t + 1 | X t + 1 ) max Pr ( X t + 1 | x t ) max x 1 ... x t − 1 P ( x 1 , . . . , x t − 1 , x t | e 1 : t ) x t Identical to filtering, except f 1 : t replaced by m 1 : t = x 1 ... x t − 1 Pr ( x 1 , . . . , x t − 1 , X t | e 1 : t ) , max I.e., m 1 : t ( i ) gives the probability of the most likely path to state i . Update has sum replaced by max, giving the Viterbi algorithm: m 1 : t + 1 = Pr ( e t + 1 | X t + 1 ) max x t ( Pr ( X t + 1 | x t ) m 1 : t ) 14

  15. Viterbi example Uncertainty over Time Rain 1 Rain 2 Rain 3 Rain 4 Rain 5 true true true true true state space paths false false false false false true true false true true umbrella .8182 .5155 .0361 .0334 .0210 most likely paths .1818 .0491 .1237 .0173 .0024 m 1:1 m 1:2 m 1:3 m 1:4 m 1:5 15

  16. Hidden Markov models Uncertainty over Time X t is a single, discrete variable (usually E t is too) Domain of X t is { 1 , . . . , S } – can be a macro variable representing several state vars. HMMs allow for an elegant matrix representation � 0 . 7 � 0 . 3 Transition matrix T ij = P ( X t = j | X t − 1 = i ) , e.g., 0 . 3 0 . 7 Sensor matrix O t (for convenience) for each time step, diagonal elements P ( e t | X t = i ) � 0 . 9 � 0 e.g., for U 1 = true , O 1 = 0 0 . 2 Forward and backward messages as column vectors: α O t + 1 T ⊤ f 1 : t f 1 : t + 1 = b k + 1 : t = TO k + 1 b k + 2 : t Forward-backward algorithm needs time O ( S 2 t ) and space O ( St ) 16

  17. Real HMM examples Uncertainty over Time Speech recognition HMMs: Observations are acoustic signals (continuous valued) States are specific positions in specific words (so, tens of thousands) Machine translation HMMs: Observations are words (tens of thousands) States are translation options Robot tracking: Observations are features of environment (discrete) or range readings (continuous) States are cells (discrete) or positions on a map (continuous) 17

  18. Localization Uncertainty over Time (a) Possible locations of robot after E 1 = NSW (b) Possible locations of robot After E 1 = NSW, E 2 = NS 18

  19. Localization Uncertainty over Time (a) Posterior distribution over robot location after E 1 = NSW (b) Posterior distribution over robot location after E 1 = NSW, E 2 = NS Pr ( X 0 = i ) = 1 / n � 1 / N ( i ) if i is adjacent to j Pr ( X t + 1 = j | X t = i ) = T ij = 0 otherwise Pr ( E t = e t | X t = i ) = O ti = ( 1 − ǫ ) 4 − d it ǫ d it 19

  20. Kalman filters Uncertainty over Time Modelling systems described by a set of continuous variables, e.g., tracking a bird flying— X t = X , Y , Z , ˙ X , ˙ Y , ˙ Z . Airplanes, robots, ecosystems, economies, chemical plants, planets, . . . X X t t +1 X X t t +1 Z Z t t +1 Gaussian prior, linear Gaussian transition model and sensor model 20

  21. Updating Gaussian distributions Uncertainty over Time Prediction step: if Pr ( X t | e 1 : t ) is Gaussian, then prediction � Pr ( X t + 1 | e 1 : t ) = Pr ( X t + 1 | x t ) P ( x t | e 1 : t ) d x t x t is Gaussian. If Pr ( X t + 1 | e 1 : t ) is Gaussian, then the updated distribution Pr ( X t + 1 | e 1 : t + 1 ) = α Pr ( e t + 1 | X t + 1 ) Pr ( X t + 1 | e 1 : t ) is Gaussian Hence Pr ( X t | e 1 : t ) is multivariate Gaussian N ( µ t , Σ t ) for all t General (nonlinear, non-Gaussian) process: description of posterior grows unboundedly as t → ∞ 21

Recommend


More recommend