markov chains and hidden markov models
play

Markov Chains and Hidden Markov Models CE417: Introduction to - PowerPoint PPT Presentation

Markov Chains and Hidden Markov Models CE417: Introduction to Artificial Intelligence Sharif University of Technology Spring 2019 Soleymani Slides are based on Klein and Abdeel, CS188, UC Berkeley. Reasoning over Time or Space } Often, we want


  1. Markov Chains and Hidden Markov Models CE417: Introduction to Artificial Intelligence Sharif University of Technology Spring 2019 Soleymani Slides are based on Klein and Abdeel, CS188, UC Berkeley.

  2. Reasoning over Time or Space } Often, we want to reason about a sequence of observations } Speech recognition } Robot localization } User attention } Medical monitoring } Need to introduce time (or space) into our models 2

  3. Markov Models } Value of X at a given time is called the state X 1 X 2 X 3 X 4 } Parameters: called transition probabilities or dynamics, specify how the state evolves over time (also, initial state probabilities) } Stationarity assumption: transition probabilities the same at all times } Same as MDP transition model, but no choice of action 3

  4. Joint Distribution of a Markov Model X 1 X 2 X 3 X 4 } Joint distribution: P ( X 1 , X 2 , X 3 , X 4 ) = P ( X 1 ) P ( X 2 | X 1 ) P ( X 3 | X 2 ) P ( X 4 | X 3 ) } More generally: P ( X 1 , X 2 , . . . , X T ) = P ( X 1 ) P ( X 2 | X 1 ) P ( X 3 | X 2 ) . . . P ( X T | X T − 1 ) T Y = P ( X 1 ) P ( X t | X t − 1 ) t =2 4

  5. Chain Rule and Markov Models X 1 X 2 X 3 X 4 } From the chain rule, every joint distribution over can X 1 , X 2 , . . . , X T be written as: T Y P ( X 1 , X 2 , . . . , X T ) = P ( X 1 ) P ( X t | X 1 , X 2 , . . . , X t − 1 ) t =2 } Assuming that for all t : ⊥ X 1 , . . . , X t − 2 | X t − 1 X t ⊥ gives us the expression posited on the earlier slide: T Y P ( X 1 , X 2 , . . . , X T ) = P ( X 1 ) P ( X t | X t − 1 ) t =2 5

  6. Markov Models } Explicit assumption for all t : ⊥ X 1 , . . . , X t − 2 | X t − 1 X t ⊥ } Consequence, joint distribution can be written as: P ( X 1 , X 2 , . . . , X T ) = P ( X 1 ) P ( X 2 | X 1 ) P ( X 3 | X 2 ) . . . P ( X T | X T − 1 ) T Y = P ( X 1 ) P ( X t | X t − 1 ) t =2 } Implied conditional independencies: } Past variables independent of future variables given the present i.e., if or then: ⊥ X t 3 | X t 2 X t 1 ⊥ t 1 > t 2 > t 3 t 1 < t 2 < t 3 } Additional explicit assumption: is the same for P ( X t | X t − 1 ) all t 6

  7. Conditional Independence } Basic conditional independence: } Past and future independent of the present } Each time step only depends on the previous } This is called the (first order) Markov property } Note that the chain is just a (growable) BN } We can always use generic BN reasoning on it if we truncate the chain at a fixed length 7

  8. Example Markov Chain: Weather } States: X = {rain, sun} Initial distribution: 1.0 sun § CPT P(X t | X t-1 ): § Two new ways of representing the same CPT X t-1 X t P(X t |X t-1 ) 0.9 0.3 0.9 sun sun 0.9 sun sun rain sun 0.1 sun rain 0.1 0.3 rain sun 0.3 rain rain 0.7 rain rain 0.7 0.7 0.1 8

  9. Example Markov Chain: Weather } Initial distribution: 1.0 sun 0.9 0.3 rain sun 0.7 0.1 } What is the probability distribution after one step? 9

  10. Mini-Forward Algorithm } Question:What’s P(X) on some day t? X 1 X 2 X 3 X 4 P ( x t ) = X P ( x t − 1 , x t ) x t − 1 X = P ( x t | x t − 1 ) P ( x t − 1 ) x t − 1 Forward simulation 10

  11. Example Run of Mini-Forward Algorithm § From initial observation of sun P( X 1 ) P( X 2 ) P( X 3 ) P( X 4 ) P( X ¥ ) § From initial observation of rain P( X 1 ) P( X 2 ) P( X 3 ) P( X 4 ) P( X ¥ ) § From yet another initial distribution P(X 1 ): … P( X 1 ) P( X ¥ ) [Demo: L13D1,2 11

  12. Stationary Distributions Stationary distribution: } For most chains: § § The distribution we end up with is } Influence of the initial distribution called the stationary distribution gets less and less over time. P ∞ of the chain } The distribution we end up in is § It satisfies independent of the initial distribution X P ∞ ( X ) = P ∞ +1 ( X ) = P ( X | x ) P ∞ ( x ) x 12

  13. Example: Stationary Distributions } Question:What’s P(X) at time t = infinity? X 1 X 2 X 3 X 4 P ∞ ( sun ) = P ( sun | sun ) P ∞ ( sun ) + P ( sun | rain ) P ∞ ( rain ) P ∞ ( rain ) = P ( rain | sun ) P ∞ ( sun ) + P ( rain | rain ) P ∞ ( rain ) P ∞ ( sun ) = 0 . 9 P ∞ ( sun ) + 0 . 3 P ∞ ( rain ) X t-1 X t P(X t |X t-1 ) P ∞ ( rain ) = 0 . 1 P ∞ ( sun ) + 0 . 7 P ∞ ( rain ) sun sun 0.9 sun rain 0.1 P ∞ ( sun ) = 3 P ∞ ( rain ) rain sun 0.3 P ∞ ( rain ) = 1 / 3 P ∞ ( sun ) P ∞ ( sun ) = 3 / 4 rain rain 0.7 Also: P ∞ ( rain ) = 1 / 4 P ∞ ( sun ) + P ∞ ( rain ) = 1 13

  14. Inference in Ghostbusters } A ghost is in the grid somewhere } Sensor readings tell how close a square is to the ghost On the ghost: red } 1 or 2 away: orange } 3 or 4 away: yellow } 5+ away: green } § Sensors are noisy, but we know P(Color | Distance) P(red | 3) P(orange | 3) P(yellow | 3) P(green | 3) 0.05 0.15 0.5 0.3

  15. Video of Demo Ghostbusters Basic Dynamics 15

  16. Video of Demo Ghostbusters Circular Dynamics 16

  17. Video of Demo Ghostbusters Whirlpool Dynamics 17

  18. Application of Stationary Distribution: Web Link Analysis } PageRank over a web graph } Each web page is a state } Initial distribution: uniform over pages } Transitions: } With prob. c, uniform jump to a random page (dotted lines, not all shown) } With prob. 1-c, follow a random outlink (solid lines) } Stationary distribution } Will spend more time on highly reachable pages } E.g. many ways to get to the Acrobat Reader download page } Somewhat robust to link spam } Google 1.0 returned the set of pages containing all your keywords in decreasing rank, now all search engines use link analysis along with many other factors (rank actually getting less important over time) 18

  19. Hidden Markov Models 19

  20. Hidden Markov Models } Markov chains not so useful for most agents Need observations to update your beliefs } } Hidden Markov models (HMMs) Underlying Markov chain over states X } You observe outputs (effects) at each time step } X 1 X 2 X 3 X 4 X 5 E 1 E 2 E 3 E 4 E 5 20

  21. Example: Weather HMM P ( X t | X t − 1 ) Rain t-1 Rain t Rain t+1 P ( E t | X t ) Umbrella Umbrella Umbrella t-1 t t+1 } An HMM is defined by: R t R t+1 P(R t+1 |R t ) R t U t P(U t |R t ) } Initial distribution: +r +r 0.7 +r +u 0.9 } Transitions: +r -r 0.3 +r -u 0.1 P ( X t | X t − 1 ) } Emissions: -r +r 0.3 -r +u 0.2 P ( E t | X t ) -r -r 0.7 -r -u 0.8 21

  22. HMM: probabilistic model } Transitional probabilities : transition probabilities between states } 𝐵 "# ≡ 𝑄(𝑌 ( = 𝑘|𝑌 (,- = 𝑗) } Initial state distribution: start probabilities in different states } 𝜌 " ≡ 𝑄(𝑌 - = 𝑗) } Observation model : Emission probabilities associated with each state } 𝑄(𝐹 ( |𝑌 ( ) 22

  23. Joint Distribution of an HMM X 1 X 2 X 3 X 5 E 1 E 2 E 3 E 5 } Joint distribution: P ( X 1 , E 1 , X 2 , E 2 , X 3 , E 3 ) = P ( X 1 ) P ( E 1 | X 1 ) P ( X 2 | X 1 ) P ( E 2 | X 2 ) P ( X 3 | X 2 ) P ( E 3 | X 3 ) } More generally: T Y P ( X 1 , E 1 , . . . , X T , E T ) = P ( X 1 ) P ( E 1 | X 1 ) P ( X t | X t − 1 ) P ( E t | X t ) t =2 23

  24. Chain Rule and HMMs X 1 X 2 X 3 E 1 E 2 E 3 } From the chain rule, every joint distribution over can be written X 1 , E 1 , X 2 , E 2 , X 3 , E 3 as: P ( X 1 , E 1 , X 2 , E 2 , X 3 , E 3 ) = P ( X 1 ) P ( E 1 | X 1 ) P ( X 2 | X 1 , E 1 ) P ( E 2 | X 1 , E 1 , X 2 ) P ( X 3 | X 1 , E 1 , X 2 , E 2 ) P ( E 3 | X 1 , E 1 , X 2 , E 2 , X 3 ) } Assuming that ⊥ E 1 | X 1 , ⊥ X 1 , E 1 | X 2 , ⊥ X 1 , E 1 , E 2 | X 2 , ⊥ X 1 , E 1 , X 2 , E 2 | X 3 X 2 ⊥ E 2 ⊥ X 3 ⊥ E 3 ⊥ gives us the expression posited on the previous slide: P ( X 1 , E 1 , X 2 , E 2 , X 3 , E 3 ) = P ( X 1 ) P ( E 1 | X 1 ) P ( X 2 | X 1 ) P ( E 2 | X 2 ) P ( X 3 | X 2 ) P ( E 3 | X 3 ) 24

  25. Conditional Independencies X 1 X 2 X 3 E 1 E 2 E 3 } State independent of all past states and all past evidence given the previous state, i.e.: ⊥ X 1 , E 1 , . . . , X t − 2 , E t − 2 , E t − 1 | X t − 1 X t ⊥ } Evidence is independent of all past states and all past evidence given the current state, i.e.: ⊥ X 1 , E 1 , . . . , X t − 2 , E t − 2 , X t − 1 , E t − 1 | X t E t ⊥ 25

  26. Conditional Independence } HMMs have two important independence properties: Markov hidden process: future depends on past via the present } Current observation independent of all else given current state } X 1 X 2 X 3 X 4 X 5 E 1 E 2 E 3 E 4 E 5 } Quiz: does this mean that evidence variables are guaranteed to be independent? [No, they tend to correlated by the hidden state] } 26

  27. Example: Ghostbusters HMM 1/9 1/9 1/9 } P(X 1 ) = uniform 1/9 1/9 1/9 } P(X|X ’ ) = usually move clockwise, but sometimes 1/9 1/9 1/9 move in a random direction or stay in place P(X 1 ) 1/6 1/6 1/2 } P(R ij |X) = same sensor model as before: red means close, green means far away. 0 1/6 0 0 0 0 X 1 X 2 X 3 X 4 P(X|X ’ =<1,2>) X 5 R i,j R i,j R i,j R i,j 27

  28. Video of Demo Ghostbusters – Circular Dynamics -- HMM 28

  29. Filtering / Monitoring } Filtering, or monitoring, is the task of tracking the distribution B t (X) = P t (X t | e 1 , …, e t ) (the belief state) over time } We start with B 1 (X) in an initial setting, usually uniform } As time passes, or we get observations, we update B(X) } The Kalman filter was invented in the 60’s and first implemented as a method of trajectory estimation for the Apollo program 29

Recommend


More recommend