robust hidden markov models inference in the presence of
play

Robust Hidden Markov Models Inference in the Presence of Label Noise - PowerPoint PPT Presentation

Robust Hidden Markov Models Inference in the Presence of Label Noise Benot Frnay February 7, 2014 What is Machine Learning ? What is Machine Learning ? Machine learning is about learning from data . A model is inferred from a training set


  1. Robust Hidden Markov Models Inference in the Presence of Label Noise Benoît Frénay February 7, 2014

  2. What is Machine Learning ?

  3. What is Machine Learning ? Machine learning is about learning from data . A model is inferred from a training set to make predictions .

  4. What is Machine Learning ? Machine learning is about learning from data . A model is inferred from a training set to make predictions .

  5. What is Machine Learning ? Machine learning is about learning from data . A model is inferred from a training set to make predictions .

  6. Examples of Tasks: Regression Example : predict children weight from anthropometric measures.

  7. Examples of Tasks: Regression Example : predict children weight from anthropometric measures.

  8. Examples of Tasks: Regression Example : predict children weight from anthropometric measures.

  9. Examples of Tasks: Classification Examples : disease diagnosis, spam filtering, image classification.

  10. Examples of Tasks: Classification Examples : disease diagnosis, spam filtering, image classification.

  11. Examples of Tasks: Classification Examples : disease diagnosis, spam filtering, image classification.

  12. What does it Mean for a Machine to Learn ? Machine learning studies how machine can learn automatically . Learning means to find a model of data . Three steps : specify a type of model (e.g. a linear model) specify a criterion (e.g. mean square error) find the best model w.r.t. the criterion

  13. Example of Learning Process: Linear Regression Model : linear model f ( x 1 , . . . , x n ) = w 1 x 1 + · · · + w d x d + w 0 Criterion : mean square error n � ( y i − f ( x 1 , . . . , x n )) 2 i = 1 Algorithm : linear regression n � − 1 X ′ y ( y i − f ( x 1 , . . . , x n )) 2 = � X ′ X � w = arg min w i = 1

  14. Overview of the Presentation Segmentation of electrocardiogram signals :

  15. Overview of the Presentation Segmentation of electrocardiogram signals : goal: allow automated diagnosis of heart disease

  16. Overview of the Presentation Segmentation of electrocardiogram signals : goal: allow automated diagnosis of heart disease tools: hidden Markov models and wavelet transform

  17. Overview of the Presentation Segmentation of electrocardiogram signals : goal: allow automated diagnosis of heart disease tools: hidden Markov models and wavelet transform issue: robustness to label noise (i.e. expert errors)

  18. Overview of the Presentation Segmentation of electrocardiogram signals : goal: allow automated diagnosis of heart disease tools: hidden Markov models and wavelet transform issue: robustness to label noise (i.e. expert errors) solution: modelling of expert behaviour

  19. Overview of the Presentation Segmentation of electrocardiogram signals :

  20. Overview of the Presentation Segmentation of electrocardiogram signals : goal: allow automated diagnosis of heart disease

  21. Overview of the Presentation Segmentation of electrocardiogram signals : goal: allow automated diagnosis of heart disease tools: hidden Markov models and wavelet transform

  22. Overview of the Presentation Segmentation of electrocardiogram signals : goal: allow automated diagnosis of heart disease tools: hidden Markov models and wavelet transform issue: robustness to label noise (i.e. expert errors)

  23. Overview of the Presentation Segmentation of electrocardiogram signals : goal: allow automated diagnosis of heart disease tools: hidden Markov models and wavelet transform issue: robustness to label noise (i.e. expert errors) solution: modelling of expert behaviour

  24. Electrocardiogram Signal Segmentation

  25. What is an Electrocardiogram Signal ? An ECG is a measure of the electrical activity of the human heart . Patterns of interest: P wave, QRS complex, T wave, baseline.

  26. Where Does it Come from ? The ECG results from the superposition of several signals.

  27. What it Looks Like in Real-World Cases Real ECGs are polluted by various sources of noise .

  28. What is our Goal in ECG Segmentation ? Task : split/segment an entire ECG into patterns . Available data : a few manual segmentations from experts. Issue : some of the annotations of the experts are incorrect . Probabilistic model of sequences with labels hidden Markov Models (with wavelet transform)

  29. Hidden Markov Models

  30. Hidden Markov Models in a Nutshell Hidden Markov models (HMMs) are probabilistic models of sequences. S 1 , . . . , S T is the sequence of annotations (ex.: state of the heart). P ( S t = s t | S t − 1 = s t − 1 )

  31. Hidden Markov Models in a Nutshell Hidden Markov models (HMMs) are probabilistic models of sequences. S 1 , . . . , S T is the sequence of annotations (ex.: state of the heart). P ( S t = s t | S t − 1 = s t − 1 ) O 1 , . . . , O T is the sequence of observations (ex.: measured voltage). P ( O t = o t | S t = s t )

  32. Hypotheses Behind Hidden Markov Models (1) Markov hypothesis : the next state only depend on the current state.

  33. Hypotheses Behind Hidden Markov Models (2) Observations are conditionally independent w.r.t. the hidden states: P ( O 1 , . . . , O T | S 1 , . . . , S T ) = � T t = 1 P ( O t | S t )

  34. Learning Hidden Markov Models Learning an HMM means to estimate probabilities: P ( S t ) are prior probabilities P ( S t | S t − 1 ) are transition probabilities P ( O t | S t ) are emission probabilities . Parameters Θ = ( q , a , b ) : q i is the prior of state i a ij is the transition probability from state i to state j b i is the observation distributions for state i

  35. Standard Inference Algorithms for HMMs Supervised learning : assumes the observed labels are correct ; maximises the likelihood P ( S , O | Θ) ; learns the correct concepts; sensitive to label noise. Baum-Welch algorithm: unsupervised , i.e. observed labels are discarded; iteratively (i) label samples and (ii) learn a model; may learn concepts which differs significantly; theoretically insensitive to label noise.

  36. Supervised Learning for Hidden Markov Models Supervised: uses annotations , which are assumed to be reliable . � T � T Maximises the likelihood P ( S , O | Θ) = q s 1 t = 2 a s t − 1 s t t = 1 b s t ( o t ) .

  37. Supervised Learning for Hidden Markov Models Supervised: uses annotations , which are assumed to be reliable . � T � T Maximises the likelihood P ( S , O | Θ) = q s 1 t = 2 a s t − 1 s t t = 1 b s t ( o t ) . Transition probabilities P ( S t | S t − 1 ) are estimated by counting a ij = #( transitions from i to j ) / #( transitions from i ) Emission probabilities P ( O t | S t ) are obtained by PDF estimation standard models in ECG analysis: Gaussian mixture models (GMMs)

  38. Unsupervised Learning for Hidden Markov Models (1) Unsupervised : uses only observations, guesses hidden states . Maximises the likelihood P ( O | Θ) = � S P ( S , O | Θ) .

  39. Unsupervised Learning for Hidden Markov Models (1) Unsupervised : uses only observations, guesses hidden states . Maximises the likelihood P ( O | Θ) = � S P ( S , O | Θ) . Non-convex function to optimise: � T T � � � � log P ( O | Θ) = log q s 1 a s t − 1 s t b s t ( o t ) S t = 2 t = 1 Solution: expectation-maximisation algorithm (a.k.a. Baum-Welch).

  40. Unsupervised Learning for Hidden Markov Models (2) The log-likelihood is intractable, but what about a convex lower bound ? Source: Pattern Recognition and Machine Learning, C. Bishop, 2006. Two steps : find a tractable lower bound maximise this lower bound w.r.t. Θ

  41. Unsupervised Learning for Hidden Markov Models (3) Idea: use Jensen inequality to find a lower bound to the log-likelihood. � log P ( O | Θ) = log P ( S , O | Θ) S

  42. Unsupervised Learning for Hidden Markov Models (3) Idea: use Jensen inequality to find a lower bound to the log-likelihood. � log P ( O | Θ) = log P ( S , O | Θ) S q ( S ) P ( S , O | Θ) � = log q ( S ) S

  43. Unsupervised Learning for Hidden Markov Models (3) Idea: use Jensen inequality to find a lower bound to the log-likelihood. � log P ( O | Θ) = log P ( S , O | Θ) S q ( S ) P ( S , O | Θ) � = log q ( S ) S q ( S ) log P ( S , O | Θ) � ≥ q ( S ) S

  44. Unsupervised Learning for Hidden Markov Models (3) Idea: use Jensen inequality to find a lower bound to the log-likelihood. � log P ( O | Θ) = log P ( S , O | Θ) S q ( S ) P ( S , O | Θ) � = log q ( S ) S q ( S ) log P ( S , O | Θ) � ≥ q ( S ) S q ( S ) log P ( S | O , Θ) � = + const q ( S ) S

  45. Unsupervised Learning for Hidden Markov Models (3) Idea: use Jensen inequality to find a lower bound to the log-likelihood. � log P ( O | Θ) = log P ( S , O | Θ) S q ( S ) P ( S , O | Θ) � = log q ( S ) S q ( S ) log P ( S , O | Θ) � ≥ q ( S ) S q ( S ) log P ( S | O , Θ) � = + const q ( S ) S Best lower bound with q ( S ) = P ( S | O , Θ) .

  46. The Expectation-Maximisation / Baum-Welch Algorithm Expectation step : estimate the posteriors γ t ( i ) = P ( S t = i | O , Θ old ) ǫ t ( i , j ) = P ( S t − 1 = i , S t = j | O , Θ old )

Recommend


More recommend