Robust Hidden Markov Models Inference in the Presence of Label Noise - PowerPoint PPT Presentation

Robust Hidden Markov Models Inference in the Presence of Label Noise Benoît Frénay 25 August 2014

Machine Learning in a Nutshell 1

Challenges in Machine Learning: Robust Inference 2

Overview of the Presentation Segmentation of electrocardiogram signals : 3

Overview of the Presentation Segmentation of electrocardiogram signals : goal: allow automated diagnosis of heart disease 3

Overview of the Presentation Segmentation of electrocardiogram signals : goal: allow automated diagnosis of heart disease tools: hidden Markov models and wavelet transform 3

Overview of the Presentation Segmentation of electrocardiogram signals : goal: allow automated diagnosis of heart disease tools: hidden Markov models and wavelet transform issue: robustness to label noise (i.e. expert errors) 3

Overview of the Presentation Segmentation of electrocardiogram signals : goal: allow automated diagnosis of heart disease tools: hidden Markov models and wavelet transform issue: robustness to label noise (i.e. expert errors) solution: modelling of expert behaviour 3

Electrocardiogram Signal Segmentation

What is an Electrocardiogram Signal ? An ECG is a measure of the electrical activity of the human heart . Patterns of interest: P wave, QRS complex, T wave, baseline. 5

Where Does it Come from ? The ECG results from the superposition of several signals. 6

What it Looks Like in Real-World Cases Real ECGs are polluted by various sources of noise . 7

What is our Goal in ECG Segmentation ? Task : split/segment an entire ECG into patterns . Available data : a few manual segmentations from experts. Issue : some of the annotations of the experts are incorrect . Probabilistic model of sequences with labels hidden Markov Models (with wavelet transform) 8

Hidden Markov Models

Hidden Markov Models in a Nutshell Hidden Markov models (HMMs) are probabilistic models of sequences. S 1 , . . . , S T is the sequence of annotations (ex.: state of the heart). P ( S t = s t | S t − 1 = s t − 1 ) 10

Hidden Markov Models in a Nutshell Hidden Markov models (HMMs) are probabilistic models of sequences. S 1 , . . . , S T is the sequence of annotations (ex.: state of the heart). P ( S t = s t | S t − 1 = s t − 1 ) O 1 , . . . , O T is the sequence of observations (ex.: measured voltage). P ( O t = o t | S t = s t ) 10

Hypotheses Behind Hidden Markov Models (1) Markov hypothesis : the next state only depend on the current state. 11

Hypotheses Behind Hidden Markov Models (2) Observations are conditionally independent w.r.t. the hidden states: P ( O 1 , . . . , O T | S 1 , . . . , S T ) = � T t = 1 P ( O t | S t ) 12

Learning Hidden Markov Models Learning an HMM means to estimate probabilities: P ( S t ) are prior probabilities P ( S t | S t − 1 ) are transition probabilities P ( O t | S t ) are emission probabilities . Parameters Θ = ( q , a , b ) : q i is the prior of state i a ij is the transition probability from state i to state j b i is the observation distributions for state i 13

Standard Inference Algorithms for HMMs Supervised learning : assumes the observed labels are correct ; maximises the likelihood P ( S , O | Θ) ; learns the correct concepts; sensitive to label noise. Baum-Welch algorithm: unsupervised , i.e. observed labels are discarded; iteratively (i) label samples and (ii) learn a model; may learn concepts which differs significantly; theoretically insensitive to label noise. 14

Supervised Learning for Hidden Markov Models Supervised: uses annotations , which are assumed to be reliable . � T � T Maximises the likelihood P ( S , O | Θ) = q s 1 t = 2 a s t − 1 s t t = 1 b s t ( o t ) . 15

Supervised Learning for Hidden Markov Models Supervised: uses annotations , which are assumed to be reliable . � T � T Maximises the likelihood P ( S , O | Θ) = q s 1 t = 2 a s t − 1 s t t = 1 b s t ( o t ) . Transition probabilities P ( S t | S t − 1 ) are estimated by counting a ij = #( transitions from i to j ) / #( transitions from i ) Emission probabilities P ( O t | S t ) are obtained by PDF estimation standard models in ECG analysis: Gaussian mixture models (GMMs) 15

Unsupervised Learning for Hidden Markov Models (1) Unsupervised : uses only observations, guesses hidden states . Maximises the likelihood P ( O | Θ) = � S P ( S , O | Θ) . 16

Unsupervised Learning for Hidden Markov Models (1) Unsupervised : uses only observations, guesses hidden states . Maximises the likelihood P ( O | Θ) = � S P ( S , O | Θ) . Non-convex function to optimise: � T T � � � � log P ( O | Θ) = log q s 1 a s t − 1 s t b s t ( o t ) S t = 2 t = 1 Solution: expectation-maximisation algorithm (a.k.a. Baum-Welch). 16

Unsupervised Learning for Hidden Markov Models (2) The log-likelihood is intractable, but what about a convex lower bound ? Source: Pattern Recognition and Machine Learning, C. Bishop, 2006. Two steps : find a tractable lower bound maximise this lower bound w.r.t. Θ 17

Unsupervised Learning for Hidden Markov Models (3) Idea: use Jensen inequality to find a lower bound to the log-likelihood. � log P ( O | Θ) = log P ( S , O | Θ) S 18

Unsupervised Learning for Hidden Markov Models (3) Idea: use Jensen inequality to find a lower bound to the log-likelihood. � log P ( O | Θ) = log P ( S , O | Θ) S q ( S ) P ( S , O | Θ) � = log q ( S ) S 18

Unsupervised Learning for Hidden Markov Models (3) Idea: use Jensen inequality to find a lower bound to the log-likelihood. � log P ( O | Θ) = log P ( S , O | Θ) S q ( S ) P ( S , O | Θ) � = log q ( S ) S q ( S ) log P ( S , O | Θ) � ≥ q ( S ) S 18

Unsupervised Learning for Hidden Markov Models (3) Idea: use Jensen inequality to find a lower bound to the log-likelihood. � log P ( O | Θ) = log P ( S , O | Θ) S q ( S ) P ( S , O | Θ) � = log q ( S ) S q ( S ) log P ( S , O | Θ) � ≥ q ( S ) S q ( S ) log P ( S | O , Θ) � = + const q ( S ) S 18

Unsupervised Learning for Hidden Markov Models (3) Idea: use Jensen inequality to find a lower bound to the log-likelihood. � log P ( O | Θ) = log P ( S , O | Θ) S q ( S ) P ( S , O | Θ) � = log q ( S ) S q ( S ) log P ( S , O | Θ) � ≥ q ( S ) S q ( S ) log P ( S | O , Θ) � = + const q ( S ) S Best lower bound with q ( S ) = P ( S | O , Θ) . 18

The Expectation-Maximisation / Baum-Welch Algorithm Expectation step : estimate the posteriors γ t ( i ) = P ( S t = i | O , Θ old ) ǫ t ( i , j ) = P ( S t − 1 = i , S t = j | O , Θ old ) 19

The Expectation-Maximisation / Baum-Welch Algorithm Expectation step : estimate the posteriors γ t ( i ) = P ( S t = i | O , Θ old ) ǫ t ( i , j ) = P ( S t − 1 = i , S t = j | O , Θ old ) Maximisation step for q i and a ij : � T γ 1 ( i ) t = 2 ǫ t ( i , j ) q i = a ij = � |S| � |S| � T i = 1 γ 1 ( i ) j = 1 ǫ t ( i , j ) t = 2 The hidden states are estimated and used to compute the parameters. 19

Wavelet Transform

Why do we Need High-Dimensional Representations ? Using HMMs with raw ECG signals gives 70 % of accuracy . The Markov and conditionally independency hypotheses are strong: transitions do not depend only on the current state emissions are not independent, even when states are given 21

Why do we Need High-Dimensional Representations ? Using HMMs with raw ECG signals gives 70 % of accuracy . The Markov and conditionally independency hypotheses are strong: transitions do not depend only on the current state emissions are not independent, even when states are given Solution: use a multi-dimensional representation of the ECG signal. Example: O ( t ) → ( O ( t ) , O ′ ( t ) , O ′′ ( t )) . the observation vector contains contextual information numerical estimations of derivative are unstable 21

Wavelet Transform in a Nutshell Signals can be studied at different time scales (or frequencies). Fourrier transform only considers the whole signal (no localisation ) � ∞ f ( t ) e − 2 π i ω t dt f ( ω ) = −∞ 22

Wavelet Transform in a Nutshell Signals can be studied at different time scales (or frequencies). Fourrier transform only considers the whole signal (no localisation ) � ∞ f ( t ) e − 2 π i ω t dt f ( ω ) = −∞ The wavelet transform uses a localised function ψ (a.k.a. wavelet) � ∞ 1 � t − b � f ψ ( a , b ) = ψ f ( t ) dt � a | a | −∞ where b is the translation factor and a is the scale factor. 22

Example of Time-Frequency Analysis (1) Source: A Wavelet Tour of Signal Processing, Stéphane Mallat, 1999. 23

Example of Time-Frequency Analysis (2) Source: A Wavelet Tour of Signal Processing, Stéphane Mallat, 1999. 24

Information Extraction with Wavelet Transform filtered using a 3-30 Hz band-pass filter transformed using a continuous wavelet transform dyadic scales from 2 1 to 2 7 are kept and normalised 25

Label Noise-Tolerant Hidden Markov Models

Motivation For real datasets, perfect labelling is difficult : subjectivity of the labelling task; lack of information; communication noise . In particular, label noise arise in biomedical applications. Previous works by e.g. Lawrence et al. incorporated a noise model into a generative model for i.i.d. observations (classification). 27

Example of Label Noise in ECGs 28

Robust Hidden Markov Models Inference in the Presence of Label Noise - PowerPoint PPT Presentation

Robust Hidden Markov Models Inference in the Presence of Label Noise Benot Frnay 25 August 2014 Machine Learning in a Nutshell 1 Challenges in Machine Learning: Robust Inference 2 Overview of the Presentation Segmentation of

Hidden Markov Models Discrete Markov Processes 1 Hidden Markov Models Hidden Markov Models 2

CSCE 471/871 Lecture 3: Markov Chains Markov Chains and and Hidden Markov Models Hidden

Outline depmixS4: an R-package for hidden Markov models Hidden Markov Models Ingmar Visser 1

Markov chains and Hidden Markov Models 9000 Markov chains and HMMs We will discuss: Markov

Hidden Markov Models Steven J Zeil Old Dominion Univ. Fall 2010 1 Discrete Markov Processes

Hidden Markov Models Pratik Lahiri Introduction A hidden Markov model (HMM) is a

Markov Models Kunsch, H.R., State Space and Hidden Markov Models . ETH- Zurich, Zurich;

Markov Chains and Hidden Markov Models COMP 571 Luay Nakhleh, Rice University Markov Chains and

Markov Chains and Hidden Markov Models COMP 571 Luay Nakhleh, Rice University 2 Markov Chains

Markov Chains Markov Processes Discrete-time Markov Chains Continuous-time Markov Chains Dr

Markov Chains and Hidden Markov Models COMP 571 - Spring 2015 Luay Nakhleh, Rice University

The Hidden Markov The Hidden Markov Model (HMM) Model (HMM) 1 Lecture Outline Lecture Outline

Robust Hidden Markov Models Inference in the Presence of Label Noise Benot Frnay February 7,

Hidden Markov Models Markov Model (Finite State Machine with Probs) Modeling a sequence of

Sequential Data Oliver Schulte - CMPT 726 Bishop PRML Ch. 13 Russell and Norvig, AIMA Hidden

Graphical Models - Part II Oliver Schulte - CMPT 726 Bishop PRML Ch. 8 Markov Random Fields

medical records to uncover missed diagnosis: The SPEED-EXTRACT Study Aldo F Saavedra, Richard

S MART G EN : Exposing Server URLs of Mobile Apps with Selective Symbolic Execution Chaoshun Zuo

Imaging Stroke: Is There a Stroke Equivalent of the ECG? Albert J. Yoo, MD Director of Acute

Mathematical modeling from ion channel to ECG h l t ECG an Introduction Mark Potse model

Model validation through "Posterior predictive checking" and "Leave-one-out"

Maldonado-Bascon et al. et al. Presented by Dara Nyknahad ECG 789 Outline Introduction

Sensor, Signal and Information Processing (SenSIP) Center and NSF Industry Consortium (I/UCRC)

Computational Cognitive Morphosemantics Modeling morphological compositionality in Hebrew verbs

Robust Hidden Markov Models Inference in the Presence of Label Noise - PowerPoint PPT Presentation

Robust Hidden Markov Models Inference in the Presence of Label Noise Benot Frnay 25 August 2014 Machine Learning in a Nutshell 1 Challenges in Machine Learning: Robust Inference 2 Overview of the Presentation Segmentation of

Hidden Markov Models Discrete Markov Processes 1 Hidden Markov Models Hidden Markov Models 2

CSCE 471/871 Lecture 3: Markov Chains Markov Chains and and Hidden Markov Models Hidden

Outline depmixS4: an R-package for hidden Markov models Hidden Markov Models Ingmar Visser 1

Markov chains and Hidden Markov Models 9000 Markov chains and HMMs We will discuss: Markov

Hidden Markov Models Steven J Zeil Old Dominion Univ. Fall 2010 1 Discrete Markov Processes

Hidden Markov Models Pratik Lahiri Introduction A hidden Markov model (HMM) is a

Markov Models Kunsch, H.R., State Space and Hidden Markov Models . ETH- Zurich, Zurich;

Markov Chains and Hidden Markov Models COMP 571 Luay Nakhleh, Rice University Markov Chains and

Markov Chains and Hidden Markov Models COMP 571 Luay Nakhleh, Rice University 2 Markov Chains

Markov Chains Markov Processes Discrete-time Markov Chains Continuous-time Markov Chains Dr

Markov Chains and Hidden Markov Models COMP 571 - Spring 2015 Luay Nakhleh, Rice University

The Hidden Markov The Hidden Markov Model (HMM) Model (HMM) 1 Lecture Outline Lecture Outline

Robust Hidden Markov Models Inference in the Presence of Label Noise Benot Frnay February 7,

Hidden Markov Models Markov Model (Finite State Machine with Probs) Modeling a sequence of

Sequential Data Oliver Schulte - CMPT 726 Bishop PRML Ch. 13 Russell and Norvig, AIMA Hidden

Graphical Models - Part II Oliver Schulte - CMPT 726 Bishop PRML Ch. 8 Markov Random Fields

medical records to uncover missed diagnosis: The SPEED-EXTRACT Study Aldo F Saavedra, Richard

S MART G EN : Exposing Server URLs of Mobile Apps with Selective Symbolic Execution Chaoshun Zuo

Imaging Stroke: Is There a Stroke Equivalent of the ECG? Albert J. Yoo, MD Director of Acute

Mathematical modeling from ion channel to ECG h l t ECG an Introduction Mark Potse model

Model validation through &quot;Posterior predictive checking&quot; and &quot;Leave-one-out&quot;

Maldonado-Bascon et al. et al. Presented by Dara Nyknahad ECG 789 Outline Introduction

Sensor, Signal and Information Processing (SenSIP) Center and NSF Industry Consortium (I/UCRC)

Computational Cognitive Morphosemantics Modeling morphological compositionality in Hebrew verbs

Model validation through "Posterior predictive checking" and "Leave-one-out"