Data mining for Obstructive Sleep Apnea Detection 18 October 2017 Konstantinos Nikolaidis
Introduction: What is Obstructive Sleep Apnea? Obstructive Sleep Apnea (OSA) is a relatively common ● sleep disorder that is characterized by recurrent episodes of partial or complete collapse of the upper airway during sleep. Estimates of disease prevalence are in the range of 3% to ● 7%. However, It is estimated that 70-80% of OSA cases remains ● undiagnosed. Factors that increase vulnerability for the disorder include ● age, male sex, obesity, family history, menopause and certain health behaviors such as cigarette smoking and alcohol use etc
Introduction: What is Obstructive Sleep Apnea? Figure1.[1]
Diagnosis: There is a variety of tools used to diagnose sleep apnea, ranging from the gold ● standard polysomnography to questionnaires used for screening of patients at higher risk Regarding Polysonography it is usually done by hospitalization in sleep ● laboratories with polysomnographic instuments with multiparametric tests. The diagnosis includes: – sensors on the nose – on the head (for monitoring occular movement and brain activity(EEG)) – on the chest and abdomen(elastic belts for the measurement of the respiration) – on the finger for oxygen saturation – also ECG (electrocardiograph) and EMG(electromyograph).
Diagnosis: Polysomnography Figure2 [2] The overall process of the polysomnography diagnosis is ● resource demanding and also intrusive for the patient.
Diagnosis: In recent years many research teams aim at creating new ● hardware/software for easier and patient friendly OSA diagnosis. Some ideas include: Mobile applications which use sensors with automated – diagnosis based on data mining techniques. Portable devices created for this purpose. – More exotic methods like usage of smart T-shirts as – sensors, use of built in phone microphones to detect OSA via snoring, or even detection during wakefulness.
Classification problem From the above it is clear that OSA detection can be ● described as a classification problem. Looking at the OSA event detection we understand that ● we have two probable classes: Periods with apnea events – Periods with normal breathing – Supervised learning problem: we need annotations! ●
Classification problem: General Under this assumption, we have have studied how do ● different data mining techniques compare for the classification of OSA in different datasets. These datasets (found in Physionet.org) include different ● patients which have been monitored by using a variety of sensors, and their breathing periods were classified by experts. So the datasets provide the annotations we need. ●
Classification problem: Sensors We studied how different sensor combinations affect the ● result of the classification. We focused on the less cumbersome sensors, and ● especially on the sensors that are related to respiration (nose, abdomen, chest and SpO2). We did not apply feature extraction before the ● classification, and so we trained our classifiers with the raw data for the different sensor combinations, for the different datasets.
Classification problem: Data mining The data mining techniques we used were: ● KNN algorithm. – Neural network. – Decision tree. – Support Vecto Machine. –
Classification problem: Data mining: K Nearest Neighbor Define proximity between instances, find neighbors of ● new instance and assign majority class Case based reasoning: when attributes are more ● complicated than real-valued Cons: Pros: ● ● Slow during application + Fast training – No feature selection – Notion of proximity – vague.
Classification problem: Data mining: K Nearest Neighbor Figure3 [3]
Classification problem: Data mining: Support Vector Machine. We want to find the optimal separation hyperplane which ● maximizes the margin between the 2 classes. A hyperplane can be defined by the following equation: ● w x + b = 0 where w,b are parameters. x i y i We assume that are our input vectors and are the ● corresponding labels. we examine linear separable case for 2 classes. So yi=+1 or yi=-1
Classification problem: Data mining: Support Vector Machine. we will select 2 parallel ● hyperplanes w*χ+b=1 and w*x+b=-1 the hyperplanes for the 2 classes. The region between these ● hyperplanes is called margin, and we want to maximize this region. It can be proven that the total ● distance between these hyperplanes is 2/||w|| So if we want to maximize the ● Figure4 [4] margin, we will have to minimize ||w||.
Classification problem: Data mining: Support Vector Machine. We have to maximize the distance via minimizing ||w||, but ● also we will have to satisfy our class separation constraints: w*xi+b>=1 for every xi which yi=1 – w*xi+b<=1 for every xi which yi=-1 – So, our problem can be defined as: minimize ||w|| subject ● to yi*( w*xi+b ) >= 1 for each i in our dataset.
Classification problem: Data mining: Support Vector Machine. This is a quadratic programming problem which is ● equivalent to solving the following( the primal Lagrangian equivalent to our problem): l l 1 + ∑ a i y i ( x i w + b )+ ∑ min L p ( x )= a i 2 ‖) (‖ w i = 0 i = 0 for each ai>=0 , and l is the total number of training points. After solving for the partial derivatives of our parameters w and b equal to zero (local minimum), we get: l l w = ∑ ∑ a i y i x i a i y i = 0 and . b can be easily derived. i = 0 i = 0
Classification problem: Data mining: Support Vector Machine. We can also express our problem in the dual Lagrangian ● form by substituting our parameters in the Lagrangian equation. The two forms are equivalent. We get the following: l l l l a i − 1 L D ( a i )= ∑ 2 ∑ ∑ a i ≥ 0, ∑ a i x i x j y i y j a i y i = 0 max s.t i = 1 i = 1 j = 1 i = 0 Now we can find the ai by our maximisation constraint through partial derivatives equal to zero.
Classification problem: Data mining: Support Vector Machine. We express our problem to this form because we are ● independent from w and b, and we can solve it only via computing the inner product of xi,xj. This is very useful for non linear separable cases where ● we want to map our data to higher dimensions in order to make them linearly separable via using kernels. Also, most of the ai (Lagrangian multipliers) will be zero. ● The ones that are not zero correspond to the support vectors.
Classification problem: Data mining: Support Vector Machine. Pros: Cons: ● ● + Reach global optimum -Choice of kernel + Not many parameters -Relatively slow training + Good for small -Does not scale well with datasets the increase of data
Classification problem: Data mining: Neural Networks. Input nodes are connected to ● output nodes by a set of hidden nodes and edges Inputs describe DB instances Car ● House Outputs are the categories ● Sports we want to recognize Music Hidden nodes assign weights ● Comic Hidden to each edge so they Output Layer represent the weight of Nodes Nodes relationships between the Input input and the output of a Nodes large set of training data
Classification problem: Data mining: Neural Networks. Initializing: ● Normalize the data – Initialize the weights (set them to zero or uniformly distributed random value – in (-1,1)) Training phase: ● Forward phase where the input vector xi is inserted, and we get the output. – Backpropagation: Based on the error function – Classes E ( n )= 1 2 ∑ 2 ( y i ( n )− d i ( n )) i = 1 propagate the error to the previous layers and update the weights based on the gradient descend. Mining /Testing phase, where we test our model on unknown data ●
Classification problem: Data mining: Neural Networks Basic NN Unit: A more typical NN: n x1 x1 w1 o ( w x ) i i x2 x2 w2 i 1 1 x3 x3 w3 O u t p u t ( y ) n o d e s y 1 e H i d d e n n o d e s
Classification problem: Data mining: Neural Networks Decision boundaries: Linear regression Classification tree Neural network Useful for learning complex data like handwriting, speech ● and image recognition. In order to have curved boundaries however, we must use a nonlinear activation function.
Classification problem: Data mining: Neural Networks Pros: Cons: ● ● Can learn more Slow training time – – complicated boundaries Hard to interpret – – Can handle large number Hard to implement: trial – of features and error for choosing Fast application number of nodes –
Classification problem: Data mining: Decision Trees Tree where internal nodes are simple decision rules on ● one or more attributes and leaf nodes are predicted class labels. S a l a r y < 1 M P r o f = t e a c h e r A g e < 3 0 Good Bad Bad Good
Recommend
More recommend