Classifying HIV Vaccination Status with Regularized Logistic - PowerPoint PPT Presentation

Classifying HIV Vaccination Status with Regularized Logistic Regression Ariful Azad, Arif Khan, Bartek Rajwa, Alex Pothen Purdue University FlowCAP-III, NIH, November 29-30, 2012 This research was supported by grant 1R21EB015707 from the National Institute of Biomedical Imaging and Bioengineering and NSF grant CCF-1218916 FlowCAP-III Ariful Azad, Arif Khan, Bartek Rajwa, Alex Pothen 1

Overview Overview Problem: Predict the vaccination status (pre- and post- vaccination) of samples from HIV patients. Half of the samples with known vaccination status are given as training set. Method: We used the fraction of cells in different combination of Boolean gates, and Median Fluorescence Intensity (MFI) as features or explanatory variables. We then train a logistic regression model with Lasso regularization (RLR) with the training set and obtained a sparse model with four predictive features. Results: The optimized RLR model performs good on training set with four (out of 37) misclassification. On the test set, the model classify 29 out of 37 samples with high confidence. FlowCAP-III Ariful Azad, Arif Khan, Bartek Rajwa, Alex Pothen 2

Problem Description Dataset Application of a HIV vaccine on 74 subjects at two time points (before and after vaccination), 37 in training set and 37 subjects in test set. At each time point we have a POL-3 stimulated sample and two negative controls. Each samples has six markers. CD 3 , CD 4 , CD 8 are for identifying T cell subpopulations. The remaining markers are cytokines TNFa , IFNg , and IL 2 A ¡POL-‑3 ¡S,mulated ¡ Sample ¡ Before ¡ Vaccina,on ¡ Two ¡Nega,ve ¡ Controls ¡ A ¡POL-‑3 ¡S,mulated ¡ Sample ¡ Subject 1 AAer ¡ Vaccina,on ¡ Two ¡Nega,ve ¡ Controls ¡ FlowCAP-III Ariful Azad, Arif Khan, Bartek Rajwa, Alex Pothen 3

Preprocessing Automated CD 4 + and CD 8 + T cell gatings We used norm1filter and norm2filter from flowCore to perform the automated gatings. Remove doublet Remove dead cells 250000 250000 250000 200000 200000 200000 FSC.A FSC.A FSC.A 150000 150000 150000 100000 100000 100000 50000 50000 50000 0 0 0 0 50000 100000150000200000250000 0 1 2 3 4 5 0 50000100000 200000 FSC.H ViViD SSC.A Tcells CD4+ Tcells CD8+ Tcells 5 5 5 4 4 4 3 3 3 CD3 CD4 CD4 2 2 2 1 1 1 0 0 0 −1 −1 −2 0 2 4 −2 0 2 4 −2 0 2 4 CD8 CD8 CD8 FlowCAP-III Ariful Azad, Arif Khan, Bartek Rajwa, Alex Pothen 4

Preprocessing Automated Cytokine gating We applied patient specific normalization to all six samples from a particular subject and used norm2filter to identify TNFa + , IFNg + , and IL 2 + cells. Cytokine positive cells are extremely rare in CD 8 + cells, and we mainly used them when CD 4 + is unable to classify a pair of samples. CD4+ Tcells CD4+ Tcells CD4+ Tcells 4 4 4 3 3 3 TNFa IFNg IL2 2 2 2 1 1 1 0 0 0 50000 100000 50000 100000 50000 100000 SSC.A SSC.A SSC.A FlowCAP-III Ariful Azad, Arif Khan, Bartek Rajwa, Alex Pothen 5

Feature Selection Feature Selection For each sample, we computed a Boolean (positive/negative) gating for each of the three cytokines. The Boolean gates can then be combined in 3 3 = 27 ways by considering positive, neutral and negative levels of expression. We, however, kept only those combinations with at least one positive cytokine. We consider the fraction of cells within a Boolean gate combination as a feature In addition we included median fluorescence intensity (MFI) of three cytokines as features in our model. Hence, we have about 21 features FlowCAP-III Ariful Azad, Arif Khan, Bartek Rajwa, Alex Pothen 6

Feature Selection Model selection The dependent variable is the vaccination status of a sample (vaccinated or not-vaccinated) Therefore, this is a binary classification problem. We used Logistic Regression for this classification. FlowCAP-III Ariful Azad, Arif Khan, Bartek Rajwa, Alex Pothen 7

Logistic Regression Model Logistic Regression Widely used for binary classification, e.g., Vaccinated and not-Vaccinated Explanatory variable x i , such as fraction of cells in a combination of Boolean gate. e.g., TNFa + IFNg − IL 2 + Dependent variable y i , Vaccinated, y i =1 and not-Vaccinated, y i =0 Probability of i th sample being Vaccinated = p i p i log odds for the event y i =1, logit ( p i ) = log ( 1 − p i ) FlowCAP-III Ariful Azad, Arif Khan, Bartek Rajwa, Alex Pothen 8

Logistic Regression Model Logistic Regression logit ( p i ) = β 0 + β 1 x i 1 + ... + β d x id = β 0 + x T β 1 p i = 1+ e − ( β 0+ xT β ) , logistic function FlowCAP-III Ariful Azad, Arif Khan, Bartek Rajwa, Alex Pothen 9

Logistic Regression Model Maximum Likelihood Solution The dependent variable follows a binomial distribution, y i ∼ bin (1 , p i ) maximize the log likelihood: n max � { y i log ( p i ) + (1 − y i ) log (1 − p i ) } ( β 0 ,β ) ∈ R d +1 i =1 which is equivalent to n { y i ( β 0 + x T i β ) − log (1 + ( β 0 + x T max � i β )) } ( β 0 ,β ) ∈ R d +1 i =1 FlowCAP-III Ariful Azad, Arif Khan, Bartek Rajwa, Alex Pothen 10

Logistic Regression Model Lasso Regularization Pick the predictive features by penalize models with too many parameters [Friedman et. al. 2009] maximize the log likelihood: � n � { y i ( β 0 + x T i β ) − log (1 + ( β 0 + x T max � i β )) } − λ � β � 1 ( β 0 ,β ) ∈ R d +1 i =1 Select a sparse solution with few non-zero values for β i We used R package glmnet by Jerome Friedman, Trevor Hastie, and Rob Tibshirani. FlowCAP-III Ariful Azad, Arif Khan, Bartek Rajwa, Alex Pothen 11

Results Model Parameter Selection The model parameters to be selected are β 0 , β 1 ...β d and λ For fixed λ , β 0 , β 1 ...β d are estimated by maximizing the log likelihood λ is selected from n-fold cross validation (minimize � o i log ( o i e i )) No of features selected 15 16 12 11 10 9 7 6 5 4 4 2 1 1.4 ● ● 1.3 ● ● ● Binomial Deviance ● 1.2 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 1.1 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 1.0 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.9 −10 −8 −6 −4 −2 log(Lambda) FlowCAP-III Ariful Azad, Arif Khan, Bartek Rajwa, Alex Pothen 12

Results Significance of the selected features A sparse solution with only four features being used Feature Coefficient in the model MFI TNFa + 2.293 TNFa + IFNg + IL 2 + 1.421 TNFa + IFNg − IL 2 − 0.397 TNFa − IFNg − IL 2 + -0.844 Table: Optimal Solution of the Regularized (Lasso) Logistic Regression FlowCAP-III Ariful Azad, Arif Khan, Bartek Rajwa, Alex Pothen 13

Results Model verification by incremental feature selection Build logistic regression model by incrementally adding features. Incrementally complex models from simpler models. Decrease the misclassification as we include features. Incremental Model features p-value AIC Tr Misclassification MFI TNFa + 2.46e-07 79.95 8 TNFa + IFNg + IL 2 + 2.20-08 73.33 6 TNFa + IFNg − IL 2 − 3.15e-08 72.81 5 TNFa − IFNg − IL 2 + 4.69e-09 67.93 4 FlowCAP-III Ariful Azad, Arif Khan, Bartek Rajwa, Alex Pothen 14

Results Predicting vaccination status The RLR model predicts the probability of a sample being vaccinated. Low probability for non-vaccinated and high probability for vaccinated samples. From a pair of samples (before and after vaccination) from a patient, the sample with high probability is predicted as vaccinated. Example: Let p ( s 1), and p ( s 2) be the probabilities predicted by a trained RLR model for a pair of samples, s 1 and s 2 from a patient. If p ( s 1) > p ( s 2) then the model predicts s 1 as vaccinated and vise versa. | p ( s 1) − p ( s 2) | indicates the confidence on the prediction. FlowCAP-III Ariful Azad, Arif Khan, Bartek Rajwa, Alex Pothen 15

Results Prediction in the training set Four misclassification in the training set. Misclassified samples are marked with green circles. FlowCAP-III Ariful Azad, Arif Khan, Bartek Rajwa, Alex Pothen 16

Results Prediction in the test set Prediction in the test set. We have eight pair of samples predicted with low confidence (green circles). Thus about 75% samples are classified with high confidence. FlowCAP-III Ariful Azad, Arif Khan, Bartek Rajwa, Alex Pothen 17

Summary Summary We used a logistic regression model with Lasso regularization (RLR) to classify samples to HIV vaccinated/not-vaccinated classes. The RLR model was able to automatically select the features predictive to the vaccination status. Results: The optimized RLR model performs good on training set with four (out of 37) misclassification. On the test set, the model classifies 29 out of 37 samples with high confidence. FlowCAP-III Ariful Azad, Arif Khan, Bartek Rajwa, Alex Pothen 18

Thanks Thank You ! FlowCAP-III Ariful Azad, Arif Khan, Bartek Rajwa, Alex Pothen 19

Classifying HIV Vaccination Status with Regularized Logistic - PowerPoint PPT Presentation

Classifying HIV Vaccination Status with Regularized Logistic Regression Ariful Azad, Arif Khan, Bartek Rajwa, Alex Pothen Purdue University FlowCAP-III, NIH, November 29-30, 2012 This research was supported by grant 1R21EB015707 from the

V0D 2016 Classifying Studies V0D V0D 2016 Classifying Studies 1 2016 Classifying Studies

Vaccination: Vaccination: Vaccination: An Obligation or a Privilege? An Obligation or a

Airing My Vaccination Dirty Laundry to Finding the Fly in My Vaccination Urinal: A Vaccination

Vaccination and Patients with Chronic Condition: an Overview @eupatientsforum Why vaccination is

Susanne Mnstermann Felix Njeumi OIE Paris, FAO Rome Post vaccination evaluation tool

HIV tropism assessment HIV tropism assessment HIV tropism assessment HIV tropism assessment

HIV- -1 Integrase: 1 Integrase: HIV not just an not just an other HIV enzyme other HIV

Regularized generalized CCA (RGCCA) Arthur Tenenhaus (SUPELEC) Michel Tenenhaus (HEC Paris) 1

HIV mother-to-child HIV identified in 1983 transmission of HIV AIDS syndrome described

A. H. Physical Achievement Panchayat level Vaccination Camps organized from May, 07 for HS-BQ

Healthcare Personnel Safety Component Healthcare Personnel Vaccination Module Influenza Vaccination

Vaccination Coverage & Impact of Vaccination on Disease Epidemiology: Focus on Africa

Healthcare Personnel Safety Component Healthcare Personnel Vaccination Module Influenza Vaccination

Classifying Homogeneous Structures Cherlin Introduction The finite case Gregory Cherlin

HIV-1 subtype kindergarden Hauke Walter Arevir 2015 HIV-1 subtype distribution Source: Is HIV-1

Practical Issues with HIV Practical Issues with HIV Testing, CD4 Count and Viral Testing, CD4

Proliferation of Medications Explosion of new therapies have Biological Agents for Rheumatic

Could It Be The Drops? Not-Too-Uncommon Complications from Glaucoma Medications Cathy Sun, MD

Epiretinal 1. How is epiretinal membrane (ERM) best diagnosed? 2. How is ERM differentially

Total Kidney Volume (TKV) in Autosomal Dominant Polycystic Kidney Disease as model for biomarker

Surgical Critical Care Initiative: bringing precision medicine to the critically ill Eric

Critical Care Setting John G Toffaletti, PhD Director of Blood Gas and Clinical Pediatric Labs

Anti-Inflammatory Therapy with Canakinumab for Atherosclerotic Disease Paul M Ridker, MD, MPH

Ex Vivo Profiling of PD-1 Blockade Using Organotypic Tumor Spheroids Developing a Functional

Classifying HIV Vaccination Status with Regularized Logistic - PowerPoint PPT Presentation

Classifying HIV Vaccination Status with Regularized Logistic Regression Ariful Azad, Arif Khan, Bartek Rajwa, Alex Pothen Purdue University FlowCAP-III, NIH, November 29-30, 2012 This research was supported by grant 1R21EB015707 from the

V0D 2016 Classifying Studies V0D V0D 2016 Classifying Studies 1 2016 Classifying Studies

Vaccination: Vaccination: Vaccination: An Obligation or a Privilege? An Obligation or a

Airing My Vaccination Dirty Laundry to Finding the Fly in My Vaccination Urinal: A Vaccination

Vaccination and Patients with Chronic Condition: an Overview @eupatientsforum Why vaccination is

Susanne Mnstermann Felix Njeumi OIE Paris, FAO Rome Post vaccination evaluation tool

HIV tropism assessment HIV tropism assessment HIV tropism assessment HIV tropism assessment

HIV- -1 Integrase: 1 Integrase: HIV not just an not just an other HIV enzyme other HIV

Regularized generalized CCA (RGCCA) Arthur Tenenhaus (SUPELEC) Michel Tenenhaus (HEC Paris) 1

HIV mother-to-child HIV identified in 1983 transmission of HIV AIDS syndrome described

A. H. Physical Achievement Panchayat level Vaccination Camps organized from May, 07 for HS-BQ

Healthcare Personnel Safety Component Healthcare Personnel Vaccination Module Influenza Vaccination

Vaccination Coverage &amp; Impact of Vaccination on Disease Epidemiology: Focus on Africa

Healthcare Personnel Safety Component Healthcare Personnel Vaccination Module Influenza Vaccination

Classifying Homogeneous Structures Cherlin Introduction The finite case Gregory Cherlin

HIV-1 subtype kindergarden Hauke Walter Arevir 2015 HIV-1 subtype distribution Source: Is HIV-1

Practical Issues with HIV Practical Issues with HIV Testing, CD4 Count and Viral Testing, CD4

Proliferation of Medications Explosion of new therapies have Biological Agents for Rheumatic

Could It Be The Drops? Not-Too-Uncommon Complications from Glaucoma Medications Cathy Sun, MD

Epiretinal 1. How is epiretinal membrane (ERM) best diagnosed? 2. How is ERM differentially

Total Kidney Volume (TKV) in Autosomal Dominant Polycystic Kidney Disease as model for biomarker

Surgical Critical Care Initiative: bringing precision medicine to the critically ill Eric

Critical Care Setting John G Toffaletti, PhD Director of Blood Gas and Clinical Pediatric Labs

Anti-Inflammatory Therapy with Canakinumab for Atherosclerotic Disease Paul M Ridker, MD, MPH

Ex Vivo Profiling of PD-1 Blockade Using Organotypic Tumor Spheroids Developing a Functional

Vaccination Coverage & Impact of Vaccination on Disease Epidemiology: Focus on Africa