phenotyping and robust feature selection for flow
play

Phenotyping and Robust Feature Selection for Flow Cytometry Data - PowerPoint PPT Presentation

Phenotyping and Robust Feature Selection for Flow Cytometry Data Nima Aghaeepour CIHR/MSFHR Strategic Training Program in Bioinformatics for Health Research, University of British Columbia Sep 22, 2011 1 / 24 Introduction Problem statement


  1. Phenotyping and Robust Feature Selection for Flow Cytometry Data Nima Aghaeepour CIHR/MSFHR Strategic Training Program in Bioinformatics for Health Research, University of British Columbia Sep 22, 2011 1 / 24

  2. Introduction Problem statement Find cell populations that correlate with an external variable ( e.g. , a clinical outcome). Approach flowType: Phenotyping. FeaLect: Feature Selection for Sample Classification. 2 / 24

  3. Dataset United States Military HIV Natural History Study PBMCs of 466 HIV + personnel and beneficiaries from Army, Navy, Marines, and Air Force. 13 surface markers and KI-67 (cell proliferation). Clinical Data: Survival times including 135 events a a An event is defined as progression to AIDS or initiation of HAART. 3 / 24

  4. Manual Gates 4 / 24

  5. Manual Gating Results Results Frequency of long-lived Memory Cells (CD127 + ) has a positive correlation. Frequency of cells with high proliferation (KI-67 + ) has a negative correlation. 5 / 24

  6. Results Previous results: Frequency of long-lived Memory Cells (CD127 + ) has a positive 1 correlation. Frequency of cells with high proliferation (KI-67 + ) has a negative 2 correlation. New results: Frequency of “short-lived” cells with high proliferation (CD127 − KI-67 + ) 1 has a negative correlation. Frequency of terminal effector T-cells has a negative correlation. 2 Frequency of transitional memory T-cells has a negative correlation. 3 1.0 1.0 1.0 Lowest (371/86%) Lowest (387/90%) Lowest (356/83%) vent−free Proportion Highest (59/14%) Highest (43/10%) Highest (74/17%) 0.8 0.8 0.8 0.6 0.6 0.6 0.4 0.4 0.4 0.2 0.2 0.2 p < 8.6e−13 p < 1.8e−06 p < 4.6e−10 E 0.0 0.0 0.0 0 5 10 15 0 5 10 15 0 5 10 15 Y ears from Cell S ample 6 / 24

  7. Clustering Positive Negative Neutral flowMeans was used Positive on one dimension at a time. A marker can only O be positive or CD45R Neutral negative. Other ( e.g. , dim) populations will be resolved in other Negative dimensions. A marker can be neutral. CD28 7 / 24

  8. Phenotyping 1 3 10 ≈ 60 , 000 phenotypes 2 Cox Proportional Hazards Regression 3 Log rank test 4 Multiple testing 5 Sensitivity analysis 6 101 phenotypes remain statistically significant Phenotype p-value p-value CI adj p-value CPHR Coef Cell Freq 1 KI-67+CD8+CD27- 6.4e-07 (1.1e-12, 3.6e-03) 3e-04 35.2 0.00560 2 KI-67+CD8+CD57- 1.1e-06 (2.7e-13, 3.5e-03) 2e-06 28.3 0.00648 3 KI-67+CD45RO+ 8.9e-07 (2.1e-14, 2.0e-03) 4e-05 15.4 0.01343 4 KI-67+CD28-CD8- 8.3e-08 (6.9e-14, 1.6e-03) 2e-04 44.2 0.00523 5 KI-67+CD28-CD27- 7.1e-08 (1.5e-13, 3.0e-03) 2e-05 26.3 0.00874 6 KI-67+CD28- 1.9e-07 (3.9e-13, 3.3e-03) 2e-05 18.3 0.01053 7 KI-67+CD28-CD27-CCR7- 3.3e-09 (6.6e-14, 8.6e-04) 4e-04 43.0 0.00647 8 KI-67+CD28-CCR7- 3.3e-09 (3.2e-13, 7.6e-04) 3e-03 37.7 0.00739 9 KI-67+CD57-CD27-CCR7- 1.2e-08 (1.3e-13, 3.4e-03) 1e-03 36.8 0.00762 10 KI-67+CD57-CCR7- 2.7e-08 (5.3e-15, 1.2e-02) 2e-05 26.6 0.01008 . . . 101 KI-67+CD8+CD27- 6.4e-07 (2.3e-14, 1.1e-02) 2e-02 35.2 0.00560 8 / 24

  9. Clustering the Phenotypes Pearson correlation: Color Key and Density Plot 3 2.5 Density 2 1.5 1 0.5 0 0.2 0.4 0.6 0.8 1 Value phenotypes phenotypes 9 / 24

  10. Clustering the Phenotypes Pearson correlation: Phenotype names: Color Key and Density Plot 3 2.5 Density 2 Positive 1.5 Neutral 1 Negative 0.5 0 0.2 0.4 0.6 0.8 1 Value Phenotypes phenotypes KI−67 CD28 CD45RO CD8 CD4 CD57 CCR5 CD27 CCR7 CD127 Markers phenotypes 10 / 24

  11. Marker Impacts Impact Value Force each marker to be neutral (remove it). Measure its contribution to the results Interpretation Does removing the marker make the phenotype “less 1 significant”? We have to use an effect size (like root mean square error) 2 P ositive 0.020 0.04 0.04 Mixed Negative Marker Impact 0.010 0.02 0.02 0.000 0.00 0.00 7 8 O 8 4 7 5 7 7 7 O 6 2 D D 5 R 2 R 2 7 8 8 4 7 5 7 7 7 7 8 O 8 4 7 5 7 7 7 R 6 2 R D D 5 R 2 R 2 R R − D 5 C C D C D C 1 − D C C D C D C 1 6 2 R D D 5 2 2 I C C C C C D 5 − D 5 C C D C D C 1 K 4 K I C 4 C C C C D C C C C C D D C C K I 4 D D C C C Phenotype Name C 11 / 24

  12. Marker Selection Now we can select the markers with a significant impact: P ositive 0.020 0.04 0.04 Mixed Marker Impact Negative 0.010 0.02 0.02 0.000 0.00 0.00 7 8 O 8 4 7 5 7 7 7 R R 7 8 O 8 4 7 5 7 7 7 O 6 2 R D D 5 2 2 6 2 R D D 5 R 2 R 2 7 8 8 4 7 5 7 7 7 − D C C D C D C 1 6 2 R D D 5 R 2 R 2 5 C C D − D 5 C C D C D C 1 − D C C D C D C 1 K I C 4 C C I C C C C C D 5 D C K 4 K I C 4 C C C C D D C C C C D Phenotype Name C And extract 3 phenotypes: Phenotype p-value p-value CI adjusted Cell p-value Frequency 1 KI-67+CD4-CCR5+CD127- 1.7e-10 (0.0e+00, 1.0e-05) 1.7e-08 0.00704 2 CD45RO-CD8+CD4- 1.2e-07 (0.0e+00, 7.7e-05) 1.3e-05 0.00068 CD57+CCR5-CD27+CCR7- CD127- 3 CD28-CD45RO+CD4- 6.5e-08 (2.2e-16, 1.9e-05) 6.5e-06 0.02456 CD57-CD27-CD127- 12 / 24

  13. Marker Elimination Phenotype p-value p-value CI adjusted Cell p-value Frequency 1 KI-67+CD4-CCR5+CD127- 1.7e-10 (0.0e+00, 1.0e-05) 1.7e-08 0.00704 2 CD45RO-CD8+CD4- 1.2e-07 (0.0e+00, 7.7e-05) 1.3e-05 0.00068 CD57+CCR5-CD27+CCR7- CD127- 3 CD28-CD45RO+CD4- 6.5e-08 (2.2e-16, 1.9e-05) 6.5e-06 0.02456 CD57-CD27-CD127- 10 Some of the markers are redundant 8 log10(pvalue) Finding small cell populations is CD127− 6 hard CD127− In clinics/developing world we can CCR5+ 4 CD127− have a large number of − CD4− CD4− measurements. 2 67+ 67+ 67+ 67+ KI− KI− KI− KI− Do we need all of the markers? 0 Phenotype Name 13 / 24

  14. Are we "over-fitting"? Resampling-based Sensitivity Analysis Bootstrapp percentage 0 20 40 60 80 Group 1 KI− 67+ CD28− CD4− CD57− CD27− CD127− KI− 67+ CD127− KI− 67+ CD45RO+ CD4− CD57− CD127− Bootstrapp percentage 0 10 20 30 40 CD28− CD45RO− CD4− CD57+ CCR5− CD27+ CCR7− CD127− CD28− CD45RO− CD4− CD57+ CCR5− CD27+ CD127− Group 2 CD28− CD45RO− CD8+ CD57+ CCR5− CD27+ CCR7− CD127− CD45RO− CD4− CD57+ CCR5− CD27+ CCR7− CD127− CD45RO− CD8+ CD4− CD57+ CCR5− CD27+ CCR7− CD127− CD45RO− CD8+ CD57+ CCR5− CD27+ CCR7− CD127− Bootstrapp percentage 0 10 20 30 40 CD28− CD4− CD57− CD28− CD57− CD27− CD127− CD28− CD57− CD127− CD28− CD45RO+ CD4− CD57− CD27− CD127− CD28− CD45RO+ CD4− CD57− CD127− Group 3 CD28− CD45RO+ CD4− CD57− CCR5+ CD27− CD127− CD28− CD45RO+ CD4− CD57− CCR5+ CD127− CD28− CD45RO+ CD57− CD27− CD127− CD28− CD45RO+ CD57− CD28− CD45RO+ CD8+ CD4− CD57− CD27− CD127− CD28− CD45RO+ CD8+ CD4− CD57− CD127− CD45RO+ CD57− CD27− CD127− 14 / 24

  15. (A) Population Identification (B) Statistical Modeling (C) Grouping Negative Positive Color Key and Density Plot Neutral 3 Density Positive Cox Proportional 1.5 Hazards Regression 0 0.2 0.6 1 CD45RO Neutral Multiple Testing Phenotypes Correction Negative Sensitivity Analysis CD28 Phenotypes Phenotype Groups: 1 2 3 (D) Marker Selection Positive 0.020 0.04 Mixed 0.04 Marker Impact Negative 0.02 0.02 0.010 0.000 0.00 0.00 KI−67 CD28 CD45RO CD8 CD4 CD57 CCR5 CD27 CCR7 CD127 KI−67 CD28 CD45RO CD8 CD4 CD57 CCR5 CD27 CCR7 CD127 KI−67 CD28 CD45RO CD8 CD4 CD57 CCR5 CD27 CCR7 CD127 Phenotype Name (E) Marker Elimination 62 6 7 30 CD45RO−CD4−CD57+CCR5−CD27+CCR7−CD127− P−value 1.9 8 5 51 6 CD28−CD45RO+CD4−CD57−CD27−CD127− %Cell Freq. CD4−CD57+CCR5−CD27+CCR7−CD127− CD28−CD45RO+CD57−CD27−CD127− % Cell Frequency −log10(pvalue) 22 CD57+CCR5−CD27+CCR7−CD127− 41 5 1.4 4 6 CD28−CD45RO+CD57−CD127− CD57+CD27+CCR7−CD127− 4 KI−67+CD4−CCR5+CD127− 3 31 0.9 CD28−CD45RO+CD57− 13 4 CD57+CD27+CD127− 3 KI−67+CD4−CD127− 21 2 KI−67+CD127− 2 9 CD57+CD27+ CD28−CD57− 0.5 2 10 1 KI−67+ CD27+ 1 CD28− 4 0 0.0 0 0 0 0 Phenotype Name (F) Kaplan-Meier Curves 1.0 1.0 1.0 Lowest (371/86%) Lowest (387/90%) Lowest (356/83%) Highest (59/14%) Highest (43/10%) Highest (74/17%) Event−free Proportion 0.8 0.8 0.8 0.6 0.6 0.6 0.4 0.4 0.4 0.2 0.2 0.2 p < 8.6e−13 p < 1.8e−06 p < 4.6e−10 0.0 0.0 0.0 0 5 10 15 0 5 10 15 0 5 10 15 Years from Cell Sample 15 / 24

  16. A Cell Hierarchy based on a Clinical Outcome CD45RO-CD8+CD4-CD57+CCR5- CD27+CCR7-CD127- A hierarchy based on the predictive power. Explains the relationship between the markers. Thinkness of arrows: increase in the predictive power. Yellow: high predictive power 16 / 24

  17. Phenotype Hierarchies CD28-CD45RO+CD4-CD57-CD27-CD127- 17 / 24

Recommend


More recommend