flowbin a complete pipeline for feature extraction and
play

flowBin: A Complete Pipeline for Feature Extraction and - PowerPoint PPT Presentation

flowBin: A Complete Pipeline for Feature Extraction and Classification of Multi-tube Flow Cytometry Data Kieran ONeill Terry Fox Laboratory, BC Cancer Agency September 22, 2011 Kieran ONeill (TFL) FlowBin September 22, 2011 1 / 18


  1. flowBin: A Complete Pipeline for Feature Extraction and Classification of Multi-tube Flow Cytometry Data Kieran O’Neill Terry Fox Laboratory, BC Cancer Agency September 22, 2011 Kieran O’Neill (TFL) FlowBin September 22, 2011 1 / 18

  2. Background Background Kieran O’Neill (TFL) FlowBin September 22, 2011 2 / 18

  3. Background Multi Tube/well Flow Cytometry ◮ Why? Get more colours ◮ Use common parameters in all tubes to identify populations ◮ Get some further information out of other parameters, often compared to negative control ◮ Two common use cases: Determine immunophenotype of identified population 1 Determine immunological response to stimulus 2 Kieran O’Neill (TFL) FlowBin September 22, 2011 3 / 18

  4. Background Multiplexed flow cytometry SSC CD45 Isotype controls SSC . . . . CD45 Cell surface markers . . Bone marrow aspirate SSC aliquots CD45 Intracellular markers flow cytometry data Kieran O’Neill (TFL) FlowBin September 22, 2011 4 / 18

  5. Background Typical Manual Expert’s Approach Tube 1. Gate blasts on CD45/SS, then set autofluorescence thresholds. Tube 2 (and subsequent). Gate blasts on CD45/SS, then look at expression relative to autofluorescence. Kieran O’Neill (TFL) FlowBin September 22, 2011 5 / 18

  6. Feature Extraction Feature Extraction Kieran O’Neill (TFL) FlowBin September 22, 2011 6 / 18

  7. Feature Extraction FlowBin Approach Bin single tube in terms of population ID parameters (K-means; 1 k=100, inspired by FlowMeans) Map bins across tubes using 1-NN (after Pedreira et al ) 2 Extract immunophenotype for each bin in terms of non-ID 3 parameters Kieran O’Neill (TFL) FlowBin September 22, 2011 7 / 18

  8. Feature Extraction Binning and KNN Mapping of Bins 1000 800 Side Scatter 600 400 200 0 0.0 0.5 1.0 1.5 2.0 2.5 3.0 CD45 PerCP Kieran O’Neill (TFL) FlowBin September 22, 2011 8 / 18

  9. Feature Extraction Binning and KNN Mapping of Bins 1000 800 K-means 1 cluster tube 1 Side Scatter 600 400 200 0 0.0 0.5 1.0 1.5 2.0 2.5 3.0 CD45 PerCP Kieran O’Neill (TFL) FlowBin September 22, 2011 8 / 18

  10. Feature Extraction Binning and KNN Mapping of Bins 1000 800 K-means 1 cluster tube 1 Side Scatter 600 For each 2 population 400 200 0 0.0 0.5 1.0 1.5 2.0 2.5 3.0 CD45 PerCP Kieran O’Neill (TFL) FlowBin September 22, 2011 8 / 18

  11. Feature Extraction Binning and KNN Mapping of Bins 1000 800 K-means 1 cluster tube 1 Side Scatter 600 For each 2 population 400 KNN map 3 200 across tubes 0 0.0 0.5 1.0 1.5 2.0 2.5 3.0 CD45 PerCP Kieran O’Neill (TFL) FlowBin September 22, 2011 8 / 18

  12. Feature Extraction Intra-sample, Inter-tube Variation and Quantile Normalization 1000 1000 1.0 800 800 0.8 Side Scatter 600 Side Scatter 600 Empirical CDF 0.6 400 400 0.4 200 200 0.2 0 0 0.0 0.5 1.0 1.5 2.0 2.5 3.0 0.0 0.5 1.0 1.5 2.0 2.5 3.0 0.0 CD45 PerCP CD45 PerCP 200 400 600 800 1000 SSC.H ECDF (all tubes) Surface Intracellular Kieran O’Neill (TFL) FlowBin September 22, 2011 9 / 18

  13. Feature Extraction Intra-sample, Inter-tube Variation and Quantile Normalization 1000 1000 1.0 800 800 0.8 Side Scatter 600 Side Scatter 600 Empirical CDF 0.6 400 400 0.4 200 200 0.2 0 0 0.0 0.5 1.0 1.5 2.0 2.5 3.0 0.0 0.5 1.0 1.5 2.0 2.5 3.0 0.0 CD45 PerCP CD45 PerCP 200 400 600 800 SSC.H ECDF (all tubes) Surface Intracellular Kieran O’Neill (TFL) FlowBin September 22, 2011 9 / 18

  14. Feature Extraction Measuring Immunophenotype/Response For bin k , tube l , channel m expr k , l , m = log [ median ( x ∗ k , l , m ) − median ( x ∗ k , ctrl , m )] ◮ I use MFI with correction from negative control ◮ Options for other measures will be in final package ◮ Option to include MFIs of popualtion ID parameters Kieran O’Neill (TFL) FlowBin September 22, 2011 10 / 18

  15. Feature Extraction Results (for each sample) Color Key and Histogram 1000 Count 400 1000 0 0 0.4 0.8 1.2 Value sample_1946__52 sample_1946__46 800 sample_1946__42 sample_1946__66 sample_1946__2 sample_1946__22 sample_1946__69 sample_1946__78 sample_1946__82 sample_1946__79 sample_1946__77 sample_1946__34 sample_1946__98 sample_1946__27 sample_1946__14 sample_1946__21 sample_1946__57 sample_1946__33 sample_1946__35 sample_1946__95 600 sample_1946__60 Side Scatter sample_1946__96 sample_1946__16 sample_1946__20 sample_1946__18 sample_1946__29 sample_1946__19 sample_1946__47 sample_1946__25 sample_1946__93 sample_1946__90 sample_1946__83 sample_1946__49 sample_1946__80 sample_1946__50 sample_1946__44 sample_1946__5 sample_1946__31 400 sample_1946__85 sample_1946__32 sample_1946__91 sample_1946__7 sample_1946__87 sample_1946__63 sample_1946__13 sample_1946__56 sample_1946__100 sample_1946__54 sample_1946__67 sample_1946__40 sample_1946__3 sample_1946__88 sample_1946__76 sample_1946__89 sample_1946__23 sample_1946__38 sample_1946__36 200 sample_1946__55 sample_1946__86 sample_1946__45 sample_1946__48 sample_1946__11 sample_1946__84 sample_1946__62 sample_1946__4 sample_1946__24 sample_1946__71 sample_1946__65 sample_1946__41 sample_1946__81 sample_1946__39 sample_1946__17 sample_1946__92 sample_1946__53 sample_1946__51 sample_1946__99 sample_1946__12 sample_1946__10 0 sample_1946__59 sample_1946__70 sample_1946__72 sample_1946__97 sample_1946__8 sample_1946__1 sample_1946__73 sample_1946__9 sample_1946__43 sample_1946__61 sample_1946__30 0.0 0.5 1.0 1.5 2.0 2.5 3.0 sample_1946__26 CD117 SSC CD13 cytCD3 CD7 CD2 CD5 CD3 CD8 cytLactoferrin CD61 CD19 CD20 CD10 cytCD79a cytTdT CD34 CD14 CD56 cytCD22 CD45 cytMPO HLA CD33 CD64 CD4 FSC CD45 PerCP Kieran O’Neill (TFL) FlowBin September 22, 2011 11 / 18

  16. Classification Classification Kieran O’Neill (TFL) FlowBin September 22, 2011 12 / 18

  17. Classification Collating Sample Data ◮ Problem: need some common measure ◮ But bins are sample-specific ◮ First tried metaclustering (cluster clusters) ◮ Unsatisfactory, over-merges populations ◮ Solution: voting SVM classifier ◮ Pass all bins to classifier independently, labelled with sample label ◮ Take vote of each sample’s component bins when predicting Kieran O’Neill (TFL) FlowBin September 22, 2011 13 / 18

  18. Classification More Formally Training: For sample j with k bins Set: C jk = C j Prediction: � if � k P ( C k = 0 ) > � 0 k P ( C k = 1 ) C j = 1 otherwise Kieran O’Neill (TFL) FlowBin September 22, 2011 14 / 18

  19. Classification Results ◮ Works pretty well when there is signal (see Challenge 2) ◮ And this is without any feature selection ◮ But tends to get class bias when no signal (Challenge 1, FLT3-ITD ) ◮ Some performance bottlenecks ⊲ KNN mapping can take an hour or two for larger N (challenge 1) ⊲ Grid parameterization of SVM under CV also slow for more samples (challenge 2) Kieran O’Neill (TFL) FlowBin September 22, 2011 15 / 18

  20. Conclusions Conclusions Kieran O’Neill (TFL) FlowBin September 22, 2011 16 / 18

  21. Conclusions FlowBin Features ◮ In preparation for BioConductor ◮ User writes own per-sample pre-processing and loading code ◮ Everything else through to classification is provided ◮ Very close to biologists’ approach ◮ Treats each measured (non-ID) parameter independently Kieran O’Neill (TFL) FlowBin September 22, 2011 17 / 18

  22. Conclusions In Progress / Near Future ◮ Intertube quality control ◮ Other QC plots / reports (e.g. bin removal) ◮ FlowFP binning option ◮ Feature (FC parameter) selection ◮ Population selection ◮ Extracting relevant populations (at original FCS level) ◮ Nested cross-validation Kieran O’Neill (TFL) FlowBin September 22, 2011 18 / 18

  23. Conclusions Testing/Refinement ◮ Good value for Kmeans? (100 is arbitrary) ◮ Kmeans vs flowFP ◮ FlowFP binning option ◮ Empirical measurement of quantile normalization (flowFP) ◮ Tuning of population and feature selection FlowCAP2 data provides an excellent test bed. Kieran O’Neill (TFL) FlowBin September 22, 2011 19 / 18

Recommend


More recommend