flowcap i results
play

FlowCAP-I: Results Ryan Brinkman Senior Scientist, Terry Fox - PowerPoint PPT Presentation

FlowCAP-I: Results Ryan Brinkman Senior Scientist, Terry Fox Laboratory, BC Cancer Agency Associate Professor, Medical Genetics, UBC Sept 22, 2010 Ryan Brinkman British Columbia Cancer Agency FlowCAP Outline Sections What it means to


  1. FlowCAP-I: Results Ryan Brinkman Senior Scientist, Terry Fox Laboratory, BC Cancer Agency Associate Professor, Medical Genetics, UBC Sept 22, 2010 Ryan Brinkman – British Columbia Cancer Agency FlowCAP

  2. Outline Sections What it means to be better (F-measure, ranking) Challenge 1 results Challenge 2 results Challenge 3 results Challenge 4 results So, which method should you use? Ryan Brinkman – British Columbia Cancer Agency FlowCAP

  3. What it means to be better - Part I Some comparisons are easy to quantify and understand intuitively Does Raphael have more hair/ cm 2 of skull than Richard? Some aren’t Is Richard better looking than Raphael? In which case you can use a gold standard Ryan Brinkman – British Columbia Cancer Agency FlowCAP

  4. Problems with gold standards 1 It is possible they are flawed You are unaware of intrinsic problems of your standard You start over-optimizing for some qualities of the standard Rogain vs. Steroids - remember this now 2 Can never be better (looking) than the standard Ryan Brinkman – British Columbia Cancer Agency FlowCAP

  5. How to evaluate gating vs. the gold standard? Ryan Brinkman – British Columbia Cancer Agency FlowCAP

  6. How to evaluate gating vs. the gold standard? Several categories of clustering comparison metrics Pair counting Measures likelihood of grouping pair of data points together Set-matching Measures overlap between gold standard “classes” and hypothesized “clusters” Entropy-based Measures how well clusters only contain data points from a single class ( i.e. , homogeneity & completeness) Several examples within each category MCR, V-measure, VI, Rand Index, F-measure F-measure has the minimum overall error for flow data C. J. van Rijsbergen. Information Retrieval. Butterworths, London, 1979 Ryan Brinkman – British Columbia Cancer Agency FlowCAP

  7. F-measure Everything you need to know about the F-measure* Mis-classification rate is generally used for evaluating classifiers For FlowCAP, we performed cluster matching to label the clusters & calculate the misclassifications But its very time consuming to find the best cluster matching F-measure uses heuristic cluster matching algorithm Does not guarantee best answer but is significantly faster Mis-classification rate is then normalized by the size of the cluster. *Andrew Rosenberg and Julia Hirschberg. V-Measure: A conditional entropy-based external cluster evaluation measure. Ryan Brinkman – British Columbia Cancer Agency FlowCAP

  8. What it means to be better - Part II Some differences are easy to test for significance Null hypothesis: Raphael has significantly different hair thickness than Richard Count # hairs in 30 random 1 cm 2 patches on Raphael’s head Count # hairs in 30 matched 1 cm 2 locations on Richard’s Do a paired t-test & check significance table Some aren’t H 0 : flowMeans’ results are significantly different than SamSpectral n = 5 (datasets) is too small Gold standard is manual gating Is an F-measure of .72 significantly different than .73? What does such a difference even mean ? Ryan Brinkman – British Columbia Cancer Agency FlowCAP

  9. Scoring: fractional ranking and Borda count Reducing complex data by evaluating it using certain criteria Evaluate match to human gating per sample using F-measure Rank F-measures high to low Score “best” algorithm N = # algorithms points Rank second highest algorithm N-1 points Group algorithms with overlapping F-measure 95% CI Give grouped algorithms average score of the group Sum scores across datasets Ryan Brinkman – British Columbia Cancer Agency FlowCAP

  10. Challenge 1: Automated algorithms Unsupervised clustering The “We really don’t know what we are looking for challenge” Given FCS files, markers (sometimes), general biology No tweaking of algorithms across datasets Compare to manual gates Ryan Brinkman – British Columbia Cancer Agency FlowCAP

  11. F-Measure Distributions: Challenge 1: GvHD Ryan Brinkman – British Columbia Cancer Agency FlowCAP Figure 1: Distributions of F-Measures for the GvHD dataset, challenge 1.

  12. Example Boxplots of F-measure values of different algorithms for Challenge 4: GvHD. There is a general agreement between the algorithms and the manual analysis. Sample 2: A sharp change in the F-measure values: The algorithms don’t agree with the human expert. Ryan Brinkman – British Columbia Cancer Agency FlowCAP

  13. F-Measure CIs: Challenge 1: GvHD Figure 2: Confidence Intervals of F-Measures for the GvHD dataset, challenge 1. Ryan Brinkman – British Columbia Cancer Agency FlowCAP

  14. Challenge 1: GvHD GvHD Rank Score FlowVB 0.85 (0.78, 0.90) 8.0 FLOCK 0.84 (0.77, 0.90) 8.0 flowMeans 0.88 (0.82, 0.93) 8.0 FLAME 0.85 (0.76, 0.92) 8.0 MM&PCA 0.84 (0.74, 0.93) 8.0 MM 0.83 (0.74, 0.91) 8.0 SamSPECTRAL 0.87 (0.82, 0.93) 8.0 CDP 0.52 (0.46, 0.57) 2.5 FEK 0.64 (0.57, 0.71) 2.5 flowClust/Merge 0.69 (0.56, 0.79) 2.5 SWIFT 0.63 (0.56, 0.69) 2.5 Table 1: Mean and 95 percent CIs for the F-Measures and rank scores for challenge 1 dataset GvHD Ryan Brinkman – British Columbia Cancer Agency FlowCAP

  15. Challenge 1: DLBCL DLBCL Rank Score FLOCK 0.88 (0.85, 0.91) 8.80 flowMeans 0.92 (0.90, 0.95) 8.80 FLAME 0.91 (0.88, 0.93) 8.80 MM 0.90 (0.86, 0.92) 8.80 SamSPECTRAL 0.86 (0.83, 0.90) 8.80 FlowVB 0.87 (0.85, 0.90) 4.75 CDP 0.85 (0.81, 0.88) 4.75 flowClust/Merge 0.84 (0.81, 0.86) 4.75 MM&PCA 0.85 (0.82, 0.88) 4.75 FEK 0.79 (0.74, 0.83) 2.00 SWIFT 0.67 (0.63, 0.71) 1.00 Table 2: Mean and 95 percent CIs for the F-Measures and rank scores for challenge 1 dataset DLBCL Ryan Brinkman – British Columbia Cancer Agency FlowCAP

  16. Challenge 1: HSCT HSCT Rank Score flowMeans 0.92 (0.90, 0.94) 10 FLAME 0.94 (0.92, 0.95) 10 MM&PCA 0.91 (0.88, 0.94) 10 FLOCK 0.86 (0.83, 0.89) 7 flowClust/Merge 0.81 (0.77, 0.85) 7 SamSPECTRAL 0.85 (0.82, 0.88) 7 FlowVB 0.75 (0.70, 0.79) 4 FEK 0.70 (0.65, 0.74) 4 MM 0.73 (0.66, 0.80) 4 SWIFT 0.59 (0.55, 0.63) 2 CDP 0.50 (0.48, 0.52) 1 Table 3: Mean and 95 percent CIs for the F-Measures and rank scores for challenge 1 dataset HSCT Ryan Brinkman – British Columbia Cancer Agency FlowCAP

  17. Challenge 1: WNV WNV Rank Score FLOCK 0.83 (0.80, 0.86) 10.5 flowMeans 0.88 (0.86, 0.90) 10.5 FlowVB 0.81 (0.78, 0.83) 7.0 FEK 0.78 (0.75, 0.81) 7.0 flowClust/Merge 0.77 (0.74, 0.79) 7.0 FLAME 0.80 (0.76, 0.84) 7.0 SamSPECTRAL 0.75 (0.61, 0.85) 7.0 CDP 0.71 (0.67, 0.74) 2.5 MM&PCA 0.64 (0.52, 0.72) 2.5 MM 0.69 (0.60, 0.75) 2.5 SWIFT 0.69 (0.64, 0.74) 2.5 Table 4: Mean and 95 percent CIs for the F-Measures and rank scores for challenge 1 dataset WNV Ryan Brinkman – British Columbia Cancer Agency FlowCAP

  18. Challenge 1: ND ND Rank Score SamSPECTRAL 0.92 (0.92, 0.93) 11.00 FLOCK 0.91 (0.89, 0.92) 8.33 flowMeans 0.85 (0.76, 0.92) 8.33 FLAME 0.90 (0.89, 0.91) 8.33 CDP 0.86 (0.81, 0.89) 7.50 SWIFT 0.87 (0.86, 0.88) 7.50 FEK 0.81 (0.80, 0.82) 4.00 FlowVB 0.85 (0.84, 0.86) 3.00 flowClust/Merge 0.73 (0.58, 0.85) 3.00 MM&PCA 0.76 (0.75, 0.77) 2.50 MM 0.75 (0.74, 0.76) 2.50 Table 5: Mean and 95 percent CIs for the F-Measures and rank scores for challenge 1 dataset ND Ryan Brinkman – British Columbia Cancer Agency FlowCAP

  19. Challenge 1: Overall (lots of choice for automate analysis) Rank Score Total Runtime flowMeans 45.6 00:04:23:27 FLOCK 42.6 00:00:37:38 FLAME 42.1 00:05:31:12 SamSPECTRAL 41.8 00:07:21:44 MM&PCA 27.8 00:00:04:35 FlowVB 26.8 03:02:23:09 MM 25.8 00:00:13:00 (sorry) flowClust/Merge 24.2 10:13:00:00 FEK 19.5 00:15:25:00 CDP 18.2 00:01:48:06 SWIFT 15.5 05:23:24:30 Table 6: Total runtimes (dd:hh:mm:ss) and rank scores for challenge 1 Ryan Brinkman – British Columbia Cancer Agency FlowCAP

  20. Challenge 1: Overall (lots of choice for automate analysis) Ryan Brinkman – British Columbia Cancer Agency FlowCAP

  21. Challenge 2: Tuned Algorithms (in the Absence of Example Human-Provided Gates) Add in the number of clusters Same as challenge 1, and ... You can tweak algorithm parameters to get a better “fit” to the data Ryan Brinkman – British Columbia Cancer Agency FlowCAP

  22. Challenge 2: GvHD GvHD Rank Score NMF-curvHDR 0.76 (0.69, 0.82) 5.0 FLOCK 0.84 (0.76, 0.90) 5.0 FLAME 0.81 (0.75, 0.87) 5.0 SamSPECTRAL 0.87 (0.79, 0.93) 5.0 SamSPECTRAL-Fixed-K 0.87 (0.80, 0.93) 5.0 CDP 0.59 (0.52, 0.64) 1.5 flowClust/Merge 0.69 (0.54, 0.79) 1.5 Table 7: Mean and 95 percent CIs for the F-Measures and rank scores for challenge 2 dataset GvHD Ryan Brinkman – British Columbia Cancer Agency FlowCAP

  23. Challenge 2: DLBCL DLBCL Rank Score FLOCK 0.88 (0.85, 0.91) 5.5 flowClust/Merge 0.87 (0.85, 0.90) 5.5 FLAME 0.87 (0.84, 0.90) 5.5 SamSPECTRAL 0.92 (0.89, 0.94) 5.5 NMF-curvHDR 0.84 (0.82, 0.86) 2.5 SamSPECTRAL-Fixed-K 0.85 (0.81, 0.89) 2.5 CDP 0.75 (0.69, 0.81) 1.0 Table 8: Mean and 95 percent CIs for the F-Measures and rank scores for challenge 2 dataset DLBCL Ryan Brinkman – British Columbia Cancer Agency FlowCAP

Recommend


More recommend