transfer learning for auto gating of flow cytometry data
play

Transfer Learning for Auto-gating of Flow Cytometry Data Gyemin Lee - PowerPoint PPT Presentation

Transfer Learning for Auto-gating of Flow Cytometry Data Gyemin Lee Lloyd Stoolman Clayton Scott University of Michigan ICML 2011 Workshop on Unsupervised and Transfer Learning July 2, 2011 Lee, Stoolman, Scott (University of Michigan) TL


  1. Transfer Learning for Auto-gating of Flow Cytometry Data Gyemin Lee Lloyd Stoolman Clayton Scott University of Michigan ICML 2011 Workshop on Unsupervised and Transfer Learning July 2, 2011 Lee, Stoolman, Scott (University of Michigan) TL for Auto-gating of Flow Cytometry Data ICML 2011 workshop (July 2, 2011) 1 / 13

  2. Flow Cytometry A technique for rapidly quantifying physical and chemical properties of large numbers of cells. e.g. size, shape, and fluorescent antigen attributes Applications : diagnosis of blood-related diseases such as acute leukemia, chronic lymphoproliferative disorders and malignant lymphomas FS SS CD45 CD4 CD8 CD3 790 626 592 177 252 303 496 477 675 485 306 383 684 553 548 180 325 322 681 588 563 221 258 272 632 565 531 0 134 41 ... ... ... ... ... ... Each column corresponds to a measured feature Each row corresponds to a cell 10 , 000 ∼ 100 , 000 cells/rows for an experiment Lee, Stoolman, Scott (University of Michigan) TL for Auto-gating of Flow Cytometry Data ICML 2011 workshop (July 2, 2011) 2 / 13

  3. Gating Typical flow cytometry data analysis involves visualizing multiple 2-dimensional scatter plots and manually selecting subset of cells from the scatter plots. ⇓ gating ⇒ assigning binary labels y i ∈ {− 1 , 1 } to every cell x i Lee, Stoolman, Scott (University of Michigan) TL for Auto-gating of Flow Cytometry Data ICML 2011 workshop (July 2, 2011) 3 / 13

  4. Gating The distribution of cell populations differs from patient to patient. Lee, Stoolman, Scott (University of Michigan) TL for Auto-gating of Flow Cytometry Data ICML 2011 workshop (July 2, 2011) 4 / 13

  5. Automated Gating Problems of manual gating labor-intensive and time-consuming highly subjective and not standardized modern clinical laboratories see dozens of cases per day ⇒ highly desirable to automate “gating” Automated gating In flow cytometry data analysis, more than 70% of studies focused on automated gating techniques 1 . In automatic gating, majority of approaches rely on unsupervised clustering/mixture modeling. 1 Bashashati & Brinkman, 2009 Lee, Stoolman, Scott (University of Michigan) TL for Auto-gating of Flow Cytometry Data ICML 2011 workshop (July 2, 2011) 5 / 13

  6. Automated Gating Problems of manual gating labor-intensive and time-consuming highly subjective and not standardized modern clinical laboratories see dozens of cases per day ⇒ highly desirable to automate “gating” Automated gating In flow cytometry data analysis, more than 70% of studies focused on automated gating techniques 1 . In automatic gating, majority of approaches rely on unsupervised clustering/mixture modeling. 1 Bashashati & Brinkman, 2009 Lee, Stoolman, Scott (University of Michigan) TL for Auto-gating of Flow Cytometry Data ICML 2011 workshop (July 2, 2011) 5 / 13

  7. Auto-gating as a Transfer Learning Problem Given M labeled source datasets D m = {( x m , i , y m , i )} N m i = 1 ∼ P m for m = 1 , . . . , M an unlabeled target dataset T = { x t , i } N t i = 1 ∼ P t Goal : assign labels {̂ y t , i } N t i = 1 to T with low misclassification D 1 D 2 D M ⋯ y t , i } Nt {̂ T i = 1 ⇒ Lee, Stoolman, Scott (University of Michigan) TL for Auto-gating of Flow Cytometry Data ICML 2011 workshop (July 2, 2011) 6 / 13

  8. Our Approach (1/2) Consider linear decision functions f test ( x ) = ⟨ w , x ⟩ + b ≷ 0 1. Summarize expert knowledge f m from each of the M source dataset D m to build a baseline classifier f 0 . ⎫ D 1 ⇒ ⎪ ⎪ ⎪ f 1 ⎪ D 2 ⇒ ⎪ ⎬ ⇒ f 0 = ⟨ w 0 , x ⟩ + b 0 ≷ 0 f 2 ⋮ ⋮ ⎪ (baseline) ⎪ ⎪ ⎪ ⎪ D M ⇒ ⎭ f M f m ∶ ( w m , b m ) ← SVM ( D m ) , m = 1 , . . . , M where f 0 ∶ ( w 0 , b 0 ) ← robust mean ({( w m , b m )} m ) f 0 Lee, Stoolman, Scott (University of Michigan) TL for Auto-gating of Flow Cytometry Data ICML 2011 workshop (July 2, 2011) 7 / 13

  9. Our Approach (2/2) 2. Transfer the knowledge by adapting f 0 to the target task T based on the low-density separation principle. T } ⇒ f t = ⟨ w t , x ⟩ + b t ≷ 0 f 0 Adjust the hyperplane parameters ( w , b ) so that the decision boundary passes through a region where the marginal density of T is low. Find ( w t , b t ) near ( w 0 , b 0 ) that minimizes the number of data points inside the margin I {∣⟨ w t , x t , i ⟩ + b t ∣ N t < ∆ } ∑ ∥ w t ∥ i = 1 f t Lee, Stoolman, Scott (University of Michigan) TL for Auto-gating of Flow Cytometry Data ICML 2011 workshop (July 2, 2011) 8 / 13

  10. Auto-gating Example Comparison of the gating from the baseline ( f 0 ) and the proposed transfer learning ( f t ) to the gating by the expert (true). true f 0 f t Lee, Stoolman, Scott (University of Michigan) TL for Auto-gating of Flow Cytometry Data ICML 2011 workshop (July 2, 2011) 9 / 13

  11. Experiments - setup 4 x 10 10 total Cells (+) labeled Cells 8 Number of Cells 6 4 2 0 0 5 10 15 20 25 30 35 Case 35 peripheral blood datasets are provided by the Department of Pathology, University of Michigan Leave-One-Out Setting choose a dataset as a target task T hide the labels of T treat the other datasets as source tasks D m , m = 1 , . . . , 34 Lee, Stoolman, Scott (University of Michigan) TL for Auto-gating of Flow Cytometry Data ICML 2011 workshop (July 2, 2011) 10 / 13

  12. Experiments - results Our Transfer Learning Approach f 0 : baseline classifier with no adaptation f t : classifier adapted to T by varying both the direction and the bias Reference Approaches Pooling : merge all the source data, and learn a classifier on this dataset Oracle : standard SVM with the true labels of the target task data Lee, Stoolman, Scott (University of Michigan) TL for Auto-gating of Flow Cytometry Data ICML 2011 workshop (July 2, 2011) 11 / 13

  13. Experiments - results Pool Oracle f 0 f t avg 9.81 3.70 2.49 2.12 std err 1.68 0.54 0.30 0.27 ⇒ Our strategy can successfully replicate what experts do in the field without labeled training set for the target task. Lee, Stoolman, Scott (University of Michigan) TL for Auto-gating of Flow Cytometry Data ICML 2011 workshop (July 2, 2011) 12 / 13

  14. Conclusion and Forthcoming work Conclusion We cast flow cytometry auto-gating as a transfer learning problem. By combining the transfer learning and the low-density separation criterion for class separation, our strategy can leverage expert-gated datasets for the automatic gating of a new unlabeled dataset. Forthcoming work General kernel-based framework Generalization error analysis Joint with Gilles Blanchard Lee, Stoolman, Scott (University of Michigan) TL for Auto-gating of Flow Cytometry Data ICML 2011 workshop (July 2, 2011) 13 / 13

  15. Our Approach - detail 2-1. Varying bias For a grid of biases { s j } , count points inside the margin I {∣⟨ w , x t , i ⟩ + b − s i ∣ N t c j ← ∑ < ∆ } , ∀ j ∥ w ∥ : count i = 1 p ( z ) ← ∑ ̂ c j δ ( z − s j ) ∗ exp (− z 2 2 h 2 ) √ 1 : smooth 2 π h j z ∗ ← gradient descent (̂ p ( z ) , 0 ) : find minimizing bias b new ← b − z ∗ : update bias 2-2. Varying normal vector Let w t = w 0 + a t v t where v t = eig ( cov ([ w 1 , . . . , w M ])) . For a grid of the amount of changes { a k } , count points inside the margin I {∣⟨ w 0 + a k v t , x t , i ⟩ + b ∣ c k ← N t < 1 } ∑ ∥ w 0 + a k v t ∥ : count i = 1 exp (− a 2 g ( a ) ← ∑ c k δ ( a − a k ) ∗ √ 1 2 h 2 ) : smooth 2 π h k a t ← gradient descent ( g ( a ) , 0 ) : find minimizing a t w new ← w 0 + a t v t : update direction Lee, Stoolman, Scott (University of Michigan) TL for Auto-gating of Flow Cytometry Data ICML 2011 workshop (July 2, 2011) 13 / 13

Recommend


More recommend