samspectral efficient spectral clustering ffi i l l i on
play

SamSPECTRAL: Efficient spectral clustering ffi i l l i on flow - PowerPoint PPT Presentation

SamSPECTRAL: Efficient spectral clustering ffi i l l i on flow cytometry data Habil Zare PhD Candidate Terry Fox Laboratory, British Columbia Cancer Agency and Department of Computer Science, British Columbia University Vancouver, Canada


  1. SamSPECTRAL: Efficient spectral clustering ffi i l l i on flow cytometry data Habil Zare PhD Candidate Terry Fox Laboratory, British Columbia Cancer Agency and Department of Computer Science, British Columbia University Vancouver, Canada Joint work with: Parisa Shooshtari Supervisors: Dr Arvind Gupta Dr Ryan Brinkman Supervisors: Dr. Arvind Gupta, Dr. Ryan Brinkman & Dr. Andrew Weng FlowCAP summit, September 2010

  2. High dimensionality High dimensionality

  3. Biology: Identifying Cell Populations Biology: Identifying Cell Populations Computer science: Clustering Data Points Computer science: Clustering Data Points

  4. mathematics Ch ll Challenge! ! Graph Theory Spectral Clustering Spectral Clustering

  5. Technical, Spectral Clustering

  6. ? Computational limitations: •1000 events: Memory: 0 1 GB Time: 1 minute OK •1000 events: Memory: 0.1 GB, Time: 1 minute OK •100,000 events: Memory: 1000GB, rent for 50 years!

  7. statistics Ch ll Challenge! ! Sampling uniformly

  8. creativity Ch ll Challenge! ! “Faithful” Sampling faithful

  9. Comparison with uniform sampling: p p g

  10. Data reduction

  11. Technical, “Faithful” Sampling Faithful (Information Preserving) Sampling Algorithm , assuming the parameter h (neighborhood) is set: a. Label all data points as unregistered . b. Pick a random unregistered point p and find all unregistered data b Pi k d i t d i d fi d ll i t d d points within distance h from p . c Put all of these points in a set called community p and label them as c. Put all of these points in a set called community p and label them as registered . p is called the of this community. d Repeat the above two steps until no unregistered points are left d. Repeat the above two steps until no unregistered points are left.

  12. Rare Populations: cancer stem cells detection of fetal cells in maternal blood leukemia and malaria diagnosis etc cancer stem cells, detection of fetal cells in maternal blood, leukemia and malaria diagnosis , etc. - Consists of only 0.1% to 2% of total events -SamSPECTRAL distinguished in 27/34 (79%) samples correctly. -Successful on all population greater than 0.15% -FLAME [11/34 (32%)] ]and flowMerge [9/34 (26%)]

  13. Other Applications: vaccine design, Leukemia classification, lymphoma diagnosis, …. 300 * 7 * 5 � 1 month SamSPECTRAL: 1day SLL follicular

  14. Automatic identification of cell population for lymphoma diagnosis •Tube: CD19,CD5 and CD3 •5 dimensional clustering by SamSPECTRAL SLL follicular

  15. 100 Patients •DLBC •Follicular •MCL •SLL

  16. MCL vs SLL Three novel phenotypes for deferential diagnosis between MCL and SLL � Verified on 110 lymphoma patients � capable of correctly discriminating: - all the 43/43 (100%) MCL cases - 65/67 (98%) SLL cases 65/67 (98%) SLL cases � previously known flow cytometry signatures: - 27/43 (63%) MCL 27/43 (63%) MCL - 48/67 (72%) SLL cases

  17. FlowCAP Results:

  18. Future Work: •Improving SamSPECTRAL (spectral clustering is flexible) •Using SamSPECTRAL to make biological discoveries g g (Analysis of thousands of lymphoma and leukemia patients is now possible to build subtype classifier & for discovery of novel biomarkers) •Facilitating clinical diagnosis based on flow cytometry •Facilitating clinical diagnosis based on flow cytometry Biologist Collaborators are Most welcome!

  19. Reference: Data reduction for spectral clustering to analyze high throughput flow cytometry data Thanks to Brinkman lab Thanks to Brinkman lab And : The MITACS Network of Centres of Excellence, Canadian Cancer Society grant #700374, and NIH/NIBIB grant EB008400

  20. Supplementary slides …

  21. Information retrieval and number of spectral clusters

  22. Comparative Results (1):

  23. Comparative Results(2) :

  24. 14 Resolution: 10 cells

  25. Microscope Flow Cytometer 17th century 20th century 20th century

  26. Difficulty in understanding high dimensional data for human:

  27. Limitations of manual gating : •Time consuming •Which dimension to gate first? •“Unknown” populations, yet potentially interesting from biological and clinical point of view Challenges of computer-based clustering : •“small” populations •Adjacent populations •Non-elliptical shape populations

Recommend


More recommend