SamSPECTRAL: Efficient spectral clustering ffi i l l i on flow cytometry data Habil Zare PhD Candidate Terry Fox Laboratory, British Columbia Cancer Agency and Department of Computer Science, British Columbia University Vancouver, Canada Joint work with: Parisa Shooshtari Supervisors: Dr Arvind Gupta Dr Ryan Brinkman Supervisors: Dr. Arvind Gupta, Dr. Ryan Brinkman & Dr. Andrew Weng FlowCAP summit, September 2010
High dimensionality High dimensionality
Biology: Identifying Cell Populations Biology: Identifying Cell Populations Computer science: Clustering Data Points Computer science: Clustering Data Points
mathematics Ch ll Challenge! ! Graph Theory Spectral Clustering Spectral Clustering
Technical, Spectral Clustering
? Computational limitations: •1000 events: Memory: 0 1 GB Time: 1 minute OK •1000 events: Memory: 0.1 GB, Time: 1 minute OK •100,000 events: Memory: 1000GB, rent for 50 years!
statistics Ch ll Challenge! ! Sampling uniformly
creativity Ch ll Challenge! ! “Faithful” Sampling faithful
Comparison with uniform sampling: p p g
Data reduction
Technical, “Faithful” Sampling Faithful (Information Preserving) Sampling Algorithm , assuming the parameter h (neighborhood) is set: a. Label all data points as unregistered . b. Pick a random unregistered point p and find all unregistered data b Pi k d i t d i d fi d ll i t d d points within distance h from p . c Put all of these points in a set called community p and label them as c. Put all of these points in a set called community p and label them as registered . p is called the of this community. d Repeat the above two steps until no unregistered points are left d. Repeat the above two steps until no unregistered points are left.
Rare Populations: cancer stem cells detection of fetal cells in maternal blood leukemia and malaria diagnosis etc cancer stem cells, detection of fetal cells in maternal blood, leukemia and malaria diagnosis , etc. - Consists of only 0.1% to 2% of total events -SamSPECTRAL distinguished in 27/34 (79%) samples correctly. -Successful on all population greater than 0.15% -FLAME [11/34 (32%)] ]and flowMerge [9/34 (26%)]
Other Applications: vaccine design, Leukemia classification, lymphoma diagnosis, …. 300 * 7 * 5 � 1 month SamSPECTRAL: 1day SLL follicular
Automatic identification of cell population for lymphoma diagnosis •Tube: CD19,CD5 and CD3 •5 dimensional clustering by SamSPECTRAL SLL follicular
100 Patients •DLBC •Follicular •MCL •SLL
MCL vs SLL Three novel phenotypes for deferential diagnosis between MCL and SLL � Verified on 110 lymphoma patients � capable of correctly discriminating: - all the 43/43 (100%) MCL cases - 65/67 (98%) SLL cases 65/67 (98%) SLL cases � previously known flow cytometry signatures: - 27/43 (63%) MCL 27/43 (63%) MCL - 48/67 (72%) SLL cases
FlowCAP Results:
Future Work: •Improving SamSPECTRAL (spectral clustering is flexible) •Using SamSPECTRAL to make biological discoveries g g (Analysis of thousands of lymphoma and leukemia patients is now possible to build subtype classifier & for discovery of novel biomarkers) •Facilitating clinical diagnosis based on flow cytometry •Facilitating clinical diagnosis based on flow cytometry Biologist Collaborators are Most welcome!
Reference: Data reduction for spectral clustering to analyze high throughput flow cytometry data Thanks to Brinkman lab Thanks to Brinkman lab And : The MITACS Network of Centres of Excellence, Canadian Cancer Society grant #700374, and NIH/NIBIB grant EB008400
Supplementary slides …
Information retrieval and number of spectral clusters
Comparative Results (1):
Comparative Results(2) :
14 Resolution: 10 cells
Microscope Flow Cytometer 17th century 20th century 20th century
Difficulty in understanding high dimensional data for human:
Limitations of manual gating : •Time consuming •Which dimension to gate first? •“Unknown” populations, yet potentially interesting from biological and clinical point of view Challenges of computer-based clustering : •“small” populations •Adjacent populations •Non-elliptical shape populations
Recommend
More recommend