Radina Nikolic flowCAP 2010 NIH, Sep. 21-22, 2010 , Sep , 0 0 BCIT BCIT
Self ‐ Organizing Map (SOM) � Kohonen (1981) network � Idea von der Marlsburg Id d M l b � Unsupervised learning � Artificial neural network � Artificial neural network � Two layers � High-dimensional to 2D � Topology preserving
Flow Cytometry (FCM) Data � Multidimensional � Large data sets L d t t � High-throughput � Growing amount of data � Growing amount of data � Standard data format � Data analysis ata a a ys s � Cell population identification
Why SOM for FCM? � Efficient for large high-dimensional data sets � No assumptions on underlying data distributions N ti d l i d t di t ib ti � Neurobiological background � Simple but extensible mathematical model � Simple but extensible mathematical model � Widely used in various domains � Few attempts to use in flow cytometry e atte pts to use o cyto et y
Neurobiological Background � The most realistic computational model of brain f functions ti � Paradigm to explain functional structures of the brain � Self organization � Self-organization � Adaptive features � Multidimensional sensory inputs in human cortex are u d e s o a se so y pu s u a co e a e represented as 2D maps, topology conserving � To what extent it can be regarded as biophysical model?
Mathematical Model � Initialization Randomly generate synaptic weight vector values ( w i ) � Choose an initial learning rate e and neighbourhood function ( d x ) � � Sampling Randomly select cell from input data set ( x ) layer � � Competition Determine winning neuron k (best matching unit) in output layer g ( g ) p y � ||w k – x|| = min i ||w i – x|| � Cooperation find the neighbourhood neurons � � Synaptic Adaptation y p p Update the weight vectors of the winning neuron and neighb ours � wi = wi + e * h * d x (x j , w i ) � Convergence Check Criteria �
SOM Applications � Importance of applying SOM in a proper way emphasized by Kohonen: � The SOM is a clustering, visualization and abstraction method � For classification pattern recognition and decision � For classification, pattern recognition and decision support, Learning Vector Quantification (LVQ) should be used � For automatic feature extraction and invariant detection use Adaptive-subspace SOM (ASSOM)
Proposed Approach ‐ flowKoh � FCM data loading � flowCore method read.FCS � If no BioConductor software read.cvs � Data pre-processing � Clustering Cl t i � R kohonen package for SOM � Labels generated and saved Labels generated and saved � Results visualization (optional)
Kohonen Package Parameters � R kohonen package allows for learning in both � unsupervised mode kohonen::som i d d k h � supervised mode kohonen::bdk, kohonen::xyf � Number of iterations rlen = 100 Number of iterations rlen 100 � Learning rate alpha = c(0.05, 0.01) � Neighbourhood function (radius) – Default � Map Topology – Default (rectangular)
Clustering Results Obtained � Challenge 3 � Three flowCAP datasets clustered Dataset Samples Events Dims Runtime (ss) GvHD 12 14000 6 21.86 DLBCL 30 10000 5 28.26 StemCell 30 5000 5 26.70
Behind the Map View � SOM presents a simplified view of a highly complex data set � Each node in the map is one cluster � All the data associated with a � All the data associated with a given node may be made available via that node. � Position on the map may be representative of a wide variety of variables of variables
Visualization and Feature Selection � GvHD Datasets � FSC and SSC excluded � Two populations � FL1.H/FL3.H FL1 H/FL3 H � FL2.H/FL3.H/FL4.H � Sample 11 – exception Sa p e e cep o � Scientifically valid?
Interpretation and Assessment � Simple to implement, difficult to analyze and interpret � How accurate are the clustering results? H t th l t i lt ? � How meaningful are the clusters identified? � Does visualization help us in feature selection? � Does visualization help us in feature selection? � Stability (convergence) and map plasticity? � Is there any correlation between patterns observed and s t e e a y co e at o bet ee patte s obse ed a d biological outcome (diagnosis)?
Resources � Kohonen T., Self-organizing Maps. Springer, May 2006. � Von der Malsburg C., Self-organization of orientation sensitive cells in the striate cortex. Kybernetik, 14:85(100), 1973., � Wehrens R � Wehrens R., Self- and Super-organizing Maps in R: The Self and Super organizing Maps in R: The kohonen Package, Journal of Statistical Software , October 2007, Volume 21, Issue 5. � Willkins M. F., A comparison of some neural and non- neural methods for identification of phytoplankton from flow cytometry data Computer Applications in the flow cytometry data. Computer Applications in the Biosciences, 12(1):9–18, 1996.
flowCAP Initiative � From software development perspective � Collaboration with BCCA (SSL Project) C ll b ti ith BCCA (SSL P j t) � Critical for both new/existing algorithms � Standard dataset test cases � Standard dataset test cases � Evaluation criteria – objective assessment measure(s) � Important � Feature extraction � Scientific validation – Guidelines � Set of criteria � Set of criteria – how to be flowCAP compliant how to be flowCAP compliant
Special Thanks � BCCA Terry Fox Lab � Oxford Bioinformatics Programme � BCIT � BCIT � NIH � Summit Audiences Su ud e ces
Recommend
More recommend