supervised self organising maps
play

Supervised Self-Organising Maps similarity/distance (Kohonen, - PowerPoint PPT Presentation

Self-organising maps Map high-dimensional data to a 2D grid of units according to Supervised Self-Organising Maps similarity/distance (Kohonen, 1982). Ron Wehrens Institute of Molecules and Materials, IMM Radboud University


  1. Self-organising maps Map high-dimensional data to a 2D grid of “units” according to Supervised Self-Organising Maps similarity/distance (Kohonen, 1982). Ron Wehrens Institute of Molecules and Materials, IMM Radboud University “Spatially smooth version of Nijmegen, The Netherlands k-means” (Ripley, PRNN, 1996). Training SOMs Training SOMs Initial state Initial state Object 1 Data: 177 Italian wines

  2. Training SOMs Training SOMs Winner 1 Update 1 Object 1 Object 1 Training SOMs Mapping Algorithm: ✎ Pick random object Wines: codebook vectors mapping ✎ Determine winner in map ● ● ● ● ● ✎ Update winner and environment ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ✎ Periodically, decrease environment and learning rate ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● R code: ● ● ● ● ● ● > library(kohonen) > data(wines) > somnet <- som(scale(wines), gr = somgrid(5, 5), rlen=100) > plot(somnet, "codes")

  3. Supervised SOMs Supervised SOMs ✎ use of all information ✎ use of all information ✎ treat Y as a special (set of) variables ✎ better reproducibility ✎ better reproducibility ✎ separate range scaling of distances in X and Y ✎ better interpretability ✎ better interpretability ✎ explicit weighting of distances in X and Y ✎ better predictions ✎ better predictions ✎ for regression as well as classification W.J. Melssen, R. Wehrens and L.M.C Buydens, Chemom. Intell. Lab. Syst. (2006), in press . W.J. Melssen, R. Wehrens and L.M.C Buydens, Chemom. Intell. Lab. Syst. (2006), in press . Supervised SOMs X-ray powder patterns F18 Descriptor of crystal structure: ✎ use of all information ✎ treat Y as a special (set of) variables similar patterns should E17 ✎ better reproducibility ✎ separate range scaling of distances in X correspond to similar structures D16 and Y ✎ better interpretability ✎ explicit weighting of distances in X and Y C15 I ✎ better predictions N19 N20 ✎ for regression as well as classification B12 B14 > library(kohonen) > data(wines) A2 > xyfnet <- xyf(scale(wines), classvec2classmat(wine.classes), A6 12 14 16 18 gr = somgrid(5, 5), rlen=100, xweight = .5) 2 θ 5 10 15 20 25 30 35 2 θ W.J. Melssen, R. Wehrens and L.M.C Buydens, Chemom. Intell. Lab. Syst. (2006), in press .

  4. Package wccsom Data set: steroids ✎ Self-organising maps for Space group # compounds label powder patterns P212121 978 19 ✎ Supervised and P21 843 4 unsupervised mapping P1 93 5 ✎ Special similarity C2 99 1 function (WCC) with one Total 2013 parameter: triangle width Training set (1342 compounds) and a test set (671 compounds). Mapping using cell volume Mapping using space group > xyfnet <- xyf(X[training,], Y[training], Space Group + gr = somgrid(20, 20, "hexagonal"), SOM: no Space Group information XYF: including Space Group information XYF: Volume and Space Group information + rlen = 250, xweight = .5) ●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●● 5 ●●●●●●●●●●●●●●●●●●●● 5 ●●●●●●●●●●●●●●●●●●●● 5 ●●●●●●●●●●●●●●●●●●●● > plot(xyfnet, "predict") ●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●● 4 ●●●●●●●●●●●●●●●●●●●● 4 ●●●●●●●●●●●●●●●●●●●● 4 ●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●● 19 ●●●●●●●●●●●●●●●●●●●● 19 ●●●●●●●●●●●●●●●●●●●● 19 ●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●● log2(cell volume) ●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●● 1 ●●●●●●●●●●●●●●●●●●●● 1 ●●●●●●●●●●●●●●●●●●●● 1 ●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●● 14 ●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●● Training time: ●●●●●●●●●●●●●●●●●●●● 12 ●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●● > sompredictions <- ●●●●●●●●●●●●●●●●●●●● 1 h 20’ (P 3.2GHz) ●●●●●●●●●●●●●●●●●●●● + predict(somnet, trainY = classvec2classmat(Ycl[training])) ●●●●●●●●●●●●●●●●●●●● 10 ●●●●●●●●●●●●●●●●●●●● > plot(somnet, "property", ●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●● + property = sompredictions$unit.predictions) ●●●●●●●●●●●●●●●●●●●● 8 ●●●●●●●●●●●●●●●●●●●● > plot(xyfnet, "predict") ●●●●●●●●●●●●●●●●●●●●

  5. Prediction results (test set) Conclusions Volume prediction (correlation coefficients) Seed 7 Seed 13 Seed 31 ✎ SOMs (supervised and unsupervised) are ideally suited for analysing SOM .01 -.04 .01 databases of chemical structures XYF (class only) .36 .41 .41 XYF (class and volume) .72 .28 .68 ✎ Special distance measures can/must be used ✎ Supervised SOMs have many advantages: better predictions, easier to interpret, and better stability Space group prediction (percentage correct) ✎ Training can take a long time but mapping is relatively fast Seed 7 Seed 13 Seed 31 ✎ Including space group information is important in predicting properties of SOM 43% 43% 24% crystals XYF (class only) 87% 86% 85% XYF (class and volume) 79% 46% 66% Acknowledgements Library ’class’ by B.D. Ripley Edwards & Oman, RNews 3(3), 2003 ✎ René de Gelder ✎ Willem Melssen ✎ Egon Willighagen

Recommend


More recommend