G RAVITATIONAL C LUSTERING OF THE S ELF -O RGANIZING M AP Nejc Ilc - PowerPoint PPT Presentation

ICANNGA 2011, Ljubljana G RAVITATIONAL C LUSTERING OF THE S ELF -O RGANIZING M AP Nejc Ilc Andrej Dobnikar University of Ljubljana Faculty of Computer and Information Science

I NTRODUCTION • Tools needed to deal with  data/web mining  huge (social) networks  gene expression data  image segmentation ICANNGA, April 2011 2

I NTRODUCTION Visualization of the Internet • Tools needed to deal with  data/web mining  huge (social) networks  gene expression data  image segmentation Credits: Opte Project ICANNGA, April 2011 3

I NTRODUCTION Connections between neurons in human brain • Tools needed to deal with  data/web mining  huge (social) networks  gene expression data  image segmentation Credits: Van J. Wedeen, M.D., MGH/Harvard U. ICANNGA, April 2011 5

I NTRODUCTION Heat map of gene expression profile • Tools needed to deal with  data/web mining  huge (social) networks  gene expression data  image segmentation Credits: Manfred Gessler ICANNGA, April 2011 7

I NTRODUCTION Image segmentation • Tools needed to deal with  data/web mining  huge (social) networks  gene expression data  image segmentation Credits: T . Riklin-Raviv, N. Sochen and N. Kiryati ICANNGA, April 2011 9

C LUSTERING • unsupervised process of organizing data into "natural" groups • approaches  information theory  graphs  fuzzy logic  …  artificial neural networks ICANNGA, April 2011 11

C LUSTERING WITH SOM • Self-Organizing Map [Kohonen, 1982] • Advantages  visualization of high-dimensional data  preserves topology and density of input data • Problem  SOM is not "true" clustering method  more neurons than expected number of clusters  How to group neurons into clusters? ICANNGA, April 2011 12

C LUSTERING OF SOM • K-means, hierarchical [Vesanto & Alhoniemi, 2000] • Emergence SOM [Ultsch, 2007]  watershed algorithm  neurons > 1000 • Surface flooding [Brugger et al., 2008]  automatically finds number of clusters ICANNGA, April 2011 13

GSOM – THE IDEA ICANNGA, April 2011 14

GSOM – L EVEL O NE • train SOM on input data • identify winning neurons • remove interpolating neurons 𝑛 𝑗 = [𝑛 𝑗1 , 𝑛 𝑗2 , … , 𝑛 𝑗𝐸 ] ICANNGA, April 2011 15

GSOM – L EVEL T WO • Gravitational clustering [Wright, 1977; Gomez et al., 2003] • BMU  mass point (m=1) • "Move & merge" steps ICANNGA, April 2011 16

E XPERIMENT • GSOM compared to  EM GMM [Dempster et al., 1977]  CS [Jenssen et al., 2003]  SOMkM [Vesanto & Alhoniemi, 2000] • datasets  6 artificial (2D with complex shapes)  3 real from UCI (Iris, Wine, LetterABC) • 100 runs of algorithm, we measure:  Clustering Error (CE): minimal, average  elapsed time ICANNGA, April 2011 17

R ESULTS – G IANT EM GMM CS CE = 0.0 CE = 0.219 SOMkM GSOM CE = 0.352 CE = 0.0 ICANNGA, April 2011 18

R ESULTS – W AVE EM GMM CS CE = 0.280 CE = 0.130 SOMkM GSOM CE = 0.126 CE = 0.0 ICANNGA, April 2011 19

R ESULTS – RANKS Mean Rank • minimal CE • average CE ICANNGA, April 2011 20

R ESULTS – ELAPSED TIME • Hepta N=212 • LettersABC N=1719 ICANNGA, April 2011 21

R ESULTS – NUMBER OF CLUSTERS • number of detected clusters true dataset GSOM number Giant 2 2 Hepta 7 7 Ring 2 4 Wave 2 2 Moon 4 4 Flag 3 3 Iris 3 3 Wine 3 3 LettersABC 3 7 ICANNGA, April 2011 22

GSOM - SUMMARY + finds clusters of complex shapes, linearly non-separable + insensitive to unbalanced density of clusters + number of clusters automatically detected + usage of topology relations – neighbourhood + less computational intensive + intuitive - 8 parameters to adjust - sometimes unstable behaviour ICANNGA, April 2011 25

F UTURE WORK • implementing heuristics for setting parameters automatically • study of clustering ensembles based on GSOM  could non-deterministic nature of GSOM be an advantage? • application of GSOM on clustering of gene expression data ICANNGA, April 2011 26

D ATASETS PROPERTIES number number of number of dataset of points dimensions clusters Giant 862 2 2 Hepta 212 2 7 Ring 800 2 2 Wave 293 2 2 Moon 514 2 4 Flag 640 2 3 Iris 150 4 3 Wine 178 13 3 LettersABC 1719 16 3 ICANNGA, April 2011 27

GSOM PARAMETERS SETTING dataset SOM size SOM grid 𝐇 𝚬𝐇 α p Giant 13 x 11 rect. 0.0008 0.045 0.01 0.1 Hepta 9 x 8 rect. 0.0008 0.060 0.01 0.1 Ring 11 x 10 rect. 0.0008 0.045 0.01 0.1 Wave 14 x 12 rect. 0.0008 0.045 0.01 0.1 Moon 20 x 10 rect. 0.0008 0.045 0.01 0.0 Flag 14 x 9 rect. 0.0008 0.045 0.01 0.1 Iris 12 x 5 rect. 0.0008 0.045 0.01 0.1 Wine 7 x 5 rect. 0.0008 0.030 0.01 0.1 LettersABC 12 x 9 rect. 0.0010 0.030 0.01 0.1 ICANNGA, April 2011 28

G RAVITATIONAL C LUSTERING OF THE S ELF -O RGANIZING M AP Nejc Ilc - PowerPoint PPT Presentation

ICANNGA 2011, Ljubljana G RAVITATIONAL C LUSTERING OF THE S ELF -O RGANIZING M AP Nejc Ilc Andrej Dobnikar University of Ljubljana Faculty of Computer and Information Science I NTRODUCTION Tools needed to deal with data/web mining

S EMANTIC -B ASED M ULTILINGUAL D OCUMENT C LUSTERING VIA T ENSOR M ODELING Salvatore Romeo 1 ,

P rediction of U nderlying L atent C lasses via K -means and H ierarchical C lustering A lgorithm

Wit ith Im Image Clu lustering Jianwei Yang Devi Parikh Dhruv Batra Vir irgin inia ia

L ECTURE 26: C LUSTERING Prof. Julia Hockenmaier juliahmr@illinois.edu CS446 Machine Learning 1

Gravitational Waves from Binary Black Hole Mergers Inside of Stars Joseph M. Fedrow C.D. Ott, U.

C LUSTERING S TRATEGY : SWIFT GMM clustering with Sampling for k [ K min , K max ] BIC to

S TANDARD DPA ATTACK 0.6 Distinguisher value 3 # std deviations 0.4 2 0.2 1 0 0

Supporting Information Management in Digital Libraries with Map-based Interfaces Rudolf Mayer ,

A. Morrison, G. Ross,

Pattern Analysis and Machine Intelligence Lecture Notes on Clustering (II) 2011-2012 Davide

Clustering Clustering is an unsupervised classification method, i.e. unlabeled data is partitioned

Data Warehousing and Machine Learning Preprocessing Thomas D. Nielsen Aalborg University

Machine Learning in Reservoir Production Simulation and Forecast Serge A. Terekhov NeurOK

Machine Translation Research in META-NET Jan Haji Institute of Formal and Applied Linguistics

Modeling the cognitive spatio-temporal operations using associative memories and multiplicative

18.175: Lecture 11 Independent sums and large deviations Scott Sheffield MIT 1 18.175 Lecture 11

Kolmogorov complexity as a language Alexander Shen LIF CNRS, Marseille; on leave from

trt t ttst r

On Essentially Conditional Information Inequalities Tarik Kaced 1 and Andrei Romashchenko 2 1 LIF

Harnack Chains and Control Problems in Hypoelliptic Partial Differential Equations Sergio

Two main cases Systems break down because of cumulative effect of shocks; extreme

trt rsts r rt

Greedy controllability Enrique Zuazua 1 Departamento de Matem aticas Universidad Aut onoma

CDM Program Size Complexity Klaus Sutner Carnegie Mellon University kolmogorov 2018/2/8 22:58

Sambuz

Useful Links

Newsletter

Mail Us