How Many Dissimilarity/Kernel Self Organizing Map Variants Do We Need? Fabrice Rossi SAMM, Université Paris 1 WSOM 2014 Mittweida
How Many Dissimilarity/Kernel Self Organizing Map Variants Do We Need? Fabrice Rossi SAMM, Université Paris 1 WSOM 2014 Mittweida “a little bit small compared to Paris”
Data complexity is increasing Modern data are complex ◮ text everywhere (comments, messages, status, etc.) ◮ images everywhere ◮ relations (friends/contact, like/plus, ad hoc discussion, etc.) ◮ mixed data (buyers/items, listeners/songs, etc.)
Data complexity is increasing Modern data are complex ◮ text everywhere (comments, messages, status, etc.) ◮ images everywhere ◮ relations (friends/contact, like/plus, ad hoc discussion, etc.) ◮ mixed data (buyers/items, listeners/songs, etc.)
Data complexity is increasing Modern data are complex ◮ text everywhere (comments, messages, status, etc.) ◮ images everywhere ◮ relations (friends/contact, like/plus, ad hoc discussion, etc.) ◮ mixed data (buyers/items, listeners/songs, etc.)
Data complexity is increasing Modern data are complex ◮ text everywhere (comments, messages, status, etc.) ◮ images everywhere ◮ relations (friends/contact, like/plus, ad hoc discussion, etc.) ◮ mixed data (buyers/items, listeners/songs, etc.) The vector model... ◮ in which all objects ( x i ) 1 ≤ i ≤ N live in a fixed vector space R p ◮ ...is less and less relevant Solutions 1. specific solutions (e.g., probabilistic models for relational data) 2. generic solutions via a comparison measure
Dissimilarity/Kernel Data Data model ◮ a data space X (might be implicit) ◮ N observations ( x i ) 1 ≤ i ≤ N from X (possibly with no attached description) Dissimilarity ◮ a symmetric dissimilarity d function from X 2 to R + ◮ or a symmetric matrix D = ( d ( x i , x j )) 1 ≤ i ≤ N , 1 ≤ j ≤ N Kernel ◮ a kernel function k from X 2 to R , symmetric and positive definite ◮ or a symmetric positive definite matrix K = ( k ( x i , x j )) 1 ≤ i ≤ N , 1 ≤ j ≤ N
SOM Low dimensional prior structure ◮ a regular lattice of K units/neurons in R 2 : ( r k ) 1 ≤ k ≤ K ◮ a time dependent neighborhood function h kl ( t ) , e.g. � � − � r k − r l � 2 h kl ( t ) = exp 2 σ 2 ( t ) Mapping ◮ each neuron r k is associated to a prototype/model m k in the data space ◮ each m k / r k is responsible of a cluster of data points, the C k : quantization/clustering aspect ◮ if r k and r l are close according to h kl then m k and m l should be close: topology preservation aspect
Training Algorithms Stochastic/Online SOM 1. select a random data point x 2. find its best matching unit k ∈{ 1 ,..., K } � x − m k ( t ) � 2 c = arg min 3. update all prototypes m k ( t + 1 ) = m k ( t ) + ǫ ( t ) h kc ( t )( x − m k ( t )) 4. loop to 1 until convergence
Training Algorithms Batch SOM 1. compute the best matching unit for all data points k ∈{ 1 ,..., K } � x i − m k ( t ) � 2 c i ( t ) = arg min 2. update all prototypes � N i = 1 h kc i ( t ) ( t ) x i m k ( t + 1 ) = � N i = 1 h kc i ( t ) ( t ) 3. loop to 1 until convergence
Demosimple 2D dataset The original grid
Demosimple 2D dataset Prototype positions in the data space
Recommend
More recommend