Localizing Learning after Localizing Local Models Steven J Zeil Old Dominion Univ. Fall 2010 1
Localizing Learning after Localizing Local Models Localizing 1 Competitive Learning Online k-Means Adaptive Resonance Theory Self-Organizing Maps Learning Vector Quantization Radial-Basis Functions Falling Between the Cracks Rule-Based Knowledge Learning after Localizing 2 Hybrid Learning Competitive Basis Functions Mixture of Experts (MoE) 2
Localizing Learning after Localizing Local Models Piecewise approaches to regression. Divide input space into local regions and learn simple models on each region. Localization can be supervised or unsupervised Learning is then supervised Or can do both at once 3
Localizing Learning after Localizing Localizing Localizing 1 Competitive Learning Online k-Means Adaptive Resonance Theory Self-Organizing Maps Learning Vector Quantization Radial-Basis Functions Falling Between the Cracks Rule-Based Knowledge Learning after Localizing 2 Hybrid Learning Competitive Basis Functions Mixture of Experts (MoE) 4
Localizing Learning after Localizing Competitive Learning Competitive methods will assign � x to one region and apply a function associated with that single region. Cooperative methods will apply a mixture of functions weighted according to which region � x is most likely to belong. 5
Localizing Learning after Localizing Competitive Learning Techniques Localizing 1 Competitive Learning Online k-Means Adaptive Resonance Theory Self-Organizing Maps Learning Vector Quantization Radial-Basis Functions Falling Between the Cracks Rule-Based Knowledge Learning after Localizing 2 Hybrid Learning Competitive Basis Functions Mixture of Experts (MoE) 6
Localizing Learning after Localizing Online k-Means � � � � x t − � m i } k b t E { � i =1 |X = i || � m i || t i � 1 x t − � x t − � if || � m i || = min j || � m j || b t i = 0 ow t b t x t � i � batch k-means: � m i = t b t � i online k-means: − η ∂ E t ∆ m ij = ∂ m ij η b t i ( x t = i − m ij ) 7
Localizing Learning after Localizing Winner-take-all Network Online k-means can be implemented via a variant of perceptrons Blue lines are inhibitory connections - seek to suppress other values Red are excitory - attempt to reinforce own output With appropriate weights, these suppress all but te maximum 8
Localizing Learning after Localizing Adaptive Resonance Theory (ART) Incrementally adds new cluster means ρ denotes vigilance If a new � x lies outside the vigilance of all cluster centers, use that � x as the center of a new cluster 9
Localizing Learning after Localizing Self-Organizing Maps (SOM) Units (cluster means) have a neighborhood More often, 2D Update both the closest mean � m i to � x but also the ones in � m i ’s neighborhood Strength of the update falls off with steps through the neighborhood x t − � ∆ � m j = η e ( j , i )( � m i ) � � − ( j − i ) 2 1 e ( j , i ) = √ exp 2 σ 2 2 πσ 10
Localizing Learning after Localizing Learning Vector Quantization (LVQ) Supervised technique Assume that existing cluster means are labeled with classes If � x is closest to � m i , � ∆ � x t − � x t ) = label ( � m i = η ( � m i ) if label ( � m i ) x t − � ∆ � m i = − η ( � m i ) ow 11
Localizing Learning after Localizing Radial-Basis Functions A weighted distance from a cluster mean � x t − � � m h || 2 −|| � p t h = exp 2 s 2 h s h is the “spread” around � m h 12
Localizing Learning after Localizing Radial Functions and Perceptrons � x t − � � m h || 2 −|| � p t h = exp 2 s 2 h H � y t = w h p t h + w 0 h =1 Node that the p h are taking the usual place of the x i 13
Localizing Learning after Localizing Using RBFs as a New Basis 14
Localizing Learning after Localizing Obtaining RBFs Unsupervised: Use any prior technique to compute the means (e.g., k-means) Set the spread to cover the cluster x t belonging to cluster h but farthest from � Find the � m h Set s h so that p t h ≈ 0 . 5 Supervised: Because p h are differentiable, can combine with training of overall function 15
Localizing Learning after Localizing Falling Between the Cracks With RBFs it is possible for some � x to fall outside region of influence of all clusters. May be useful to train an “overall” model and then train local exceptions H � y t = x t + v 0 w h p t v T � + � h � �� � h =1 default rule � �� � exceptions 16
Localizing Learning after Localizing Rule with Exceptions 17
Localizing Learning after Localizing Normalized Basis Functions Alternatively, normalize the basis functions so that their sum is 1.0 Do cooperative calculation 18
Localizing Learning after Localizing Rule-Based Knowledge Prior rules often give localized solutions E.g., IF (( x 1 ≈ a AND ( x 2 ≈ b )) OR ( x 3 ≈ c ) THEN y=0.1 � � � � − ( x 1 − a ) 2 − ( x 2 − b ) 2 p 1 = exp exp with w 1 = 0 . 1 2 s 2 2 s 2 1 2 � � − ( x 3 − c ) 2 p 2 = exp with w 2 = 0 . 1 2 s 2 3 19
Localizing Learning after Localizing Learning after Localizing Localizing 1 Competitive Learning Online k-Means Adaptive Resonance Theory Self-Organizing Maps Learning Vector Quantization Radial-Basis Functions Falling Between the Cracks Rule-Based Knowledge Learning after Localizing 2 Hybrid Learning Competitive Basis Functions Mixture of Experts (MoE) 20
Localizing Learning after Localizing Hybrid Learning Use unsupervised techniques to learn centers (and spreads) Learn 2nd layer weight by supervised gradient-descent 21
Localizing Learning after Localizing Fully Supervised Training both levels at once m h , s h , w ih } i , h |X ) = 1 � � ( r t i − y t i ) 2 E ( { � 2 t i H � y t w ih p t i = h + w i 0 h =1 � ( r t i − y t i ) p t ∆ w ih = η h t �� � ( x t j − m hj ) � ( r t i − y t p t ∆ m hj = η i ) w ih h s 2 h t i �� � x t − � m h || 2 || � � ( r t i − y t p t ∆ s h = η i ) w ih h s 3 h t i 22
Localizing Learning after Localizing Mixture of Experts In RBF, each local fit is a constant, w ih In MoE, each local fir is a linear function of � x , a “local expert”: w t v T x t ih = � ih � The g h form a gating network 23
Localizing Learning after Localizing Gating The gating network selects a mixture of models from the local experts ( w h ) Radial gating � � m h || 2 x t − � − || � exp 2 s 2 g t h = h � � � − || � x t − � m j || 2 j exp 2 s 2 j Softmax gating m T x t ] exp [ � h � g t h = � m T j exp [ � j � x t ] 24
Localizing Learning after Localizing Cooperative MoE m h , s h , w ih } i , h |X ) = 1 � � ( r t i − y t i ) 2 E ( { � 2 t i � ( r t i − y t i ) g t x t ∆ � v ih = η h � t � ( r t i − y t i )( w t ih − y t i ) g t h x t ∆ � m hj = η j t 25
Localizing Learning after Localizing Cooperative & Competitive MoE Cooperative Competitive � ( r t i − y t i ) g t x t ∆ � v ih = η h � � ( r t i − y t i ) f t x t ∆ � v ih = η h � t t � ( r t i − y t i )( w t ih − y t i ) g t h x t ∆ � m hj = η j � ( f t i − g t x t t ∆ � m h = η i ) � t f h is the posterior prob. of unit h taking both the input and output into account. 26
Localizing Learning after Localizing Cooperative vs. Competitive Cooperative is generally more accurate. Models overlap, giving smoother fit Competitive generally learns faster. Generally only one expert at a time is active 27
Recommend
More recommend