magnifying unknown rare clusters to increase the chance
play

Magnifying (unknown) rare clusters to increase the chance of - PowerPoint PPT Presentation

Magnifying (unknown) rare clusters to increase the chance of detection, using unsupervised learning Erzsbet Mernyi Department of Statistics and Department of Electrical and Computer Engineering Rice University, Houston, Texas E. Mernyi,


  1. Magnifying (unknown) rare clusters to increase the chance of detection, using unsupervised learning Erzsébet Merényi Department of Statistics and Department of Electrical and Computer Engineering Rice University, Houston, Texas E. Merényi, Rice U 1 erzsebet@rice.edu Finding rare patterns, DarkMachines Workshop April 9, 2019

  2. Learning Without a Teacher (unsupervised learning) Output training Learner Input training patterns: patterns: Representative Representative instances of y_i  Y instances x_i  X corresponding to x_i E. Merényi, Rice U 2 erzsebet@rice.edu Finding rare patterns, DarkMachines Workshop April 9, 2019

  3. Learning Without a Teacher (unsupervised learning) Learner Input training Model of the patterns: Representative input space instances x_i  X An unsupervised learner captures some internal • No (explicit) cost characteristics of the input data: structure, function mixing components / latent variables, ... Ex: clusters • • Ex: principal components • Ex: independent components • Best for discovery: model‐free • Some “model‐free” have implicit assumptions E. Merényi, Rice U 3 erzsebet@rice.edu Finding rare patterns, DarkMachines Workshop April 9, 2019

  4. Self-Organizing Map: model-free structure learner Machine learning analog of biological neural maps in the brain Formation of basic (Kohonen) SOM : SOM lattice Input buffer x = (x 1 , x 2 , …, x d )  M  R d input pattern w j = (w j1 , w j2 , …, w j d ) j=1, … , P weight vector of Input vector x (spectrum) x w j1 neuron j (prototype j) 1 j x Learning: cycle through steps 1. and 2. many times 2 i 1. Competition … Select a pattern x randomly. w jd-1 Find winning neuron c as k x c(x) = arg min ||x - w j ||, j=1, … , P d-1  D 1 j Euclidean dist. x in data space d D 2. Synaptic weight adaptation / cooperation w j (t+1) = w j (t)+a(t) h j,c(x) (t) (x - w j (t)) for all w j in influence region of node c in the SOM lattice, prescribed by h j,c(x) (t) x Data h(t): most often Gaussian centered on node c space h j,c(x) (t) = exp(-(c-j) 2 /  (t) 2 ) M  R d Manhattan dist. In SOM lattice E. Merényi, Rice U 4 erzsebet@rice.edu Finding rare patterns, DarkMachines Workshop April 9, 2019

  5. Self-Organizing Map: model-free structure learner Machine learning analog of biological neural maps in the brain Two simultaneous actions: Formation of basic (Kohonen) SOM : SOM lattice Input buffer - Adaptive Vector Quantization (n-D x = (x 1 , x 2 , …, x d )  M  R d input pattern binning): puts the prototypes in the w j = (w j1 , w j2 , …, w j d ) j=1, … , P weight vector of Input vector x (spectrum) x “right” locations, encoding salient w j1 neuron j (prototype j) 1 j properties of data distribution x Learning: cycle through steps 1. and 2. many times 2 - Ordering the prototypes on the i 1. Competition SOM grid according to similarities: … Select a pattern x randomly. w jd-1 expresses the topology on a low- Find winning neuron c as k dimensional lattice x c(x) = arg min ||x - w j ||, j=1, … , P d-1  D 1 j Euclidean dist. x in data space d D 2. Synaptic weight adaptation / cooperation Finding the prototype groups: post- w j (t+1) = w j (t)+a(t) h j,c(x) (t) (x - w j (t)) processing – segmentation of the for all w j in influence region of node c SOM based on the SOM’s in the SOM lattice, prescribed by h j,c(x) (t) x knowledge (both the summarized Data distribution and topology relations) h(t): most often Gaussian centered on node c space h j,c(x) (t) = exp(-(c-j) 2 /  (t) 2 ) Summarization of N data vectors by M  R d Manhattan dist. O(sqrt(N)) prototypes ; In SOM lattice E. Merényi, Rice U 5 erzsebet@rice.edu Finding rare patterns, DarkMachines Workshop April 9, 2019

  6. Map magnification in SOMs (Magnification of Vector Quantizers, in general) pdf s of SOM weight vectors (VQ prototypes) and inputs related by Q( w ) = const ∙ P( w )  where Q( w ) is pdf of prototype vectors P( w ) is pdf of input vectors and  is the Magnification Exponent – an inherent property of a given Vector Quantizer (Zador, 1982; Bauer, Der, and Hermann, 1996) E. Merényi, Rice U 6 erzsebet@rice.edu Finding rare patterns, DarkMachines Workshop April 9, 2019

  7. What does  mean? If data dimensionality = d,   = 1 equiprobabilistic mapping (max entropy mapping, information theoretical optimum)   = d/(d+2) minimum MSE distortion quantization   = d/(d+p) minimum distortion in p norm   < 0 enlarges representation of low‐frequency inputs ‐ Kohonen’s SOM (KSOM) attains  = 2/3 (under certain conditions) (Ritter and Schulten, 1986). Not ideal by any of the above measures. ‐ Conscience SOM (CSOM) attains  = 1 (D. DeSieno, 1988) ‐  of KSOM or CSOM cannot be changed (not a parameter of the algorithm); E. Merényi, Rice U 7 erzsebet@rice.edu Finding rare patterns, DarkMachines Workshop April 9, 2019

  8. BDH: Modification of KSOM to allow control of  (Bauer, Der and Hermann, 1996) KSOM learning rule: w j (t+1) = w j (t)+ɛ(t) h j,r(v) (t) (v ‐ w j (t)) winner index Time‐decreasing learning rate Idea: Modify the learning rate ɛ(t) in KSOM to force the local adaptabilities to depend on the input density P at the lattice position, r, of prototype w r . Require  r =  0 P(w r ) m , where m is a free parameter that will allow control of  . How to do this when P(w r ) is unknown? Use the information already acquired by the SOM and exploit P(w r )  Q(w r )P’(r) where P’(r) is the winning probability of the neuron at r. E. Merényi, Rice U 8 erzsebet@rice.edu Finding rare patterns, DarkMachines Workshop April 9, 2019

  9. Approximate Q(w r ) and P’(r) by quantities the SOM has learnt so far Compute P(w r )  Q(w r )P’(r): Q(w r )  1/vol vol = Volume of the Voronoi polyhedron of w r vol  |v – w r | d P’(r)  1/(  t r ),  t r  (present t value – last time neuron r won) Substitute into P(w r )  Q(w r )P’(r) to get (1) Update weight vectors (prototypes) of ALL SOM lattice neighbors by using ɛ r of the winning neuron. E. Merényi, Rice U 9 erzsebet@rice.edu Finding rare patterns, DarkMachines Workshop April 9, 2019

  10. Controlling  through m in the learning rate formula  Given  = 2/3 for KSOM , it can be shown that a “desired” SOM magnification with exponent  ’ is related to m as Q( w ) = const ∙ P( w )  ’ = const P( w ) ( 2/3)*(m+1)  Now we have a free parameter to control   EXAMPLE: to achieve max entropy mapping, we want  ’ = 1.  ’ = 2/3 (m+1) = 1 ‐> set m = 3/2‐1 = 0.5 in eq. (1)  EXAMPLE: to achieve  ’ = ‐1 negative magnification, set m = ‐3/2 ‐1 = ‐2.5 E. Merényi, Rice U 10 erzsebet@rice.edu Finding rare patterns, DarkMachines Workshop April 9, 2019

  11. Limitations of the BDH algorithm Theory guarantees success only for 1‐D input data 1. n‐D data, if and only if P( v ) = P(v 1 )P(v 2 )…P(v n ) 2. (i.e., the data are independent in the different dimensions) 1 and 2  “Allowed” data Rest  “Forbidden” data Central question: Can BDH be used for “forbidden” data? Carefully designed controlled experiments suggest YES. (Merényi, Jain, Villmann, IEEE TNN 2007). E. Merényi, Rice U 11 erzsebet@rice.edu Finding rare patterns, DarkMachines Workshop April 9, 2019

  12. Magnification control for higher‐dimensional data I. Noiseless, 6‐D 5‐class synthetic data cube 128  128 pixel image where a 6‐D vector is associated with each pixel (16,384 6‐D patterns). 5 classes: Class No. of inputs A 4095 U 1 (rare class) C 4096 E 4096 K 4096 0.004 � Pairwise correlation coefficients � 0.9924  “Forbidden” data (Merényi et al. IEEE TNN 2007) E. Merényi, Rice U 12 erzsebet@rice.edu Finding rare patterns, DarkMachines Workshop April 9, 2019

  13. 128 x 128 px image data cube 6-D spectrum (feature vector) SOM Visualization for >3-D Data at each pixel location 1-px class U Weight vectors of 10 x 10 KSOM, after learning C A 5 spectral classes synthetic, noiseless E K class signatures Merényi et al. IEEE TNN 2007 E. Merényi, Rice U 13 erzsebet@rice.edu Finding rare patterns, DarkMachines Workshop April 9, 2019

  14. 128 x 128 px image 6-D spectra SOM Visualization, for >3-D Data 1-px class U Weight vectors of 10 x 10 KSOM, after learning C A 5 spectral classes synthetic, noiseless E K Merényi et al. IEEE TNN 2007 E. Merényi, Rice U 14 erzsebet@rice.edu Finding rare patterns, DarkMachines Workshop April 9, 2019

  15. SOM learning without and with magnification I: Noiseless, 6‐D 5‐class synthetic data cube BDH with  desired = ‐0.8 KSOM (no magnification) Only 1 PE represents the rare class U U now represented by 10 PEs! (PE = Processing Element = neuron) (Merényi et al. IEEE TNN 2007) E. Merényi, Rice U 15 erzsebet@rice.edu Finding rare patterns, DarkMachines Workshop April 9, 2019

  16. Magnification control for higher‐dimensional data II. Noiseless, 6‐D 20‐class synthetic data set 128  128 pixel image where each pixel is a 6‐D vector (16,384 6‐D patterns). 20 classes: Class No. of inputs A,B,D,E,G,H,K,L,N,O,P 1024 C 1023 F 1008 I 979 J 844 M 924 Q 16 R 1 S 100 T 225 0.008 �  � 0.6  “Forbidden data” (Merényi et al. IEEE TNN 2007) E. Merényi, Rice U 16 erzsebet@rice.edu Finding rare patterns, DarkMachines Workshop April 9, 2019

Recommend


More recommend