Clustering is How We . . . Resulting Clustering . . . Discussion Why Gaussian Kernel: . . . Density-Based Alternative Explanation Fuzzy Clustering as a Explaining the . . . What If Not All . . . First Step to Learning Rules: Towards Fuzzy Clusters Towards Hierarchical . . . Challenges and Solutions Home Page Title Page ozde Ulutagay 1 and Vladik Kreinovich 2 G¨ 1 Department of Industrial Engineering, Izmir University ◭◭ ◮◮ Izmir, Turkey, gozde.ulutagay@gmail.com ◭ ◮ 2 University of Texas at El Paso El Paso, Texas 79968, USA, vladik@utep.edu Page 1 of 16 Go Back Full Screen Close Quit
Clustering is How We . . . Resulting Clustering . . . 1. Clustering is How We Humans Make Decisions Discussion • Most algorithms for control and decision making take, Why Gaussian Kernel: . . . as input, the values of the input parameters. Alternative Explanation Explaining the . . . • In contrast, we normally only use a category to which What If Not All . . . this value belongs; e.g., when we select a place to eat: Towards Fuzzy Clusters – instead of exact prices, we consider whether the Towards Hierarchical . . . restaurant is cheap, medium, or expensive; Home Page – instead of details of food, we check whether it is Title Page Mexican, Chinese, etc. ◭◭ ◮◮ • When we select a hotel, we take into account how many ◭ ◮ stars it has, is it walking distance from the conf. site. Page 2 of 16 • First, we cluster possible situations, i.e., divide them into a few groups. Go Back • Then, we make a decision based on the group to which Full Screen the current situation belongs. Close Quit
Clustering is How We . . . Resulting Clustering . . . 2. Clustering is a Natural First Step to Learning Discussion the Rules Why Gaussian Kernel: . . . • Computers process data much faster than we humans. Alternative Explanation Explaining the . . . • However, e.g., in face recognition, we are much better What If Not All . . . than the best of the known computer programs. Towards Fuzzy Clusters • It is thus reasonable to emulate the way we humans Towards Hierarchical . . . make the corresponding decisions; e.g.: Home Page – to first cluster possible situations, Title Page – and then make a decision based on the cluster con- ◭◭ ◮◮ taining the current situation. ◭ ◮ Page 3 of 16 Go Back Full Screen Close Quit
Clustering is How We . . . Resulting Clustering . . . 3. Clustering: Ideal Case Discussion • Each known situation is described by the values x = Why Gaussian Kernel: . . . ( x 1 , . . . , x n ) of n known quantities. Alternative Explanation Explaining the . . . • When we have many situations, we can talk about the What If Not All . . . density d ( x ): # situations per unit volume. Towards Fuzzy Clusters • Clusters are separated by voids: there are cats, there Towards Hierarchical . . . are dogs, but there is no continuous transition. Home Page • Within each cluster, we have d ( x ) > 0. Title Page • Outside clusters, we have d ( x ) = 0. So: ◭◭ ◮◮ – once we know the density d ( x ) at each point x , ◭ ◮ – we can find each cluster as the connected compo- Page 4 of 16 nent of the set { x : d ( x ) > 0 } . Go Back Full Screen Close Quit
Clustering is How We . . . Resulting Clustering . . . 4. Clustering: A More Realistic Case Discussion • We often have objects in between clusters. Why Gaussian Kernel: . . . Alternative Explanation • For example, coughing and sneezing patients can be Explaining the . . . classified into cold, allergy, flu, etc. What If Not All . . . • However, there are also rare diseases. Towards Fuzzy Clusters • Let t be the density of such rare cases. Towards Hierarchical . . . Home Page – If d ( x ) < t , then most probably x is not in any Title Page major cluster. ◭◭ ◮◮ – If d ( x ) > t , then some examples come from one of the clusters that we are trying to form. ◭ ◮ • Resulting clustering algorithm: Page 5 of 16 – we select a threshold t , and Go Back – we find each cluster as a connected component of Full Screen the set { x : d ( x ) ≥ t } . Close Quit
Clustering is How We . . . Resulting Clustering . . . 5. How to Estimate the Density d ( x ) Discussion • In practice, we only have finitely many examples Why Gaussian Kernel: . . . x (1) , . . . , x ( N ) . Alternative Explanation • The measured values x ( j ) are, in general, different from Explaining the . . . What If Not All . . . the actual (unknown) values x . Towards Fuzzy Clusters • Let ρ (∆ x ) be the probability density of meas. errors. Towards Hierarchical . . . • Then, for each j , the probability density of actual val- Home Page ues is ρ ( x ( j ) − x ). Title Page • Observations are equally probable, so ◭◭ ◮◮ d ( x ) = p ( x (1) ) · ρ ( x (1) − x )+ . . . + p ( x ( N ) ) · ρ ( x ( N ) − x ) = ◭ ◮ N 1 ρ ( x ( j ) − x ) . Page 6 of 16 � N · Go Back j =1 • This formula is known as the Parzen window . Full Screen • The corresponding function ρ ( x ) is known as a kernel . Close Quit
Clustering is How We . . . Resulting Clustering . . . 6. Resulting Clustering Algorithm Discussion • At first, we select a function ρ ( x ). Why Gaussian Kernel: . . . Alternative Explanation • Then, based on the observed examples x (1) , x (2) , . . . , Explaining the . . . x ( N ) , we form a density function What If Not All . . . N d ( x ) = 1 ρ ( x ( j ) − x ) . Towards Fuzzy Clusters � N · Towards Hierarchical . . . j =1 Home Page • After that, we select a threshold t . Title Page • We find the clusters as the connected components of ◭◭ ◮◮ the set { x : d ( x ) ≥ t } . ◭ ◮ • For imprecise (“fuzzy”) expert estimates, instead of Page 7 of 16 probabilities, we have membership functions. Go Back • So, we get similar formulas. Full Screen Close Quit
Clustering is How We . . . Resulting Clustering . . . 7. Discussion Discussion • Empirical results: Why Gaussian Kernel: . . . Alternative Explanation – The best kernel is the Gaussian function Explaining the . . . ρ ( x ) ∼ exp( − const · x 2 ). What If Not All . . . – The best threshold t is the one for which clustering Towards Fuzzy Clusters is the most robust to selecting t . Towards Hierarchical . . . • Our 1st challenge is to provide a theoretical explana- Home Page tion for these empirical results. Title Page • 2nd challenge: take into account that some observa- ◭◭ ◮◮ tions may be erroneous. ◭ ◮ • 3rd challenge: clustering algorithms should return de- Page 8 of 16 grees of belonging to different clusters. Go Back • 4th challenge: hierarchy – animals should be first clas- Full Screen sified into dangerous and harmless, then further. Close Quit
Clustering is How We . . . Resulting Clustering . . . 8. Why Gaussian Kernel: A Solution to the 1st Discussion Part of the 1st Challenge Why Gaussian Kernel: . . . • The Gaussian distribution of the measurement error is Alternative Explanation indeed frequently occurring in practice. Explaining the . . . What If Not All . . . • This empirical fact has a known explanation: Towards Fuzzy Clusters – a measurement error usually consists of a large num- Towards Hierarchical . . . ber of small independent components, and, Home Page – according to the Central Limit theorem: Title Page ∗ the distribution of the sum of a large number of ◭◭ ◮◮ small independent components ◭ ◮ ∗ is close to Gaussian. Page 9 of 16 • Expert inaccuracy is also caused by a large number of relatively small independent factors. Go Back Full Screen Close Quit
Clustering is How We . . . Resulting Clustering . . . 9. Alternative Explanation Discussion • We start with the discrete empirical distribution d N ( x ) Why Gaussian Kernel: . . . in which we get N values x ( j ) with equal probability. Alternative Explanation Explaining the . . . • We “smoothen” d N ( x ) by convolving it with the kernel What If Not All . . . � function ρ ( x ): d ( x ) = d N ( y ) · ρ ( x − y ) dy. Towards Fuzzy Clusters • This works if we properly select the half-width σ of the Towards Hierarchical . . . kernel: Home Page – if we select a very narrow half-width, then each Title Page original point x ( j ) becomes its own cluster; ◭◭ ◮◮ – if we select a very wide half-width, then we end up ◭ ◮ with a single cluster. Page 10 of 16 • The choice of this half-width is usually performed em- pirically: Go Back – we start with a small value of half-width, and Full Screen – we gradually increase it. Close Quit
Clustering is How We . . . Resulting Clustering . . . 10. Alternative Explanation (cont-d) Discussion • Since the kernel functions are close to each other, the Why Gaussian Kernel: . . . resulting convolutions are also close. Alternative Explanation Explaining the . . . • So, it is computationally efficient to apply a small mod- What If Not All . . . ifying convolution to the previous convolution result. Towards Fuzzy Clusters • The resulting convolution is the result of applying a Towards Hierarchical . . . large number of minor convolutions. Home Page • From the mathematical viewpoint, a convolution means Title Page adding an independent random variable. ◭◭ ◮◮ • Applying a large number of convolutions is equivalent ◭ ◮ to adding many small random variables. Page 11 of 16 • Thus, it is equivalent to adding Gaussian variable – Go Back i.e., to Gaussian convolution. Full Screen Close Quit
Recommend
More recommend