Jeffreys centroids: A closed-form expression for positive histograms - PowerPoint PPT Presentation

Jeffreys centroids: A closed-form expression for positive histograms and a guaranteed tight approximation for frequency histograms Frank Nielsen Frank.Nielsen@acm.org 5793b870 Sony Computer Science Laboratories, Inc. April 2013 c � 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 1/25

Why histogram clustering? Task: Classify documents into categories: Bag-of-Word (BoW) modeling paradigm [3, 6]. ◮ Define a word dictionary, and ◮ Represent each document by a word count histogram. Centroid-based k -means clustering [1]: ◮ Cluster document histograms to learn categories, ◮ Build visual vocabularies by quantizing image features: Compressed Histogram of Gradient descriptors [4]. → histogram centroids w h = � d i =1 h i : cumulative sum of bin values ˜: normalization operator c � 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 2/25

Why Jeffreys divergence? Distance between two frequency histograms ˜ p and ˜ q : Kullback-Leibler divergence or relative entropy . H × (˜ KL (˜ p : ˜ q ) = p : ˜ q ) − H (˜ p ) , d p i log 1 � H × (˜ p : ˜ q ) = ˜ q i , cross − entropy ˜ i =1 d p i log 1 � H (˜ p ) = H × (˜ p : ˜ p ) = ˜ p i , Shannon entropy . ˜ i =1 → expected extra number of bits per datum that must be transmitted when using the “wrong” distribution ˜ q instead of the true distribution ˜ p . ˜ p is hidden by nature (and hypothesized), ˜ q is estimated. c � 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 3/25

Why Jeffreys divergence? When clustering histograms, all histograms play the same role → Jeffreys [8] divergence: J ( p , q ) = KL ( p : q ) + KL ( q : p ) , d ( p i − q i ) log p i � J ( p , q ) = q i = J ( q , p ) . i =1 → symmetrizes the KL divergence. (also called J -divergence or symmetrical Kullback-Leibler divergence, etc.) c � 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 4/25

Jeffreys centroids: frequency and positive centroids A set H = { h 1 , ..., h n } of weighted histograms . n � c = arg min π j J ( h j , x ) , x j =1 π j > 0’s histogram positive weights: � n j =1 π j = 1. ◮ Jeffreys positive centroid c : n � c = arg min π j J ( h j , x ) , x ∈ R d + j =1 ◮ Jeffreys frequency centroid ˜ c : n � π j J (˜ c = arg min ˜ h j , x ) , x ∈ ∆ d j =1 ∆ d : Probability ( d − 1)-dimensional simplex. c � 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 5/25

Prior work ◮ Histogram clustering wrt. χ 2 distance [10] ◮ Histogram clustering wrt. Bhattacharyya distance [11, 13] ◮ Histogram clustering wrt. Kullback-Leibler distance as Bregman k -means clustering [1] ◮ Jeffreys frequency centroid [16] (Newton numerical optimization) ◮ Jeffreys frequency centroid as equivalent symmetrized Bregmen centroid [14] ◮ Mixed Bregman clustering [15] ◮ Smooth family of KL symmetrized centroids including Jensen-Shannon centroids and Jeffreys centroids in limit case [12] c � 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 6/25

Jeffreys positive centroid n � c = arg min J ( H , x ) = arg min π j J ( h j , x ) . x ∈ R d x ∈ R d + + j =1 Theorem (Theorem 1) The Jeffreys positive centroid c = ( c 1 , ..., c d ) of a set { h 1 , ..., h n } of n weighted positive histograms with d bins can be calculated component-wise exactly using the Lambert W analytic function: a i c i = , W ( a i g i e ) where a i = � n j =1 π j h i j denotes the coordinate-wise arithmetic weighted means and g i = � n j =1 ( h i j ) π j the coordinate-wise geometric weighted means. Lambert analytic function [2] W ( x ) e W ( x ) = x for x ≥ 0. c � 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 7/25

Jeffreys positive centroid (proof) n � min π j J ( h j , x ) x j =1 n d � � ( h i j − x i )(log h i j − log x i ) min π j x j =1 i =1 d n � � π j ( x i log x i − x i log h i j − h i j log x i ) ≡ min x i =1 j =1 d n n � � � x i log x i − x i log ( h i π j h i a log x i j ) π j − j i =1 j =1 j =1 � �� g d x i log x i � g − a log x i min x i =1 c � 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 8/25

Jeffreys positive centroid (proof) Coordinate-wise minimize: x x log x min g − a log x Setting the derivative to zero, we solve: log x g + 1 − a x = 0 and get a x = W ( a g e ) c � 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 9/25

Jeffreys frequency centroid: A guaranteed approximation n � π j J (˜ ˜ c = arg min h j , x ) , x ∈ ∆ d j =1 Relaxing x from probability simplex ∆ d to R d + , we get a i c ′ = c � , c i = c i ˜ , w c = w c W ( a i g i e ) i Lemma (Lemma 1) The cumulative sum w c of the bin values of the Jeffreys positive centroid c of a set of frequency histograms is less or equal to one: 0 < w c ≤ 1 . c � 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 10/25

Proof of Lemma 1 From Theorem 1: d d a i � � c i = w c = . W ( a i g i e ) i =1 i =1 Arithmetic-geometric mean inequality: a i ≥ g i g i e ) ≥ 1 and c i ≤ a i . Thus Therefore W ( a i d d � � c i ≤ a i = 1 w c = i =1 i =1 c � 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 11/25

Lemma 2 Lemma (Lemma 2) For any histogram x and frequency histogram ˜ h, we have J ( x , ˜ x , ˜ x : ˜ h ) = J (˜ h ) + ( w x − 1)( KL (˜ h ) + log w x ) , where w x denotes the normalization factor (w x = � d i =1 x i ). J ( x , ˜ x , ˜ x : ˜ H ) = J (˜ H ) + ( w x − 1)( KL (˜ H ) + log w x ) , H ) = � n where J ( x , ˜ j =1 π j J ( x , ˜ h j ) and H ) = � n h j ) (with � n x : ˜ x , ˜ KL (˜ j =1 π j KL (˜ j =1 π j = 1). c � 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 12/25

Proof of Lemma 2 x i = w x ˜ x i d x i h i ) log w x ˜ � x i − ˜ J ( x , ˜ h ) = ( w x ˜ ˜ h i i =1 d ˜ x i h i x i log ˜ � x i log w x + ˜ h i log h i log w x ) J ( x , ˜ x i − ˜ h ) = ( w x ˜ h i + w x ˜ ˜ ˜ i =1 d x i x i log ˜ � x , ˜ = ( w x − 1) log w x + J (˜ h ) + ( w x − 1) ˜ ˜ h i i =1 x , ˜ x : ˜ = J (˜ h ) + ( w x − 1)( KL (˜ h ) + log w x ) since � d h i = � d x i = 1. i =1 ˜ i =1 ˜ c � 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 13/25

Guaranteed approximation of ˜ c Theorem (Theorem 2) c ′ = c Let ˜ c denote the Jeffreys frequency centroid and ˜ w c the normalized Jeffreys positive centroid. Then the approximation c ′ , ˜ c ′ = J (˜ H ) 1 c ′ ≤ factor α ˜ H ) is such that 1 ≤ α ˜ w c (with w c ≤ 1 ). c , ˜ J (˜ c � 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 14/25

Proof of Theorem 2 J ( c , ˜ c , ˜ c ′ , ˜ H ) ≤ J (˜ H ) ≤ J (˜ H ) From Lemma 2, since c ′ , ˜ H ) = J ( c , ˜ c ′ , ˜ J (˜ H ) + (1 − w c )( KL (˜ H ) + log w c )) and J ( c , ˜ c , ˜ H ) ≤ J (˜ H ) c ′ , ˜ c ′ ≤ 1 + (1 − w c )( KL (˜ H ) + log w c ) 1 ≤ α ˜ c , ˜ J (˜ H ) H ) = 1 c ′ : ˜ KL ( c , ˜ KL (˜ H ) − log w c w c c ′ ≤ 1 + (1 − w c ) KL ( c , ˜ H ) α ˜ c , ˜ w c J (˜ H ) c , ˜ H ) ≥ J ( c , ˜ H ) and KL ( c , ˜ H ) ≤ J ( c , ˜ Since J (˜ H ), we get 1 c ′ ≤ w c . α ˜ When w c = 1 the bound is tight. c � 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 15/25

In practice... c in closed-form → compute w c , KL ( c , ˜ H ), J ( c , ˜ H ). Bound the approximation factor α ˜ c ′ as: � 1 � KL ( c , ˜ H ) ≤ 1 c ′ ≤ 1 + − 1 α ˜ J ( c , ˜ w c w c H ) c � 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 16/25

Fine approximation From [16, 14], minimization of Jeffreys frequency centroid equivalent to: c = arg min ˜ KL (˜ a : ˜ x ) + KL (˜ x : ˜ g ) x ∈ ∆ d ˜ Lagrangian function enforcing � i c i = 1: c i a i log ˜ g i + 1 − ˜ c i + λ = 0 ˜ ˜ a i ˜ c i = ˜ � � ˜ a i e λ +1 W g i ˜ λ = − KL (˜ c : ˜ g ) ≤ 0 c � 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 17/25

Fine approximation: Bisection search a i ˜ c i ≤ 1 ⇒ c i = � ≤ 1 � ˜ a i e λ +1 W g i ˜ a i ˜ a i ˜ λ ≥ log( e ˜ g i ) − 1 ∀ i , log( e ˜ g i ) − 1 , 0] λ ∈ [max i d a i ˜ � � c i ( λ ) = s ( λ ) = � � a i e λ +1 ˜ W i i =1 g i ˜ Function s : monotonously decreasing with s (0) ≤ 1. → Bisection search for s ( λ ∗ ) ≃ 1 for arbitrary precision. c � 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 18/25

Experiments: Caltech-256 Caltech-256 [7]: 30607 images labeled into 256 categories (256 Jeffreys centroids). Arbitrary floating-point precision: http://www.apfloat.org/ c ′′ = ˜ a + ˜ g ˜ 2 c ′ ( n ′ lized approx . ) wc ≤ 1( n ′ lizing coeff . t ) α c ( optimal positive ) c ′′ ( Veldhuis’ approx. ) α ˜ α ˜ avg 0 . 9648680345638155 1 . 0002205080964255 0 . 9338228644308926 1 . 065590178484613 min 0 . 906414219584823 1 . 0000005079528809 0 . 8342819488534723 1 . 0027707382095195 max 0 . 9956399220678585 1 . 0000031489541772 0 . 9931975105809021 1 . 3582296675397754 c � 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 19/25

Experiments: Synthetic data-sets Random binary histograms c ′ ) α = J (˜ c ) ≥ 1 J (˜ Performance: α ∼ 1 . 0000009 , α max ∼ 1 . 00181506 , α min = 1 . 000000. ¯ Express better worst-case upper bound performance? c � 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 20/25

Jeffreys centroids: A closed-form expression for positive histograms - PowerPoint PPT Presentation

Jeffreys centroids: A closed-form expression for positive histograms and a guaranteed tight approximation for frequency histograms Frank Nielsen Frank.Nielsen@acm.org 5793b870 Sony Computer Science Laboratories, Inc. April 2013 c 2013

Clustering Nathaniel Lewis How it works Read in Historic Data Generate Centroids randomly

k -means++ seeding Have seen that the k -means algorithm can output arbitrarily poor solutions, if

Gene Expression Data Introduction to gene expression data Expression data storage concept An

Classification of curves Simple, not closed Simple, closed Closed, not simple Not simple, not

Becky Coffin Kingfisher plc Net Positive 2 Net Positive 3 Net Positive 4 Creating the

The Expression Problem and Lenses Lambdajam 2016 Tony Morris The Expression Problem A new name

Differential expression analysis John Blischak Instructor DataCamp Differential Expression

Confluent Orthogonal Drawing for Syntax Diagrams S-expression ( S-expression

Lec 03. Regular expression, Pumping lemma Eunjung Kim F ORMAL DEFINITION OF R EGULAR EXPRESSION

The Lament Form The Lament Form The Lament Form The Lament Form The Lament Form Psalm 64

FEDERAL GOVERNMENT COVID-19 STIMULUS PACKAGE JOHN JEFFREYS TAX COUNSEL 2 April 2020 Cash Flow

Digital Design Discussion: Boolean Algebra Boolean Expression Equivalence Boolean Function

Tulip: Lightweight Entity Recognition and Disambiguation Using Wikipedia-Based Topic Centroids

Buffers and centroids Zev Ross President, ZevRoss Spatial Analysis DataCamp Spatial Analysis

Centroids Beyond First Meaning of This . . . Proof Defuzzification A Version of the First . . .

Expression Profiling Mark Voorhies 4/4/2011 Mark Voorhies Expression Profiling Review

Intensity Values Stephen Bailey Instructor DataCamp Biomedical Image Analysis in Python Pixels

Integrating Legacy User Authentication with HIP Jani Hautakorpi (Jani.Hautakorpi@hut.fi) 1 / 11

2016 Agenda Welcome......Rob Brooks 2

Disclosures Author of chapters in UpToDate Definition of osteoporosis NIH Conference A

Constructive universal high-dimensional distribution generation through deep ReLU networks Dmytro

Visualizing and Exploring Data Sargur Srihari University at Buffalo The State University of New

Histogram sort rt with h Sampl pling ng (HS (HSS) Vipul Harsh, Laxmikant Kale Parallel

Alternative to Excel Histogram Categories Histogram for the USAs and the Worlds Starbucks