Bag-of-components: an online algorithm for batch learning of mixture - PowerPoint PPT Presentation

Information Geometry for mixtures Co-Mixture Models Bag of components Bag-of-components: an online algorithm for batch learning of mixture models Olivier Schwander Frank Nielsen Université Pierre et Marie Curie, Paris, France École polytechnique, Palaiseau, France October 29, 2015 1 / 20

Information Geometry for mixtures Exponential families Co-Mixture Models Bregman divergences Bag of components Mixture models Exponential families Definition p ( x ; λ ) = p F ( x ; θ ) = exp ( � t ( x ) | θ � − F ( θ ) + k ( x )) ◮ λ source parameter ◮ t ( x ) sufficient statistic ◮ θ natural parameter ◮ F ( θ ) log-normalizer ◮ k ( x ) carrier measure F is a stricly convex and differentiable function �·|·� is a scalar product 2 / 20

Information Geometry for mixtures Exponential families Co-Mixture Models Bregman divergences Bag of components Mixture models Multiple parameterizations: dual parameter spaces Multiple source parameterizations Source Parameters (not unique) λ 1 ∈ Λ 1 , λ 2 ∈ Λ 2 , . . . , λ n ∈ Λ n Legendre Transform ( F, Θ) ↔ ( F ⋆ , H ) θ = ∇ F ⋆ ( η ) η = ∇ F ( θ ) θ ∈ Θ η ∈ H Natural Parameters Expectation Parameters Two canonical parameterizations 3 / 20

Information Geometry for mixtures Exponential families Co-Mixture Models Bregman divergences Bag of components Mixture models Bregman divergences Definition and properties B F ( x � y ) = F ( x ) − F ( y ) − � x − y , ∇ F ( y ) � ◮ F is a stricly convex and differentiable function ◮ No symmetry! Contains a lot of common divergences ◮ Squared Euclidean, Mahalanobis, Kullback-Leibler, Itakura-Saito. . . 4 / 20

Information Geometry for mixtures Exponential families Co-Mixture Models Bregman divergences Bag of components Mixture models Bregman centroids Right-sided centroid Left-sided centroid � � min ω i B F ( c � x i ) min ω i B F ( x i � c ) c c i i Closed-form �� c R = c L = ∇ F ∗ ω i x i ω i ∇ F ( x i ) i i 5 / 20

Information Geometry for mixtures Exponential families Co-Mixture Models Bregman divergences Bag of components Mixture models Link with exponential families [Banerjee 2005] Bijection with exponential families log p F ( x | θ ) = − B F ∗ ( t ( x ) � η ) + F ∗ ( t ( x )) + k ( x ) Kullback-Leibler between exponential families ◮ between members of the same exponential family KL ( p F ( x , θ 1 ) , p F ( x , θ 2 )) = B F ( θ 2 � θ 1 ) = B F ⋆ ( η 1 � η 2 ) Kullback-Leibler centroids ◮ In closed-form through the Bregman divergence 6 / 20

Information Geometry for mixtures Exponential families Co-Mixture Models Bregman divergences Bag of components Mixture models Maximum likelihood estimator A Bregman centroid � η = arg max ˆ log p F ( x i , η ) η i � B F ∗ ( t ( x i ) � η ) − F ∗ ( t ( x i )) − k ( x i ) = arg min � �� η i does not depend on η � = arg min B F ∗ ( t ( x i ) � η ) η i � = t ( x i ) i And ˆ θ = ∇ F ⋆ (ˆ η ) 7 / 20

Information Geometry for mixtures Exponential families Co-Mixture Models Bregman divergences Bag of components Mixture models Mixtures of exponential families � m ( x ; ω, θ ) = ω i p F ( x ; θ i ) 1 ≤ i ≤ k Fixed Parameters ◮ Family of the components P F ◮ Weights � i ω i = 1 ◮ Number of components k ◮ Component parameters θ i (model selection techniques to choose) Learning a mixture ◮ Input: observations x 1 , . . . , x N ◮ Output: ω i and θ i 8 / 20

Information Geometry for mixtures Exponential families Co-Mixture Models Bregman divergences Bag of components Mixture models Bregman Soft Clustering: EM for exponential families [Banerjee 2005] E-step p ( i , j ) = ω j p F ( x i , θ j ) m ( x i ) M-step � η j = arg max p ( i , j ) log p F ( x i , θ j ) η i   �    B F ∗ ( t ( x i ) � η ) − F ∗ ( t ( x i )) − k ( x i ) = arg min p ( i , j )  � �� η i does not depend on η � p ( i , j ) = u p ( u , j ) t ( x u ) � i 9 / 20

Information Geometry for mixtures Motivation Co-Mixture Models Algorithms Bag of components Applications Joint estimation of mixture models Exploit shared information between multiple pointsets ◮ to improve quality ◮ to improve speed Inspiration Efficient algorithms ◮ Dictionary methods ◮ Building ◮ Transfer learning ◮ Comparing 10 / 20

Information Geometry for mixtures Motivation Co-Mixture Models Algorithms Bag of components Applications Co-Mixtures Sharing components of all the mixtures k � ω (1) m 1 ( x | ω (1) , η ) = p F ( x | η j ) i i =1 . . . k � ω ( S ) m S ( x | ω ( S ) , η ) = p F ( x | η j ) i i =1 ◮ Same η 1 . . . η k everywhere ◮ Different weights ω ( l ) 11 / 20

Information Geometry for mixtures Motivation Co-Mixture Models Algorithms Bag of components Applications co-Expectation-Maximization Maximize the mean of the likelihoods on each mixtures E-step ◮ A posterior matrix for each dataset ω ( l ) j p F ( x i , θ j ) p ( l ) ( i , j ) = m ( x ( l ) | ω ( l ) , η ) i M-step ◮ Maximization on each dataset � p ( i , j ) η ( l ) u p ( l ) ( u , j ) t ( x ( l ) = u ) � j i ◮ Aggregation S � η j = 1 η ( l ) j S l =1 12 / 20

Information Geometry for mixtures Motivation Co-Mixture Models Algorithms Bag of components Applications Variational approximation of Kullback-Leibler [Hershey Olsen 2007] � j ω (1) e − KL ( p F ( · ; θ i ) � p F ( · ; θ j )) K � ω (1) j � KL Variationnal ( m 1 , m 2 ) = log � i j ω (2) e − KL ( p F ( · ; θ i ) � p F ( · ; θ j )) i =1 j With shared parameters ◮ Precompute D ij = e − KL ( p F ( ·| η i ) , p F ( ·| η j )) Fast version � j ω (1) e − D ij � ω (1) j KL var ( m 1 � m 2 ) = log � i j ω (2) e − D ij i j 13 / 20

Information Geometry for mixtures Motivation Co-Mixture Models Algorithms Bag of components Applications co-Segmentation Segmentation from 5D RGBxy mixtures Original EM Co-EM 14 / 20

Information Geometry for mixtures Motivation Co-Mixture Models Algorithms Bag of components Applications Transfer learning Increase the quality of one particular mixture of interest ◮ First image: only 1% of the points ◮ Two other images: full set of points ◮ Not enough points for EM 15 / 20

Information Geometry for mixtures Algorithm Co-Mixture Models Experiments Bag of components Bag of Components Training step ◮ Comix on some training set ◮ Keep the parameters ◮ Costly but offline D = { θ 1 , . . . , θ K } Online learning of mixtures ◮ For a new pointset ◮ For each observation arriving: arg max θ ∈D p F ( x j , θ ) or arg min θ ∈D B F ( t ( x j ) , θ ) 16 / 20

Information Geometry for mixtures Algorithm Co-Mixture Models Experiments Bag of components Nearest neighbor search Naive version ◮ Linear search ◮ O ( number of samples × number of components ) ◮ Same order of magnitude as one step of EM Improvement ◮ Computational Bregman Geometry to speed-up the search ◮ Bregman Ball Trees ◮ Hierarchical clustering ◮ Approximate nearest neighbor 17 / 20

Information Geometry for mixtures Algorithm Co-Mixture Models Experiments Bag of components Image segmentation Segmentation on a random subset of the pixels 100% 10% 1% EM BoC 18 / 20

Information Geometry for mixtures Algorithm Co-Mixture Models Experiments Bag of components Computation times 120 Training 100 EM BoC 80 60 40 20 0 Training 100% 10% 1% 19 / 20

Information Geometry for mixtures Algorithm Co-Mixture Models Experiments Bag of components Summary Comix ◮ Mixtures with shared components ◮ Compact description of a lot of mixtures ◮ Fast KL approximations ◮ Dictionary-like methods Bag of Components ◮ Online method ◮ Predictable time (no iteration) ◮ Works with only a few points ◮ Fast 20 / 20

Bag-of-components: an online algorithm for batch learning of mixture - PowerPoint PPT Presentation

Information Geometry for mixtures Co-Mixture Models Bag of components Bag-of-components: an online algorithm for batch learning of mixture models Olivier Schwander Frank Nielsen Universit Pierre et Marie Curie, Paris, France cole

Red-Bag Engineers Consultants Software User Day April 2017 Red-Bag 2017 1 Ves Online

Batch Systems Running calculations on HPC resources Outline What is a batch system? How

WINE BOTTLE AIRBAG SINGLE WINE BOTTLE AIRBAG SINGLE BOTTLE AIR BAG PROTECT ALL BOTTLED PRODUCT

Pathway Red Bag Scheme October 2018 The Red Bag concept The Red Bag scheme was first implemented

The Plastic Bag Free world in action Surfriders Ban the Bag Campaign Plastic bag free

Online Learning Lorenzo Rosasco MIT, 9.520 L. Rosasco Online Learning About this class Goal

Odds Algorithm An Online Algorithm Group Fibonado 20. Dec 2016 Group Fibonado Odds Algorithm

HEBT Magnet Vacuum Chambers for Batch 2 and Batch 3 PSP Code 2.3.7.1.2.3.2 Lukas Urban /

Batch Systems Running your jobs on an HPC machine Outline What are batch systems? Why are

Batch Metadata Editing in DSpace 1.6+ Maureen P. Walsh, The Ohio State University Libraries

DC Bag Law Presented by Jeffrey Seltzer Associate Director Stormwater Management Division District

Batch Mode Active Learning and Its Application to Medical Image Classification ICML 2006 S. Hoi,

Learning about the process and organism: Batch Sef Heijnen, Department of Biotechnology, Faculty

Lecture: Visual Bag of Words Juan Carlos Niebles and Ranjay Krishna Stanford Vision and Learning

Ac#ve Learning Machine Learning 10-601B Batch/Passive Learning

Text Representation Bag-of-Words and Word Embeddings count vector unordered bag over

Nave Bayes, Perceptron CMSC 470 Marine Carpuat Slides credit: Jacob Eisenstein Linear Models

Deep learning Optimization and Regularization in deep networks Hamid Beigy Sharif university of

Machine Learning in GATE Angus Roberts, Horacio Saggion, Genevieve Gorrell University of

Characterizing Deep-Learning I/O Workloads in TensorFlow Steven W. D. Chien, Stefano Markidis,

AMMI Introduction to Deep Learning 6.4. Batch normalization Fran cois Fleuret

Distributed Deep Learning Inference using Apache MXNet* and Apache Spark Naveen Swamy Amazon AI

Linear Classification 1 / 14 The Linear Model In the next few lectures we will extend the

Outline Optimization Unconstrained Optimization Problems Machine Learning and Pattern