A convex relaxation for weakly supervised classifiers Armand Joulin - PowerPoint PPT Presentation

Introduction Problem formulation Convex relaxation Optimization Results A convex relaxation for weakly supervised classifiers Armand Joulin and Francis Bach SIERRA group INRIA -Ecole Normale Sup´ erieure ICML 2012 Armand Joulin and Francis Bach A convex relaxation for weakly supervised classifiers

Introduction Problem formulation Convex relaxation Optimization Results Weakly supervised classification We adress the problem of weakly supervision: Instances are grouped into bags that are associated with observable partial labelling We suppose that each instance possesses its own true latent label Armand Joulin and Francis Bach A convex relaxation for weakly supervised classifiers

Introduction Problem formulation Convex relaxation Optimization Results Example Set of true labels = {horse, human, background} Bags = images {horse, human, background} Instances = pixels set of partial Labelling = 2 Partially labeled data y = {horse, background} y = {background} y = {human, background} y ={horse, background} Fully labeled data Armand Joulin and Francis Bach A convex relaxation for weakly supervised classifiers

Introduction Problem formulation Convex relaxation Optimization Results Weakly supervised classification: Examples Semi-supervised learning Multiple instance learning Unsupervised learning set of Latent true labels: set of partial Labelling: set of partial Labelling: set of partial Labelling: { , } { } { , , } { , } Examples of partial labelling depending for different weakly supervised problems Armand Joulin and Francis Bach A convex relaxation for weakly supervised classifiers

Introduction Problem formulation Convex relaxation Optimization Results Inferring the labels and learning the model Latent true labelling set = { , } classifier = The goal is to jointly estimate these true latent labels and learn a classifier based on them This usually leads to non-convex formulations They are typically optimized with an EM procedure which converges to local minima. Armand Joulin and Francis Bach A convex relaxation for weakly supervised classifiers

Introduction Problem formulation Convex relaxation Optimization Results Our approach We propose: A general weakly supervised framework based on the likelihood of a probabilistic model, A convex relaxation of the related cost function, A dedicated optimization scheme. Armand Joulin and Francis Bach A convex relaxation for weakly supervised classifiers

Introduction Notations Problem formulation Discriminative classifier Convex relaxation Cluster size balancing term Optimization Overall problem Results Notations Partial Labelling set ={ , } Bags = I bags of instances, Each instance n is associated with: a feature x n ∈ X a weight π n , a partial label y n , common to a bag, a latent label z n depending on y n Armand Joulin and Francis Bach A convex relaxation for weakly supervised classifiers

Introduction Notations Problem formulation Discriminative classifier Convex relaxation Cluster size balancing term Optimization Overall problem Results Discriminative classifier We consider a reguralized discriminative classifier: L ( z , w φ ( x ) + b ) + λ 2 � w � 2 F Armand Joulin and Francis Bach A convex relaxation for weakly supervised classifiers

Introduction Notations Problem formulation Discriminative classifier Convex relaxation Cluster size balancing term Optimization Overall problem Results Discriminative classifier We consider a reguralized discriminative classifier: L ( z , w φ ( x ) + b ) + λ 2 � w � 2 F where the loss function L ( z , w , b ) is the reweighted soft-max loss function: N exp( w T p φ ( x n ) + b p ) � � � � � − π n y nl z np log , � k ∈P l exp( w T k φ ( x n ) + b k ) n =1 l ∈L p ∈P l Our cost function is equivalent to the log-likelihood of a multinomial model ( w , b ) = parameters x = feature φ = feature map y n = label z n = latent label π n = weight Armand Joulin and Francis Bach A convex relaxation for weakly supervised classifiers

Introduction Notations Problem formulation Discriminative classifier Convex relaxation Cluster size balancing term Optimization Overall problem Results Cluster size balancing term In unsupervised learning or MIL, the same latent label to all the instances Armand Joulin and Francis Bach A convex relaxation for weakly supervised classifiers

Introduction Notations Problem formulation Discriminative classifier Convex relaxation Cluster size balancing term Optimization Overall problem Results Cluster size balancing term In unsupervised learning or MIL, the same latent label to all the instances = > Perfect separation Armand Joulin and Francis Bach A convex relaxation for weakly supervised classifiers

Introduction Notations Problem formulation Discriminative classifier Convex relaxation Cluster size balancing term Optimization Overall problem Results Cluster size balancing term we penalizing by the entropy of the proportion of instances per class and per bag (Joulin et al., 2010): � � � � � � � � H ( z ) = log π n z nk π n z nk i ∈ I k ∈P n ∈N i n ∈N i This penalization is related to a graphical model ( x → z → y ): No additional parameter i = bag n = instance x n = feature I = set of bags z n = latent label π n = weight Armand Joulin and Francis Bach A convex relaxation for weakly supervised classifiers

Introduction Notations Problem formulation Discriminative classifier Convex relaxation Cluster size balancing term Optimization Overall problem Results Overall problem Our overall problem is formulated as: � � L ( z , w , b ) − H ( z ) + λ 2 � w � 2 z ∈P min min w , b f ( z , w , b ) = F Not jointly convex in z and ( w , b ). H = cluster size balancing term L = cost function ( w , b ) = classifier parameters λ = regularization parameter P = set of latent labels z = latent label Armand Joulin and Francis Bach A convex relaxation for weakly supervised classifiers

Introduction Problem formulation Convex relaxation - Overview Convex relaxation Fenchel duality Optimization Reparametrization Results Convex relaxation - Overview � � L ( z , w , b ) − H ( z ) + λ 2 � w � 2 min z ∈P min w , b f ( z , w , b ) = F We use a dual formulation based on Fenchel duality We reparametrize the problem following Guo and Schuurmans, 2008 Finally we relax it to a semi-definite program (SDP) Armand Joulin and Francis Bach A convex relaxation for weakly supervised classifiers

Introduction Problem formulation Convex relaxation - Overview Convex relaxation Fenchel duality Optimization Reparametrization Results Duality with Fenchel conjugate The Fenchel conjugate of the log-partition is: � � � log( t k ) = max q k t k − q k log( q k ) q k k k The minimization in ( w , b ) leading to the dual formulation is in closed form: π n h ( q n ) − 1 � � ( q − z )( q − z ) T K � � min max − H ( z ) + λ tr z ∈P q ∈S N i ∈ I n ∈N i P ( q − z ) T π =0 where K nm = � π n φ ( x n ) , π m φ ( x m ) � . i = bag n = instance x n = feature φ = feature map π n = weight z n = latent label S P = simplex h = entropy (w,b) = classifier parameters Armand Joulin and Francis Bach A convex relaxation for weakly supervised classifiers

Introduction Problem formulation Convex relaxation - Overview Convex relaxation Fenchel duality Optimization Reparametrization Results Sources of non-convexity π n h ( q n ) − 1 � � � ( q − z )( q − z ) T K � min max − H ( z ) + λ tr z ∈P q ∈S N P i ∈ I n ∈N i ( q − z ) T π =0 2 sources of non-convexity: A constraint joining a variable of a convex and a concave problem A function which is not jointly convex/concave in z / q . Proposed solution (Guo and Schuurmans, 2008): Reparametrization in q SDP relaxation in z Armand Joulin and Francis Bach A convex relaxation for weakly supervised classifiers

Introduction Problem formulation Convex relaxation - Overview Convex relaxation Fenchel duality Optimization Reparametrization Results Reparametrization in q We reparametrize the problem by introducing an N × N matrix Ω such that: q = Ω z The constraints on q become convex constraints over Ω The problem becomes: π n h (Ω n z ) − 1 � � ( I − Ω) zz T ( I − Ω) T K � � min max H ( z ) + λ tr z ∈P R N × N Ω ∈ I i ∈ I n ∈N i + Ω T π = π Ω1 N =1 N q = dual variables z = latent label π = weights Armand Joulin and Francis Bach A convex relaxation for weakly supervised classifiers

Introduction Problem formulation Convex relaxation - Overview Convex relaxation Fenchel duality Optimization Reparametrization Results Tight upper-bound on the entropy The entropy in Ω and z is not convex We use the following upper bound: � � � π n h (Ω n z ) ≤ − π n h (Ω n ) + H ( z ) + C 0 . i ∈I n ∈N i n This upper-bound is tight for discrete values of z This leads to: π n h (Ω n ) − 1 � ( I − Ω) zz T ( I − Ω) T K � � min max λ tr z ∈P N R N × N Ω ∈ I n + Ω T π = π Ω1 N =1 N h = entropy z = latent label π = weights Armand Joulin and Francis Bach A convex relaxation for weakly supervised classifiers

A convex relaxation for weakly supervised classifiers Armand Joulin - PowerPoint PPT Presentation

Introduction Problem formulation Convex relaxation Optimization Results A convex relaxation for weakly supervised classifiers Armand Joulin and Francis Bach SIERRA group INRIA -Ecole Normale Sup erieure ICML 2012 Armand Joulin and

Nonlinear Classifiers II 2 Nonlinear Classifiers: Introduction Classifiers Supervised

Convex Hell 362 dnc CS 16: Convex Hull Whoops, I mean... Convex Hull Whats a Convex Hull?

free 18-May-17 Towards Weakly Supervised Image Understanding 1/50 Towards Weakly Supervised

Weakly Supervised Classification Weakly Supervised Classification and Robust Learning and Robust

Non Convex Minimization using Convex Relaxation Some Hints to Formulate Equivalent Convex Energies

Convex hull 1 - 1 Convex hull 1 - 2 Convex hull 1 - 3 Convex hull Definition, extremal

CS133 Computational Geometry Convex Hull 1 Convex Hull Given a set of n points, find the

constrained convex optimization virgil pavlu 1 convex set a set X in a vector space is convex if

Cognitive Modeling Unseen Examples 2 Bayes Classifiers Lecture 14: Naive Bayes Classifiers

Graph cut, convex relaxation and continuous max-flow problem Egil Bae (UCLA) and Xue-Cheng Tai

CS675: Convex and Combinatorial Optimization Spring 2018 Convex Sets Instructor: Shaddin Dughmi

Convex hull: basic facts Convex hull: basic facts CG Lecture 1 CG Lecture 1 Problem : give a set

Convex hulls of spheres and convex hulls of convex polytopes lying on parallel hyperplanes

CS675: Convex and Combinatorial Optimization Fall 2019 Convex Functions Instructor: Shaddin

Convex Analysis Jos e De Don a September 2004 Centre of Complex Dynamic Systems and

CS675: Convex and Combinatorial Optimization Fall 2019 Convex Sets Instructor: Shaddin Dughmi

1 Convexity x 1 Sets For scalars

Iterated learning optimizes for simplicity Jon W. Carr Centre for Language Evolution School of

Optimization (Repetition) Convexity Convex set S x 1 + (1 ) x 2 S , x 1

Convex Algebraic Geometry Cynthia Vinzant, North Carolina State University Cynthia Vinzant

14. Convex programming Convex sets and functions Convex programs Hierarchy of

Machine learning theory Convex learning problems Hamid Beigy Sharif university of technology

Stationary points, non-convex optimization, and more... Instructor: Sham Kakade 1 Terminology

Convex Optimization and Inpainting: A Tutorial Thomas Pock Institute of Computer Graphics and