Introduction Problem formulation Convex relaxation Optimization Results A convex relaxation for weakly supervised classifiers Armand Joulin and Francis Bach SIERRA group INRIA -Ecole Normale Sup´ erieure ICML 2012 Armand Joulin and Francis Bach A convex relaxation for weakly supervised classifiers
Introduction Problem formulation Convex relaxation Optimization Results Weakly supervised classification We adress the problem of weakly supervision: Instances are grouped into bags that are associated with observable partial labelling We suppose that each instance possesses its own true latent label Armand Joulin and Francis Bach A convex relaxation for weakly supervised classifiers
Introduction Problem formulation Convex relaxation Optimization Results Example Set of true labels = {horse, human, background} Bags = images {horse, human, background} Instances = pixels set of partial Labelling = 2 Partially labeled data y = {horse, background} y = {background} y = {human, background} y ={horse, background} Fully labeled data Armand Joulin and Francis Bach A convex relaxation for weakly supervised classifiers
Introduction Problem formulation Convex relaxation Optimization Results Weakly supervised classification: Examples Semi-supervised learning Multiple instance learning Unsupervised learning set of Latent true labels: set of partial Labelling: set of partial Labelling: set of partial Labelling: { , } { } { , , } { , } Examples of partial labelling depending for different weakly supervised problems Armand Joulin and Francis Bach A convex relaxation for weakly supervised classifiers
Introduction Problem formulation Convex relaxation Optimization Results Inferring the labels and learning the model Latent true labelling set = { , } classifier = The goal is to jointly estimate these true latent labels and learn a classifier based on them This usually leads to non-convex formulations They are typically optimized with an EM procedure which converges to local minima. Armand Joulin and Francis Bach A convex relaxation for weakly supervised classifiers
Introduction Problem formulation Convex relaxation Optimization Results Our approach We propose: A general weakly supervised framework based on the likelihood of a probabilistic model, A convex relaxation of the related cost function, A dedicated optimization scheme. Armand Joulin and Francis Bach A convex relaxation for weakly supervised classifiers
Introduction Notations Problem formulation Discriminative classifier Convex relaxation Cluster size balancing term Optimization Overall problem Results Notations Partial Labelling set ={ , } Bags = I bags of instances, Each instance n is associated with: a feature x n ∈ X a weight π n , a partial label y n , common to a bag, a latent label z n depending on y n Armand Joulin and Francis Bach A convex relaxation for weakly supervised classifiers
Introduction Notations Problem formulation Discriminative classifier Convex relaxation Cluster size balancing term Optimization Overall problem Results Discriminative classifier We consider a reguralized discriminative classifier: L ( z , w φ ( x ) + b ) + λ 2 � w � 2 F Armand Joulin and Francis Bach A convex relaxation for weakly supervised classifiers
Introduction Notations Problem formulation Discriminative classifier Convex relaxation Cluster size balancing term Optimization Overall problem Results Discriminative classifier We consider a reguralized discriminative classifier: L ( z , w φ ( x ) + b ) + λ 2 � w � 2 F where the loss function L ( z , w , b ) is the reweighted soft-max loss function: N exp( w T p φ ( x n ) + b p ) � � � � � − π n y nl z np log , � k ∈P l exp( w T k φ ( x n ) + b k ) n =1 l ∈L p ∈P l Our cost function is equivalent to the log-likelihood of a multinomial model ( w , b ) = parameters x = feature φ = feature map y n = label z n = latent label π n = weight Armand Joulin and Francis Bach A convex relaxation for weakly supervised classifiers
Introduction Notations Problem formulation Discriminative classifier Convex relaxation Cluster size balancing term Optimization Overall problem Results Cluster size balancing term In unsupervised learning or MIL, the same latent label to all the instances Armand Joulin and Francis Bach A convex relaxation for weakly supervised classifiers
Introduction Notations Problem formulation Discriminative classifier Convex relaxation Cluster size balancing term Optimization Overall problem Results Cluster size balancing term In unsupervised learning or MIL, the same latent label to all the instances = > Perfect separation Armand Joulin and Francis Bach A convex relaxation for weakly supervised classifiers
Introduction Notations Problem formulation Discriminative classifier Convex relaxation Cluster size balancing term Optimization Overall problem Results Cluster size balancing term we penalizing by the entropy of the proportion of instances per class and per bag (Joulin et al., 2010): � � � � � � � � H ( z ) = log π n z nk π n z nk i ∈ I k ∈P n ∈N i n ∈N i This penalization is related to a graphical model ( x → z → y ): No additional parameter i = bag n = instance x n = feature I = set of bags z n = latent label π n = weight Armand Joulin and Francis Bach A convex relaxation for weakly supervised classifiers
Introduction Notations Problem formulation Discriminative classifier Convex relaxation Cluster size balancing term Optimization Overall problem Results Overall problem Our overall problem is formulated as: � � L ( z , w , b ) − H ( z ) + λ 2 � w � 2 z ∈P min min w , b f ( z , w , b ) = F Not jointly convex in z and ( w , b ). H = cluster size balancing term L = cost function ( w , b ) = classifier parameters λ = regularization parameter P = set of latent labels z = latent label Armand Joulin and Francis Bach A convex relaxation for weakly supervised classifiers
Introduction Problem formulation Convex relaxation - Overview Convex relaxation Fenchel duality Optimization Reparametrization Results Convex relaxation - Overview � � L ( z , w , b ) − H ( z ) + λ 2 � w � 2 min z ∈P min w , b f ( z , w , b ) = F We use a dual formulation based on Fenchel duality We reparametrize the problem following Guo and Schuurmans, 2008 Finally we relax it to a semi-definite program (SDP) Armand Joulin and Francis Bach A convex relaxation for weakly supervised classifiers
Introduction Problem formulation Convex relaxation - Overview Convex relaxation Fenchel duality Optimization Reparametrization Results Duality with Fenchel conjugate The Fenchel conjugate of the log-partition is: � � � log( t k ) = max q k t k − q k log( q k ) q k k k The minimization in ( w , b ) leading to the dual formulation is in closed form: π n h ( q n ) − 1 � � ( q − z )( q − z ) T K � � min max − H ( z ) + λ tr z ∈P q ∈S N i ∈ I n ∈N i P ( q − z ) T π =0 where K nm = � π n φ ( x n ) , π m φ ( x m ) � . i = bag n = instance x n = feature φ = feature map π n = weight z n = latent label S P = simplex h = entropy (w,b) = classifier parameters Armand Joulin and Francis Bach A convex relaxation for weakly supervised classifiers
Introduction Problem formulation Convex relaxation - Overview Convex relaxation Fenchel duality Optimization Reparametrization Results Sources of non-convexity π n h ( q n ) − 1 � � � ( q − z )( q − z ) T K � min max − H ( z ) + λ tr z ∈P q ∈S N P i ∈ I n ∈N i ( q − z ) T π =0 2 sources of non-convexity: A constraint joining a variable of a convex and a concave problem A function which is not jointly convex/concave in z / q . Proposed solution (Guo and Schuurmans, 2008): Reparametrization in q SDP relaxation in z Armand Joulin and Francis Bach A convex relaxation for weakly supervised classifiers
Introduction Problem formulation Convex relaxation - Overview Convex relaxation Fenchel duality Optimization Reparametrization Results Reparametrization in q We reparametrize the problem by introducing an N × N matrix Ω such that: q = Ω z The constraints on q become convex constraints over Ω The problem becomes: π n h (Ω n z ) − 1 � � ( I − Ω) zz T ( I − Ω) T K � � min max H ( z ) + λ tr z ∈P R N × N Ω ∈ I i ∈ I n ∈N i + Ω T π = π Ω1 N =1 N q = dual variables z = latent label π = weights Armand Joulin and Francis Bach A convex relaxation for weakly supervised classifiers
Introduction Problem formulation Convex relaxation - Overview Convex relaxation Fenchel duality Optimization Reparametrization Results Tight upper-bound on the entropy The entropy in Ω and z is not convex We use the following upper bound: � � � π n h (Ω n z ) ≤ − π n h (Ω n ) + H ( z ) + C 0 . i ∈I n ∈N i n This upper-bound is tight for discrete values of z This leads to: π n h (Ω n ) − 1 � ( I − Ω) zz T ( I − Ω) T K � � min max λ tr z ∈P N R N × N Ω ∈ I n + Ω T π = π Ω1 N =1 N h = entropy z = latent label π = weights Armand Joulin and Francis Bach A convex relaxation for weakly supervised classifiers
Recommend
More recommend