h ow c an w e d esign an a lgorithm
play

H OW C AN W E D ESIGN AN A LGORITHM ? In all the above problems one - PowerPoint PPT Presentation

R EGULARIZATION FOR M ULTI -O UTPUT L EARNING R EGULARIZATION M ETHODS FOR H IGH D IMENSIONAL L EARNING Francesca Odone and Lorenzo Rosasco odone@disi.unige.it - lrosasco@mit.edu June 10, 2011 Regularization Methods for High Dimensional Learning


  1. R EGULARIZATION FOR M ULTI -O UTPUT L EARNING R EGULARIZATION M ETHODS FOR H IGH D IMENSIONAL L EARNING Francesca Odone and Lorenzo Rosasco odone@disi.unige.it - lrosasco@mit.edu June 10, 2011 Regularization Methods for High Dimensional Learning Regularization for Multi-Output Learning

  2. A BOUT THIS CLASS G OAL In many practical problems, it is convenient to model the object of interest as a function with multiple outputs. In machine learning, this problem typically goes under the name of multi-task or multi-output learning. We present some concepts and algorithms to solve this kind of problems. Regularization Methods for High Dimensional Learning Regularization for Multi-Output Learning

  3. P LAN Examples and Set-up Tikhonov regularization for multiple output learning Regularizers and Kernels Vector Fields Multiclass Conclusions Regularization Methods for High Dimensional Learning Regularization for Multi-Output Learning

  4. C OSTUMERS M ODELING C OSTUMERS M ODELING the goal is to model buying preferences of several people based on previous purchases. BORROWING STRENGTH People with similar tastes will tend to buy similar items and their buying history is related. The idea is then to predict the consumer preferences for all individuals simultaneously by solving a multi-output learning problem. Each consumer is modelled as a task and its previous preferences are the corresponding training set. Regularization Methods for High Dimensional Learning Regularization for Multi-Output Learning

  5. M ULTI - TASK L EARNING We are given T scalar tasks. For each task j = 1 , . . . , T , we are given a set of examples n j S j = ( x j i , y j i ) i = 1 sampled i.i.d. according to a distribution P j . The goal is to find f j ( x ) ∼ y j = 1 , . . . , T . Regularization Methods for High Dimensional Learning Regularization for Multi-Output Learning

  6. M ULTI - TASK L EARNING Task 1 Y X Task 2 X Regularization Methods for High Dimensional Learning Regularization for Multi-Output Learning

  7. P HARMACOLOGICAL D ATA Blood concentration of a medicine across different times. Each task is a patient. 60 60 40 40 20 20 0 0 0 5 10 15 20 25 0 5 10 15 20 25 60 60 40 40 20 20 0 0 0 5 10 15 20 25 0 5 10 15 20 25 Single-task Multi-task 60 60 Red dots are test and black dots are training points. ( pics from Pillonetto et al. 08) Regularization Methods for High Dimensional Learning Regularization for Multi-Output Learning

  8. N AMES AND A PPLICATONS Related problems: conjoint analysis transfer learning collaborative filtering co-kriging Examples of applications: geophysics music recommendation (Dinuzzo 08) pharmacological data (Pillonetto at el. 08) binding data (Jacob et al. 08) movies recommendation (Abernethy et al. 08) HIV Therapy Screening (Bickel et al. 08) Regularization Methods for High Dimensional Learning Regularization for Multi-Output Learning

  9. M ULTI - TASK L EARNING : R EMARKS The framework is very general. The input spaces can be different. The output space can be different. The hypotheses spaces can be different Regularization Methods for High Dimensional Learning Regularization for Multi-Output Learning

  10. H OW C AN W E D ESIGN AN A LGORITHM ? In all the above problems one can think of improving performances, by exploiting relation among the different outputs. A possible way to do this is penalized empirical risk minimization f 1 ,..., f T ERR [ f 1 , . . . , f T ] + λ PEN ( f 1 , . . . , f T ) min Typically The error term is the sum of the empirical risks. The penalty term enforces similarity among the tasks. Regularization Methods for High Dimensional Learning Regularization for Multi-Output Learning

  11. E RROR T ERM We are going to choose the square loss to measure errors. T n 1 � � ( y j i − f j ( x j ERR [ f 1 , . . . , f T ] = i )) 2 n j j = 1 i = 1 Regularization Methods for High Dimensional Learning Regularization for Multi-Output Learning

  12. MTL MTL Let f j : X → R , j = 1 , . . . T then T ERR [ f 1 , . . . , f T ] = � I S j [ f j ] j = 1 with n I S [ f ] = 1 � ( y i − f ( x i )) 2 n i = 1 Regularization Methods for High Dimensional Learning Regularization for Multi-Output Learning

  13. B UILDING R EGULARIZERS We assume that input, output and hypotheses spaces are the same, i.e. X j = X , Y j = Y , and H j = H , for all j = 1 , . . . , T . We also assume H to be a RKHS with kernel K . Regularization Methods for High Dimensional Learning Regularization for Multi-Output Learning

  14. R EGULARIZERS : M IXED E FFECT For each component/task the solution is the same function plus a component/task specific component. T T T � f j − � � � � f j � 2 f s � 2 PEN ( f 1 , . . . , f T ) = λ K + γ K j = 1 j = 1 s = 1 Regularization Methods for High Dimensional Learning Regularization for Multi-Output Learning

  15. R EGULARIZERS : G RAPH R EGULARIZATION We can define a regularizer that, in addition to a standard regularization on the single components, forces stronger or weaker similarity through a T × T positive weight matrix M : T T � f ℓ − f q � 2 � � � f ℓ � 2 PEN ( f 1 , . . . , f T ) = γ K M ℓ q + λ K M ℓℓ ℓ, q = 1 ℓ = 1 Regularization Methods for High Dimensional Learning Regularization for Multi-Output Learning

  16. R EGULARIZERS : CLUSTER The components/tasks are partitioned into c clusters: components in the same cluster should be similar. Let m r , r = 1 , . . . , c , be the cardinality of each cluster, I ( r ) , r = 1 , . . . , c , be the index set of the components that belong to cluster c . c c || f l − f r || 2 � � � m r || f r || 2 PEN ( f 1 , . . . , f T ) = γ K + λ K r = 1 l ∈ I ( r ) r = 1 where f r , , r = 1 , . . . , c , is the mean in cluster c . Regularization Methods for High Dimensional Learning Regularization for Multi-Output Learning

  17. H OW CAN WE FIND A THE SOLUTION ? We have to solve T n T T T f 1 ,..., f T { 1 i − f j ( x i )) 2 + λ � f j − � � ( y j � � � � f j � 2 f s � 2 K + γ K } min n j = 1 i = 1 j = 1 j = 1 s = 1 (we considered the first regularizer as an example). The theory of RKHS gives us a way to do this using what we already know from the scalar case. Regularization Methods for High Dimensional Learning Regularization for Multi-Output Learning

  18. T IKHONOV R EGULARIZATION We now show that for al the above penalties we can define a suitable RKHS with kernel Q (and re-index the sums in the error term), so that T n 1 i − f j ( x i )) 2 + λ PEN ( f 1 , . . . , f T ) } � � ( y j f 1 ,..., f T { min n j j = 1 i = 1 can be written as n T f ∈H { 1 ( y i − f ( x i , t i )) 2 + λ � f � 2 � min Q } n T i = 1 Regularization Methods for High Dimensional Learning Regularization for Multi-Output Learning

  19. K ERNELS AT R ESCUE Consider a (joint) kernel Q : ( X , Π) × ( X , Π) → R , where Π = 1 , . . . T is the index set of the output components. A function in the space is � f ( x , t ) = Q (( x , t ) , ( x i , t i )) c i , i with norm � � f � 2 Q = Q (( x j , t j ) , ( x i , t i )) c i c j . i , j Regularization Methods for High Dimensional Learning Regularization for Multi-Output Learning

  20. A U SEFUL C LASS OF K ERNELS Let A be a T × T positive definite matrix and K a scalar kernel. Consider a kernel Q : ( X , Π) × ( X , Π) → R , defined by Q (( x , t ) , ( x ′ , t ′ )) = K ( x , x ′ ) A t , t ′ . Then the norm of a function is � f � 2 � Q = K ( x i , x j ) A t i t j c i c j . i , j Regularization Methods for High Dimensional Learning Regularization for Multi-Output Learning

  21. R EGULARIZERS AND K ERNELS If we fix t then f t ( x ) = f ( t , x ) is one of the task. The norm � · � Q can be related to the scalar products among the tasks. � � f � 2 A † Q = s , t � f s , f t � K s , t Regularization Methods for High Dimensional Learning Regularization for Multi-Output Learning

  22. R EGULARIZERS AND K ERNELS If we fix t then f t ( x ) = f ( t , x ) is one of the task. The norm � · � Q can be related to the scalar products among the tasks. � � f � 2 A † Q = s , t � f s , f t � K s , t This implies that : s , t A † A regularizer of the form � s , t � f s , f t � K defines a kernel Q . Regularization Methods for High Dimensional Learning Regularization for Multi-Output Learning

  23. R EGULARIZERS AND K ERNELS If we fix t then f t ( x ) = f ( t , x ) is one of the task. The norm � · � Q can be related to the scalar products among the tasks. � � f � 2 A † Q = s , t � f s , f t � K s , t This implies that : s , t A † A regularizer of the form � s , t � f s , f t � K defines a kernel Q . The norm induced by a kernel Q of the form K ( x , x ′ ) A can be seen as a regularizer. Regularization Methods for High Dimensional Learning Regularization for Multi-Output Learning

  24. R EGULARIZERS AND K ERNELS If we fix t then f t ( x ) = f ( t , x ) is one of the task. The norm � · � Q can be related to the scalar products among the tasks. � � f � 2 A † Q = s , t � f s , f t � K s , t This implies that : s , t A † A regularizer of the form � s , t � f s , f t � K defines a kernel Q . The norm induced by a kernel Q of the form K ( x , x ′ ) A can be seen as a regularizer. The matrix A encodes relations among outputs. Regularization Methods for High Dimensional Learning Regularization for Multi-Output Learning

Recommend


More recommend