applications to high dimensional problems
play

Applications to high dimensional problems Francesca Odone and - PowerPoint PPT Presentation

Applications to high dimensional problems Francesca Odone and Lorenzo Rosasco RegML 2013 Application domains Machine Learning systems are trained on examples rather than being programmed Regularization Methods for High Dimensional Learning


  1. Applications to high dimensional problems Francesca Odone and Lorenzo Rosasco RegML 2013

  2. Application domains Machine Learning systems are trained on examples rather than being programmed Regularization Methods for High Dimensional Learning Intro

  3. Some success stories Pedestrian detection face detection OCR fa speech recognition Regularization Methods for High Dimensional Learning Applications

  4. Plan ✤ bioinformatics : gene selection (elastic net) ✤ computer vision : object detection (l1 regularization) ✤ human robot interaction : action recognition (dictionary learning and multi-class categorization) ✤ video-surveillance : pose detection (semi-supervised learning) Regularization Methods for High Dimensional Learning Applications

  5. Microarray analysis Goals: ✤ Design methods able to identify a gene signature, i.e., a panel of genes potentially interesting for further streening ✤ Learn the gene signatures, i.e., select the most discriminant subset of genes on the available data Regularization Methods for High Dimensional Learning Applications

  6. Microarray analysis A typical “-omics” scenario: ✤ High dimensional data - Few samples per class • tenths of data - tenths of thousands genes → Variable selection ✤ High risk of selection bias • data distortion arising from the way the data are collected due to the small amount of data available → Model assessment needed ✤ Find ways to incorporate prior knowledge ✤ Deal with data visualization Regularization Methods for High Dimensional Learning Applications

  7. Gene selection THE PROBLEM ✤ Select a small subset of input variables (genes) which are used for building classifiers ADVANTAGES: ✤ it is cheaper to measure less variables ✤ the resulting classifier is simpler and potentially faster ✤ prediction accuracy may improve by discarding irrelevant variables ✤ identifying relevant variables gives useful information about the nature of the corresponding classification problem (biomarker detection) Regularization Methods for High Dimensional Learning Applications

  8. Elastic net and gene selection β ∈ R p || Y − � X || 2 + ⌧ ( || � || 1 + ✏ || � || 2 min 2 ) ✤ Consistency guaranteed - the more samples available the better the estimator ✤ Multivariate - it takes into account many genes at once Output: ✤ One-parameter family of nested lists with equivalent prediction ability and increasing correlation among genes ✤ minimal list of prototype genes ✏ → 0 ✤ longer lists including correlated genes ✏ 1 < ✏ 2 < ✏ 3 < . . . Regularization Methods for High Dimensional Learning Applications

  9. Double optimization approach Mosci et al, 2008 ✤ Variable selection step (elastic net) β ∈ R p || Y − � X || 2 + ⌧ ( || � || 1 + ✏ || � || 2 min 2 ) ✤ Classification step (RLS) || Y − β X || 2 2 + λ || β || 2 2 for each ✏ we have to choose � and ⌧ the combination prevents the elastic net shrinking effect Regularization Methods for High Dimensional Learning Applications

  10. Dealing with selection bias Barla et al, 2008 ( λ 1 , . . . , λ A ) λ → ( τ 1 , . . . , τ B ) τ → the optimal pair ( λ ∗ , τ ∗ ) is one of the possible A · B pairs ( λ , τ ) Regularization Methods for High Dimensional Learning Applications

  11. Computational issues • Computational time for LOO (for one task) time 1 − optim = (2 . 5 s to 25 s ) depending on the correlation parameter total time = A · B · N samples · time 1 − optim 20 · 20 · 30 · time 1 − optim ∼ 2 · 10 4 s to 2 · 10 5 s ∼ • 6 tasks → 1 week!! Regularization Methods for High Dimensional Learning Applications

  12. Image understanding ✤ Image understanding is still largely unsolved ✤ today we are starting to answer more specific questions such as object detection, image categorization, ... ✤ Machine learning has been the key to solve this kind of problems: ✤ it deals with noise and intra-class variability by collecting appropriate data and finding suitable descriptions ✤ Notice that images are relatively easy to gather (but not to label!) • many benchmark datasets • labeling tools • and services ✤ image representations are very high dimensional • curse of dimensionality • computational cost at run time (while often we need real time performances) Regularization Methods for High Dimensional Learning Applications

  13. Adaptive representations from fixed dictionaries ✤ Overcomplete general purpose sets of features are effective for modeling visual information ✤ Many object classes have peculiar intrinsic structures that can be better appreciated if one looks for symmetries or local geometries ✤ Examples of dictionaries: ✤ wavelets, ranklets, chirplets, banks of filters, ... ✤ See for instance • face detection [Heisele et al., Viola & Jones, Destrero et al.], • pedestrian detection [Oren et al., Dalal and Triggs] • car detection [Papageorgiou & Poggio] Regularization Methods for High Dimensional Learning Applications

  14. Object detection in images ✤ object detection is a binary classification problem • image regions of variable size are classified: is it an instance of an object or not? ✤ unbalanced classes • in this 380x220 px image we perform ~6.5x10 5 tests and we should find only 11 positives ✤ the training set contains • images of positive examples (instances of the object) • negative examples (background) Regularization Methods for High Dimensional Learning Applications

  15. object detection in images x i → ( φ 1 ( x i ) , . . . , φ p ( x i )) image processing & computer vision offer a variety of local and global features for different purposes Regularization Methods for High Dimensional Learning Applications

  16. feature selection on fixed dictionaries We start off from an overcomplete dictionary of features ✤ D = { φ γ : X → R, γ ∈ Γ } We assume Φ β = Y where Φ = { Φ ij } is the data matrix; β = ( β 1 , ..., β p ) T ✤ vector of unknown weights to be estimated; Y = ( y 1 , ..., y n ) T output labels usually p is big; existence of the solution is ensured, uniqueness is not ✤ overcomplete dictionaries contain many correlated features ✤ Thus, the problem is ill-posed. ✤ Selection of meaningful features subsets L1 regularization allows us to select a sparse subset of meaningful features for the ✤ problem, with the aim of discarding correlated ones β ∈ R p || Y − β Φ || 2 + τ || β || 1 min Regularization Methods for High Dimensional Learning Applications

  17. an example of fixed size dictionary Destrero et al, 2009 ✤ rectangle features aka Haar-like features (Viola & Jones) are one of the most effective representations of images for face detection ✤ size of the initial dictionary: a 19 x 19 px image is mapped into a 64.000-dim feature vector! ✤ the features selected Regularization Methods for High Dimensional Learning Applications

  18. the role of prior knowledge ✤ Many image features have a characteristic internal structure ✤ An image patch is divided in regions or cells and represented according to the specific description, then all representations are concatenated ✤ many features used in computer vision share this common structure (SIFT, HOG, LBP , ...) ✤ In such cases it is beneficial to select groups of features belonging to the same region (so called Group Lasso ) Regularization Methods for High Dimensional Learning Applications

  19. Selecting features groups Zini & Odone, 2010 Fusco et al, 2013 Pedestrian detection with Face recognition with LBP HOG features: binary classification features: multi-class categorization G ! G ! β ∗ = arg min B ∗ = arg min k Y � Φ B k 2 X k y � Φ β k 2 X � � � � � B I g . 2 + τ � β I g 2 + τ � � F 2 B β g =1 g =1 fixed size HOG (105 blocks)[DalTri05] group lasso 50 blocks group lasso 104 blocks group lasso 210 blocks group lasso 387 blocks − 1 10 miss rate − 2 10 − 4 − 3 − 2 − 1 10 10 10 10 false positives per window Regularization Methods for High Dimensional Learning Applications

  20. adaptive dictionaries keypoints clusters Regularization Methods for High Dimensional Learning Applications

  21. adaptive dictionaries D , u k x � Du k 2 ✤ sparse codes 2 + λ k u k 1 subject to k d i k 2  1 . min fixed vs adaptive Regularization Methods for High Dimensional Learning Applications

  22. HRI: iCub recognizing actions Gori, Fanello et al 2012 Regularization Methods for High Dimensional Learning Applications

  23. Semi-supervised pose classification Noceti & Odone, in preparation ✤ The capability of classifying people with respect to their orientation in space is important for a number of tasks • An example is the analysis of collective activities , where the reciprocal orientation of people within a group is an important feature Back • Back The typical approach relies on Back Left Right quantizing the possible Right Left orientations in 8 main angles Front Front Right • Left Appearance changes very Front smoothly and labeling may be Back subjective Left Left Front Left Regularization Methods for High Dimensional Learning Applications

Recommend


More recommend