Order parameters and model selection in Machine Learning: model - PowerPoint PPT Presentation

Order parameters and model selection in Machine Learning: model characterization and feature selection Romaric Gaudel Advisor: Mich` ele Sebag; Co-advisor: Antoine Cornu´ ejols PhD, December 14, 2010

Introduction Relational Kernels Feature Selection Conclusion + Supervised Machine Learning Background Unknown distribution I P ( x , y ) on X × Y Objective Find h ∗ minimizing generalization error h ∗ ( x ) > 0 Err ( h ) = I P ( x , y ) [ ℓ ( h ( x ) , y )] E I Where ℓ ( h ( x ) , y ) is the cost of error on example x h ∗ ( x ) = 0 h ∗ ( x ) < 0 Given Training examples L = { ( x 1 , y 1 ) , . . . , ( x n , y n ) } Where ( x i , y i ) ∼ I P ( x , y ) , i ∈ 1 , . . . , n R. Gaudel (LRI) Model Characterization and Feature Selection PhD, December 14, 2010 2 / 52

Introduction Relational Kernels Feature Selection Conclusion + Supervised Machine Learning 2 (Vapnik-Chervonenkis; Bottou & Bousquet, 08) Approximation error (a.k.a. bias ) Learned hypothesis belong to H h ∗ h ∗ H = argmin Err ( h ) h ∈H Approximation Estimation error (a.k.a. variance ) h ∗ H H Err estimated by empirical error P ℓ ( h ( x i ) , y i ) Err n ( h ) = 1 n h n = argmin Err n ( h ) h ∈H Optimization error Learned hypothesis returned by an optimization algorithm A ˆ h n = A ( L ) R. Gaudel (LRI) Model Characterization and Feature Selection PhD, December 14, 2010 3 / 52

Introduction Relational Kernels Feature Selection Conclusion + Supervised Machine Learning 2 (Vapnik-Chervonenkis; Bottou & Bousquet, 08) Approximation error (a.k.a. bias ) Learned hypothesis belong to H h ∗ h ∗ H = argmin Err ( h ) h ∈H Approximation Estimation error (a.k.a. variance ) h ∗ H H Estimation Err estimated by empirical error P ℓ ( h ( x i ) , y i ) h n Err n ( h ) = 1 n h n = argmin Err n ( h ) h ∈H Optimization error Learned hypothesis returned by an optimization algorithm A ˆ h n = A ( L ) R. Gaudel (LRI) Model Characterization and Feature Selection PhD, December 14, 2010 3 / 52

Introduction Relational Kernels Feature Selection Conclusion + Supervised Machine Learning 2 (Vapnik-Chervonenkis; Bottou & Bousquet, 08) Approximation error (a.k.a. bias ) Learned hypothesis belong to H h ∗ h ∗ H = argmin Err ( h ) h ∈H Approximation Estimation error (a.k.a. variance ) h ∗ H H Estimation Err estimated by empirical error P ℓ ( h ( x i ) , y i ) h n Optimization Err n ( h ) = 1 ˆ h n n h n = argmin Err n ( h ) h ∈H Optimization error Learned hypothesis returned by an optimization algorithm A ˆ h n = A ( L ) R. Gaudel (LRI) Model Characterization and Feature Selection PhD, December 14, 2010 3 / 52

Introduction Relational Kernels Feature Selection Conclusion + Focus of the thesis Combinatorial optimization problems hidden in Machine Learning + Relational representation = ⇒ Combinatorial optimization problem Example: Mutagenesis database - + Feature Selection = ⇒ Combinatorial optimization problem Example: Microarray data − R. Gaudel (LRI) Model Characterization and Feature Selection PhD, December 14, 2010 4 / 52

Introduction Relational Kernels Feature Selection Conclusion + Position Theory Lower bound Experiments Discussion Outline Relational Kernels 1 Feature Selection 2 R. Gaudel (LRI) Model Characterization and Feature Selection PhD, December 14, 2010 5 / 52

Introduction Relational Kernels Feature Selection Conclusion + Position Theory Lower bound Experiments Discussion Outline Relational Kernels 1 Feature Selection 2 R. Gaudel (LRI) Model Characterization and Feature Selection PhD, December 14, 2010 6 / 52

Introduction Relational Kernels Feature Selection Conclusion + Position Theory Lower bound Experiments Discussion Relational Learning / Inductive Logic Programming Position Relational database X : keys in the database Background knowledge H : set of logical formulas Expressive language Actual covering test: Constraint Satisfaction Problem (CSP) R. Gaudel (LRI) Model Characterization and Feature Selection PhD, December 14, 2010 7 / 52

Introduction Relational Kernels Feature Selection Conclusion + Position Theory Lower bound Experiments Discussion CSP consequences within Inductive Logic Programming Consequences of the Phase Transition Complexity Worst case: NP-hard Average case: “easy” except in Phase Transistion (Cheeseman et al. 91) Phase Transition in Inductive Logic Programming Existence (Giordana & Saitta, 00) Impact: fails to learn in Phase Transition region (Botta et al., 03) R. Gaudel (LRI) Model Characterization and Feature Selection PhD, December 14, 2010 8 / 52

Introduction Relational Kernels Feature Selection Conclusion + Position Theory Lower bound Experiments Discussion Multiple Instance Problems The missing link between Relational and Propositional Learning Multiple Instance Problems (MIP) (Dietterich et al., 89) An example: set of instances An instance: vector of features Target-concept: there exists an instance satisfying a predicate P pos ( x ) ⇐ ⇒ ∃ I ∈ x , P ( I ) Example of MIP Positive key ring A locked door A positive key-ring contains a key which can unlock the door Negative key ring R. Gaudel (LRI) Model Characterization and Feature Selection PhD, December 14, 2010 9 / 52

Introduction Relational Kernels Feature Selection Conclusion + Position Theory Lower bound Experiments Discussion Support Vector Machine A Convex optimization problem ˆ n n 0 < ξ i < 1 h n ( x ) > 0 α i − 1 X X argmin α i α j y i y j � x i , x j � 2 R n α ∈ I i = 1 i = 1 (P n i = 1 α i y i = 0 s.t. 0 � α i � C , i = 1 , . . . , n ˆ ξ i = 0 h n ( x ) < 0 Kernel trick ˆ h n ( x ) = 1 ˆ � x i , x j � � K ( x i , x j ) ξ i > 1 h n ( x ) = 0 ˆ h n ( x ) = − 1 Kernel-based propositionalization (differs from RKHS framework) ( L = { ( x 1 , y 1 ) , . . . , ( x n , y n ) } � Φ : x → ( K ( x 1 , x ) , . . . , K ( x n , x )) K R. Gaudel (LRI) Model Characterization and Feature Selection PhD, December 14, 2010 10 / 52

Introduction Relational Kernels Feature Selection Conclusion + Position Theory Lower bound Experiments Discussion SVM and MIP Averaging-kernel for MIP (G¨ artner et al., 02) Given a kernel k on instances P P x j ∈ x ′ k ( x i , x j ) x i ∈ x K ( x , x ′ ) = norm ( x ) norm ( x ′ ) Question MIP Target-concept: existential properties Averaging-Kernel: average properties Do averaging-kernels sidestep limitations of Relational Learning? R. Gaudel (LRI) Model Characterization and Feature Selection PhD, December 14, 2010 11 / 52

Introduction Relational Kernels Feature Selection Conclusion + Position Theory Lower bound Experiments Discussion Methodology Inspired from Phase Transition studies Usual Phase Transition framework Generate data after control parameters Observe results Draw phase diagram: results w.r.t. order parameters This study Generalized Multiple Instance Problem Experimental results of averaging-kernel-based propositionalization R. Gaudel (LRI) Model Characterization and Feature Selection PhD, December 14, 2010 12 / 52

Introduction Relational Kernels Feature Selection Conclusion + Position Theory Lower bound Experiments Discussion Outline Relational Kernels 1 Theoretical failure region Lower bound on the generalization error Empirical failure region Feature Selection 2 R. Gaudel (LRI) Model Characterization and Feature Selection PhD, December 14, 2010 13 / 52

Introduction Relational Kernels Feature Selection Conclusion + Position Theory Lower bound Experiments Discussion Generalized Multiple Instance Problems Generalized MIP (Weidmann et al., 03) An example: set of instances An instance: vector of features Target-concept: conjunction of predicates P 1 , . . . , P m m ^ pos ( x ) ⇐ ⇒ ∃ I 1 , . . . , I m ∈ x , P i ( I i ) i = 1 O CH3 O CH3 Example of Generalized MIP C N C N CH3 N C O C O A molecule: set of sub-graphs = ⇒ CH3 N C C C C Bioactivity: implies several sub-graphs N N N N CH3 CH CH3 CH R. Gaudel (LRI) Model Characterization and Feature Selection PhD, December 14, 2010 14 / 52

Introduction Relational Kernels Feature Selection Conclusion + Position Theory Lower bound Experiments Discussion Control Parameters Category Param. Definition | Σ | Size of alphabet Σ , a ∈ Σ Instances d number of numerical features, I = ( a , z ) z ∈ [ 0 , 1 ] d + ε M + Number of instances per positive example M − Number of instances per negative example m + Number of instances in a predi- Examples cate, for positive example m − Number of instances in a predicate, for negative example P m Number of predicates “missed” - ε by each negative example P Number of predicate Concept ε Radius of each predicate ( ε - ball) R. Gaudel (LRI) Model Characterization and Feature Selection PhD, December 14, 2010 15 / 52

Order parameters and model selection in Machine Learning: model - PowerPoint PPT Presentation

Order parameters and model selection in Machine Learning: model characterization and feature selection Romaric Gaudel Advisor: Mich` ele Sebag; Co-advisor: Antoine Cornu ejols PhD, December 14, 2010 Introduction Relational Kernels

Validation and Testing COMPSCI 371D Machine Learning COMPSCI 371D Machine Learning

Camera Parameters INEL 6088 Computer Vision Camera Parameters Extrinsic parameters: define

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

ERP Selection KIRTANE & PANDIT Suhas Deshpande Why ERP Selection is important ?

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

STAT 213 Model Selection II Colin Reimer Dawson Oberlin College March 30, 2018 1 / 13 Outline

A Machine Learning Approach A Machine Learning Approach A Machine Learning Approach A Machine

Reorder Buffer Method Issue Execute Write Classic 5-stage pipeline In-order In-order

Machine Learning 1 Machine(Learning(in(a(Nutshell ( Data$ Model$ Performance$ Measure$

10601 Machine Learning Model and feature selection Model selection issues We have seen some

Variable selection bias Bias in Ensemble Bias in Ensemble Methods Methods Variable selection

SECONDHAND SELECTION Sales Price - 275,000.00 EU SECONDHAND SELECTION INTERNAL VIEWS SECONDHAND

SELECTION Deterministic Stochastic Proportionate selection: Roulette Wheel Selection

Some Clustering Methods on Some Clustering Methods on Some Clustering Methods on Dissimilarity

Arithmetic Circuit Identity Testing for Sparse Polynomials Speaker: Moritz Hardt Joint work

Strong relative monads Tarmo Uustalu, Institute of Cybernetics, Tallinn CMCS 2010, Paphos,

SHISA: The IPv6 Mobility Framework for BSD Operating Systems IPv6 Today Workshop 2nd August

Birds Eye View of Mendeley May 31 st 2011 NYU, ETS Group Jessica Mezei Community Liaison

Interactive Certificates for Polynomial Matrices with Sub-Linear Communication Daniel S. Roche

Computing Popov and Hermite forms of rectangular polynomial matrices ISSAC 2018 (New York, USA)

On the bicategory of operads and analytic functors Nicola Gambino University of Leeds Joint

Order parameters and model selection in Machine Learning: model - PowerPoint PPT Presentation

Order parameters and model selection in Machine Learning: model characterization and feature selection Romaric Gaudel Advisor: Mich` ele Sebag; Co-advisor: Antoine Cornu ejols PhD, December 14, 2010 Introduction Relational Kernels

Validation and Testing COMPSCI 371D Machine Learning COMPSCI 371D Machine Learning

Camera Parameters INEL 6088 Computer Vision Camera Parameters Extrinsic parameters: define

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

ERP Selection KIRTANE &amp; PANDIT Suhas Deshpande Why ERP Selection is important ?

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

STAT 213 Model Selection II Colin Reimer Dawson Oberlin College March 30, 2018 1 / 13 Outline

A Machine Learning Approach A Machine Learning Approach A Machine Learning Approach A Machine

Reorder Buffer Method Issue Execute Write Classic 5-stage pipeline In-order In-order

Machine Learning 1 Machine(Learning(in(a(Nutshell ( Data$ Model$ Performance$ Measure$

10601 Machine Learning Model and feature selection Model selection issues We have seen some

Variable selection bias Bias in Ensemble Bias in Ensemble Methods Methods Variable selection

SECONDHAND SELECTION Sales Price - 275,000.00 EU SECONDHAND SELECTION INTERNAL VIEWS SECONDHAND

SELECTION Deterministic Stochastic Proportionate selection: Roulette Wheel Selection

Some Clustering Methods on Some Clustering Methods on Some Clustering Methods on Dissimilarity

Arithmetic Circuit Identity Testing for Sparse Polynomials Speaker: Moritz Hardt Joint work

Strong relative monads Tarmo Uustalu, Institute of Cybernetics, Tallinn CMCS 2010, Paphos,

SHISA: The IPv6 Mobility Framework for BSD Operating Systems IPv6 Today Workshop 2nd August

Birds Eye View of Mendeley May 31 st 2011 NYU, ETS Group Jessica Mezei Community Liaison

Interactive Certificates for Polynomial Matrices with Sub-Linear Communication Daniel S. Roche

Computing Popov and Hermite forms of rectangular polynomial matrices ISSAC 2018 (New York, USA)

On the bicategory of operads and analytic functors Nicola Gambino University of Leeds Joint

ERP Selection KIRTANE & PANDIT Suhas Deshpande Why ERP Selection is important ?