Knowledge-Uncertainty Axiomatized Framework with Support Vector Machines for Hyperparameter Optimization Marcin Orchel AGH University of Science and Technology in Poland 1 / 54
1 Introduction 2 Problem Definition 3 Solution 4 SVM 5 Measures of Knowledge and Uncertainty 6 Experiments 7 Summary 2 / 54
Introduction Introduction 3 / 54
Introduction There are multiple reformulations of support vector machines (SVM). Reformulations regard objective function, constraints, representation of a solution (kernel function). How to group reformulations into one framework? We introduce a framework of knowledge and uncertainty. Multi-objective optimization problem with two goals: maximizing knowledge and minimizing uncertainty. Generalizing regularization term in SVM to an uncertainty measure, hinge loss to a knowledge measure. How to improve generalization performance or simplicity of SVM? Introduction 4 / 54
Introduction Define the most efficient measures of knowledge and uncertainty. Define concepts of knowledge and uncertainty with a set of axioms. Requirement: use existing SVM optimization problem. Define knowledge-uncertainty framework for selecting optimal values of hyperparameters. Select optimal values of hyperparameters over a finite set of candidates generated by a double grid search method. Introduction 5 / 54
Introduction There are no practical methods for selecting value of a hyperparameter C . We use double grid search method or some global optimization methods like evolutionary computation for selecting values of C and σ . We use cross validation for comparing different set of values of hyperparameters. We minimize statistical generalization bounds. We could aggregate solutions for multiple values of hyperparameters. Introduction 6 / 54
Introduction The similar idea to uncertainty is risk in financial economics. Risk is related more to potential loss then uncertainty. The knowledge has been axiomatized in epistemic modal logic. The axioms are based on logic rather than mathematical spaces. Introduction 7 / 54
Problem Definition Problem Definition 8 / 54
Classification problem Definition 1 (Classification problem with a training set) For a universe X of objects, a set C of classes and a set of mappings M T : X T ⊂ X → C \{ c 0 } called a training set T , a classification problem is to find classes for all elements of X . Definition 2 (Classification problem with hypotheses) For a classification problem with a training set, we define additionally a space of hypotheses H . Each hypothesis is a function h : X → C . Finding a class for all elements of X is replaced by finding a hypothesis. Problem Definition 9 / 54
Knowledge set Definition 3 (knowledge set) A knowledge set K is a tuple K = ( X , C , M T ; U , c ), shortly, without environment objects it is a pair K = ( U , c ), where c � = c 0 . It is a set U ⊂ X of points with information that every u ∈ U maps to c ∈ C \{ c 0 } . The c is called a class of a knowledge set . The difference between a hypothesis and a knowledge set, is that a knowledge set defines mappings only for some of objects, while a hypothesis is for all objects. We also define an unknowledge set as a pair U = ( U , c 0 ). Definition 4 (knowledge setting) A knowledge setting is a pair of knowledge sets, ( K 1 , K 2 ), shortly ( K 1 , 2 ), where K 1 = ( U 1 , c 1 ), K 2 = ( U 2 , c 2 ), c 1 , c 2 ∈ C \ { c 0 } , c 1 � = c 2 . Problem Definition 10 / 54
Knowledge set We can generalize knowledge settings to multiple knowledge sets, including only one knowledge set. The special type of knowledge setting is a knowledge setting consistent with a training set ( T 1 , 2 ). We define operation of inclusion as inclusion of mappings. We say ( K 1 , 2 ) ⊆ ( L 1 , 2 ), if and only if M ( K 1 , 2 ) ⊆ M ( L 1 , 2 ) . The difference between knowledge settings is the difference between corresponding sets of mappings. Problem Definition 11 / 54
Knowledge and uncertainty measures We define a space of knowledge measures K K for knowledge settings. Each knowledge measure is a function k K : ( K 1 , 2 ) → R . The goal is to find a knowledge setting with maximal knowledge measure. We define a space of uncertainty measures U K for knowledge settings. Each uncertainty measure is a function u K : ( K 1 , 2 ) → R . The goal is to find a knowledge setting with minimal uncertainty measure. The special type of uncertainty measure is a measure dependent on U 1 and U 2 but independent of mappings. More formally, it is a measure on an uncertain setting ( U 1 , 2 ), for example on ( X , c 0 ). Problem Definition 12 / 54
Knowledge measure axioms Axiom 1 (monotonicity of a knowledge measure) When ( K 1 , 2 ) ⊆ ( L 1 , 2 ) , then k K (( K 1 , 2 )) ≤ k K (( L 1 , 2 )) . (1) Axiom 2 (strict monotonicity of a knowledge measure) When ( K 1 , 2 ) ⊂ ( L 1 , 2 ) and ( L 1 , 2 ) \ ( K 1 , 2 ) ∩ ( T 1 , 2 ) � = ∅ , then k K (( K 1 , 2 )) < k K (( L 1 , 2 )) . (2) Axiom 3 (non-negativity) k K (( K 1 , 2 )) ≥ 0 , (3) Problem Definition 13 / 54
Knowledge measure axioms Axiom 4 (null empty set) k K (( ∅ , ∅ ) = 0 , (4) Axiom 5 (knowledge in a training set) k K (( K 1 , 2 ) \ ( T 1 , 2 )) = 0 , (5) Axiom 6 (knowledge in a training set 2) When ( K 1 , 2 ) ⊂ ( L 1 , 2 ) and ( L 1 , 2 ) \ ( K 1 , 2 ) ∩ ( T 1 , 2 ) = ∅ , k K (( K 1 , 2 )) = k K (( L 1 , 2 )) . (6) Problem Definition 14 / 54
Knowledge measure axioms Axiom 7 The maximal value of k K exists, that is k K < ∞ . Axiom 8 When ( K 1 , 2 ) ∩ ( T 1 , 2 ) � = ∅ , k K (( K 1 , 2 )) > 0 . (7) Axiom 9 �� + k K �� U ′ �� U ′ �� k K (( T 1 , 2 )) = k K 1 , c 1 2 , c 2 (8) + k K (( K 1 , 2 )) (9) Problem Definition 15 / 54
Knowledge measure axioms Axiom 10 (optional, additivity) When ( K 1 , 2 ) ∩ ( L 1 , 2 ) = ∅ , k K (( K 1 , 2 ) ∪ ( L 1 , 2 )) = k K (( K 1 , 2 )) + k K (( L 1 , 2 )) (10) Problem Definition 16 / 54
Knowledge measure axioms Example 1 The example of a knowledge measure is k K (( K 1 , 2 )) = | ( T 1 , 2 ) ∩ ( K 1 , 2 ) | . (11) The knowledge k K might be interpreted as an upper bound on the number of examples with correct classification included in a knowledge setting from a training set. In general, knowledge k K is a quality measure. Problem Definition 17 / 54
Uncertainty measure axioms Axiom 11 (monotonicity of an uncertainty measure) When ( K 1 , 2 ) ⊆ ( L 1 , 2 ) , then u K (( K 1 , 2 )) ≤ u K (( L 1 , 2 )) , (12) Axiom 12 (non-negativity) u K (( K 1 , 2 )) ≥ 0 . (13) Axiom 13 (null empty set) u K (( ∅ , ∅ ) = 0 . (14) Problem Definition 18 / 54
Uncertainty measure axioms Axiom 14 (uncertainty outside a training set) When ( K 1 , 2 ) �⊂ ( T 1 , 2 ) , u K (( K 1 , 2 )) > 0 . (15) Axiom 15 (uncertainty outside a training set 2) When ( K 1 , 2 ) ⊂ ( L 1 , 2 ) and ( L 1 , 2 ) \ ( K 1 , 2 ) �⊂ ( T 1 , 2 ) , u K (( K 1 , 2 )) < u K (( L 1 , 2 )) . (16) Axiom 16 The maximal value of u K exists, that is u K < ∞ . Problem Definition 19 / 54
Uncertainty measure axioms Axiom 17 u K (( K 1 , 2 )) = u K (( X , c 0 )) − u K (( U 1 ∪ U 2 ) ′ , c 0 ) (17) Axiom 18 (optional, additivity) When ( K 1 , 2 ) ∩ ( L 1 , 2 ) = ∅ , u K (( K 1 , 2 ) ∪ ( L 1 , 2 )) = u K (( K 1 , 2 )) + k K (( L 1 , 2 )) (18) Problem Definition 20 / 54
Uncertainty measure axioms Example 2 The example of an uncertainty measure is u K (( K 1 , 2 )) = | ( K 1 , 2 ) | . (19) The uncertainty u K might be related to uncertain knowledge from a knowledge setting, so the classification might be incorrect. It could be also interpreted as an upper bound on knowledge in a knowledge setting that might be incorrect. The uncertainty 0 means that we are sure about correct classification. Problem Definition 21 / 54
Solution Solution 22 / 54
Solution We define a solution for the classification problem as a solution of a multi-objective optimization problem for knowledge settings Optimization problem (OP) 1 ( K 1 , 2 ) ∈ K s : max k K (( K 1 , 2 )) , min u K (( K 1 , 2 )) . (20) We maximize knowledge k K and minimize u K . We are interested in a Pareto optimal set. For a finite set K s , the solution always exists. The objective function is lower bounded by (0 , 0). Solution 23 / 54
Solution We are interested only in Pareto optimal solutions. It is a priori assumption and might be considered as an axiom. Axiom 19 The best solutions for the OP 1 are Pareto optimal solutions. When we have multiple Pareto optimal solutions, they must be compared by using oracle for knowledge settings in order to obtain the best single solution. By selecting only Pareto optimal solutions, we limit the number of knowledge settings to validate in oracle. Solution 24 / 54
Solution Proposition 1 The set of Pareto optimal solutions includes only subsets of a training knowledge setting ( T 1 , 2 ) , if they are included in K s assuming axioms Ax. 6, Ax. 15. Not all subsets are Pareto optimal solutions in general. We could have two separated knowledge settings, when one would have more knowledge and less uncertainty. Solution 25 / 54
Online setting Proposition 2 After adding a new object to a training set, we only need to check old Pareto optimal solutions and all of them with added new object to solve OP 1 assuming Ax. 10 and Ax. 18. Solution 26 / 54
Recommend
More recommend