On Discriminative Learning of Prediction Uncertainty Vojtěch Franc, Daniel Průša Department of Cybernetics Faculty of Electrical Engineering Czech Technical University in Prague
On Discriminative Learning of Prediction Uncertainty � h ( x ) 2/2 with probability c ( x ) Selective classifier: ( h, c )( x ) = reject with probability 1 − c ( x ) where h : X → Y is a classifier and c : X → [0 , 1] is a selection function
On Discriminative Learning of Prediction Uncertainty � h ( x ) 2/2 with probability c ( x ) Selective classifier: ( h, c )( x ) = reject with probability 1 − c ( x ) where h : X → Y is a classifier and c : X → [0 , 1] is a selection function Example: Linear SVM h ( x ) = sign( � φ ( x ) , w � + b ) c ( x ) = [ [ |� φ ( x ) , w � + b | ≥ θ ] ]
On Discriminative Learning of Prediction Uncertainty � h ( x ) 2/2 with probability c ( x ) Selective classifier: ( h, c )( x ) = reject with probability 1 − c ( x ) where h : X → Y is a classifier and c : X → [0 , 1] is a selection function 8 Example: Linear SVM SVM + distance to hyperplane h ( x ) = sign( � φ ( x ) , w � + b ) 6 selective risk [%] c ( x ) = [ [ |� φ ( x ) , w � + b | ≥ θ ] ] 4 Coverage: � � R S = 2.1% φ ( c ) = E x ∼ p c ( x ) 2 Selective risk: � � 0 ℓ ( y,h ( x )) c ( x ) E ( x,y ) ∼ p R S ( h, c ) = 0 20 40 60 80 100 φ ( x ) coverage [%]
On Discriminative Learning of Prediction Uncertainty � h ( x ) 2/2 with probability c ( x ) Selective classifier: ( h, c )( x ) = reject with probability 1 − c ( x ) where h : X → Y is a classifier and c : X → [0 , 1] is a selection function 8 Example: Linear SVM SVM + distance to hyperplane h ( x ) = sign( � φ ( x ) , w � + b ) 6 selective risk [%] c ( x ) = [ [ |� φ ( x ) , w � + b | ≥ θ ] ] 4 R S = 2.1% 2 In our paper we show: 1) What is the optimal c ( x ) 0 0 20 40 60 80 100 2) How to learn c ( x ) discriminatively coverage [%]
On Discriminative Learning of Prediction Uncertainty � h ( x ) 2/2 with probability c ( x ) Selective classifier: ( h, c )( x ) = reject with probability 1 − c ( x ) where h : X → Y is a classifier and c : X → [0 , 1] is a selection function 8 Example: Linear SVM SVM + distance to hyperplane SVM + learned selection function h ( x ) = sign( � φ ( x ) , w � + b ) 6 selective risk [%] c ( x ) = [ [ |� φ ( x ) , w � + b | ≥ θ ] ] 4 R S = 2.1% 2 In our paper we show: R S = 0.2% 1) What is the optimal c ( x ) 0 0 20 40 60 80 100 2) How to learn c ( x ) discriminatively coverage [%]
Recommend
More recommend