Active Learning by the Naive Credal Classifier Alessandro Antonucci ∗ , Giorgio Corani ∗ , Sandra Gabaglio † ∗ Istituto “Dalle Molle” di Studi sull’Intelligenza Artificiale - Lugano (Switzerland) † ISIN/SUPSI - Lugano (Switzerland) PGM’12 - Granada, September 20, 2012
Active Learning Class C (values in C ) attributes A := ( A 1 , . . . , A k ) Training Dataset Test Set/Instance (supervised) (unsupervised) c ( 1 ) , a ( 1 ) 1 , . . . , a ( 1 ) ∗ , ˜ ∗ , ˜ a 1 , . . . , ˜ a 1 , . . . , ˜ a k a k k c ( 2 ) , a ( 2 ) 1 , . . . , a ( 2 ) k . . . c ( n ) , a ( n ) 1 , . . . , a ( n ) k
Active Learning Class C (values in C ) attributes A := ( A 1 , . . . , A k ) Training Dataset Test Set/Instance (supervised) (unsupervised) c ( 1 ) , a ( 1 ) 1 , . . . , a ( 1 ) ∗ , ˜ ∗ , ˜ a 1 , . . . , ˜ a 1 , . . . , ˜ a k a k k c ( 2 ) , a ( 2 ) 1 , . . . , a ( 2 ) k Classifier . . . c ( n ) , a ( n ) 1 , . . . , a ( n ) k
Active Learning Class C (values in C ) attributes A := ( A 1 , . . . , A k ) Training Dataset Active Set Test Set/Instance (unsupervised) (supervised) (unsupervised) c ( 1 ) , a ( 1 ) 1 , . . . , a ( 1 ) ∗ , ˜ ∗ , ˜ a 1 , . . . , ˜ a 1 , . . . , ˜ a k a k k ∗ , a ( a ) 1 , . . . , a ( a ) c ( 2 ) , a ( 2 ) 1 , . . . , a ( 2 ) k k Classifier ∗ , a ( b ) 1 , . . . , a ( b ) . . . k ∗ , a ( c ) 1 , . . . , a ( c ) c ( n ) , a ( n ) 1 , . . . , a ( n ) k k
Active Learning Class C (values in C ) attributes A := ( A 1 , . . . , A k ) Training Dataset Active Set Test Set/Instance (unsupervised) (supervised) (unsupervised) c ( 1 ) , a ( 1 ) 1 , . . . , a ( 1 ) ∗ , ˜ ∗ , ˜ a 1 , . . . , ˜ a 1 , . . . , ˜ a k a k k ∗ , a ( a ) 1 , . . . , a ( a ) c ( 2 ) , a ( 2 ) 1 , . . . , a ( 2 ) .3 k k Classifier ∗ , a ( b ) 1 , . . . , a ( b ) .8 . . . k ∗ , a ( c ) 1 , . . . , a ( c ) c ( n ) , a ( n ) 1 , . . . , a ( n ) .5 k k Active Learning Score
Active Learning Class C (values in C ) attributes A := ( A 1 , . . . , A k ) Training Dataset Active Set Test Set/Instance (unsupervised) (supervised) (unsupervised) c ( 1 ) , a ( 1 ) 1 , . . . , a ( 1 ) ∗ , ˜ ∗ , ˜ a 1 , . . . , ˜ a 1 , . . . , ˜ a k a k k ∗ , a ( a ) 1 , . . . , a ( a ) c ( 2 ) , a ( 2 ) 1 , . . . , a ( 2 ) .3 k k Classifier . . . ∗ , a ( c ) 1 , . . . , a ( c ) c ( n ) , a ( n ) 1 , . . . , a ( n ) .5 k k ∗ , a ( b ) 1 , . . . , a ( b ) .8 Active k Learning Score
Active Learning Class C (values in C ) attributes A := ( A 1 , . . . , A k ) Training Dataset Active Set Test Set/Instance (unsupervised) (supervised) (unsupervised) c ( 1 ) , a ( 1 ) 1 , . . . , a ( 1 ) ∗ , ˜ ∗ , ˜ a 1 , . . . , ˜ a 1 , . . . , ˜ a k a k k ∗ , a ( a ) 1 , . . . , a ( a ) c ( 2 ) , a ( 2 ) 1 , . . . , a ( 2 ) .3 k k Classifier . . . ∗ , a ( c ) 1 , . . . , a ( c ) c ( n ) , a ( n ) 1 , . . . , a ( n ) .5 k k c b , a ( b ) 1 , . . . , a ( b ) Active k Learning Annotation Score
Active Learning Class C (values in C ) attributes A := ( A 1 , . . . , A k ) Training Dataset Active Set Test Set/Instance (unsupervised) (supervised) (unsupervised) c ( 1 ) , a ( 1 ) 1 , . . . , a ( 1 ) ∗ , ˜ ∗ , ˜ a 1 , . . . , ˜ a 1 , . . . , ˜ a k a k k ∗ , a ( a ) 1 , . . . , a ( a ) c ( 2 ) , a ( 2 ) 1 , . . . , a ( 2 ) .3 k k Classifier . . . Actively Learned Classifier ∗ , a ( c ) 1 , . . . , a ( c ) c ( n ) , a ( n ) 1 , . . . , a ( n ) .5 k k (more accurate) c b , a ( b ) 1 , . . . , a ( b ) Active k Learning Annotation Score
Accuracy Trajectories constant AL score: random pick among active set instances (variance error decreases, accuracy increases)
Accuracy Trajectories constant AL score: random pick among active set instances (variance error decreases, accuracy increases) Accuracy (on test set) Training Set N N+d N+2d N+kd . . . Size Active Set Active Set FULL EMPTY
Accuracy Trajectories constant AL score: random pick among active set instances (variance error decreases, accuracy increases) Accuracy (on test set) Random Pick Training Set N N+d N+2d N+kd . . . Size Active Set Active Set FULL EMPTY
Accuracy Trajectories constant AL score: random pick among active set instances (variance error decreases, accuracy increases) Accuracy AL algorithm (on test set) Random Pick AL algs should do better! Training Set N N+d N+2d N+kd . . . Size Active Set Active Set FULL EMPTY
Naive Classifiers C A k A 1 A 2 . . . Given class C , attributes independent NAIVE BAYES (NBC) NAIVE CREDAL (NCC) A BN quantified from data by BNs quantified by set of priors a flat Dirichlet prior Dir ( s t ) Imprecise Dirichlet model � � P ( c , a ) = P ( c ) · � m Dir ( s t ) : t > 0 , � i t i = 1 T ≡ i = 1 P ( a i | c ) A set C ∗ ⊆ C of optimal Given test instance a , assigns class c ∗ := arg max c ∈C P ( c | f ) (undominated) classes ∀ c ′ , c ′′ ∈ C , dominance test Conservative dominance test P t ( c ′ , a ) P ( c ′′ | a ) = P ( c ′ , a ) P ( c ′ | a ) min t ∈T P t ( c ′′ , a ) > 1 P ( c ′′ , a ) > 1
Naive Classifiers C A k A 1 A 2 . . . Given class C , attributes independent NAIVE BAYES (NBC) NAIVE CREDAL (NCC) A BN quantified from data by BNs quantified by set of priors a flat Dirichlet prior Dir ( s t ) Imprecise Dirichlet model � � P ( c , a ) = P ( c ) · � m Dir ( s t ) : t > 0 , � i t i = 1 T ≡ i = 1 P ( a i | c ) A set C ∗ ⊆ C of optimal Given test instance a , assigns class c ∗ := arg max c ∈C P ( c | f ) (undominated) classes ∀ c ′ , c ′′ ∈ C , dominance test Conservative dominance test P t ( c ′ , a ) P ( c ′′ | a ) = P ( c ′ , a ) P ( c ′ | a ) min t ∈T P t ( c ′′ , a ) > 1 P ( c ′′ , a ) > 1
Naive Classifiers C A k A 1 A 2 . . . Given class C , attributes independent NAIVE BAYES (NBC) NAIVE CREDAL (NCC) A BN quantified from data by BNs quantified by set of priors a flat Dirichlet prior Dir ( s t ) Imprecise Dirichlet model � � P ( c , a ) = P ( c ) · � m Dir ( s t ) : t > 0 , � i t i = 1 T ≡ i = 1 P ( a i | c ) A set C ∗ ⊆ C of optimal Given test instance a , assigns class c ∗ := arg max c ∈C P ( c | f ) (undominated) classes ∀ c ′ , c ′′ ∈ C , dominance test Conservative dominance test P t ( c ′ , a ) P ( c ′′ | a ) = P ( c ′ , a ) P ( c ′ | a ) min t ∈T P t ( c ′′ , a ) > 1 P ( c ′′ , a ) > 1
Naive Classifiers C A k A 1 A 2 . . . Given class C , attributes independent NAIVE BAYES (NBC) NAIVE CREDAL (NCC) A BN quantified from data by BNs quantified by set of priors a flat Dirichlet prior Dir ( s t ) Imprecise Dirichlet model � � P ( c , a ) = P ( c ) · � m Dir ( s t ) : t > 0 , � i t i = 1 T ≡ i = 1 P ( a i | c ) A set C ∗ ⊆ C of optimal Given test instance a , assigns class c ∗ := arg max c ∈C P ( c | f ) (undominated) classes ∀ c ′ , c ′′ ∈ C , dominance test Conservative dominance test P t ( c ′ , a ) P ( c ′′ | a ) = P ( c ′ , a ) P ( c ′ | a ) min t ∈T P t ( c ′′ , a ) > 1 P ( c ′′ , a ) > 1
Uncertainty Samplings AL score ( a ) shows how hard-to-classify an instance is difficult/ambiguous instances give better contribution to learning Uncertainty Sampling Credal Uncertainty Sampling Based on NBC Set of NCC posteriors P ( C | a ) posterior P ( C | a ) The weaker the dominances, the The smaller the more hard-to-classify instances probability of the most If C = { c ′ , c ′′ } (binary class): probable class, the score ( a ) ≡ more hard-to-classify � P ( c ′ | a ) P ( c ′′ | a ) � − max min t P ( c ′′ | a ) , min t is the instance P ( c ′ | a ) More than two classes? score ( a ) ≡ − P ( c ∗ | a ) Max over all pairs ( c ′ , c ′′ ) ∈ C 2
Uncertainty Samplings AL score ( a ) shows how hard-to-classify an instance is difficult/ambiguous instances give better contribution to learning Uncertainty Sampling Credal Uncertainty Sampling Based on NBC Set of NCC posteriors P ( C | a ) posterior P ( C | a ) The weaker the dominances, the The smaller the more hard-to-classify instances probability of the most If C = { c ′ , c ′′ } (binary class): probable class, the score ( a ) ≡ more hard-to-classify � P ( c ′ | a ) P ( c ′′ | a ) � − max min t P ( c ′′ | a ) , min t is the instance P ( c ′ | a ) More than two classes? score ( a ) ≡ − P ( c ∗ | a ) Max over all pairs ( c ′ , c ′′ ) ∈ C 2
Uncertainty Samplings AL score ( a ) shows how hard-to-classify an instance is difficult/ambiguous instances give better contribution to learning Uncertainty Sampling Credal Uncertainty Sampling Based on NBC Set of NCC posteriors P ( C | a ) posterior P ( C | a ) The weaker the dominances, the The smaller the more hard-to-classify instances probability of the most If C = { c ′ , c ′′ } (binary class): probable class, the score ( a ) ≡ more hard-to-classify � P ( c ′ | a ) P ( c ′′ | a ) � − max min t P ( c ′′ | a ) , min t is the instance P ( c ′ | a ) More than two classes? score ( a ) ≡ − P ( c ∗ | a ) Max over all pairs ( c ′ , c ′′ ) ∈ C 2
Recommend
More recommend