New Regularized Algorithms for Transductive Learning Partha Pratim Talukdar University of Pennsylvania, USA Koby Crammer Technion, Israel 1
Graph-based Semi-Supervised Learning 0.2 0.2 0.1 0.3 0.3 0.2 0.2 2
Graph-based Semi-Supervised Learning 0.2 0.2 0.1 0.3 0.3 0.2 0.2 Labeled (seed) 2
Graph-based Semi-Supervised Learning 0.2 0.2 0.1 0.3 0.3 0.2 0.2 Unlabeled Labeled (seed) 2
Graph-based Semi-Supervised Learning 0.2 0.2 0.1 0.3 0.3 0.2 0.2 3
Graph-based Semi-Supervised Learning 0.2 0.2 0.1 0.3 0.3 0.2 0.2 Various methods: LP (Zhu et al., 2003); QC (Bengio et al., 2007); Adsorption (Baluja et al., 2008) 3
Adsorption Algorithm 4
Adsorption Algorithm • Successfully used in • YouTube Video Recommendation [Baluja et al., 2008] , Semantic Classification [Talukdar et al., 2008] 4
Adsorption Algorithm • Successfully used in • YouTube Video Recommendation [Baluja et al., 2008] , Semantic Classification [Talukdar et al., 2008] • It has not been analyzed so far • Is it optimizing an objective? If so, what? • Motivation for proposed work 4
Adsorption Algorithm [Baluja et al., WWW 2008] 0.3 0.3 0.2
Adsorption Algorithm [Baluja et al., WWW 2008] 0.3 0.3 0.2
Adsorption Algorithm [Baluja et al., WWW 2008] 0.3 0.3 Dummy Label 0.2
Characteristics of Adsorption 6
Characteristics of Adsorption • Highly scalable and iterative 6
Characteristics of Adsorption • Highly scalable and iterative • Main difference with previous methods: • all nodes are not equal: high-degree nodes are discounted 6
Characteristics of Adsorption • Highly scalable and iterative • Main difference with previous methods: • all nodes are not equal: high-degree nodes are discounted • Two equivalent views: 6
Characteristics of Adsorption • Highly scalable and iterative • Main difference with previous methods: • all nodes are not equal: high-degree nodes are discounted • Two equivalent views: Label Diffusion Random Walk U L L U 6
Random Walk View U V 7
Random Walk View what next? U V 7
Random Walk View what next? U V • Continue walk with prob. p cont v • Assign p inj V’s seed label to U with prob. v • Abandon random walk with prob. p abnd v • assign U a dummy label with score p abnd v 7
Discounting High-Degree Nodes
Discounting High-Degree Nodes • High-degree nodes can be unreliable • do not allow propagation/walk through them
Discounting High-Degree Nodes • High-degree nodes can be unreliable • do not allow propagation/walk through them • Solution: increase abandon probability on high- degree nodes
Discounting High-Degree Nodes • High-degree nodes can be unreliable • do not allow propagation/walk through them • Solution: increase abandon probability on high- degree nodes p abnd ∝ degree(v) v
Is Adsorption Optimizing an Objective?
Is Adsorption Optimizing an Objective? • Under certain assumptions, NO ! • Theorem in the paper
Is Adsorption Optimizing an Objective? • Under certain assumptions, NO ! • Theorem in the paper • Our Goal: • Retain Adsorption’s desirable properties • But do so using a well-defined optimization
Is Adsorption Optimizing an Objective? • Under certain assumptions, NO ! • Theorem in the paper • Our Goal: • Retain Adsorption’s desirable properties • But do so using a well-defined optimization • Proposed Solution: MAD (next slide)
Modified Adsorption (MAD) [This Paper] MAD Objective 10
Modified Adsorption (MAD) [This Paper] MAD Objective � � y vl ) 2 + µ 2 y vl ) 2 + µ 3 � � � � � � p inj y vl ) 2 min { ˆ v ( y vl − ˆ uv (ˆ y ul − ˆ ( r vl − ˆ µ 1 w y vl } v u v v l 10
Modified Adsorption (MAD) [This Paper] MAD Objective Smoothness Seed Label Loss Label Prior Loss (e.g. min + + Loss Across Edge (if any) prior on dummy label) 10
Modified Adsorption (MAD) [This Paper] MAD Objective Smoothness Seed Label Loss Label Prior Loss (e.g. min + + Loss Across Edge (if any) prior on dummy label) • High-degree node discounting is enforced through the third term 10
Modified Adsorption (MAD) [This Paper] MAD Objective Smoothness Seed Label Loss Label Prior Loss (e.g. min + + Loss Across Edge (if any) prior on dummy label) • High-degree node discounting is enforced through the third term • Results in an Adsorption-like iterative update, scalable 10
Extension to Dependent Labels • Labels are not always mutually exclusive. 11
Extension to Dependent Labels • Labels are not always mutually exclusive. White ScotchAle 0.8 1.0 0.95 1.0 TopFormentedBeer BrownAle Ale 1.0 0.8 PaleAle Porter 11
Extension to Dependent Labels • Labels are not always mutually exclusive. White ScotchAle 0.8 1.0 0.95 1.0 TopFormentedBeer BrownAle Ale 1.0 0.8 PaleAle Porter Label Similarity Labels Label Graph 11
MAD with Dependent Labels (MADDL) Label Prior Seed Label Edge min + + Loss (e.g. prior Loss Smoothness on dummy label) (if any) Loss 12
MAD with Dependent Labels (MADDL) Label Prior Seed Label Edge Dependent min + + Loss (e.g. prior + Loss Smoothness Label Loss on dummy label) (if any) Loss 12
MAD with Dependent Labels (MADDL) MADDL Objective Label Prior Seed Label Edge Dependent min + + Loss (e.g. prior + Loss Smoothness Label Loss on dummy label) (if any) Loss 12
MAD with Dependent Labels (MADDL) MADDL Objective Label Prior Seed Label Edge Dependent min + + Loss (e.g. prior + Loss Smoothness Label Loss on dummy label) (if any) Loss Penalize if similar labels are assigned different scores on a node 12
MAD with Dependent Labels (MADDL) MADDL Objective Label Prior Seed Label Edge Dependent min + + Loss (e.g. prior + Loss Smoothness Label Loss on dummy label) (if any) Loss Penalize if similar labels are assigned different scores on a node 1.0 BrownAle Ale 12
MAD with Dependent Labels (MADDL) MADDL Objective Label Prior Seed Label Edge Dependent min + + Loss (e.g. prior + Loss Smoothness Label Loss on dummy label) (if any) Loss Penalize if similar labels are assigned different scores on a node • MADDL objective results in a scalable iterative update, with convergence guarantee. 1.0 BrownAle Ale 12
Experimental Setup 13
Experimental Setup I. Classification Experiments • WebKB (4 classes) [Subramanya and Bilmes, 2008] • Sentiment Classification (4 classes) [Blitzer, Dredze and Pereira, 2007] • k-Nearest Neighbor Graph (k is tuned) 13
Experimental Setup I. Classification Experiments • WebKB (4 classes) [Subramanya and Bilmes, 2008] • Sentiment Classification (4 classes) [Blitzer, Dredze and Pereira, 2007] • k-Nearest Neighbor Graph (k is tuned) II. Smoother Sentiment Ranking with MADDL 13
(Zhu et al., 03) PRBEP (macro-averaged) on WebKB Dataset, 3148 test instances 14
(Zhu et al., 03) Precision on 3568 Sentiment test instances
II. Smooth Sentiment Ranking rank 1 rank 4 smooth predictions 16
II. Smooth Sentiment Ranking rank 1 rank 4 smooth non-smooth predictions predictions 16
II. Smooth Sentiment Ranking rank 1 Prefer over rank 4 smooth non-smooth predictions predictions 16
II. Smooth Sentiment Ranking rank 1 Prefer over rank 4 smooth non-smooth predictions predictions 1.0 MADDL Label 1.0 Constraints 16
II. Smooth Sentiment Ranking Count of Top Predicted Pair in MAD Output Count of Top Predicted Pair in MADDL Output 4000 10000 2000 5000 0 0 1 1 2 2 3 4 4 3 3 3 4 4 2 2 1 1 Label 2 Label 2 Label 1 Label 1
II. Smooth Sentiment Ranking Count of Top Predicted Pair in MAD Output Count of Top Predicted Pair in MADDL Output 4000 10000 2000 5000 0 0 1 1 2 2 3 4 4 3 3 3 4 4 2 2 1 1 Label 2 Label 2 Label 1 Label 1 MADDL generates smoother ranking, while preserving accuracy of prediction.
Conclusion 18
Conclusion • Presented Modified Adsorption (MAD) • an Adsorption-like algorithm but with well defined optimization. 18
Conclusion • Presented Modified Adsorption (MAD) • an Adsorption-like algorithm but with well defined optimization. • Extended MAD to MADDL • MADDL can handle non mutually-exclusive labels. 18
Conclusion • Presented Modified Adsorption (MAD) • an Adsorption-like algorithm but with well defined optimization. • Extended MAD to MADDL • MADDL can handle non mutually-exclusive labels. • Demonstrated effectiveness of MAD and MADDL on real world datasets. 18
Conclusion • Presented Modified Adsorption (MAD) • an Adsorption-like algorithm but with well defined optimization. • Extended MAD to MADDL • MADDL can handle non mutually-exclusive labels. • Demonstrated effectiveness of MAD and MADDL on real world datasets. • Future Work • Apply MADDL in other domains with dependent labels e.g. Information Extraction 18
Thanks! algorithm authors 19
Recommend
More recommend