new regularized algorithms for transductive learning
play

New Regularized Algorithms for Transductive Learning Partha Pratim - PowerPoint PPT Presentation

New Regularized Algorithms for Transductive Learning Partha Pratim Talukdar University of Pennsylvania, USA Koby Crammer Technion, Israel 1 Graph-based Semi-Supervised Learning 0.2 0.2 0.1 0.3 0.3 0.2 0.2 2 Graph-based


  1. New Regularized Algorithms for Transductive Learning Partha Pratim Talukdar University of Pennsylvania, USA Koby Crammer Technion, Israel 1

  2. Graph-based Semi-Supervised Learning 0.2 0.2 0.1 0.3 0.3 0.2 0.2 2

  3. Graph-based Semi-Supervised Learning 0.2 0.2 0.1 0.3 0.3 0.2 0.2 Labeled (seed) 2

  4. Graph-based Semi-Supervised Learning 0.2 0.2 0.1 0.3 0.3 0.2 0.2 Unlabeled Labeled (seed) 2

  5. Graph-based Semi-Supervised Learning 0.2 0.2 0.1 0.3 0.3 0.2 0.2 3

  6. Graph-based Semi-Supervised Learning 0.2 0.2 0.1 0.3 0.3 0.2 0.2 Various methods: LP (Zhu et al., 2003); QC (Bengio et al., 2007); Adsorption (Baluja et al., 2008) 3

  7. Adsorption Algorithm 4

  8. Adsorption Algorithm • Successfully used in • YouTube Video Recommendation [Baluja et al., 2008] , Semantic Classification [Talukdar et al., 2008] 4

  9. Adsorption Algorithm • Successfully used in • YouTube Video Recommendation [Baluja et al., 2008] , Semantic Classification [Talukdar et al., 2008] • It has not been analyzed so far • Is it optimizing an objective? If so, what? • Motivation for proposed work 4

  10. Adsorption Algorithm [Baluja et al., WWW 2008] 0.3 0.3 0.2

  11. Adsorption Algorithm [Baluja et al., WWW 2008] 0.3 0.3 0.2

  12. Adsorption Algorithm [Baluja et al., WWW 2008] 0.3 0.3 Dummy Label 0.2

  13. Characteristics of Adsorption 6

  14. Characteristics of Adsorption • Highly scalable and iterative 6

  15. Characteristics of Adsorption • Highly scalable and iterative • Main difference with previous methods: • all nodes are not equal: high-degree nodes are discounted 6

  16. Characteristics of Adsorption • Highly scalable and iterative • Main difference with previous methods: • all nodes are not equal: high-degree nodes are discounted • Two equivalent views: 6

  17. Characteristics of Adsorption • Highly scalable and iterative • Main difference with previous methods: • all nodes are not equal: high-degree nodes are discounted • Two equivalent views: Label Diffusion Random Walk U L L U 6

  18. Random Walk View U V 7

  19. Random Walk View what next? U V 7

  20. Random Walk View what next? U V • Continue walk with prob. p cont v • Assign p inj V’s seed label to U with prob. v • Abandon random walk with prob. p abnd v • assign U a dummy label with score p abnd v 7

  21. Discounting High-Degree Nodes

  22. Discounting High-Degree Nodes • High-degree nodes can be unreliable • do not allow propagation/walk through them

  23. Discounting High-Degree Nodes • High-degree nodes can be unreliable • do not allow propagation/walk through them • Solution: increase abandon probability on high- degree nodes

  24. Discounting High-Degree Nodes • High-degree nodes can be unreliable • do not allow propagation/walk through them • Solution: increase abandon probability on high- degree nodes p abnd ∝ degree(v) v

  25. Is Adsorption Optimizing an Objective?

  26. Is Adsorption Optimizing an Objective? • Under certain assumptions, NO ! • Theorem in the paper

  27. Is Adsorption Optimizing an Objective? • Under certain assumptions, NO ! • Theorem in the paper • Our Goal: • Retain Adsorption’s desirable properties • But do so using a well-defined optimization

  28. Is Adsorption Optimizing an Objective? • Under certain assumptions, NO ! • Theorem in the paper • Our Goal: • Retain Adsorption’s desirable properties • But do so using a well-defined optimization • Proposed Solution: MAD (next slide)

  29. Modified Adsorption (MAD) [This Paper] MAD Objective 10

  30. Modified Adsorption (MAD) [This Paper] MAD Objective � � y vl ) 2 + µ 2 y vl ) 2 + µ 3 � � � � � � p inj y vl ) 2 min { ˆ v ( y vl − ˆ uv (ˆ y ul − ˆ ( r vl − ˆ µ 1 w y vl } v u v v l 10

  31. Modified Adsorption (MAD) [This Paper] MAD Objective Smoothness Seed Label Loss Label Prior Loss (e.g. min + + Loss Across Edge (if any) prior on dummy label) 10

  32. Modified Adsorption (MAD) [This Paper] MAD Objective Smoothness Seed Label Loss Label Prior Loss (e.g. min + + Loss Across Edge (if any) prior on dummy label) • High-degree node discounting is enforced through the third term 10

  33. Modified Adsorption (MAD) [This Paper] MAD Objective Smoothness Seed Label Loss Label Prior Loss (e.g. min + + Loss Across Edge (if any) prior on dummy label) • High-degree node discounting is enforced through the third term • Results in an Adsorption-like iterative update, scalable 10

  34. Extension to Dependent Labels • Labels are not always mutually exclusive. 11

  35. Extension to Dependent Labels • Labels are not always mutually exclusive. White ScotchAle 0.8 1.0 0.95 1.0 TopFormentedBeer BrownAle Ale 1.0 0.8 PaleAle Porter 11

  36. Extension to Dependent Labels • Labels are not always mutually exclusive. White ScotchAle 0.8 1.0 0.95 1.0 TopFormentedBeer BrownAle Ale 1.0 0.8 PaleAle Porter Label Similarity Labels Label Graph 11

  37. MAD with Dependent Labels (MADDL) Label Prior Seed Label Edge min + + Loss (e.g. prior Loss Smoothness on dummy label) (if any) Loss 12

  38. MAD with Dependent Labels (MADDL) Label Prior Seed Label Edge Dependent min + + Loss (e.g. prior + Loss Smoothness Label Loss on dummy label) (if any) Loss 12

  39. MAD with Dependent Labels (MADDL) MADDL Objective Label Prior Seed Label Edge Dependent min + + Loss (e.g. prior + Loss Smoothness Label Loss on dummy label) (if any) Loss 12

  40. MAD with Dependent Labels (MADDL) MADDL Objective Label Prior Seed Label Edge Dependent min + + Loss (e.g. prior + Loss Smoothness Label Loss on dummy label) (if any) Loss Penalize if similar labels are assigned different scores on a node 12

  41. MAD with Dependent Labels (MADDL) MADDL Objective Label Prior Seed Label Edge Dependent min + + Loss (e.g. prior + Loss Smoothness Label Loss on dummy label) (if any) Loss Penalize if similar labels are assigned different scores on a node 1.0 BrownAle Ale 12

  42. MAD with Dependent Labels (MADDL) MADDL Objective Label Prior Seed Label Edge Dependent min + + Loss (e.g. prior + Loss Smoothness Label Loss on dummy label) (if any) Loss Penalize if similar labels are assigned different scores on a node • MADDL objective results in a scalable iterative update, with convergence guarantee. 1.0 BrownAle Ale 12

  43. Experimental Setup 13

  44. Experimental Setup I. Classification Experiments • WebKB (4 classes) [Subramanya and Bilmes, 2008] • Sentiment Classification (4 classes) [Blitzer, Dredze and Pereira, 2007] • k-Nearest Neighbor Graph (k is tuned) 13

  45. Experimental Setup I. Classification Experiments • WebKB (4 classes) [Subramanya and Bilmes, 2008] • Sentiment Classification (4 classes) [Blitzer, Dredze and Pereira, 2007] • k-Nearest Neighbor Graph (k is tuned) II. Smoother Sentiment Ranking with MADDL 13

  46. (Zhu et al., 03) PRBEP (macro-averaged) on WebKB Dataset, 3148 test instances 14

  47. (Zhu et al., 03) Precision on 3568 Sentiment test instances

  48. II. Smooth Sentiment Ranking rank 1 rank 4 smooth predictions 16

  49. II. Smooth Sentiment Ranking rank 1 rank 4 smooth non-smooth predictions predictions 16

  50. II. Smooth Sentiment Ranking rank 1 Prefer over rank 4 smooth non-smooth predictions predictions 16

  51. II. Smooth Sentiment Ranking rank 1 Prefer over rank 4 smooth non-smooth predictions predictions 1.0 MADDL Label 1.0 Constraints 16

  52. II. Smooth Sentiment Ranking Count of Top Predicted Pair in MAD Output Count of Top Predicted Pair in MADDL Output 4000 10000 2000 5000 0 0 1 1 2 2 3 4 4 3 3 3 4 4 2 2 1 1 Label 2 Label 2 Label 1 Label 1

  53. II. Smooth Sentiment Ranking Count of Top Predicted Pair in MAD Output Count of Top Predicted Pair in MADDL Output 4000 10000 2000 5000 0 0 1 1 2 2 3 4 4 3 3 3 4 4 2 2 1 1 Label 2 Label 2 Label 1 Label 1 MADDL generates smoother ranking, while preserving accuracy of prediction.

  54. Conclusion 18

  55. Conclusion • Presented Modified Adsorption (MAD) • an Adsorption-like algorithm but with well defined optimization. 18

  56. Conclusion • Presented Modified Adsorption (MAD) • an Adsorption-like algorithm but with well defined optimization. • Extended MAD to MADDL • MADDL can handle non mutually-exclusive labels. 18

  57. Conclusion • Presented Modified Adsorption (MAD) • an Adsorption-like algorithm but with well defined optimization. • Extended MAD to MADDL • MADDL can handle non mutually-exclusive labels. • Demonstrated effectiveness of MAD and MADDL on real world datasets. 18

  58. Conclusion • Presented Modified Adsorption (MAD) • an Adsorption-like algorithm but with well defined optimization. • Extended MAD to MADDL • MADDL can handle non mutually-exclusive labels. • Demonstrated effectiveness of MAD and MADDL on real world datasets. • Future Work • Apply MADDL in other domains with dependent labels e.g. Information Extraction 18

  59. Thanks! algorithm authors 19

Recommend


More recommend