multi view active learning
play

Multi-view Active Learning Ion Muslea University of Southern - PowerPoint PPT Presentation

Multi-view Active Learning Ion Muslea University of Southern California Outline Multi-view active learning Robust multi-view learning View validation as meta-learning Related Work Contributions Future work


  1. Multi-view Active Learning Ion Muslea University of Southern California

  2. Outline • Multi-view active learning • Robust multi-view learning • View validation as meta-learning • Related Work • Contributions • Future work

  3. Background & Terminology • Inductive machine learning – algorithms that learn concepts from labeled examples • Active learning: minimize need for training data – detect & ask-user-to-label only most informative exs. • Multi-view learning ( MVL MVL ) – disjoint sets of features that are sufficient for learning • Speech recognition: sound vs. lip motion – previous multi-view learners are semi-supervised • exploit distribution of the unlabeled examples • boost accuracy by bootstrapping views from each other

  4. Thesis of the Thesis Multi-view active learning maximizes the accuracy of the learned hypotheses while minimizing the amount of labeled training data.

  5. Outline • Multi-view active learning – The intuition – The Co-Testing family of algorithms – Empirical evaluation • Robust multi-view learning • View validation as meta-learning • Related Work • Contributions • Future work

  6. A Simple Multi-View Problem Salary • Features: – salary – office number 50K • Concept: Is Faculty ? – View-1 : salary > 50 K – View-2: office < 300 Office 300 GOAL: minimize amount of labeled data

  7. Office ? ? Unlabeled Examples ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? Co-Testing ? Salary ? ? ? ? Office Labeled Examples Salary

  8. Office ? Unlabeled Examples ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? Co-Testing ? Salary ? ? ? ? Office Labeled Examples Salary

  9. Office ? Unlabeled Examples ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? Co-Testing ? Salary ? ? ? ? Office Labeled Examples Salary

  10. The Co-Testing Family of Algorithms • REPEAT – Learn one hypothesis in each view – Query one of the contention points (CP) » • Algorithms differ by: – output hypothesis: winner-takes-all , majority / weighted vote – query selection strategy: • Naïve: randomly chosen CP • Conservative: equal confidence CP • Aggressive: maximum confidence CP »

  11. When does Co-Testing work? Assumptions: • 1. Uncorrelated views for any <x 1 ,x 2 ,L> : given L , x 1 and x 2 are uncorrelated • views unlikely to make same mistakes => contention points • 2. Compatible views • perfect learning in both views • contention points are fixable mistakes • under these assumptions , there are classes of learning problems for which Co-Testing converges faster than single-view active learners

  12. Experiments: four real-world domains Ad Parse Courses Wrapper Ad Parse Courses Wrapper IB C4.5 Naïve-Bayes Stalker IB C4.5 Naïve-Bayes Stalker Random Sampling Random Sampling [Kushmerick ‘99] [Marcu et al. ‘00] [Blum+Mitchell ‘98] [Kushmerick ‘00] Uncertainty Sampling Uncertainty Sampling - remove advertisements - learn shift-reduce parser that - discriminates between course - extract relevant Query-by-Committee Query-by-Committee -“is this image an ad?” converts Japanese discourse tree homepages and other pages data from Web pages Query-by-Boosting Query-by-Boosting into an equivalent English one Query-by-Bagging Query-by-Bagging Naïve Co-Testing Naïve Co-Testing Conservative Co-Testing Conservative Co-Testing Aggressive Co-Testing Aggressive Co-Testing wins works cannot-be-applied

  13. Main Application: Wrapper Induction • Extract phone number : find its start & end … Hilton <p> Phone: <b> (211) 111-1111 </b> Fax: (211) 121-1… SkipTo ( Phone : <b> ) SkipTo ( </b> ) … Phone (toll free) : <i> (800) 171-1771 </i> Fax: (800) 777-1… SkipTo ( Phone ) SkipTo ( Html ) SkipTo ( Html )

  14. Co-Testing for Wrapper Induction • Views: tokens before & after extract. point … Hilton <p> Phone: <b> (211) 111-1111 </b> Fax: <b> (211) … SkipTo ( Phone ) SkipTo ( <b> ) BackTo ( Fax ) BackTo ( ( Nmb ) …Motel 6 <p> Phone : <b> (311) 101-1110 </b> Fax: <b> (311) … …Motel 6 <p> Phone : <b> (311) 101-1110 </b> Fax: <b> (311) … … Phone (tool free) : <i> (800) 171-1771 </i> Fax: <b> (111) … … Phone (tool free) : <i> (800) 171-1771 </i> Fax: <b> (111) …

  15. Results on 33 tasks: 2 rnd exs + queries Random sampling Tasks 20 15 10 5 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18+ 18+ Queries until 100% accuracy

  16. Results on 33 tasks: 2 rnd exs + queries Naïve Co-Testing Random sampling Tasks 20 15 10 5 0 1 3 5 7 9 11 13 15 17 18+ Queries until 100% accuracy

  17. Results on 33 tasks: 2 rnd exs + queries Aggressive Co-Testing Naïve Co-Testing Random sampling Tasks 20 15 10 5 0 18+ 1 3 5 7 9 11 13 15 17 Queries until 100% accuracy

  18. Co-Testing vs. Single-View Sampling Aggressive Co-Testing Query-by-Bagging Tasks 25 20 15 10 5 0 18+ 1 3 5 7 9 11 13 15 17 Queries until 100% accuracy

  19. First Contribution Co-Testing : multi-view active learning • Querying contention points • Converges faster than single-view � variety of domains & base learners

  20. Outline • Multi-view active learning • Robust multi-view learning – motivation – Co-EMT = active + semi-supervised learning – robustness to assumption violations • View validation as meta-learning • Related Work • Contributions • Future work

  21. Motivation • Active learning: – queries only the most informative examples – ignores all remaining (unlabeled) examples Semi-supervised learning (previous MVL MVL ): • few labeled + many unlabeled examples – • unlabeled examples: model examples’ distribution • use this model to boost accuracy of small training set • Best of both worlds: 1. Active: make queries 2. Semi-supervised: use remaining (unlabeled) exs.

  22. Co-EMT = Co-Testing + Co-EM • Given: – views V 1 & V 2 Semi-supervised MVL MVL – L & U , sets of labeled & unlabeled examples - few labeled + many unlabeled exs • Co-Testing Co-EMT - uses unlabeled exs to bootstrap views from each other REPEAT - use Co-EM( L , U ) to learn h 1 and h 2 – use labeled examples in L to learn h 1 and h 2 ≠ – query contention point: h 1 ( u ) h 2 ( u )

  23. The Co-EMT Synergy 1. Co-Testing boosts Co-EM : better examples – stand-alone Co-EM uses random examples Co-Testing provides more informative examples – 2. Co-EM helps Co-Testing : better hypotheses – stand-alone Co-Testing uses only labeled exs Co-EM also exploits unlabeled examples –

  24. Two real-world domains Co-Training Co-Testing ADS 9 8 7 6 5 4 error rate (%) semi-supervised EM Co-EMT Co-EM COURSES 5.5 5 4.5 4 3.5 error rate (%)

  25. Semi-supervised MVL MVL : bootstrapping views Task: is Web page course homepage ( + ) or not ( - ) ? V2 : words in hyperlinks V1 : words in pages … Spring teaching … … favorite class … … my favorite class …

  26. Assumption: compatible, independent views

  27. Incompatible views CS-511: Neural Nets … neural nets … Neural nets papers: … Neural nets papers: … … neural nets … … neural nets …

  28. Correlated views: domain clumpiness A.I. clump Theory clump Systems clump Faculty clump Students clump Admin clump

  29. A Controlled Experiment clumps per class 4 Co-EM Co-Training 2 EM 1 0 10 20 30 40 incompatibility (%)

  30. Co-EMT is robust ! clumps per class Co-EMT 4 Co-EM Co-Training 2 EM 1 0 10 20 30 40 incompatibility (%)

  31. Second Contribution Co-EMT : robust multi-view learning interleave active & semi-supervised MVL MVL •

  32. Outline • Multi-view active learning • Robust multi-view learning • View validation as meta-learning – Motivation – Adaptive view validation – Empirical results • Related Work • Contributions • Future work

  33. Motivation: Wrapper Induction Aggressive Co-Testing One inadequate view: In MVL MVL , the same views may be: Example: Domains - V1: 100% accurate 12 - V2: 53% accurate 10 8 6 • adequate for some tasks 4 2 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18+ • inadequate for other tasks Queries until 100% accuracy

  34. The Need for View Validation • Not only for wrapper induction: • Speech recognition: sound vs. lip motion • Task-1: recognize Tom Brokaw ’s speech • Task-2: recognize Ozzy Osbourne ’s speech • ... • Web page classification: hyperlink vs. page words • Task-1: terrorism / economics news • Task-2: faculty / student homepage • ... • Solution: meta-learning • from past experiences, learn to … • … predict whether MVL MVL is adequate for new, unseen task

  35. Meta-learner: Adaptive View Validation • GIVEN – labeled tasks [ Task 1 , L 1 ], [ Task 2 , L 2 ], …, [ Task n , L n ] • FOR EACH Task i DO – generate view validation example e i = < Meta-F1, Meta-F2, … , L i > • train C4.5 on e 1 , e 2 , … , e n For each new, unseen task use learned decision tree to predict whether MVL MVL is adequate for task.

Recommend


More recommend