Never-Ending Language Learning Tom Mitchell, William Cohen, and Many Collaborators Carnegie Mellon University
Key Idea 1: Coupled semi-supervised training of many functions Dinesh R person noun phrase hard much easier (more constrained) (underconstrained) semi-supervised learning problem semi-supervised learning problem
Type 1 Coupling: Co-Training, Multi-View Learning Supervised training of 1 function : Minimize: person NP :
Type 1 Coupling: Co-Training, Multi-View Learning Coupled training of 2 functions : Anshul Minimize: person NP :
Type 1 Coupling: Co-Training, Multi-View Learning [Blum & Mitchell; 98] [Dasgupta et al; 01 ] [Ganchev et al., 08] [Sridharan & Kakade, 08] person [Wang & Zhou, ICML10] NP :
Type 1 Coupling: Co-Training, Multi-View Learning [Blum & Mitchell; 98] [Dasgupta et al; 01 ] [Ganchev et al., 08] [Sridharan & Kakade, 08] person [Wang & Zhou, ICML10] NP :
Multi-view, Multi-Task Coupling [Blum & Mitchell; 98] [Dasgupta et al; 01 ] Rishab [Ganchev et al., 08] person [Sridharan & Kakade, 08] sport athlete [Wang & Zhou, ICML10] coach [Taskar et al., 2009] team [Carlson et al., 2009] NP text NP NP HTML NP : context morphology contexts distribution athlete(NP) ! person(NP) athlete(NP) ! NOT sport(NP) NOT athlete(NP) " sport(NP)
Type 3 Coupling: Relation Argument Types playsSport(a,s) coachesTeam(c,t) playsForTeam(a,t) teamPlaysSport(t,s) NP1 NP2
Type 3 Coupling: Relation Argument Types playsSport(NP1,NP2) ! athlete(NP1), sport(NP2) playsSport(a,s) Happy coachesTeam(c,t) playsForTeam(a,t) teamPlaysSport(t,s) Dinesh K. person sport person sport athlete athlete team coach team coach over 2500 coupled functions in NELL NP1 NP2
If coupled learning is the key, how can we get new coupling constraints?
Key Idea 2: Barun Discover New Coupling Constraints • learn horn clause rules/constraints: 0.93 athletePlaysSport(?x,?y) " athletePlaysForTeam(?x,?z) teamPlaysSport(?z,?y) – learned by data mining the knowledge base – connect previously uncoupled relation predicates – infer new unread beliefs – modified version of FOIL [Quinlan]
Key Idea 3: Automatically extend ontology
Dinesh R Surag Dinesh R.: only 62 new Ankit ontologies added Barun Dhruvin
Example Discovered Relations [Mohamed et al. EMNLP 2011] Suggested Category Pair Frequent Instance Pairs Text Contexts Name sitar, George Harrison ARG1 master ARG2 MusicInstrument tenor sax, Stan Getz ARG1 virtuoso ARG2 Master Musician trombone, Tommy Dorsey ARG1 legend ARG2 vibes, Lionel Hampton ARG2 plays ARG1 pinched nerve, herniated disk Disease ARG1 is due to ARG2 tennis elbow, tendonitis IsDueTo Disease ARG1 is caused by ARG2 blepharospasm, dystonia CellType epithelial cells, surfactant ARG1 that release ARG2 Chemical neurons, serotonin ThatRelease ARG2 releasing ARG1 mast cells, histomine koala bears, eucalyptus Mammals ARG1 eat ARG2 sheep, grasses Eat Plant ARG2 eating ARG1 goats, saplings Seine, Paris ARG1 in heart of ARG2 River InHeartOf Nile, Cairo ARG1 which flows through City Tiber river, Rome ARG2
NELL Architecture Knowledge Base (latent variables) Beliefs Evidence Integrator Candidate Beliefs Text Orthographic URL specific Human Context classifier HTML advice patterns patterns (CPL) (CML) (SEAL) Actively Infer new Image Ontology search for beliefs from classifier extender web text old (OpenEval) (PRA) (NEIL) (OntExt)
Haroun
Evaluation
NELL Is Improving Over Time (Jan 2010 to Nov 2014) mean avg. precision top 1000 precision@10 all beliefs high conf. beliefs 10 ’ s of millions millions number of NELL beliefs vs. time reading accuracy vs. time (average over 31 predicates) human feedback vs. time (average 2.4 feedbacks per predicate per month)
Limitations • Self reflection and an explicit agenda of learning sub- goals: NELL suffers from the fact that it has a very weak ability to monitor its own performance and progress • Pervasive plasticity : NELL’s method for detecting noun phrases in text is a fixed procedure not open to learning and hence it runs the risk of reaching a performance plateau • Representation and reasoning : lacks methods for representing and reasoning about time and space • Heavy reliance on the redundancy across the web: NELL’s redundancy-based reading methods tend to extract the most frequently-mentioned beliefs earlier.
Other Limitations/Possible Improvements • No framework for forgetting previously learnt wrong relations [Anshul, Swarandeep] • Extension beyond simple horn clauses [Anshul, Ankit] • Evaluation on the tail of the distribution [Happy] • categorizing phrases/sentences into sarcastic/rhetorical questions [Happy] • Can NELL learn more and more new complex algorithms from simple algorithms [Ankit]
Other Limitations/Possible Improvements • Incorporating degrees of truth, variation of truth with time, and fuzzy categories [Anshul] • word sense disambiguation module [Surag] • Reading over evolving domains such as twitter [Dinesh K]
Consistency and correctness
[Platanios, Blum, Mitchell, UAI 2014 ] Problem setting: • have N different estimates of target function • agreement between f i , f j : Key insight: errors and agreement rates are related Pr[neither makes error] + Pr[both make error] prob. f i and f i prob. f i prob. f j prob. f i and f j both make error agree error error
Estimating Error from Unlabeled Data 1. IF f 1 , f 2 , f 3 make indep. errors, and accuracies > 0.5 THEN ! Measure errors from unlabeled data: - use unlabeled data to estimate a 12 , a 13 , a 23 - solve three equations for three unknowns e 1 , e 2 , e 3
Estimating Error from Unlabeled Data 1. IF f 1 , f 2 , f 3 make indep. errors, accuracies > 0.5 THEN ! 2. but if errors not independent
Recommend
More recommend