Hal Daumé III (me@hal3.name) Linguists get the abstraction, machines get the details Hal Daumé III Computer Science / Linguistics University of Maryland, College Park me@hal3.name Symbol Pushing Slide 1
Hal Daumé III (me@hal3.name) NLP's use of linguists, a caricature Linguists develop theory ➢ Linguists richly annotate data (eg treebank) ➢ NLP people train systems (eg parser) ➢ Parser output fed into machine translation ➢ system Machine translation system has no idea what the ➢ input symbols mean NP, VP, VBD, .... might as well be X1, X2, X3, … ➢ Symbol Pushing Slide 2
Hal Daumé III (me@hal3.name) Where does this model work? Works when entire pipeline is learned from data ➢ And we make no use of prior knowledge ➢ Where does this model not work? Pretty much any other time ➢ Symbol Pushing Slide 3
Hal Daumé III (me@hal3.name) Inferring Tags from the Structure ➢ INPUT: The man ate a big sandwich ➢ OUTPUT: D N V D J N ➢ Baseline: ➢ Random guessing: 4% accuracy Symbol Pushing Slide 4
Hal Daumé III (me@hal3.name) Sources of Knowledge ➢ Seeds (frequent words for each tag) ➢ N: membro, milhoes, obras ➢ D: as [the,2f] o [the,1m] os [the,2m] ➢ V: afector, gasta, juntar ➢ P: com, como, de, em ➢ Typological rules: ➢ Art ← Noun ➢ Prp → Noun ➢ Tag knowledge: ➢ Open class ➢ Closed class Symbol Pushing Slide 5
Hal Daumé III (me@hal3.name) Preliminary Results 60 50 40 No O/C 30 Open/Close d 20 10 0 No Seeds Seeds Symbol Pushing Slide 6
Hal Daumé III (me@hal3.name) Preliminary Results: Open/Closed NO SEEDS SEEDS 60 60 55 55 50 50 45 45 40 40 35 35 30 30 25 25 20 20 Art<-N Both Art<-N Both No Rules Prp->N No Rules Prp->N Symbol Pushing Slide 7
Hal Daumé III (me@hal3.name) I'd like NLP to use more linguistics, but... Linguistic models are often developed without ➢ any reference to computation Many NLP students do not learn (or appreciate) ➢ much beyond other than Syntax I Linguistic theories seem to be good in the ➢ abstract, but (perhaps) not so much in the details Symbol Pushing Slide 8
Recommend
More recommend