Transfer Learning in Language Part II Hal Daum III Typical NLP - PowerPoint PPT Presentation

Part 1 Transfer Learning in Language Part II Hal Daumé III

Typical NLP pipeline The man ate a sandwich Morphology The man eat+ a sandwich Tagging past Parsing DT NN VB DT NN Role labeling Interpretation N N V Agent Theme P P Interlingua S P Source Semantics Target Semantics ∃ a ∃ t ∃ e Source Target Shallowmantics man(a) & Shallowmantics Analysis sandwich(t) & Generation Source Target eat(e,a,t) & Syntax Syntax past(e) Source Target Morphology Morphology Source Words Target Words

Pipeline models break down (sorta) ➢ Tagging + Parsing + 0% / + 3% ➢ Parsing + Named Entities + 0.5% / + 4% ➢ Parsing + Role Identification + 0% / - 0.3% (upper bound: + 13% ) ➢ Named Entities + Coreference + 0.3% / + 1.3% (upper bound: + 8% ) Why? Maybe simpler model already has a lot of the fancier information? Maybe some of these tasks are more related than others?

Tree-based model of task relatedness

A probabilistic model for trees

From trees to priors...

Inference

Experiments (selected)

Learning task relationships [Saha, Rai, D., Venkatasubramanian, DuVall AIStats11]

Task Relationship Learning [Saha, Rai, D., Venkatasubramanian, DuVall AIStats11]

Joint learning of relationships [Saha, Rai, D., Venkatasubramanian, DuVall AIStats11]

Experimental Results (sample) [Saha, Rai, D., Venkatasubramanian, DuVall AIStats11]

Transfer in Learning Language aka: why everything I've told you so far isn't useful for some problems... aka: why everything I've told you so far isn't useful for some problems...

Domains really are different ● Can you guess what domain each of these sentences is drawn from? Many factors contributed to the French and Dutch objections News to the proposed EU constitution Parliament Please rise, then, for this minute's silence Latent diabetes mellitus may become manifest during thiazide Medical therapy Science Statistical machine translation is based on sets of text to build a translation model I forgot to mention in yesterdays post that I also trimmed an Step- overgrown HUGE hedge that spams the entire length of the mother front of my house and is about 3' accrossed.

S 4 ontology of adaptation effects ● Seen: Never seen this word before ● News to medical: “diabetes mellitus” ● Sense: Never seen this word used in this way ● News to technical: “monitor” ● Score: The wrong output is scored higher ● News to medical: “manifest” ● Search: Decoding/search erred (ignored) (inside=old domain outside=new domain)

Translating across domains is hard Old Domain (Parliament) Old Domain (Parliament) monsieur le président, les pêcheurs de homard de la région de Original l'atlantique sont dans une situation catastrophique. Reference mr. speaker, lobster fishers in atlantic canada are facing a disaster. System mr. speaker, the lobster fishers in atlantic canada are in a mess. New Domain New Domain Original comprimés pelliculés blancs pour voie orale. Reference white film-coated tablets for oral use. System white pelliculés tablets to oral. New Domain New Domain Original mode et voie(s) d'administration Reference method and route(s) of administration System fashion and voie(s) of directors Key Question: What went wrong?

Adaptation effects in MT ● Quick observations: Consistent in: ● New D language model helps (10%-63% improvement) * movie subtitles ● Tuning on new D data helps (10%-90% improvement) * scientific pubs ● Weighting new D data helps (4%-150% improvement) * PHP tech docs ● Identifying errors in MT (w/o parallel newD data): ● Seen: old-only model + unseen input word pairs ● Sense: old-only model + seen input/unseen output pairs ● Score: intersect old and mixed model, score from old News Medical Seen Little effect ~ 40% of error Sense Little effect ~ 40% of error Score ~ 90% of error ~ 20% of error (as measured by Bleu score)

Translating across domains is hard Dom Most frequent OOV Words News behavior favor neighbors fueled (17%) neighboring abe wwii favored favorable zhao ahmedinejad bernanke favorite phelps ccp skeptical Medical renal hepatic subcutaneous irbesartan (49%) ribavirin olanzapine serum patienten dl eine sie pharmacokinetics ritonavir hydrochlorothiazide erythropoietin efavirenz Movies gonna yeah mom hi (44%) b**** daddy s*** later f*****g f*** gotta wanna uh namely bye dude [Daum é III & Jagarlamudi, 2011]

Dictionary mining for “seen” errors [Haghighi, Liang & Klein, 2009; Daum é ● Find frequent terms in new domain III & Jagarlamudi, ● Use those that exist in old domain as “training data” 2011] ● Extract context and orthographic features ● Find low-dimensional subspace on training data (CCA) 1 2 3 2 1 2 3 1 1 3 2 3 Old Domain Space DE FR New Domain Space ● Pair input words with <=5 output words News +0.80 +0.36 Emea +1.44 +1.51 ● Add four features to SMT model Subs +0.13 +0.61 ● Rerun parameter tuning PHP +0.28 +0.68 (Bleu score improvements)

Senses are domain/language specific French fenêtre virus courir éxécuter English run virus window Japanese 病原体ウ走るィルス窓ウィンドウ

Automatically identifying new senses ● Context + existence of translations in comparable data is a window of opportunity via une fenêtre insérée . have a window of opportunity vers ma fenêtre ou vers in the run up to voulons pas courir le risque , we run the risk , sans courir le risque the browser window ' s dans la fenêtre . cet in the window to give dans la fenêtre . </s> time to run when applied courir not found or have run vcvars.bat , fenêtre courir éxécuter ne pouvez éxécuter que les pour l' éxécuter elle va window run

Spotting New Senses Given: ● Binary classification problem: ● A joint p(x,y) in the old domain ● +ve: French token has previously unseen sense ● Marginals q(x) and q(y) in the new domain ● -ve: French token is used in a known way Recover: ● Lots of features considered... ● Joint q(x,y) in the new domain ● Frequency of words/translations in each domain We formulate as a L1-regularized ● Language model perplexities across domains linear program ● T opic model “mismatches” ● Marginal matching features Easier alternative: we have many such q(x) and q(y)s ● Translation “flow” impedence

Experimental Results 75 70 65 60 Constant One Feature 55 Two Features Three Features All Features 50 EMEA Science Subs Selected features: EMEA: ppl || matchm flow || matchm topics flow Science: ppl || matchm ppl || matchm topics ppl Subs: topcs || matchm topics || matchm topics flow

Conclusions ● Transfer Learning... ● Assuming fixed task/domain relatedness is a bad idea ● Key question: what type of representation is “right”? ● Can do subspaces, trees, clusters, etc. etc. etc. ● In Language... ● ML addresses only part of the adaptation picture ● So far, specialized approaches for addressing other parts – Mining translations from comparable data – Automatically spotting new word senses Thanks! Questions?

Transfer Learning in Language Part II Hal Daum III Typical NLP - PowerPoint PPT Presentation

Part 1 Transfer Learning in Language Part II Hal Daum III Typical NLP pipeline The man ate a sandwich Morphology The man eat+ a sandwich Tagging past Parsing DT NN VB DT NN Role labeling Interpretation N N V Agent

Industrial Transfer Learning Introduction to Industrial Transfer Learning Industrial Transfer

Radiative Transfer Radiative Transfer Radiative transfer is a branch of atmospheric physics. We

Outline Language learning Computers Computers Computers Topic 6: CALL Topic 6: CALL Topic 6:

Transfer United: Partnerships to Foster Transfer Student Success Tuesday, November 5 th

Transfer Learning Eu Wern Teh What are we covering? Why transfer learning? Fine

Heat Transfer Heat Transfer Introduction Practical occurrences, applications, factors

Technology Transfer or Knowledge Transfer? Russ Somma, Ph.D. SommaTech,LLC Affiliate of IPS

Technology Transfer and Commercialisation 1 05/06/2015 1 Tech Transfer and Commercialisation

Transfer! The VIEWS of Practitioners The RESULTS from ROI Dr Paul Donovan NUIM TRANSFER THAT

Transfer Transfer Transitions: Transitions: First Semester First Semester Persistence and

Remit #2 Elimination of Transfer and Settlement What does transfer mean? Transfer

Transfer of transfert Transfer principles Thomas Hales and Julia Gordon December 2015 The

Regional STEMI Transfer Systems: Regional STEMI Transfer Systems: Regional STEMI Transfer

TRANSFER: MYTHS & FACTS ANNE HABERKERN, DIVISION DIRECTOR TRANSFER & CURRICULAR

What Is Multicast? Key: Unicast transfer Broadcast transfer Unicast Multicast transfer

WASHINGTONS TRANSFER DEGREES WASHINGTON TRANSFER DEGREES

Combined Attacks from Boomerangs to Sandwiches and Differential-Linear Orr Dunkelman

Action recognition in videos Cordelia Schmid Action recognition - goal Short actions, i.e.

Industry University Crosstalk Dan Adams University of Utah adams@mech.utah.edu (801)

CS4102 Algorithms Summer 2020 Warm up Why is an algorithms space complexity (how much memory

Testing Chapter 9, Preliminaries Written exam on for Bachelors of Informatik, and for

Applied Econometrics with R R and econometrics Robust standard errors Example: Sandwich

activities for you to complete in this week's maths folder. LO: to solve problems involving

NA64 Dipanwita Banerjee ETH, Zurich On behalf of the NA64 collaboration NA64 Collaboration

Transfer Learning in Language Part II Hal Daum III Typical NLP - PowerPoint PPT Presentation

Part 1 Transfer Learning in Language Part II Hal Daum III Typical NLP pipeline The man ate a sandwich Morphology The man eat+ a sandwich Tagging past Parsing DT NN VB DT NN Role labeling Interpretation N N V Agent

Industrial Transfer Learning Introduction to Industrial Transfer Learning Industrial Transfer

Radiative Transfer Radiative Transfer Radiative transfer is a branch of atmospheric physics. We

Outline Language learning Computers Computers Computers Topic 6: CALL Topic 6: CALL Topic 6:

Transfer United: Partnerships to Foster Transfer Student Success Tuesday, November 5 th

Transfer Learning Eu Wern Teh What are we covering? Why transfer learning? Fine

Heat Transfer Heat Transfer Introduction Practical occurrences, applications, factors

Technology Transfer or Knowledge Transfer? Russ Somma, Ph.D. SommaTech,LLC Affiliate of IPS

Technology Transfer and Commercialisation 1 05/06/2015 1 Tech Transfer and Commercialisation

Transfer! The VIEWS of Practitioners The RESULTS from ROI Dr Paul Donovan NUIM TRANSFER THAT

Transfer Transfer Transitions: Transitions: First Semester First Semester Persistence and

Remit #2 Elimination of Transfer and Settlement What does transfer mean? Transfer

Transfer of transfert Transfer principles Thomas Hales and Julia Gordon December 2015 The

Regional STEMI Transfer Systems: Regional STEMI Transfer Systems: Regional STEMI Transfer

TRANSFER: MYTHS &amp; FACTS ANNE HABERKERN, DIVISION DIRECTOR TRANSFER &amp; CURRICULAR

What Is Multicast? Key: Unicast transfer Broadcast transfer Unicast Multicast transfer

WASHINGTONS TRANSFER DEGREES WASHINGTON TRANSFER DEGREES

Combined Attacks from Boomerangs to Sandwiches and Differential-Linear Orr Dunkelman

Action recognition in videos Cordelia Schmid Action recognition - goal Short actions, i.e.

Industry University Crosstalk Dan Adams University of Utah adams@mech.utah.edu (801)

CS4102 Algorithms Summer 2020 Warm up Why is an algorithms space complexity (how much memory

Testing Chapter 9, Preliminaries Written exam on for Bachelors of Informatik, and for

Applied Econometrics with R R and econometrics Robust standard errors Example: Sandwich

activities for you to complete in this week's maths folder. LO: to solve problems involving

NA64 Dipanwita Banerjee ETH, Zurich On behalf of the NA64 collaboration NA64 Collaboration

TRANSFER: MYTHS & FACTS ANNE HABERKERN, DIVISION DIRECTOR TRANSFER & CURRICULAR