NOTE Machine Learning for NLP: New Developments and Challenges - PDF document

NOTE Machine Learning for NLP: New Developments and Challenges � These slides are still incomplete � A more complete version will be posted at a later date at: http://www.cs.berkeley.edu/~klein/nips-tutorial Dan Klein Computer Science Division University of California at Berkeley What is NLP? Speech Systems Automatic Speech Recognition (ASR) � Audio in, text out � SOTA: 0.3% for digit strings, 5% dictation, 50%+ TV � Fundamental goal: deep understand of broad language “Speech Lab” � End systems that we want to build: � Ambitious: speech recognition, machine translation, information Text to Speech (TTS) � � extraction, dialog interfaces, question answering… Text in, audio out � Modest: spelling correction, text categorization… � SOTA: totally intelligible (if sometimes unnatural) � Sometimes we’re also doing computational linguistics � Machine Translation Information Extraction � Information Extraction (IE) � Unstructured text to database entries New York Times Co. named Russell T. Lewis, 45, president and general manager of its flagship New York Times newspaper, responsible for all business-side activities. He was executive vice president and deputy general manager. He succeeds Lance R. Primis, who in September was named president and chief operating officer of the parent. Person Company Post State Russell T. Lewis New York Times president and general start newspaper manager Russell T. Lewis New York Times executive vice president end newspaper Lance R. Primis New York Times Co. president and CEO start Translation systems encode: � Something about fluent language � Something about how two languages correspond � � SOTA: perhaps 70% accuracy for multi-sentence temples, 90%+ SOTA: for easy language pairs, better than nothing, but more an understanding aid than a � for single easy fields replacement for human translators 1

Question Answering Goals of this Tutorial Question Answering: � More than search � � Introduce some of the core NLP tasks Ask general � comprehension questions of a document collection � Present the basic statistical models Can be really easy: � “What’s the capital of Wyoming?” � Highlight recent advances Can be harder: “How � many US states’ capitals are also their largest cities?” � Highlight recurring constraints on use of ML Can be open ended: � “What are the main techniques issues in the global warming debate?” � Highlight ways this audience could really help out SOTA: Can do factoids, � even when text isn’t a perfect match Recurring Issues in NLP Models Outline Inference on the training set is slow enough that discriminative � methods can be prohibitive � Language Modeling Need to scale to millions of features � Indeed, we tend to have more features than data points, and it all works � � Syntactic / Semantic Parsing out ok Kernelization is almost always too expensive, so everything’s done � with primal methods � Machine Translation Need to gracefully handle unseen configurations and words at test � time � Information Extraction Severe non-stationarity when systems are deployed in practice � � Unsupervised Learning Pipelined systems, so we need relatively calibrated probabilities, � also errors often cascade Speech in a Slide The Noisy-Channel Model Frequency gives pitch; amplitude gives volume � We want to predict a sentence given acoustics: � s p ee ch l a b amplitude � The noisy channel approach: Frequencies at each time slice processed into observation vectors � y c n e u q e r f Acoustic model: HMMs over Language model: word positions with mixtures Distributions over sequences of Gaussians as emissions of words (sentences) …………………………………………….. a 12 a 13 a 12 a 14 a 14 ……….. 2

Language Models Language Model Samples Unigram: In general, we want o place a distribution over sentences � � [fifth, an, of, futures, the, an, incorporated, a, a, the, inflation, most, dollars, quarter] � Classic solution: n-gram models � [that, or, limited, the] � [] � [after, any, on, consistently, hospital, lake, of, of, other, and, factors, raised, analyst, � too, allowed, mexico, never, consider, fall, bungled, davison, that, obtain, price, lines, the, to, sass, the, the, further, board, a, details, machinists, …… , nasdaq] Bigram: � � N-gram models are (weighted) regular languages [outside, new, car, parking, lot, of, the, agreement, reached] � [although, common, shares, rose, forty, six, point, four, hundred, dollars, from, thirty, � seconds, at, the, greatest, play, disingenuous, to, be, reset, annually, the, buy, out, of, � Natural language is not regular american, brands, vying, for, mr., womack, currently, share, data, incorporated, believe, chemical, prices, undoubtedly, will, be, as, much, is, scheduled, to, conscientious, teaching] � Many linguistic arguments [this, would, be, a, record, november] � � Long-distance effects: PCFG (later): “The computer which I had just put into the machine room on the � [This, quarter, ‘s, surprisingly, independent, attack, paid, off, the, risk, involving, IRS, leaders, fifth floor crashed.” � and, transportation, prices, .] [It, could, be, announced, sometime, .] � [Mr., Toseland, believes, the, average, defense, economy, is, drafted, from, slightly, more, � N-gram models often work well anyway (esp. with large n) � than, 12, stocks, .] Smoothing Interpolation / Dirichlet Priors � Dealing with sparsity well: smoothing / shrinkage � Problem: is supported by few counts � For most histories P(w | h), relatively few observations � Solution: share counts with related histories, e.g.: � Very intricately explored for the speech n-gram case � Easy to do badly � Despite classic mixture formulation, can be viewed as a P(w | denied the) hierarchical Dirichlet prior [MacKay and Peto, 94] 3 allegations allegations 2 reports outcome 1 reports � Each level’s distribution drawn from prior centered on back-off 1 claims attack claims request man 1 request … 0.8 Fraction Seen � Strength of prior related to mixing weights 7 total 0.6 Unigrams 0.4 Bigrams P(w | denied the) 0.2 Rules 2.5 allegations � Problem: this kind of smoothing doesn’t work well empirically allegations 0 1.5 reports allegations outcome 0 200000 400000 600000 800000 1000000 0.5 claims attack reports man Number of Words 0.5 request … claims request 2 other � All the details you could ever want: [Chen and Goodman, 98] 7 total Kneser-Ney: Discounting Kneser-Ney: Details N-grams occur more in training than they will later: � Kneser-Ney smoothing combines several ideas � � Absolute discounting Count in 22M Words Avg in Next 22M Good-Turing c* 1 0.448 0.446 2 1.25 1.26 � Lower order models take a special form 3 2.24 2.24 4 3.23 3.24 Absolute Discounting � Save ourselves some time and just subtract 0.75 (or some d) � KN smoothing repeatedly proven effective � Maybe have a separate value of d for very low counts � � But we’ve never been quite sure why � And therefore never known how to make it better � [Teh, 2006] shows KN smoothing is a kind of approximate inference in a hierarchical Pitman-Yor process (and better approximations are superior to basic KN) 3

NOTE Machine Learning for NLP: New Developments and Challenges - PDF document

NOTE Machine Learning for NLP: New Developments and Challenges These slides are still incomplete A more complete version will be posted at a later date at: http://www.cs.berkeley.edu/~klein/nips-tutorial Dan Klein Computer Science

53 Note 2 OCBOA Happy Public Library OCBOA 12/31/2015 Interest Calculation for Note Disclosure

Drums, Tempo, and Nested Repeats Drums! The play note block plays a percussion note. It has

Note 2 Flash By: Adrian Sham (adrsham) and Trey Anderson (treyman) A note taking program that

Note-8-3-for-talk Note-8-3-for-talk Current Interactive Session ACL2 Version 8.3 (April, 2020)

2018 TAX ANTICIPATION NOTE UPDATE November 7, 2018 Purpose Tax Anticipation Note is basically

Quickwrite Questions: How did you learn the skill of note taking? How did this skill

EFFECTIVE NOTE-TAKING OVERVIEW Importance Strategies Note - taking apps FORGETTING CURVE

Agriculture Agriculture By Frank W. Elwell Note Note: This presentation is based on the theory

VIA -GSTCLINIC TAX INVOICE, DEBIT NOTE & CREDIT NOTE 1 PRESENTED BY CA SAKET BAGDIA

Genesis Energy Segment Note Update 23 January 2020 G E N E S I S E N E R G Y L I M I T E D

Employment Standards Service IMPORTANT NOTE Please note: This document is for informational

Cautionary Note Cautionary Note on Forward on Forward- -Looking Statements Looking Statements

Key Note Arjen DEIJ ETF Key note speaking points 21 century skills - changing demands

Note on Access to Information Reform in British Note on Access to Information Reform in British

NOPSEMA Regulatory Advice EP Guidance Note APPEA Workshop 14 August 2012 EP Guidance Note

Cautionary Note Cautionary Note on Forward on Forward- -Looking Statements Looking Statements

Sets of Arithmetical Invariants in Transfer Krull Monoids Alfred Geroldinger Spring Central and

(or Informa5onized Force Opera5ons) Michael K. Daly November 4, 2009 What is meant by Advanced,

Maximum Likelihood (ML), Expecta6on Maximiza6on (EM)

Design and Implementa/on of a Carrier Grade So6ware Defined

Safety Check: A Semantic Web Application for Emergency Management Yogesh Pandey Srividya K

Using a RoboJc Arm to Assess the Variability of MoJon

Non-unique factorizations in bounded hereditary noetherian prime rings Daniel Smertnig

Domain-Specific Languages for Stencil Computations Azamat Mametjanov Boyana Norris

NOTE Machine Learning for NLP: New Developments and Challenges - PDF document

NOTE Machine Learning for NLP: New Developments and Challenges These slides are still incomplete A more complete version will be posted at a later date at: http://www.cs.berkeley.edu/~klein/nips-tutorial Dan Klein Computer Science

53 Note 2 OCBOA Happy Public Library OCBOA 12/31/2015 Interest Calculation for Note Disclosure

Drums, Tempo, and Nested Repeats Drums! The play note block plays a percussion note. It has

Note 2 Flash By: Adrian Sham (adrsham) and Trey Anderson (treyman) A note taking program that

Note-8-3-for-talk Note-8-3-for-talk Current Interactive Session ACL2 Version 8.3 (April, 2020)

2018 TAX ANTICIPATION NOTE UPDATE November 7, 2018 Purpose Tax Anticipation Note is basically

Quickwrite Questions: How did you learn the skill of note taking? How did this skill

EFFECTIVE NOTE-TAKING OVERVIEW Importance Strategies Note - taking apps FORGETTING CURVE

Agriculture Agriculture By Frank W. Elwell Note Note: This presentation is based on the theory

VIA -GSTCLINIC TAX INVOICE, DEBIT NOTE &amp; CREDIT NOTE 1 PRESENTED BY CA SAKET BAGDIA

Genesis Energy Segment Note Update 23 January 2020 G E N E S I S E N E R G Y L I M I T E D

Employment Standards Service IMPORTANT NOTE Please note: This document is for informational

Cautionary Note Cautionary Note on Forward on Forward- -Looking Statements Looking Statements

Key Note Arjen DEIJ ETF Key note speaking points 21 century skills - changing demands

Note on Access to Information Reform in British Note on Access to Information Reform in British

NOPSEMA Regulatory Advice EP Guidance Note APPEA Workshop 14 August 2012 EP Guidance Note

Cautionary Note Cautionary Note on Forward on Forward- -Looking Statements Looking Statements

Sets of Arithmetical Invariants in Transfer Krull Monoids Alfred Geroldinger Spring Central and

(or Informa5onized Force Opera5ons) Michael K. Daly November 4, 2009 What is meant by Advanced,

Maximum Likelihood (ML), Expecta6on Maximiza6on (EM)

Design and Implementa/on of a Carrier Grade So6ware Defined

Safety Check: A Semantic Web Application for Emergency Management Yogesh Pandey Srividya K

Using a RoboJc Arm to Assess the Variability of MoJon

Non-unique factorizations in bounded hereditary noetherian prime rings Daniel Smertnig

Domain-Specific Languages for Stencil Computations Azamat Mametjanov Boyana Norris

VIA -GSTCLINIC TAX INVOICE, DEBIT NOTE & CREDIT NOTE 1 PRESENTED BY CA SAKET BAGDIA