The Task Diachronics Dan Klein UC Berkeley Includes joint work with - PDF document

12/1/2014 Natural Language Processing The Task Diachronics Dan Klein – UC Berkeley Includes joint work with Alex Bouchard ‐ Cote, Tom Griffiths, and David Hall Lexical Reconstruction Tree of Languages Latin focus  We assume the phylogeny is known  Much work in French Spanish Italian Portuguese biology, e.g. work by Warnow, Felsenstein, feu fuego fuoco fogo Steele…  Also in linguistics, e.g. Warnow et al., Gray and Atkinson… http://andromeda.rutgers.edu/~jlynch/language.html Evolution through Sound Changes Changes are Systematic Eng. camera from Latin, camera / kamera / numerus / numerus / “camera obscura” camera / kamera / Latin e  _ e  _ Deletion: / e /, / a / camra / kamra / numrus / numrus / Change: / k / .. / t ṏ / .. / ṏ / Insertion: / b / chambre / ṏ amb Й / French Eng. chamber from Old Fr. before the initial / t / dropped 1

12/1/2014 Changes Have Structure Changes are Contextual camera / kamera / camra / kamra / e  _ _  b e  _ / after stress _  b / m_r _  [ stop x ] / [ nasal x ]_r camra / kamra / cambra / kambra / Diachronic Evidence Changes are Systematic English Great Vowel Shift (Simplified!) Yahoo! Answers [ca 2000] Appendix Probi [ca 300] “time” = teem “time” = taim i e tonight not tonite tonitru non tonotru a Synchronic (Comparative) Evidence The Data Key idea: changes occur uniformly across the lexicon 2

12/1/2014 The Data The Data  Data sets  Data sets  Small: Romance  Small: Romance  French, Italian, Portuguese, Spanish  French, Italian, Portuguese, Spanish  2344 words  2344 words  Complete cognate sets  Complete cognate sets FR IT PT ES FR IT PT ES  Target: (Vulgar) Latin  Target: (Vulgar) Latin  Large: Austronesian  637 languages  140K words  Incomplete cognate sets  Target: Proto ‐ Austronesian Austronesian Austronesian Examples From the Austronesian Basic Vocabulary Database Simple Model: Single Characters G The Model G C G G C C C C C G G [cf. Felsenstein 81] 3

12/1/2014 Changes are Systematic Parameters are Branch ‐ Specific focus  ES LA  IB /fokus/ /fokus/ /fokus/ /kentrum/  IT  PT /fogo/ /fogo/ IB /fogo/ /sentro/ fuoco fuego fogo /fw Ꜽ ko/ /fwe ɋ o/ /fogo/ /fw Ꜽ ko/ /fwe ɋ o/ /fogo/ /fw Ꜽ ko/ /fwe ɋ o/ /fogo/ /t ṏ ƌ ntro/ /sentro/ /sentro/ IT ES PT [Bouchard ‐ Cote, Griffiths, Klein, 07] Edits are Contextual, Structured # f o /fokus/ Ꜽ # f w Inference  IT /fw Ꜽ ko/ Learning: Objective Learning: EM  M ‐ Step  Find parameters which fit /fokus/ /fokus/ z (expected) sound change counts /fogo/ /fogo/  Easy: gradient ascent on theta /fw Ꜽ ko/ /fwe ɋ o/ /fogo/ /fw Ꜽ ko/ /fwe ɋ o/ /fogo/ w  E ‐ Step  Find (expected) change /fokus/ counts given parameters  Hard: variables are string ‐ /fogo/ valued /fw Ꜽ ko/ /fwe ɋ o/ /fogo/ 4

12/1/2014 Computing Expectations A Gibbs Sampler Standard approach, e.g. [Holmes 2001]: Gibbs sampling each sequence ‘grass’ ‘grass’ [Holmes 01, Bouchard ‐ Cote, Griffiths, Klein 07] A Gibbs Sampler A Gibbs Sampler ‘grass’ ‘grass’ Getting Stuck Getting Stuck ? How could we jump to a state where the liquids /r/ and /l/ have a common ancestor? 5

12/1/2014 Efficient Sampling: Vertical Slices Single Sequence Resampling Results Ancestry Resampling [Bouchard ‐ Cote, Griffiths, Klein, 08] Results: Romance Learned Rules / Mutations Learned Rules / Mutations Results: Austronesian 6

12/1/2014 Examples: Austronesian Result: More Languages Help Distance from Blust [1993] Reconstructions Mean edit distance Number of modern languages used [Bouchard ‐ Cote, Hall, Griffiths, Klein, 13] Visualization: Learned Universals Regularity and Functional Load In a language, some pairs of sounds are more contrastive than others (higher functional load) Example: English p/d versus t/th High Load: p/d: pot/dot, pin/din dress/press, pew/dew, ... Low Load: t/th: thin/tin *The model did not have features encoding natural classes Functional Load: Timeline Regularity and Functional Load 1955: Functional Load Hypothesis (FLH): Sound changes are Data: only 4 languages from the Austronesian data less frequent when they merge phonemes with high functional load [Martinet, 55] Merger posterior probability 1967: Previous research within linguistics: “FLH does not seem to be supported by the data” [King, 67] (Based on 4 Each dot is a sound change languages as noted by [Hocket, 67; Surandran et al., 06]) identified by the system Our approach: we reexamined the question with two orders of magnitude more data [Bouchard ‐ Cote, Hall, Griffiths, Klein, 13] Functional load as computed by [King, 67] 7

12/1/2014 Regularity and Functional Load Data: all 637 languages from the Austronesian data Merger posterior probability Extensions Functional load as computed by [King, 67] Cognate Detection Grammar Induction GL Avg rel gain: 29% ‘fire’ IE  G RM 70 WG NG 60 50 Portuguese Swedish Spanish Slovene Chinese English Danish Dutch /fw Ꜽ ko/ /v ƌ rbo/ /t ṏ ƌ ntro/ 40 30 /sentro/ /ber Ǎ o/ /fwe ɋ o/ 20 /v ƌ rbo/ /fogo/ /s ƌ ntro/ 10 0 [Hall and Klein, 11] [Berg ‐ Kirkpatrick and Klein, 07] Language Diversity Why are the languages of the world so similar? Universal grammar answer: Hardware constraints Common source answer: Not much time has passed [Rafferty, Griffiths, and Klein, 09] 8

The Task Diachronics Dan Klein UC Berkeley Includes joint work with - PDF document

12/1/2014 Natural Language Processing The Task Diachronics Dan Klein UC Berkeley Includes joint work with Alex Bouchard Cote, Tom Griffiths, and David Hall Lexical Reconstruction Tree of Languages Latin focus We assume the phylogeny

Bond Task Force Draft Bond Task Force Recommendations Tuesday, February 27 , 2018 Bond Task

Task 1d: River basin management Task leader: LNEC; Involved partners EU: ISPRA, DTU, EWA Task

p wered Yva productivity AI Task Manager @nerdybff Task Management Task Management Todoist

CGO Task Presentation CGO Task Presentation CGO Task Presentation Effective Task Presentation

IEA Bioenergy IEA BIOENERGY Task 42 Biorefinery 5 th Task Meeting Dublin, Ireland, 25/26 March

WASC 2019 Findings Presentation to stakeholders February 2019 2 3 Process Task 1 Task 2

Telematics Task Force Telematics Task Force Charlie Gorman Charlie Gorman Talking Points

iDASH - Secure Genome Analysis Task 1A Competition Using ObliVM Task 1B Set union Task 2A Xiao

AU Task Force: 2018 Consultation Bob Dony Chair, AU Task Force April 5, 2018 Outline About

Task-Centered Design Task-Centered Process Creating a Task Scenario Scenario-based Walk-throughs

Somers Central School District Safety Task Force Report Committee Report to the Board of

Delaware River Basin Delaware River Basin Flood Mitigation Task Force: Flood Mitigation Task

Delaware River Basin Delaware River Basin Flood Mitigation Task Force: Flood Mitigation Task

Traffic Study May 23, 2019 Project Team DBS Scope of work Task 1 Data Collection Task 2

Delaware River Basin Delaware River Basin Flood Mitigation Task Force: Flood Mitigation Task

RIVER ECOLOGY & GOVERNANCE TASK FORCE FULL TASK FORCE MEETING APRIL 30, 2019 RIVER ECOLOGY

Generative Lexicon Theory: Integrating Theoretical and Empirical Methods James Pustejovsky

EMERGENCE AND REDUCTION: GO HAND IN HAND? Katie Robertson University of Birmingham 1 THE

SNA & ancient literature Libanius' Epistolary Ego-Network Dr Lieve Van Hoof Fellow des

Double oblique case and agreement across two dialects of Wakhi Daniel Kaufman een College,

Natural Language Processing Diachronics Dan Klein UC Berkeley Includes joint work with Alex

Natural Language Processing with Deep Learning Footprint of Societal Biases in NLP Navid

The scope of linguistics John Goldsmith Origins of linguistics In several cases, the roots

Staying Regular? Alan Hjek ALI G: So what is the chances that me will eventually die? C. EVERETT

Sambuz

Useful Links

Newsletter

Mail Us

The Task Diachronics Dan Klein UC Berkeley Includes joint work with - PDF document

12/1/2014 Natural Language Processing The Task Diachronics Dan Klein UC Berkeley Includes joint work with Alex Bouchard Cote, Tom Griffiths, and David Hall Lexical Reconstruction Tree of Languages Latin focus We assume the phylogeny

Bond Task Force Draft Bond Task Force Recommendations Tuesday, February 27 , 2018 Bond Task

Task 1d: River basin management Task leader: LNEC; Involved partners EU: ISPRA, DTU, EWA Task

p wered Yva productivity AI Task Manager @nerdybff Task Management Task Management Todoist

CGO Task Presentation CGO Task Presentation CGO Task Presentation Effective Task Presentation

IEA Bioenergy IEA BIOENERGY Task 42 Biorefinery 5 th Task Meeting Dublin, Ireland, 25/26 March

WASC 2019 Findings Presentation to stakeholders February 2019 2 3 Process Task 1 Task 2

Telematics Task Force Telematics Task Force Charlie Gorman Charlie Gorman Talking Points

iDASH - Secure Genome Analysis Task 1A Competition Using ObliVM Task 1B Set union Task 2A Xiao

AU Task Force: 2018 Consultation Bob Dony Chair, AU Task Force April 5, 2018 Outline About

Task-Centered Design Task-Centered Process Creating a Task Scenario Scenario-based Walk-throughs

Somers Central School District Safety Task Force Report Committee Report to the Board of

Delaware River Basin Delaware River Basin Flood Mitigation Task Force: Flood Mitigation Task

Delaware River Basin Delaware River Basin Flood Mitigation Task Force: Flood Mitigation Task

Traffic Study May 23, 2019 Project Team DBS Scope of work Task 1 Data Collection Task 2

Delaware River Basin Delaware River Basin Flood Mitigation Task Force: Flood Mitigation Task

RIVER ECOLOGY &amp; GOVERNANCE TASK FORCE FULL TASK FORCE MEETING APRIL 30, 2019 RIVER ECOLOGY

Generative Lexicon Theory: Integrating Theoretical and Empirical Methods James Pustejovsky

EMERGENCE AND REDUCTION: GO HAND IN HAND? Katie Robertson University of Birmingham 1 THE

SNA &amp; ancient literature Libanius' Epistolary Ego-Network Dr Lieve Van Hoof Fellow des

Double oblique case and agreement across two dialects of Wakhi Daniel Kaufman een College,

Natural Language Processing Diachronics Dan Klein UC Berkeley Includes joint work with Alex

Natural Language Processing with Deep Learning Footprint of Societal Biases in NLP Navid

The scope of linguistics John Goldsmith Origins of linguistics In several cases, the roots

Staying Regular? Alan Hjek ALI G: So what is the chances that me will eventually die? C. EVERETT

Sambuz

Useful Links

Newsletter

Mail Us

RIVER ECOLOGY & GOVERNANCE TASK FORCE FULL TASK FORCE MEETING APRIL 30, 2019 RIVER ECOLOGY

SNA & ancient literature Libanius' Epistolary Ego-Network Dr Lieve Van Hoof Fellow des