Phylogenetic Inference for Language Nicholas Andrews, Jason Eisner, - PowerPoint PPT Presentation

Phylogenetic Inference for Language Nicholas Andrews, Jason Eisner, Mark Dredze Department of Computer Science, CLSP, HLTCOE Johns Hopkins University Baltimore, Maryland 21218 noa@jhu.edu April 23, 2013

Outline 1 Phylogenetic inference? 2 Generative model 3 A sampler sketch 4 Variational EM 5 Experiments

Phylogenetic inference? Language evolution: e.g. sound change 1 1 (Bouchard-Cˆ ot´ e et al., 2007)

Phylogenetic inference? Bibliographic entry variation: Steven Abney, Robert E. Schapire, & Yoram Singer (1999). Boosting applied to tagging and PP attachment. Proc. EMNLP-VLC. New Brunswick, New Jersey: Association for Computational Linguistics abbreviate names Abney, S. , Schapire, R . E., & Singer, Y . (1999). Boosting applied to tagging and PP attachment. Proc. EMNLP-VLC. New Brunswick, New Jersey: Association for Computational Linguistics initials first; shorten to ACL delete location, shorten venue S. Abney, R. E. Schapire & Y . Singer (1999). Boosting applied to tagging and PP attachment. In Proc. EMNLP-VLC. New Brunswick, New Jersey. ACL. Abney, S., Schapire, R. E., & Singer, Y . (1999). Boosting applied to tagging and PP attachment. EMNLP .

Phylogenetic inference? Paraphrase: Papa ate the caviar substitute "devoured" add "with a spoon" Papa ate the caviar with a spoon Papa devoured the caviar Active to passive The caviar was devoured by papa

Phylogenetic inference? One Entity, Many Names �� Qaddafi, Muammar �� Al-Gathafi, Muammar �� al-Qadhafi, Muammar �� Al Qathafi, Mu’ammar Al Qathafi, Muammar El Gaddafi, Moamar El Kadhafi, Moammar El Kazzafi, Moamer 2 2 Spence et al, NAACL 2012

Phylogenetic inference? In each example, there are systematic changes over time: • Sound change: assimilation, metathesis, etc. • Bibliographic variation: typos, abbreviations, punctuation, etc. • Paraphrase: synonyms, voice change, re-arrangements, etc. • Name variation: nicknames, titles, initials, etc.

Phylogenetic inference? In each example, there are systematic changes over time: • Sound change: assimilation, metathesis, etc. • Bibliographic variation: typos, abbreviations, punctuation, etc. • Paraphrase: synonyms, voice change, re-arrangements, etc. • Name variation: nicknames, titles, initials, etc. This talk: name variation

What’s a name phylogeny? A phylogeny is a directed tree rooted at ♦ Khawaja Gharibnawaz Muinuddin Hasan Chisty Khwaja Muin al-Din Chishti Khwaja Gharib Nawaz Khwaja Moinuddin Chishti Ghareeb Nawaz Khwaja gharibnawaz Muinuddin Chishti Figure: A cherry-picked fragment of a phylogeny learned by our model.

Objects in the model Names are mentioned in context: Observed? Description Example Name � Justin Parent x 13 Entity e 44 (= Justin Bieber) � Type person Topic 6 (= music ) � Document d 20 Language � English Token position 100 � Index 729

Beliebers held up infinity signs at PERSON ... Generative model Step 1: Sample a topic z at each position in each document 3 (for all documents in the corpus): z 1 z 2 z 3 z 4 z 5 ... 3 This is just like latent Dirichlet allocation (LDA).

Generative model Step 1: Sample a topic z at each position in each document 3 (for all documents in the corpus): z 1 z 2 z 3 z 4 z 5 ... Step 2: Sample either (1) a context word or (2) a named-entity type at each position, conditioned on the topic: Beliebers held up infinity signs at PERSON ... 3 This is just like latent Dirichlet allocation (LDA).

Generative model Step 3: For the n th named-entity mention y , pick a parent x : 1 Pick ♦ with probability α n + α ♦ PERSON n

Generative model Step 3: For the n th named-entity mention y , pick a parent x : 1 Pick ♦ with probability α n + α ♦ PERSON n 2 Pick a previous mention with probability proportional to exp ( φ · f ( x , y )): x PERSON n Features of x and y: topic, entity type, language

Generative model Step 4: Generate a name conditioned on the selected parent 1 If the parent is ♦ , generate a name from scratch ♦ Justin Bieber

Generative model Step 4: Generate a name conditioned on the selected parent 1 If the parent is ♦ , generate a name from scratch ♦ Justin Bieber 2 Otherwise: Justin Bieber Justin Bieber copy with probability 1 − µ

Generative model Step 4: Generate a name conditioned on the selected parent 1 If the parent is ♦ , generate a name from scratch ♦ Justin Bieber 2 Otherwise: Justin Bieber Justin Bieber Justin Bieber J.B. copy with probability 1 − µ mutate with probability µ

Generative model Name variation as mutations “Mutations” capture different types of name variation: 1. Transcription errors: Barack → barack 2. Misspellings: Barack → Barrack 3. Abbreviations: Barack Obama → Barack O. 4. Nicknames: Barack → Barry 5. Dropping words: Barack Obama → Barack

Generative model Mutation via probabilistic finite-state transducers The mutation model is a probabilistic finite-state transducer with four character operations: copy , substitute , delete , insert ◮ Character operations are conditioned on the right input character ◮ Latent regions of contiguous edits ◮ Back-off smoothing Transducer parameters θ determine the probability of being in different regions, and of the different character operations

Generative model Example: Mutating a name Mr. Robert Kennedy Mr. Bobby Kennedy Example mutation M r . _ R o b e r t _ K e n n e d y $ M r . _[ Beginning of edit region

Generative model Example: Mutating a name Mr. Robert Kennedy Mr. Bobby Kennedy Example mutation M r . _ R o b e r t _ K e n n e d y $ M r . _[B 1 substitution operation: (R, B)

Generative model Example: Mutating a name Mr. Robert Kennedy Mr. Bobby Kennedy Example mutation M r . _ R o b e r t _ K e n n e d y $ M r . _[B o b 2 copy operations: (ε, o), (ε, b)

Generative model Example: Mutating a name Mr. Robert Kennedy Mr. Bobby Kennedy Example mutation M r . _ R o b e r t _ K e n n e d y $ M r . _[B o b 3 deletion operations: (e,ε), (r,ε), (t, ε)

Generative model Example: Mutating a name Mr. Robert Kennedy Mr. Bobby Kennedy Example mutation M r . _ R o b e r t _ K e n n e d y$ M r . _[B o b b y 2 insertion operations: (ε,b), (ε,y)

Generative model Example: Mutating a name Mr. Robert Kennedy Mr. Bobby Kennedy Example mutation M r . _ R o b e r t _ K e n n e d y $ M r . _[B o b b y] End of edit region

Generative model Example: Mutating a name Mr. Robert Kennedy Mr. Bobby Kennedy Example mutation M r . _ R o b e r t _ K e n n e d y $ M r . _[B o b b y]_ K e n n e d y $

Inference The latent variables in the model are 4 • The spanning tree over tokens p • The token permutation i • The topics of all named-entity and context tokens z Inference requires marginalizing over the latent variables: � Pr φ , θ ( x ) = Pr φ , θ ( x , z , i , p ) p , i , z 4 The mutation model also has latent alignments

Inference The latent variables in the model are • The spanning tree over tokens p • The token permutation i • The topics of all named-entity and context tokens z Inference requires marginalizing over the latent variables: � Pr φ , θ ( x ) = Pr φ , θ ( x , z , i , p ) p , i , z This sum is intractable to compute �

Inference The latent variables in the model are • The spanning tree over tokens p • The token permutation i • The topics of all named-entity and context tokens z Inference requires marginalizing over the latent variables: � Pr φ , θ ( x ) = ✘✘✘✘✘✘✘✘✘✘✘ Pr φ , θ ( x , z , i , p ) ✘ p , i , z N ≈ 1 � Pr φ , θ ( x , z n , i n , p n ) N n =1 But we can sample from the posterior! �

A block sampler Key idea: sampling ( p , i , z ) jointly is hard, but sampling from the conditional for each variable is easy(ier)

A block sampler Key idea: sampling ( p , i , z ) jointly is hard, but sampling from the conditional for each variable is easy(ier) Procedure: • Initialize ( p , i , z ). • For n = 1 to N : 1 Resample a permutation i given all other variables. 2 Resample the topic vector z , similarly. 3 Resample the phylogeny p , similarly. 4 Output the current sample ( p , i , z ). Steps 1 and 2 are Metropolis-Hastings proposals

Sampling topics Step 1: Run belief propagation with messages M ij directed from the leaves to the root ♦ ♦ x M yx M zx y z

Sampling topics Step 1: Run belief propagation with messages M ij directed from the leaves to the root ♦ ♦ x M yx M zx y z Step 2: Sample topics z from ♦ downwards proportional to the belief at each vertex, conditioned on previously sampled topics

Sampling permutations ♦ ♦ x x y y (a) Compatible with both ( x , y ) and (b) Compatible with a single ( y , x ). permutation: ( x , y ).

Phylogenetic Inference for Language Nicholas Andrews, Jason Eisner, - PowerPoint PPT Presentation

Phylogenetic Inference for Language Nicholas Andrews, Jason Eisner, Mark Dredze Department of Computer Science, CLSP, HLTCOE Johns Hopkins University Baltimore, Maryland 21218 noa@jhu.edu April 23, 2013 Outline 1 Phylogenetic inference? 2

FOUND IN TRANSLATION: Reconstructing Phylogenetic Language Trees Reconstructing Phylogenetic

Assessing Phylogenetic Hypotheses and Phylogenetic Data We use numerical phylogenetic methods

Phylogenetic Networks Networks Phylogenetic Daniel H. Huson Daniel H. Huson www-

Spaces of phylogenetic networks Jonathan Klawitter PhD Exam 5th March, 2020 2 - 1

CSCE 471/871 Lecture 5: Phylogenetic Trees Building Phylogenetic Trees Stephen Scott

Outline CSCE CSCE 471/871 471/871 Lecture 5: Lecture 5: Building Building CSCE 471/871

Is automatic cognate detection good enough for phylogenetic inference? Jena, CESC 2017 September

Phylogenetic analysis of Cytochrome P450 Phylogenetic analysis of Cytochrome P450 Structures

Phylogenetic tree Michael Schroeder Biotechnology Center TU Dresden Phylogenetic trees

Drawing Tree-Based Phylogenetic Networks with Minimum Number of Crossings Jonathan Klawitter

Phylogenetic Trees in ACL2 Warren A. Hunt Jr. and Serita M. Nelesen The University of Texas at

On the proper use of phylogenetic information in typology Gerhard Jger Tbingen University

Balance indices for phylogenetic trees under well-known probability models Universitat de les

CSCE 471/871 Lecture 5: Building Phylogenetic Trees Building trees from pairwise distances

Phylogenetic tree Michael Schroeder Biotechnology Center TU Dresden Phylogenetic trees

Phylogenetic trees I Foundations, Distance-based inference Gerhard Jger Words, Bones, Genes,

Lecture 16: The CKY parsing algorithm Kai-Wei Chang CS @ University of Virginia kw@kwchang.net

Electrons on a triangular lattice in Na-doped Cobalt Oxide Yayu Wang, Lu Li, N.P.O. Nyrissa

How This Talk Came to Be... January 9, 2003 Were thinking of writing an NSF proposal.

Preliminaries Programming Coprogramming Advanced Coprogramming Preliminaries Higher-Order

Learning Relational Extractors Learning Relational Extractors TRAINING SET TRAINING SET Input

Bacterial Foraging Optimization Hoang Thanh Nguyen and Bir Bhanu 9th Annual HUMIES Awards GECCO

Natural Language Processing: Natural Language Processing: Introduction to Syntactic Parsing

The Eect of Global Warming On Financial Discounting Methodology James G. Bridgeman, FSA