An Unsupervised Model for Joint Phrase Alignment and Extraction - PowerPoint PPT Presentation

Neubig et al. - An Unsupervised Model for Joint Phrase Alignment and Extraction An Unsupervised Model for Joint Phrase Alignment and Extraction Graham Neubig 1,2 , Taro Watanabe 2 , Eiichiro Sumita 2 , Shinsuke Mori 1 , Tatsuya Kawahara 1 1 Graduate School of Informatics, Kyoto University 2 National Institute of Information and Communication Technology 1

Neubig et al. - An Unsupervised Model for Joint Phrase Alignment and Extraction Phrase Table Construction 2

Neubig et al. - An Unsupervised Model for Joint Phrase Alignment and Extraction The Phrase Table ● The most important element of phrase-based SMT ● Consists of scored bilingual phrase pairs Source Target Scores le it 0.05 0.20 0.005 1 le admettre admit it 1.0 1.0 1e-05 1 admettre admit 0.4 0.5 0.02 1 … ● Usually learned from a parallel corpus aligned at the sentence level → Phrases must be aligned 3

Neubig et al. - An Unsupervised Model for Joint Phrase Alignment and Extraction Traditional Phrase Table Construction: 1-to-1 Alignment, Combination, Extraction Word f→e Alignment 1-Many (GIZA++) Parallel Phrase Many- Phrase Combine Extract. Text Many Table e→f Word 1-Many Alignment (GIZA++) + Generally quite effective, default for Moses - Complicated, with lots of heuristics - Does not directly acquire phrases, which are the final goal of alignment 4 - Phrase table is exhaustively extracted and thus large

Neubig et al. - An Unsupervised Model for Joint Phrase Alignment and Extraction Previous Work: Many-to-Many Alignment Many- Phrase Parallel Phrase Phrase Many Table Text Alignment Extraction ● Significant recent research on many-to-many alignment [Zhang+ 08, DeNero+ 08, Blunsom+ 10] + Model is simplified, gains in accuracy ● Short phrases are aligned, then combined into longer phrases during the extraction step - Some issues still remain ● Large phrase table, heuristics, no direct modeling of extracted phrases 5

Neubig et al. - An Unsupervised Model for Joint Phrase Alignment and Extraction Proposed Model for Joint Phrase Alignment and Extraction Hierarchical Parallel Phrase Phrase Text Table Alignment ● Phrases of multiple granularities directly modeled + No mismatch between alignment goal and final goal + Completely probabilistic model, no heuristics + Competitive accuracy, smaller phrase table ● Uses a hierarchical model for Inversion Transduction Grammars (ITG) 6

Neubig et al. - An Unsupervised Model for Joint Phrase Alignment and Extraction Phrasal Inversion Transduction Grammars (Previous Work) 7

Neubig et al. - An Unsupervised Model for Joint Phrase Alignment and Extraction Inversion Transduction Grammar (ITG) ● Like a CFG over two languages ● Have non-terminals for regular and inverted productions ● One pre-terminal ● Terminals specifying phrase pairs reg inv term term term term I/il me hate/co û te admit/admettre it/le English French English French I hate il me co û te admit it le admettre 8

Neubig et al. - An Unsupervised Model for Joint Phrase Alignment and Extraction Biparsing-based Alignment with ITGs ● Non/pre-terminal distribution P x , and phrase distribution P t i hate to admit it Sentence Pair <e,f> il me coûte de le admettre P x (reg) P x (reg) P x (reg) P x (term) P x (term) P x (term) P x (inv) Derivation d P x (term) P x (term) P t (i/il me) P t (hate/coûte) P t (to/de) P t (admit/admettre) P t (it/le) i hate to admit it Alignment a il me coûte de le admettre ● Viterbi parsing and sampling both possible in O(n 6 ) 9

Neubig et al. - An Unsupervised Model for Joint Phrase Alignment and Extraction Learning Phrasal ITGs with Blocked Gibbs Sampling [Blunsom+ 10] d i e i f i 1) Choose sentence 3) Perform biparsing eeee ffffff to sample using P x and P t ... eeee ffffff c x (d i )-- 2) Subtract D, E, F Corpus current d i c t (d i )-- eeeeeeee ffffffffffff P x Symbol Counts c x ? eeee ffffff eeeeeeee ffffffffffff eeee ffffff P t Biphrase Counts c t eeeeeeee ffffffffffff c x (d i )++ 4) Add new d i c t (d i )++ 5) Replace eeee ffffff … and get a new d i in the corpus eeee ffffff sample for d i 10

Neubig et al. - An Unsupervised Model for Joint Phrase Alignment and Extraction Calculating Probabilities given Counts c t (it/le)=12 c t (I/il me)=3 c t (hate/coûte)=0 … c x (reg)=415 c x (inv)=43 c x (term)=312 ● Adapt Bayesian approach, assume that probabilities were generated from Pitman-Yor process, Dirichlet distribution P t ~ PY  d ,  ,P base  P x ~ Dirichlet = 1,1 / 3  ● Marginal probabilities can be calculated (in example, ignoring d for the PY process) P x  x = c x  x  x / 3 P t  f ,e = c t  f ,e  t P base  f ,e  ∑ x c x  x  x ∑ f ,e c t  f , e  t 11

Neubig et al. - An Unsupervised Model for Joint Phrase Alignment and Extraction Base Measure P t  f ,e = c t  f ,e  t P base  f ,e  ∑ f ,e c t  f , e  t ● P base has an effect of smoothing probabilities ● Particularly for low frequency pairs ● To bias towards good phrase pairs, use geometric mean of word-based Model 1 probabilities [DeNero+ 08] 1 2 P base  e ,f = P m1  f ∣ e  P uni  e  P m1  e ∣ f  P uni  f  ● Good word match in both directions = good phrase match 12

Neubig et al. - An Unsupervised Model for Joint Phrase Alignment and Extraction Calculating Counts given Derivations ● Elements generated from each distribution P x and P t added to the counts used to calculate the probabilities c x (reg) += 3 P x (reg) c x (inv) += 1 P x (reg) P x (reg) c x (term) += 5 P x (term) P x (term) P x (term) P x (inv) P x (term) P x (term) P t ( base ) c t (hate/co û te)++ P t (i/il me) P t (to/de) P t (it/le) P t (admit/admettre) c t (to/de)++ c t (admit/admettre)++ c t (it/le)++ c t (i/il me)++ P base (hate/coûte) ● Problem: only minimal phrases are added → Must still heuristically combine into multiple granularities 13

Neubig et al. - An Unsupervised Model for Joint Phrase Alignment and Extraction Joint Phrase Alignment and Extraction (Our Work) 14

Neubig et al. - An Unsupervised Model for Joint Phrase Alignment and Extraction Basic Idea ● Generative story in reverse order ● Traditional ITG Model: ● Generate branches (reordering structure) from P x ● Generate leaves (phrase pairs) from P t ● Proposed ITG Model: ● From the top, try to generate phrase pair from P t ● Divide and conquer using P x to handle sparsity 15

Neubig et al. - An Unsupervised Model for Joint Phrase Alignment and Extraction Derivation in the Proposed Model ● Phrases of many granularities generated from P t , added to c t P t ( base ) c t (i hate to admit it/il me co û te de le admettre)++ c x (reg) += 3 P x (reg) c x (inv) += 1 P t ( base ) P t ( base ) c x (base) += 1 c t (i hate/il me co û te)++ c t (to admit it/de le admettre)++ P x (reg) P x (reg) P t ( base ) P t ( base ) c t (admit it/le admettre)++ c t (hate/co û te)++ P x (inv) P x (base) P t (to/de) P t (i/il me) P t (it/le) P t (admit/admettre) P base (hate/coûte) c t (to/de)++ c t (admit/admettre)++ c t (it/le)++ c t (i/il me)++ 16 ● No extraction needed, as multiple granularities are included!

Neubig et al. - An Unsupervised Model for Joint Phrase Alignment and Extraction Recursive Base Measure ● Previous work: high prob. words = high prob. phrases ● Proposed: Build new phrase pairs by combining existing phrase pairs in P dac (“divide-and-conquer”) P t (I/il me)←high P t (hate/co û te)←high P dac (I hate/il me co û te)←high P t  f ,e = c t  f ,e  t P dac  f ,e  ∑ f ,e c t  f ,e  t ● High probability sub-phrases → high probability phrases ● P t is included in P dac , P dac is included in P t 17

Neubig et al. - An Unsupervised Model for Joint Phrase Alignment and Extraction Details of P dac ● Choose from P x one of three patterns for P dac , like ITG Regular: P x (reg) * P t (I/il me) * P t (hate/co û te) → I hate/il me co û te Inverted: P x (inv) * P t (admit/admettre) * P t (it/le) → admit it/le admettre Base: P x (base) * P base (hate/co û te) → hate/co û te ● P base is the same as before 18

Neubig et al. - An Unsupervised Model for Joint Phrase Alignment and Extraction Phrase Extraction ● Traditional Heuristics: Exhaustively combine Phrase Table Scores and count all neighboring P(e|f) = c(e,f) / c(f) phrases P(f|e) = c(e,f) / c(e) ● O(n 2 ) phrases per sent. ● Model Probabilities: Phrase Table Scores Calculate phrase table P(e|f) = P t (e,f) / P t (f) from model probabilities where c(e,f) >= 1 P(f|e) = P t (e,f) / P t (e) ● O(n) phrases per sent. 19

Neubig et al. - An Unsupervised Model for Joint Phrase Alignment and Extraction Experiments 20

Neubig et al. - An Unsupervised Model for Joint Phrase Alignment and Extraction Tasks/Data ● 4 Languages, 2 tasks: es-en, de-en, fr-en, ja-en ● de-en, es-en, fr-en: WMT10 news-commentary ● ja-en: NTCIR08 patent translation ● Data was lowercased, tokenized, and sentences of length 40 and under were used WMT NTCIR de es fr en ja en TM 1.85M 1.82M 1.56M 1.80M/1.62M/1.35M 2.78M 2.38M LM - - - 52.7M - 44.7M Tune 47.2k 52.6k 55.4k 49.8k 80.4k 68.9k Test 62.7k 68.1k 72.6k 65.6k 48.7k 40.4k 21

An Unsupervised Model for Joint Phrase Alignment and Extraction - PowerPoint PPT Presentation

Neubig et al. - An Unsupervised Model for Joint Phrase Alignment and Extraction An Unsupervised Model for Joint Phrase Alignment and Extraction Graham Neubig 1,2 , Taro Watanabe 2 , Eiichiro Sumita 2 , Shinsuke Mori 1 , Tatsuya Kawahara 1 1

Phrase Weights Statistical NLP Spring 2011 Lecture 10: Phrase Alignment Dan Klein UC

UNSUPERVISED LEARNING, CLUSTERING UNSUPERVISED LEARNING UNSUPERVISED LEARNING Supervised

Sequence Alignment Gerhard Jger ESSLLI 2016 Gerhard Jger Sequence Alignment ESSLLI 2016 1

Sequence Alignment (chapter 6) p The biological problem p Global alignment p Local alignment p

Building a Phrase-based SMT System Graham Neubig & Kevin Duh Nara Institute of Science and

What Is an Expanded Noun Phrase? An expanded noun phrase gives much more detail than a simple

4CSLL5 Parameter Estimation (Supervised and Unsupervised) Unsupervised Maximum Likelihood

Unsupervised Learning and Clustering l In unsupervised learning you are given a data set with no

Translation Model Parallel corpus source target translation e f phrase phrase features

Ben Burr Trail PROJECT ALIGNMENT Project alignment Hamblen Elem School PROJECT ALIGNMENT

Ben Burr Trail PROJECT ALIGNMENT Project alignment Hamblen Elem School PROJECT ALIGNMENT

Data driven Ontology Alignment Data driven Ontology Alignment Nigam Shah nigam@stanford.edu

Sequence Alignment (chapter 6) The biological problem l Global alignment l Local alignment l

Image alignment Slides from Derek Hoiem, Svetlana Lazebnik Image source Alignment applications

Statistical Machine Translation Overview p EM algorithm Lecture 3 Improved word alignment

Discriminative word alignment by learning the Discriminative word alignment by learning the

August 6th, 2020 Spaces of quasiperiodic sequences Greg Muller joint with Roi DoCampo

Advisory Council: Current State The original composition of the Advisory Council includes 12

Inverse Kinematics This addresses the obvious question: what joint angles will place my end

Joint JD/LLM Information and Panel Global and Graduate Programs 2 Welcome Executive

rainboset E R E V E T with an injection R V ee R Tle 7 e sa m P a property P find smallest

Preparation and Compression of Symmetric Pure Quantum States joint work with Stephan Eidenbenz

Deep learning-based parameter mapping for joint relaxation and diffusion tensor MR Fingerprinting

} Probability hot rain 0.1 ? sun 0.6 Distribution cold sun 0.2 rain 0.1 cold rain

An Unsupervised Model for Joint Phrase Alignment and Extraction - PowerPoint PPT Presentation

Neubig et al. - An Unsupervised Model for Joint Phrase Alignment and Extraction An Unsupervised Model for Joint Phrase Alignment and Extraction Graham Neubig 1,2 , Taro Watanabe 2 , Eiichiro Sumita 2 , Shinsuke Mori 1 , Tatsuya Kawahara 1 1

Phrase Weights Statistical NLP Spring 2011 Lecture 10: Phrase Alignment Dan Klein UC

UNSUPERVISED LEARNING, CLUSTERING UNSUPERVISED LEARNING UNSUPERVISED LEARNING Supervised

Sequence Alignment Gerhard Jger ESSLLI 2016 Gerhard Jger Sequence Alignment ESSLLI 2016 1

Sequence Alignment (chapter 6) p The biological problem p Global alignment p Local alignment p

Building a Phrase-based SMT System Graham Neubig &amp; Kevin Duh Nara Institute of Science and

What Is an Expanded Noun Phrase? An expanded noun phrase gives much more detail than a simple

4CSLL5 Parameter Estimation (Supervised and Unsupervised) Unsupervised Maximum Likelihood

Unsupervised Learning and Clustering l In unsupervised learning you are given a data set with no

Translation Model Parallel corpus source target translation e f phrase phrase features

Ben Burr Trail PROJECT ALIGNMENT Project alignment Hamblen Elem School PROJECT ALIGNMENT

Ben Burr Trail PROJECT ALIGNMENT Project alignment Hamblen Elem School PROJECT ALIGNMENT

Data driven Ontology Alignment Data driven Ontology Alignment Nigam Shah nigam@stanford.edu

Sequence Alignment (chapter 6) The biological problem l Global alignment l Local alignment l

Image alignment Slides from Derek Hoiem, Svetlana Lazebnik Image source Alignment applications

Statistical Machine Translation Overview p EM algorithm Lecture 3 Improved word alignment

Discriminative word alignment by learning the Discriminative word alignment by learning the

August 6th, 2020 Spaces of quasiperiodic sequences Greg Muller joint with Roi DoCampo

Advisory Council: Current State The original composition of the Advisory Council includes 12

Inverse Kinematics This addresses the obvious question: what joint angles will place my end

Joint JD/LLM Information and Panel Global and Graduate Programs 2 Welcome Executive

rainboset E R E V E T with an injection R V ee R Tle 7 e sa m P a property P find smallest

Preparation and Compression of Symmetric Pure Quantum States joint work with Stephan Eidenbenz

Deep learning-based parameter mapping for joint relaxation and diffusion tensor MR Fingerprinting

} Probability hot rain 0.1 ? sun 0.6 Distribution cold sun 0.2 rain 0.1 cold rain

Building a Phrase-based SMT System Graham Neubig & Kevin Duh Nara Institute of Science and