Unsupervised Vocabulary Induction 8 month-old babies exposed to - PowerPoint PPT Presentation

Infant Language Acquisition (Saffran et al., 1997) Unsupervised Vocabulary Induction • 8 month-old babies exposed to stream of syllables • Stream composed of synthetic words (pabikumalikiwabufa) • After only 2 minutes of exposure, infants can MIT distinguish words from non-words (e.g., pabiku vs. kumali) Today: Unsupervised Vocabulary Vocabulary Induction Induction Task: Unsupervised learning of word boundary segmentation • Simple: Ourenemiesareinnovativeandresourceful,andsoarewe. • Vocabulary Induction from Unsegmented Text Theyneverstopthinkingaboutnewwaystoharmourcountry • Vocabulary Induction from Speech Signal andourpeople,andneitherdowe. – Sequence Alignment Algorithms • More ambitious:

Word Segmentation (Ando&Lee, 2000) Algorithm for Word Segmentation (Cont.) Key idea: for each candidate boundary, compare the frequency of the n-grams adjacent to the proposed boundary with the frequency of the n-grams that Place boundary at all locations l such that either: straddle it. • l is a local maximum: v N ( l ) > v N ( l − 1) and ? S S v N ( l ) > v N ( l + 1) 1 2 T I N G E V I D • v N ( l ) ≥ t , a threshold parameter t 1 t 2 t 3 t V (k) For N = 4 , consider the 6 questions of the form: N A B | C D | W X | Y| Z ”Is # ( s i ) ≥ # ( t j ) ?”, where #(x) is the number of occurrences of x Example: Is “TING” more frequent in the corpus than ”INGE”? Algorithm for Word Segmentation Experimental Framework s n non-straddling n-grams to the left of location k 1 s n non-straddling n-grams to the right of location k 2 t n straddling n-gram with j characters to the right of location k j I ≥ ( y, z ) indicator function that is 1 when y ≥ z , and 0 otherwise. • Corpus: 150 megabytes of 1993 Nikkei newswire 1. Calculate the fraction of affirmative answers for • Manual annotations: 50 sequences for development each n ≤ N : set (parameter tuning) and 50 sequences for test set 2 n − 1 1 � � I ≥ (#( s n i ) , #( t n v n ( k ) = j )) • Baseline algorithms: Chasen and Juman 2 ∗ ( n − 1) i =1 j =1 morphological analyzers (115,000 and 231,000 2. Average the contributions of each n-gram order words) v N ( k ) = 1 � v n ( k ) N n ∈ N

Evaluation Today: Unsupervised Vocabulary Induction • Precision (P): the percentage of proposed brackets that exactly match word-level brackets in the annotation • Vocabulary Induction from Unsegmented Text • Recall (R): the percentage of word-level annotation • Vocabulary Induction from Speech Signal brackets that are proposed by the algorithm – Sequence Alignment Algorithms P R • F = 2 ( P + R ) • F = 82% (improvement of 1.38% over Jumann and of 5.39% over Chasen) Performance on other datasets Aligning Two Sequences Given two possibly related strings S 1 and S 2 , find the longest common subsequence Orwell(English) 79.8 Song lyrics (Romaji) 67.6 Cheng & Mitzenmacher Goethe (German) 75.2 Verne (French) 72.9 Arrighi (Italian) 73.1

How can We Compute Best Alignment Key Insight: Score is Additive • We need a scoring system for ranking alignments – Substitution Cost A G T C Compute best alignment recursively A 1 0.5 -1 -1 • For a given aligned pair ( i, j ) , the best alignment is: G -0.5 1 -1 -1 Best alignment of S 1[1 . . . i ] and S 2[1 . . . j ] T -1 -1 +1 -0.5 + Best alignment of S 1[ i . . . n ] and S 2[ j . . . m ] C -1 -1 -0.5 1 – Gap (insertion&deletion) Cost Can We Simply Enumerate All Possible Alignment Matrix Alignments? Alignment of two sequences can be modeled as a task of • Naive enumeration is prohibitively expensive finding the path with the highest weight in a matrix � � 2 m + n n + m = ( m + n )! ≈ H E A G A W G ( m !) 2 m � ( n ∗ m ) Alignment: - - P - A W - n=m Enumeration Corresponding Path: 10 184,756 H E A G A W G 20 1.4E+11 + + + 100 9.00E+58 P + + • Alignment using dynamic programming can be done in A + O ( n · m ) W + +

Global Alignment: Needleman-Wunsch Dynamic Programming Formulation Algorithm • To align two strings x , y , we construct a matrix F • We know how to compute the best score – F(i,j): the score of the best alignment between – The number at the bottom right entry (i.e., the initial segment s 1 ...i of x up to x i and the F ( n, m ) ) initial segment y 1 ...j of y up to y j • But we need to remember where it came from • We compute F recursively: F (0 , 0) = 0 – Pointer to the choice we made at each step F(i−1,j−1) F(i,j−1) • Retrace path through the matrix s(xi,yj) −d – Need to remember all the pointers F(i−1,j) F(i,j) −d Time: O ( m · n ) Dynamic Programming Formulation Local alignment: Smith-Waterman Algorithm s ( x i , y j ) similarity between x i and y j gap penalty d • Global alignment: find the best match between  F ( i − 1 , j − 1) + s ( x i , y j )  sequences from one end to the other   F ( i, j ) = max F ( i − 1 , j ) − d • Local alignment: find the best match between   F ( i, j − 1) − d  subsequences of two sequences Boundary conditions: – Useful for comparing highly divergent sequences when only local similarity is expected • The top row: F ( i, 0) = − id F ( i, 0) represents alignments of prefix x to all gaps in y • The left column: F (0 , j ) = − jd

Dynamic Programming Formulation Today: Unsupervised Vocabulary Induction  0     F ( i − 1 , j − 1) + s ( x i , y j )   F ( i, j ) = max F ( i − 1 , j ) − d      F ( i, j − 1) − d • Vocabulary Induction from Unsegmented Text  • Vocabulary Induction from Speech Signal Boundary conditions: F ( i, 0) = F (0 , j ) = 0 – Sequence Alignment Algorithms Finding the best local alignment • Find the highest value of F ( i, j ) , and start the traceback from there • The traceback ends when a cell with value 0 is found Local vs. Global Alignment Finding Words in Speech H E A G A W G P -2 -1 -1 -2 -1 -4 -2 SimilarityMatrix A -2 -1 5 0 5 -3 0 • Traditional approached to speech recognition are W -3 -3 -3 -3 -3 15 -3 supervised: H E A G A W G 0 -8 -16 -24 -32 -40 -48 -56 – Recognizers are trained using a large corpus of GlobalAlignment P -8 -2 -9 -17 -25 -33 -42 -49 A -16 -10 -3 -4 -12 -20 -28 -36 speech with corresponding transcripts W -24 -18 -11 -6 -7 -15 -5 -13 – During the training process, a recognizer is H E A G A W G 0 0 0 0 0 0 0 0 provided with a vocabulary LocalAlignment P 0 0 0 0 0 0 0 0 A 0 0 0 5 0 5 0 0 • Is it possible to learn vocabulary directly from W 0 0 0 0 2 0 20 12 speech signal?

Vocabulary Induction: Outline Spectral Vectors • Spectral vector is a vector where each component is a measure of energy in a particular frequency band • We divide acoustic signal (a one dimensional wave form) into short overlapping intervals (25 msec with 15 msec overlap) • We convert each overlapping window using Fourier transform Comparing Acoustic Signals Example of Spectral Vectors he too was diagnosed with paranoid schizophrenia 6000 Freq (Hz) 4000 2000 0 0.5 1 1.5 2 2.5 3 3.5 4 Time (sec) were willing to put nash’s schizophrenia on record 6000 Freq (Hz) 4000 2000 0 0.5 1 1.5 2 2.5 3 3.5 Time (sec)

Comparing Spectral Vectors Computing Local Alignment • Divide acoustic signal to “word segments” based on pauses • Compute spectral vectors for each segment • Build a distance matrix for each pair of “word segments” – use Euclidean distance to compare between spectral vectors Example of Distance Matrix Clustering Similar Utterance

Examples of Computed Clusters

Unsupervised Vocabulary Induction 8 month-old babies exposed to - PowerPoint PPT Presentation

Infant Language Acquisition (Saffran et al., 1997) Unsupervised Vocabulary Induction 8 month-old babies exposed to stream of syllables Stream composed of synthetic words (pabikumalikiwabufa) After only 2 minutes of exposure, infants

Induction Stepwise induction (for T PA , T cons ) Complete induction (for T PA , T cons )

Induction and recursion Chapter 5 Chapter Summary Mathematical Induction Strong Induction

UNSUPERVISED LEARNING, CLUSTERING UNSUPERVISED LEARNING UNSUPERVISED LEARNING Supervised

On the Limitations of Unsupervised Bilingual Dictionary Induction Anders Sgaard Sebastian

Mathematical Induction Lecture 10-11 Menu Mathematical Induction Strong Induction

MA THEMA TICAL INDUCTION Induction and Deduction Mathematical Induction (its

Beyond Inductive Definitions Induction-Recursion, Induction-Induction, Coalgebras Anton

Lecture Outline Strengthening Induction Hypothesis. Lecture Outline Strengthening Induction

Strong induction (3) 23/38 Let P be a unary predicate on N Strong induction: Induction . . .

Vocabulary and Reading in Secondary School (VaRiSS) Jessie Ricketts Royal Holloway Vocabulary

VOCABULARY ATI TEAS ENGLISH AND LANGUAGE USAGE VOCABULARY Vocabulary questions on this part of

VOCABULARY ATI TEAS ENGLISH AND LANGUAGE USAGE VOCABULARY Vocabulary questions on this part of

Building Science Vocabulary: Seeds of Science Roots of Reading Goal Review our model for

Teaching Vocabulary Pre-Teaching Vocabulary + Pre-Teaching Vocabulary: An Example for 2 nd -5 th

Unsupervised Learning and Clustering l In unsupervised learning you are given a data set with no

4CSLL5 Parameter Estimation (Supervised and Unsupervised) Unsupervised Maximum Likelihood

Computable Real Analysis without Set Theory or Turing Machines Paul Taylor Department of

Computer Vision II Bjoern Andres Machine Learning for Computer Vision TU Dresden 2020-05-22

Measuring Packet Reordering John Bellardo Stefan Savage Department of Computer Science and

2. Competition for Empire and Economic Expansion 2.1 Global Economy of the Eighteenth Century

Locata: Serving Those Positioning, Navigation & Timing (PNT) Applications That GNSS Can Not

Islamic Empires: Ottoman, Safavid, Mughal The Ottoman T urks as Islamic Eurasian Power

Bas Basic ic El Elec. ec. En Engr gr. . Lab Lab ECS EC S 204 04/210 Dr. Prapun

Cost functionals for large random trees Marion Sciauveau Joint work with J-F. Delmas and J-S.

Unsupervised Vocabulary Induction 8 month-old babies exposed to - PowerPoint PPT Presentation

Infant Language Acquisition (Saffran et al., 1997) Unsupervised Vocabulary Induction 8 month-old babies exposed to stream of syllables Stream composed of synthetic words (pabikumalikiwabufa) After only 2 minutes of exposure, infants

Induction Stepwise induction (for T PA , T cons ) Complete induction (for T PA , T cons )

Induction and recursion Chapter 5 Chapter Summary Mathematical Induction Strong Induction

UNSUPERVISED LEARNING, CLUSTERING UNSUPERVISED LEARNING UNSUPERVISED LEARNING Supervised

On the Limitations of Unsupervised Bilingual Dictionary Induction Anders Sgaard Sebastian

Mathematical Induction Lecture 10-11 Menu Mathematical Induction Strong Induction

MA THEMA TICAL INDUCTION Induction and Deduction Mathematical Induction (its

Beyond Inductive Definitions Induction-Recursion, Induction-Induction, Coalgebras Anton

Lecture Outline Strengthening Induction Hypothesis. Lecture Outline Strengthening Induction

Strong induction (3) 23/38 Let P be a unary predicate on N Strong induction: Induction . . .

Vocabulary and Reading in Secondary School (VaRiSS) Jessie Ricketts Royal Holloway Vocabulary

VOCABULARY ATI TEAS ENGLISH AND LANGUAGE USAGE VOCABULARY Vocabulary questions on this part of

VOCABULARY ATI TEAS ENGLISH AND LANGUAGE USAGE VOCABULARY Vocabulary questions on this part of

Building Science Vocabulary: Seeds of Science Roots of Reading Goal Review our model for

Teaching Vocabulary Pre-Teaching Vocabulary + Pre-Teaching Vocabulary: An Example for 2 nd -5 th

Unsupervised Learning and Clustering l In unsupervised learning you are given a data set with no

4CSLL5 Parameter Estimation (Supervised and Unsupervised) Unsupervised Maximum Likelihood

Computable Real Analysis without Set Theory or Turing Machines Paul Taylor Department of

Computer Vision II Bjoern Andres Machine Learning for Computer Vision TU Dresden 2020-05-22

Measuring Packet Reordering John Bellardo Stefan Savage Department of Computer Science and

2. Competition for Empire and Economic Expansion 2.1 Global Economy of the Eighteenth Century

Locata: Serving Those Positioning, Navigation &amp; Timing (PNT) Applications That GNSS Can Not

Islamic Empires: Ottoman, Safavid, Mughal The Ottoman T urks as Islamic Eurasian Power

Bas Basic ic El Elec. ec. En Engr gr. . Lab Lab ECS EC S 204 04/210 Dr. Prapun

Cost functionals for large random trees Marion Sciauveau Joint work with J-F. Delmas and J-S.

Locata: Serving Those Positioning, Navigation & Timing (PNT) Applications That GNSS Can Not