. . Improving Phonetic Alignment by Handling Secondary Sequence Structures . . . . . Johann-Mattis List ∗ ∗ Institute for Romance Languages and Literature Heinrich Heine University Düsseldorf 2012/08/10 1 / 40
Structure of the Talk . . . Historical Linguistics 1 Keys to the Past Comparative Method Sound Correspondences . . . Sequence Comparison 2 Sequences Alignment Analyses Alignment Modes . . . Secondary Alignment 3 Secondary Sequence Structures Secondary Alignment Problem Secondary Alignment Algorithm . . . Phonetic Alignment 4 SCA Paradigmatic Aspects Syntagmatic Aspects . . . Evaluation 5 Evaluation Measures Gold Standard Results 2 / 40
Historical Linguistics Historical Linguistics 3 / 40
Historical Linguistics Keys to the Past Charles Lyell on Languages 4 / 40
Historical Linguistics Keys to the Past Charles Lyell on Languages The Geological Evidences of The Antiquity of Man with Remarks on Theories of The Origin of Species by Variation By Sir Charles Lyell London John Murray, Albemarle Street 1863 4 / 40 1
Historical Linguistics Keys to the Past Charles Lyell on Languages If we new not- Portuguese, French, hing of the existence Wallachian, and of Latin, - if all Rhaetian dialects historical documents would enable us to previous to the fin- say that at some teenth century had time there must ha- been lost, - if tra- ve been a language, dition even was si- from which these lent as to the former six modern dialects existance of a Ro- derive their origin man empire, a me- in common. re comparison of the Italian, Spanish, 4 / 40 1
Historical Linguistics Keys to the Past Historical Scenarios German ʦ aː n - * Proto-Germanic t a n d English t ʊː θ - ** Proto-Indo-European d o n t Italian d ɛ n t e * Proto-Romance d e n t French d ɑ̃ - - 5 / 40 1
Historical Linguistics Keys to the Past Historical Scenarios German ʦ aː n - * Proto-Germanic t a n d English t ʊː θ - ** Proto-Indo-European d o n t Italian d ɛ n t e * Proto-Romance d e n t French d ɑ̃ - - 5 / 40 1
Historical Linguistics Keys to the Past Historical Scenarios German ʦ aː n - - * Proto-Germanic t a n d English t ʊː - θ - ** Proto-Indo-European d o n t Italian d ɛ n t e * Proto-Romance d e n t French d ɑ̃ - - - 5 / 40 1
Historical Linguistics Keys to the Past Historical Scenarios German ʦ aː n - - Proto-Germanic t a n θ - English t ʊː - θ - ** Proto-Indo-European d o n t Italian d ɛ n t e Proto-Romance d e n t e French d ɑ̃ - - - 5 / 40 1
Historical Linguistics Keys to the Past Historical Scenarios German ʦ aː n - Proto-Germanic t a n θ - English t ʊː - θ ** Proto-Indo-European d o n t Italian d ɛ n t e Proto-Romance d e n t e French d ɑ̃ - - 5 / 40 1
Historical Linguistics Keys to the Past Historical Scenarios German ʦ aː n - Proto-Germanic t a n θ - English t ʊː - θ Proto-Indo-European d e n t - Italian d ɛ n t ə Proto-Romance d e n t e French d ɑ̃ - - 5 / 40 1
Historical Linguistics Keys to the Past Historical Scenarios German ʦ aː n - * Proto-Germanic t a n d English t ʊː - θ Proto-Indo-European d e n t Italian d ɛ n t ə * Proto-Romance d e n t French d ɑ̃ - - 5 / 40 1
Historical Linguistics Keys to the Past Historical Scenarios ʦ aː n ʦ aː n German German t t a a n n θ θ Proto-Germanic Proto-Germanic t t ʊː ʊː θ θ English English d d e e n n t t Proto-Indo-European Proto-Indo-European d d ɛ ɛ n n t t e e Italian Italian d e n t e d e n t e Proto-Romance Proto-Romance d d ɑ̃ ɑ̃ French French 1 5 / 40
Historical Linguistics Comparative Method The Comparative Method Compile an initial list of putative cognate sets. Extract an initial list of putative sets of sound correspondences from the initial cognate list. Refine the cognate list and the correspondence list by adding and deleting cognate sets from the cognate list, depending on whether they are consistent with the correspondence list or not, and adding and deleting correspondence sets from the correspondence list, depending on whether they are consistent with the cognate list or not. Finish when the results are satisfying enough. 6 / 40
Historical Linguistics Sound Correspondences Sound Correspondences Sequence similarity is determined on the basis of systematic sound correspondences as opposed to similarity based on surface resemblances of phonetic segments. Lass (1997) calls this notion of similarity phenotypic as opposed to a genotypic notion of similarity. The most crucial aspect of correspondence-based similarity is that it is language-specific: Genotypic similarity is never defined in general terms but always with respect to the language systems which are being compared. bla German [ʦaːn] “tooth” Dutch tand [tɑnt] English [tʊːθ] “tooth” German [ʦeːn] “ten” Dutch tien [tiːn] English [tɛn] “ten” German [ʦʊŋə] “tongue” Dutch tong [tɔŋ] English [tʌŋ] “tongue” 7 / 40
Historical Linguistics Sound Correspondences Sound Correspondences Sequence similarity is determined on the basis of systematic sound correspondences as opposed to similarity based on surface resemblances of phonetic segments. Lass (1997) calls this notion of similarity phenotypic as opposed to a genotypic notion of similarity. The most crucial aspect of correspondence-based similarity is that it is language-specific: Genotypic similarity is never defined in general terms but always with respect to the language systems which are being compared. Meaning German Dutch English Zahn [ ʦ aːn] tand [ t ɑnt] tooth [ t ʊːθ] “tooth” zehn [ ʦ eːn] tien [ t iːn] ten [ t ɛn] “ten” Zunge [ ʦ ʊŋə] tong [ t ɔŋ] tongue [ t ʌŋ] “tongue” 7 / 40
Historical Linguistics Sound Correspondences Sound Correspondences Sequence similarity is determined on the basis of systematic sound correspondences as opposed to similarity based on surface resemblances of phonetic segments. Lass (1997) calls this notion of similarity phenotypic as opposed to a genotypic notion of similarity. The most crucial aspect of correspondence-based similarity is that it is language-specific: Genotypic similarity is never defined in general terms but always with respect to the language systems which are being compared. Meaning Shanghai Beijing Guangzhou [ ʨ iɤ³⁵] Beijing [ ʨ iou²¹⁴] [ k ɐu³⁵] “nine” [ ʨ iŋ⁵⁵ʦɔ²¹] Beijing [ ʨ iɚ⁵⁵] [ k ɐm⁵³jɐt²] “today” [koŋ⁵⁵ ʨ i²¹] Beijing [kuŋ⁵⁵ ʨ i⁵⁵] [ k ɐi⁵⁵koŋ⁵⁵] “rooster” 7 / 40
Sequence Comparison S e q u e n c e C o m p a r i s o n 8 / 40
Sequence Comparison Sequences Sequences Definition 1 Given an alphabet (a non-empty finite set, whose elements are called characters ), a sequence is an ordered list of char- acters drawn from the alphabet. The elements of sequences are called segments . (cf. Böckenbauer & Bongartz 2003: 30f) 9 / 40
Sequence Comparison Sequences Sequences 10 / 40
Sequence Comparison Sequences Sequences 10 / 40
Sequence Comparison Sequences Sequences � � � � � � � � � � � � � 3 4 11 / 40
Sequence Comparison Sequences Sequences 1 1 1 1 11 / 40
Sequence Comparison Sequences Sequences Baked Rabbit 1 rabbit 1 1/2 tsp. salt 1 1/8 1/8 tsp. pepper 1 1/2 c. onion slices • Rub salt and pepper on rabbit pieces. • Place on large sheet of aluminium foil. • Place onion slices on rabbit. • Bake at 350 degrees. • Eat when done and tender. 1 11 / 40 1
Sequence Comparison Alignment Analyses Alignment Analyses Definition 2 An alignment of two sequences s and t is a two-row matrix in which both sequences are aranged in such a way that all matching and mismatching segments occur in the same column, while empty cells, resulting from empty matches, are filled with gap symbols. (cf. Kruskal 1983) 12 / 40
Sequence Comparison Alignment Analyses Alignment Analyses 0 H H H H H 0 0 H H H H 0 13 / 40
Sequence Comparison Alignment Analyses Alignment Analyses 0 H H H H H 0 0 H H H H 0 13 / 40
Sequence Comparison Alignment Analyses Alignment Analyses 0 H H H H H 0 0 H H H H H 0 13 / 40
Mode Alignment G R E E N C A T F I S H H U N T E R global A F A T C A T - - - - H U N T E R Sequence Comparison Alignment Modes Global Alignment Global alignment analyses are the most basic way to com- pare sequences. The traditional Needleman-Wunsch algo- rithm (Needleman and Wunsch 1971) conducts global align- ment analyses, and the Levenshtein distance (edit distance, Levenshtein 1965) is defined for global alignments. 14 / 40
Recommend
More recommend