recovering dialect geography from an unaligned comparable
play

Recovering dialect geography from an unaligned comparable corpus - PowerPoint PPT Presentation

Introduction The Archimob corpus Cognate and identical word pairs Recovering dialect geography Conclusion Recovering dialect geography from an unaligned comparable corpus Yves Scherrer LATL, Department of Linguistics University of Geneva,


  1. Introduction The Archimob corpus Cognate and identical word pairs Recovering dialect geography Conclusion Recovering dialect geography from an unaligned comparable corpus Yves Scherrer LATL, Department of Linguistics University of Geneva, Switzerland LINGVIS & UNCLH Workshop EACL 2012, Avignon 1 / 24 Yves Scherrer: Recovering dialect geography from an unaligned comparable corpus

  2. Introduction 3 2 / 24 Conclusion 5 Recovering dialect geography 4 Cognate and identical word pairs The Archimob corpus The Archimob corpus 2 Introduction 1 Overview Conclusion Recovering dialect geography Cognate and identical word pairs Yves Scherrer: Recovering dialect geography from an unaligned comparable corpus

  3. Introduction 2 3 / 24 Our data source: transcribed texts from multiple Swiss German dialects Typical data source: dialectological surveys geographical distribution of dialect similarities Use statistical and mathematical methods to discover the Dialectometric analysis Use these word pairs to determine dialect distance The Archimob corpus Find cognate word pairs in texts from multiple dialects Cognate identifjcation 1 Introduction Conclusion Recovering dialect geography Cognate and identical word pairs Yves Scherrer: Recovering dialect geography from an unaligned comparable corpus

  4. Introduction 2 3 / 24 Our data source: transcribed texts from multiple Swiss German dialects Typical data source: dialectological surveys geographical distribution of dialect similarities Use statistical and mathematical methods to discover the Dialectometric analysis Use these word pairs to determine dialect distance The Archimob corpus Find cognate word pairs in texts from multiple dialects Cognate identifjcation 1 Introduction Conclusion Recovering dialect geography Cognate and identical word pairs Yves Scherrer: Recovering dialect geography from an unaligned comparable corpus

  5. Introduction 2 3 / 24 Our data source: transcribed texts from multiple Swiss German dialects Typical data source: dialectological surveys geographical distribution of dialect similarities Use statistical and mathematical methods to discover the Dialectometric analysis Use these word pairs to determine dialect distance The Archimob corpus Find cognate word pairs in texts from multiple dialects Cognate identifjcation 1 Introduction Conclusion Recovering dialect geography Cognate and identical word pairs Yves Scherrer: Recovering dialect geography from an unaligned comparable corpus

  6. Introduction (columns) at different inquiry points (rows). mached Flawil (SG) doo Lüüt mached A data matrix lists the realizations of different linguistic phenomena All realizations of a given phenomenon can be retrieved and daa compared easily: they are in the same column. Our data set: A comparable multidialectal corpus: 16 Swiss German texts Unaligned: we don’t know which are the phenomena and their respective realizations 4 / 24 Lüüt Horgen (ZH) The Archimob corpus Leute Cognate and identical word pairs Recovering dialect geography Conclusion Introduction Typical dialectological data: hier (sie) machen machend Köniz (BE) hie Lüt mache Niederwald (VS) hie Lit Yves Scherrer: Recovering dialect geography from an unaligned comparable corpus

  7. Introduction (columns) at different inquiry points (rows). mached Flawil (SG) doo Lüüt mached A data matrix lists the realizations of different linguistic phenomena All realizations of a given phenomenon can be retrieved and daa compared easily: they are in the same column. Our data set: A comparable multidialectal corpus: 16 Swiss German texts Unaligned: we don’t know which are the phenomena and their respective realizations 4 / 24 Lüüt Horgen (ZH) The Archimob corpus Leute Cognate and identical word pairs Recovering dialect geography Conclusion Introduction Typical dialectological data: hier (sie) machen machend Köniz (BE) hie Lüt mache Niederwald (VS) hie Lit Yves Scherrer: Recovering dialect geography from an unaligned comparable corpus

  8. Introduction hät 5 / 24 is used as a measure of dialect similarity. The proportion between identical and non-identical cognate pairs Partition cognate word pairs into identical and non-identical ones 2 Determine cognate word pairs (and discard non-cognate words) 1 altershaim vom gsii vatter schlosser (Bag of words) The Archimob corpus Text in Dialect B gsìì autersheim dienscht vom het vatter (Bag of words) Text in Dialect A The idea Conclusion Recovering dialect geography Cognate and identical word pairs Yves Scherrer: Recovering dialect geography from an unaligned comparable corpus

  9. Introduction hät 5 / 24 is used as a measure of dialect similarity. The proportion between identical and non-identical cognate pairs Partition cognate word pairs into identical and non-identical ones 2 Determine cognate word pairs (and discard non-cognate words) 1 altershaim vom gsii vatter schlosser (Bag of words) The Archimob corpus Text in Dialect B gsìì autersheim dienscht vom het vatter (Bag of words) Text in Dialect A The idea Conclusion Recovering dialect geography Cognate and identical word pairs Yves Scherrer: Recovering dialect geography from an unaligned comparable corpus

  10. Introduction hät 5 / 24 is used as a measure of dialect similarity. The proportion between identical and non-identical cognate pairs Partition cognate word pairs into identical and non-identical ones 2 Determine cognate word pairs (and discard non-cognate words) 1 altershaim vom gsii vatter schlosser (Bag of words) The Archimob corpus Text in Dialect B gsìì autersheim dienscht vom het vatter (Bag of words) Text in Dialect A The idea Conclusion Recovering dialect geography Cognate and identical word pairs Yves Scherrer: Recovering dialect geography from an unaligned comparable corpus

  11. Introduction hät 5 / 24 is used as a measure of dialect similarity. The proportion between identical and non-identical cognate pairs Partition cognate word pairs into identical and non-identical ones 2 Determine cognate word pairs (and discard non-cognate words) 1 altershaim vom gsii vatter schlosser (Bag of words) The Archimob corpus Text in Dialect B gsìì autersheim dienscht vom het vatter (Bag of words) Text in Dialect A The idea Conclusion Recovering dialect geography Cognate and identical word pairs Yves Scherrer: Recovering dialect geography from an unaligned comparable corpus

  12. Introduction What is an identical word pair? 6 / 24 Visually: cluster analysis, multidimensional scaling Numerically: correlation with geographical distance geographical proximity of the dialects? How do these dialect similarity values compare with the Dialect similarity can then be computed between every text pair. What is a cognate word pair? The Archimob corpus is used as a measure of dialect similarity. The proportion between identical and non-identical cognate pairs Questions Conclusion Recovering dialect geography Cognate and identical word pairs Yves Scherrer: Recovering dialect geography from an unaligned comparable corpus

  13. Introduction What is an identical word pair? 6 / 24 Visually: cluster analysis, multidimensional scaling Numerically: correlation with geographical distance geographical proximity of the dialects? How do these dialect similarity values compare with the Dialect similarity can then be computed between every text pair. What is a cognate word pair? The Archimob corpus is used as a measure of dialect similarity. The proportion between identical and non-identical cognate pairs Questions Conclusion Recovering dialect geography Cognate and identical word pairs Yves Scherrer: Recovering dialect geography from an unaligned comparable corpus

  14. Introduction 3 7 / 24 Conclusion 5 Recovering dialect geography 4 Cognate and identical word pairs The Archimob corpus The Archimob corpus 2 Introduction 1 Overview Conclusion Recovering dialect geography Cognate and identical word pairs Yves Scherrer: Recovering dialect geography from an unaligned comparable corpus

  15. Introduction The Archimob corpus Cognate and identical word pairs Recovering dialect geography Conclusion The Archimob corpus Archimob is an oral history project about the Second World War period in Switzerland. 555 interviews in all Swiss language regions 16 Swiss German interviews transcribed (University of Zurich) Transcription: A single transcriber for all texts Dieth spelling guidelines 6 500 to 16 700 words per interview 8 / 24 Yves Scherrer: Recovering dialect geography from an unaligned comparable corpus

  16. Introduction The geographic location of the 16 transcribed texts: 9 / 24 The Archimob corpus Yves Scherrer: Recovering dialect geography from an unaligned comparable corpus The Archimob corpus Recovering dialect geography Cognate and identical word pairs Conclusion BS1057 AG1147 AG1063 SG1198 BL1073 ZH1270 ZH1143 GL1207 LU1195 LU1261 BE1142 GL1048 SZ1209 NW1007 BE1170 VS1212

Recommend


More recommend