Introduction to Dialectometry Wilbert Heeringa Spr˚ akbanken, University of Gothenburg 30 january 2019 1
Introduction 2
What is dialectometry? • ’The measure of dialect’ (Jean S´ eguy). • Measures the degree of difference or similarity between dialects. • Thus patterns in the dialect landscape can be revealed. 3
Why dialectometry? • For the record of cultural history. In order to reveal migrations, contacts with other peoples, and internal cultural divisions. • May be of use to language learners, publishers, broadcasters, educators and language planners. 4
Isogloss method • Primary tool of traditional dialectology has been the isogloss . • Greek isos means equal, Greek gl ¯ o ssa means language. 5
Nucleus in ripe : [rip( @ )] (west) [r E; p] (central) [rip( @ )] (east) 6
Coda in cold : [k O; u t] (west) [k O; lt] (east) 7
Nucleus in ripe & coda in cold 8
Isogloss method Overlay the isogloss maps of 14 phenomena: 1 [ VE rk] vs. [ VE r @ k] 2 [spl I nt @ r] vs. [spl I nt @ö ] [kni] vs. [kne : ] vs. [kn E: i ] vs. [kn I b @ l] 3 4 [zi ; n] vs. [ @ zi ; n] vs. [ G@ zi ; n] vs. [j @ zi ; n] " ] vs. [ste ; n @ ] vs. [st I; @ s] 5 [ste ; n [me : st @ r] vs. [mi ; @ st @ r] vs. [m E; st @ r] 6 [rip] vs. [r E; i p] 7 8 [z E s] vs. [s E s] vs. [s E z] [k O; u t] vs. [k O; lt] 9 10 [ro : zn " ] vs. [ro : z @ n] vs. [ro : z @ ] [l A d @ r vs. [li ; @ r( @ )] 11 [bru : r] vs. [br œ: i j @ r] vs. [bru ; r @ ] 12 13 [br Yx ] vs. [br YG ( @ )] vs. [br Yg ] 14 [bl O; w] vs. [bl A: t] 9
Isoglosses of 14 phenomena. Isogloss bundles represent dialect boundaries. 10
Isogloss method • Not easy to decide about dialect borders, unless by selecting coinciding isoglosses. 11
Dialectometry We need methodology that: • is purely linguistic; • includes all linguistic levels; • uses a representative data set of contemporary spoken dialect; • includes all data without making subjective selections; • utilizes the data maximally; • allows comparisons regardless whether varieties are geographically close or not; • produces results that are unambiguous. Use dialectometry? 12
Relative difference value • The term ‘dialectometry’ was coined by Jean S´ eguy. • He was director of the Atlas linguistique de la Gascogne . • Assisted and inspired by Henri Guiter. • Dialect distance: number of items on which two dialects differ, expressed in a percentage. 13
Relative difference value • Example: calculate lexical relative difference value between Middelstum and Ommen on the basis of six items: Middelstum Ommen friend k A m @ r U; t k A m @ r O: t 0 ˇ ship sx I p sx I p 0 far v E: r V it 1 ˇ ˚ are b I n b I nt 0 " still n O x n O x 0 stø ; t n push dr Y k 1 ˇN " " 2 • Distance: 2/6 = 0.33. Percentage: 33%. 14
Relative difference value • We call this the ‘relative difference value’. • Can be used for all linguistic levels. • No gradual distances between items. • Goebl (1982 and later) measured dialect similarity and called this Relative Identity Value (RIV). 15
Weighted difference value • Goebl (1984) introduced the Weighted Identity Value (WIV). • Basic idea: similarity in rare lexemes contributes more strongly to the overall similarity between two local dialects than similarity in common lexemes. • Since we focus on distances rather than on similarity, we present ‘weighted difference value’. 16
Weighted difference value • Example: in a set of 360 dialects we find the following lexemes for schip ‘ship’: schip (353), boot (2), lager (1), schuit (4). In terms of distances: schip vs. schip : 353/360 = 0.981 schuit vs. schuit : 4/360 = 0.011 boot vs. boot : 2/360 = 0.006 • The distance between different lexemes (for example schip versus boot ) always is 1. 17
Weighted difference value • Example: calculate the lexical weighted difference value between Middelstum and Ommen on the basis of 6 words: Middelstum Ommen friend k A m @ r U; t k A m @ r O: t 140/354 0.40 ˇ ship sx I p sx I p 353/360 0.98 far v E: r V it 1 ˇ ˚ are b I n b I nt 176/360 0.49 " still n O x n O x 354/355 1.00 stø ; t n push dr Y k 1 ˇN " " 4.87 • Distance: 4.87/6 = 0.81. Percentage: 81%. 18
Levenshtein distance Groningen m�lk Grouw m�lk� Haarlem Almelo m�l�k m�l�k Polsbroek m�l�k Renesse mæl�k Venray m�l�k Mechelen Alveringem m�l�k mæk Kerkrade m�l�x How to quantify differences between the dialect pronunciations? 19
Levenshtein distance • Levenshtein distance was introduced in dialectology by Brett Kessler. • In 1995 he measured linguistic distances between Irish Gaelic dialects. • Later it was applied to Dutch, Sardinian, Norwegian, American English, German, Bulgarian and Bantu dialect/language varieties by others. • Calculate the cost of changing one string into another. 20
Levenshtein distance • Example: milk may be pronounced as [m E l @ k] in the dialect of Haarlem and as [m O lk @ ] in the dialect of Grouw. • Change the first pronounciation into the other. m E l @ k subst. E / O 1 m O l @ k delete @ 1 m O lk insert @ 1 m O lk @ 3 • Many sequence operations map [m E l @ k] → [m O lk @ ]. Levenshtein distance = cost of cheapest mapping. 21
Levenshtein distance • Alignment: 1 2 3 4 5 6 m l k E @ m O l k @ 1 1 1 • We keep track of the alignment length. • If multiple alignments all have the minimum cost, we calculate the length of the longest alignment. • The longest alignment has the greatest number of matches and is linguistically most plausible. 22
Alignment • In a linguistic alignment we assure that the minimum cost is based on an alignment in which: a vowel matches with a vowel ◦ a consonant matches with a consonant ◦ the [j] or [w] matches with a vowel ◦ the [i] or [u] matches with a consonant ◦ ◦ the schwa matches with a sonorant • A pair of pronunciations to be compared with Levenshtein distance consists preferably of cognates as we have done in all of the examples. 23
Levenshtein distance • Variation among dialects is usually not measured on the basis of a single word, but on a set of words. • Assume for two dialects we calculate the Levenshtein distance for n word pairs. • How do we combine them to one distance, i.e. how do we calculate the aggregated distance? 24
Calculating the aggregate • Example: calculate the distance in the sound components between Middelstum and Ommen on the basis of 6 words: Middelstum Ommen sum of length of weights alignment ship sx I p sx I p 0 4 cap p E t p E t @ 1 4 called r O upm @ rupm 2 6 jump spr IN spr IN kt 2 7 cellar k E l @ r k E ld @ r 1 6 house hus hys 1 3 7 30 • ‘Raw distance’ is 7/6 = 1.67, normalized distance is 7/30 = 0.233 = 23.3%. 25
Operation weights • In the examples above we used binary weights: weight is 0 (match of two sounds) or 1 (substitution of one sound by another); ◦ when a sound is inserted or deleted, the weight also is 1. ◦ • Refinement by using gradual PMI distances as operation weights. 26
PMI-based Levenshtein distance • Introduced in dialectology by Martijn Wieling, Jelena Proki´ c and John Nerbonne in 2009. • Pointwise Mutual Information (PMI) assesses the degree of dependence between aligned segments. Procedure: repeat ◦ compare each dialect to each dialect by using Levenstein distance (the first time with binary weights, later times with newly calculated weights). ◦ find new weights by analyzing the alignments: the more frequently segments co-occur in an alignment, the smaller the distance weight. until weights do not change any more. • Alignments made by PMI Levenshtein are better, see Wieling, Proki´ c and Nerbonne (2009). 27
Application • Reeks Nederlandse Dialectatlassen, compiled by E. Blancquaert and W. P´ ee. • Texts from 1922–1975, 1956 local dialects, 139 sentences each. • We selected 361 dialects, 125 words. 28
Distribution of the 361 dialects in the Dutch dialect area. 29
Beam maps • Introduced by Goebl ( ± 1983). • Distances between dialects represented by lines among local dialects in a map. • Each local dialect is connected by a straight line with each dialect. • Darker lines represent smaller distances, lighter lines represent larger distances. 30
Beam maps: lexical relative difference values (left), lexical weighted difference values (middle) and pronunciation Levenshtein distances (right). 31
Honeycomb maps • Exist since Haag (1898), and ‘reintroduced’ by Goebl ( ± 1983). • Shows distances between geographically neighboring dialects. • Related dialects are separated by lighter lines, and more remote dialects are separated by darker lines. • Cartographic inversion of beam maps. 32
Honeycomb maps: lexical relative distance values (left), lexical weighted difference values (middle) and pronunciation Levenshtein distances (right). 33
Recommend
More recommend