Informativeness: A review of work by Regier and colleagues (and a response) Jon W. Carr Centre for Language Evolution School of Philosophy, Psychology and Language Sciences University of Edinburgh
What shapes language? compressibility expressivity Language Learning Communication simplicity informativeness
How do learning and communication shape the structure of semantic categories?
a pressure for simplicity How do learning and communication shape the structure of semantic categories? a pressure for informativeness ✗ ✔
Kinship terms are simple and informative Kemp & Regier (2012) English Northern Paiute 4 3 ⬅ Informative 2 1 0 0 50 100 ⬅ Simple
Learning and communication in the CLE framework 4 3 ⬅ Informative 2 1 0 0 50 100 ⬅ Simple
Learning and communication in the CLE framework tuge tuge tuge 4 tuge tuge tuge tuge tuge tuge tupim tupim tupim Learning miniku miniku miniku tupin tupin tupin 3 poi poi poi poi poi poi poi poi poi ⬅ Informative Kirby, Cornish, & Smith (2008) 2 1 0 0 50 100 ⬅ Simple
Learning and communication in the CLE framework tuge tuge tuge 4 tuge tuge tuge tuge tuge tuge tupim tupim tupim Learning miniku miniku miniku pihino nemone piga kawake tupin tupin tupin 3 poi poi poi poi poi poi poi poi poi ⬅ Informative kapa gakho wuwele nepi Kirby, Cornish, & Smith (2008) 2 newhomo kamone gaku hokako Kirby, Tamariz, Cornish, & Smith (2015) 1 Communication 0 0 50 100 ⬅ Simple
Learning and communication in the CLE framework tuge tuge tuge 4 tuge tuge tuge tuge tuge tuge tupim tupim tupim Learning miniku miniku miniku pihino nemone piga kawake tupin tupin tupin 3 poi poi poi poi poi poi poi poi poi ⬅ Informative kapa gakho wuwele nepi Kirby, Cornish, & Smith (2008) 2 newhomo kamone gaku hokako Kirby, Tamariz, Cornish, & Smith (2015) Kirby, Tamariz, Cornish, & Smith (2015) 1 Learning and communication egewawu egewawa egewuwu ege Communication 0 mega megawawa megawuwu wulagi 0 50 100 ⬅ Simple gamenewawu gamenewawa gamenewuwu gamene
Summary Pressure from learning Pressure from communication Compressibility: To what Expressivity: How many extent can the language meaning distinctions does CLE be compressed? the language allow? Measure: MDL, gzip, Measure: Number of entropy words Simplicity: How many Informativeness: How words does an individual effectively can a meaning Regier need to remember? be transmitted? Measure: Number of Measure: Communicative words, number of rules cost
Summary Pressure from learning Pressure from communication Compressibility: To what Informativeness: How extent can the language effectively can a meaning be compressed? be transmitted? Measure: MDL, gzip, Measure: Communicative entropy cost bits required to represent the language bits lost during communication
Communicative cost
Communicative cost: High-level overview
Communicative cost: Low-level details To compute the cost of a category partition, we start by considering a individual target meaning and compute how much error would be incurred in trying to reconstruct that target Reconstruction error is defined as the Kullback-Leibler divergence between s and l : s ( i ) 1 � D KL ( s || l ) = s ( i ) log 2 l ( i ) = log 2 l ( t ) i ∈ U Summing the divergences for all targets yields the communicative cost for the partition: � k = p ( t ) D KL ( s || l ) t ∈ U 1 � k = p ( t ) log 2 l ( t ) t ∈ U
Communicative cost: Example of a discrete categorizer 1 � universe U = { i 1 , i 2 , ..., i 16 } k = p ( t ) log 2 l ( t ) t ∈ U category partition P = { C 1 , C 2 , C 3 , C 4 } 1 1 = {{ i 1 , i 2 , i 3 , i 4 } , { i 5 , i 6 , i 7 , i 8 } , { i 9 , i 10 , i 11 , i 12 } , { i 13 , i 14 , i 15 , i 16 }} � = 16 log 2 1 / 4 t ∈ U speaker’s lexicon S = { C 1 → �� , C 2 → �� , C 3 → �� , C 4 → �� } = 16( 1 1 16 log 2 1 / 4) listener’s lexicon L = { �� → C 1 , �� → C 2 , �� → C 3 , �� → C 4 } 1 p = [ 1 16 , 1 16 , 1 16 , 1 16 , 1 16 , 1 16 , 1 16 , 1 16 , 1 16 , 1 16 , 1 16 , 1 16 , 1 16 , 1 16 , 1 16 , 1 = log 2 need probabilities 16] 1 / 4 = log 2 4 speaker distributions s 1 = [1 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0] (for each meaning) s 2 = [0 , 1 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0] = 2 bits ... s 16 = [0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 1] l C 1 = [ 1 4 , 1 4 , 1 4 , 1 listener distributions 4 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0] (for each category) l C 2 = [0 , 0 , 0 , 0 , 1 4 , 1 4 , 1 4 , 1 4 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0] l C 3 = [0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 1 4 , 1 4 , 1 4 , 1 4 , 0 , 0 , 0 , 0] l C 4 = [0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 1 4 , 1 4 , 1 4 , 1 4 ]
Communicative cost: Example of a discrete categorizer 1 � universe U = { i 1 , i 2 , ..., i 16 } k = p ( t ) log 2 l ( t ) t ∈ U category partition P = { C 1 , C 2 , C 3 , C 4 } 1 1 Why 2 bits? = {{ i 1 , i 2 , i 3 , i 4 } , { i 5 , i 6 , i 7 , i 8 } , { i 9 , i 10 , i 11 , i 12 } , { i 13 , i 14 , i 15 , i 16 }} � = 16 log 2 1 / 4 t ∈ U speaker’s lexicon S = { C 1 → �� , C 2 → �� , C 3 → �� , C 4 → �� } 0000 0100 1000 1100 0001 0101 1001 1101 = 16( 1 1 Ideal system: 4-bit signals 0010 0110 1010 1110 16 log 2 1 / 4) listener’s lexicon 0011 0111 1011 1111 L = { �� → C 1 , �� → C 2 , �� → C 3 , �� → C 4 } (1 signal for every meaning) 1 p = [ 1 16 , 1 16 , 1 16 , 1 16 , 1 16 , 1 16 , 1 16 , 1 16 , 1 16 , 1 16 , 1 16 , 1 16 , 1 16 , 1 16 , 1 16 , 1 = log 2 need probabilities 16] 1 / 4 Actual system: 2-bit signals 00 01 10 11 (Pressure from leaning prefers more compressed system) = log 2 4 speaker distributions s 1 = [1 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0] (for each meaning) s 2 = [0 , 1 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0] = 2 bits ... Loss of information on every communicative episode: s 16 = [0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 1] 4 bits – 2 bits = 2 bits l C 1 = [ 1 4 , 1 4 , 1 4 , 1 listener distributions 4 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0] (for each category) l C 2 = [0 , 0 , 0 , 0 , 1 4 , 1 4 , 1 4 , 1 4 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0] l C 3 = [0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 1 4 , 1 4 , 1 4 , 1 4 , 0 , 0 , 0 , 0] l C 4 = [0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 1 4 , 1 4 , 1 4 , 1 4 ]
Communicative cost: Listener distributions Humans aren’t discrete categorizers; in human cognition, we see two effects: (a) within-category prototypicality (b) across-category fuzziness Instead, the listener distributions can be Discrete categorizer Fuzzy categorizer Non-categorizer modelled as Gaussians: � e γ d ( i,j ) l C ( i ) ∝ j ∈ U where γ allows you to model various types of categorizer 1—4 5—8 9—12 13—16 1—4 5—8 9—12 13—16 1—4 5—8 9—12 13—16
Communicative cost: Example of a fuzzy categorizer 1 � universe U = { i 1 , i 2 , ..., i 16 } k = p ( t ) log 2 l ( t ) t ∈ U category partition P = { C 1 , C 2 , C 3 , C 4 } = 3 . 636 bits = {{ i 1 , i 2 , i 3 , i 4 } , { i 5 , i 6 , i 7 , i 8 } , { i 9 , i 10 , i 11 , i 12 } , { i 13 , i 14 , i 15 , i 16 }} speaker’s lexicon S = { C 1 → �� , C 2 → �� , C 3 → �� , C 4 → �� } listener’s lexicon L = { �� → C 1 , �� → C 2 , �� → C 3 , �� → C 4 } p = [ 1 16 , 1 16 , 1 16 , 1 16 , 1 16 , 1 16 , 1 16 , 1 16 , 1 16 , 1 16 , 1 16 , 1 16 , 1 16 , 1 16 , 1 16 , 1 need probabilities 16] speaker distributions s 1 = [1 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0] (for each meaning) s 2 = [0 , 1 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0] ... s 16 = [0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 1] l C 1 = [ . 079 , . 082 , . 082 , . 079 , . 071 , . 064 , . 058 , . 053 , . 048 , . 045 , . 045 , . 048 , . 053 , . 058 , . 064 , . 071] listener distributions (for each category) l C 2 = [ . 053 , . 058 , . 064 , . 071 , . 079 , . 082 , . 082 , . 079 , . 071 , . 064 , . 058 , . 053 , . 048 , . 045 , . 045 , . 048] l C 3 = [ . 048 , . 045 , . 045 , . 048 , . 053 , . 058 , . 064 , . 071 , . 079 , . 082 , . 082 , . 079 , . 071 , . 064 , . 058 , . 053] l C 4 = [ . 071 , . 064 , . 058 , . 053 , . 048 , . 045 , . 045 , . 048 , . 053 , . 058 , . 064 , . 071 , . 079 , . 082 , . 082 , . 079]
Communicative cost: Six predictions Expressivity A system of many categories is more informative Convexity A system of convex categories (blue) is more than a system of few categories informative than a system of nonconvex categories (red) Balanced categories A system of equally sized categories is more Discreteness A system of discrete categories is more informative informative than a system of unequally sized categories than a system of fuzzy categories Dimensionality A system that uses many dimensions is less (?) Compactness A system of compact categories is more informative than a system that uses few dimensions informative than a system of noncompact categories
Recommend
More recommend