Projecting observed data on lower-dimensional-manifold Gerhard J¨ ager (UT¨ ubingen) Power laws Freiburg, January 19, 2011 35 / 112
Projecting observed data on lower-dimensional-manifold Gerhard J¨ ager (UT¨ ubingen) Power laws Freiburg, January 19, 2011 36 / 112
Projecting observed data on lower-dimensional-manifold Gerhard J¨ ager (UT¨ ubingen) Power laws Freiburg, January 19, 2011 37 / 112
Projecting observed data on lower-dimensional-manifold Gerhard J¨ ager (UT¨ ubingen) Power laws Freiburg, January 19, 2011 38 / 112
Projecting observed data on lower-dimensional-manifold Gerhard J¨ ager (UT¨ ubingen) Power laws Freiburg, January 19, 2011 39 / 112
Projecting observed data on lower-dimensional-manifold Gerhard J¨ ager (UT¨ ubingen) Power laws Freiburg, January 19, 2011 40 / 112
Smoothing the partitions from smoothed extensions we can recover smoothed partitions each pixel is assigned to category in which it has the highest degree of membership Gerhard J¨ ager (UT¨ ubingen) Power laws Freiburg, January 19, 2011 41 / 112
Smoothed partitions of the color space Gerhard J¨ ager (UT¨ ubingen) Power laws Freiburg, January 19, 2011 42 / 112
Smoothed partitions of the color space Gerhard J¨ ager (UT¨ ubingen) Power laws Freiburg, January 19, 2011 43 / 112
Smoothed partitions of the color space Gerhard J¨ ager (UT¨ ubingen) Power laws Freiburg, January 19, 2011 44 / 112
Smoothed partitions of the color space Gerhard J¨ ager (UT¨ ubingen) Power laws Freiburg, January 19, 2011 45 / 112
Smoothed partitions of the color space Gerhard J¨ ager (UT¨ ubingen) Power laws Freiburg, January 19, 2011 46 / 112
Smoothed partitions of the color space Gerhard J¨ ager (UT¨ ubingen) Power laws Freiburg, January 19, 2011 47 / 112
Smoothed partitions of the color space Gerhard J¨ ager (UT¨ ubingen) Power laws Freiburg, January 19, 2011 48 / 112
Smoothed partitions of the color space Gerhard J¨ ager (UT¨ ubingen) Power laws Freiburg, January 19, 2011 49 / 112
Smoothed partitions of the color space Gerhard J¨ ager (UT¨ ubingen) Power laws Freiburg, January 19, 2011 50 / 112
Smoothed partitions of the color space Gerhard J¨ ager (UT¨ ubingen) Power laws Freiburg, January 19, 2011 51 / 112
Convexity note: so far, we only used information from the WCS the location of the 330 Munsell chips in L*a*b* space played no role so far still, apparently partition cells always form continuous clusters in L*a*b* space Hypothesis (G¨ ardenfors): extension of color terms always form convex regions of L*a*b* space Gerhard J¨ ager (UT¨ ubingen) Power laws Freiburg, January 19, 2011 52 / 112
Support Vector Machines supervised learning technique smart algorithm to classify data in a high-dimensional space by a (for instance) linear boundary minimizes number of mis-classifications if the training data are not linearly separable SVM classification plot o o 3 o o o o o o o o o 2 o red o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o 1 o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o x o o o 0 o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o −1 o o o o o o o o o o o o o o o o o o o o o o o green o o o o o o o o o o o −2 o o o o o o o o o o o o −3 o o −3 −2 −1 0 1 2 3 y Gerhard J¨ ager (UT¨ ubingen) Power laws Freiburg, January 19, 2011 53 / 112
Convex partitions a binary linear classifier divides an n -dimensional space into two convex half-spaces intersection of two convex set is itself convex hence: intersection of k binary classifications leads to convex sets procedure: if a language partitions the Munsell space into m categories, train m ( m − 1) many binary SVMs, one for each pair of 2 categories in L*a*b* space leads to m convex sets (which need not split the L*a*b* space exhaustively) Gerhard J¨ ager (UT¨ ubingen) Power laws Freiburg, January 19, 2011 54 / 112
Convex approximation Gerhard J¨ ager (UT¨ ubingen) Power laws Freiburg, January 19, 2011 55 / 112
Convex approximation Gerhard J¨ ager (UT¨ ubingen) Power laws Freiburg, January 19, 2011 56 / 112
Convex approximation Gerhard J¨ ager (UT¨ ubingen) Power laws Freiburg, January 19, 2011 57 / 112
Convex approximation Gerhard J¨ ager (UT¨ ubingen) Power laws Freiburg, January 19, 2011 58 / 112
Convex approximation Gerhard J¨ ager (UT¨ ubingen) Power laws Freiburg, January 19, 2011 59 / 112
Convex approximation Gerhard J¨ ager (UT¨ ubingen) Power laws Freiburg, January 19, 2011 60 / 112
Convex approximation Gerhard J¨ ager (UT¨ ubingen) Power laws Freiburg, January 19, 2011 61 / 112
Convex approximation Gerhard J¨ ager (UT¨ ubingen) Power laws Freiburg, January 19, 2011 62 / 112
Convex approximation Gerhard J¨ ager (UT¨ ubingen) Power laws Freiburg, January 19, 2011 63 / 112
Convex approximation Gerhard J¨ ager (UT¨ ubingen) Power laws Freiburg, January 19, 2011 64 / 112
Convex approximation on average, 93 . 7% of all Munsell chips are correctly classified by convex approximation proportion of correctly classified Munsell chips 0.95 0.90 ● ● ● ● ● ● ● 0.85 ● ● ● 0.80 Gerhard J¨ ager (UT¨ ubingen) Power laws Freiburg, January 19, 2011 65 / 112
Convex approximation compare to the outcome of the same procedure without PCA, and with PCA but using a random permutation of the Munsell chips 100 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 80 ● ● ● ● ● ● ● ● ● ● ● ● degree of convexity (%) ● ● 60 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 40 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 20 ● 1 2 3 Gerhard J¨ ager (UT¨ ubingen) Power laws Freiburg, January 19, 2011 66 / 112
Convex approximation choice of m = 10 is somewhat arbitrary outcome does not depend very much on this choice though 100 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 90 mean degree of convexity (%) 80 70 60 50 0 10 20 30 40 50 Gerhard J¨ ager (UT¨ ubingen) Power laws Freiburg, January 19, 2011 67 / 112 no. of principal components used
Implicative universals first six features correspond nicely to the six primary colors white, black, red, green, blue, yellow according to Kay et al. (1997) (and many other authors) simple system of implicative universals regarding possible partitions of the primary colors Gerhard J¨ ager (UT¨ ubingen) Power laws Freiburg, January 19, 2011 68 / 112
Implicative universals I II III IV V white white red red / yellow yellow green / blue green / blue black black white white red � white / red / yellow white � red / yellow yellow red / yellow black / green / blue green green black / green / blue black / blue blue black white white red red yellow yellow green black / green / blue black / blue white white red red yellow / green yellow / green / blue blue black black white red yellow / green black / blue source: Kay et al. (1997) Gerhard J¨ ager (UT¨ ubingen) Power laws Freiburg, January 19, 2011 69 / 112
Partition of the primary colors each speaker/term pair can be projected to a 15-dimensional vector primary colors correspond to first 6 entries each primary color is assigned to the term for which it has the highest value defines for each speaker a partition over the primary colors Gerhard J¨ ager (UT¨ ubingen) Power laws Freiburg, January 19, 2011 70 / 112
Partition of the primary colors for instance: sample speaker from Piraha (see above): A 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 B extracted partition: C D E F G H I white / yellow J red green / blue black supposedly impossible, but occurs 61 times in the database Gerhard J¨ ager (UT¨ ubingen) Power laws Freiburg, January 19, 2011 71 / 112
Partition of primary colors most frequent partition types: { white } , { red } , { yellow } , { green, blue } , { black } ( 41 . 9% ) 1 { white } , { red } , { yellow } , { green } , { blue } , { black } ( 25 . 2% ) 2 { white } , { red, yellow } , { green, blue, black } ( 6 . 3% ) 3 { white } , { red } , { yellow } , { green } , { black, blue } ( 4 . 2% ) 4 { white, yellow } , { red } , { green, blue } , { black } ( 3 . 4% ) 5 { white } , { red } , { yellow } , { green, blue, black } ( 3 . 2% ) 6 { white } , { red, yellow } , { green, blue } , { black } ( 2 . 6% ) 7 { white, yellow } , { red } , { green, blue, black } ( 2 . 0% ) 8 { white } , { red } , { yellow } , { green, blue, black } ( 1 . 6% ) 9 10 { white } , { red } , { green, yellow } , { blue, black } ( 1 . 2% ) Gerhard J¨ ager (UT¨ ubingen) Power laws Freiburg, January 19, 2011 72 / 112
Partition of primay colors 87 . 1% of all speaker partitions obey Kay et al.’s universals the ten partitions that confirm to the universals occupy ranks 1, 2, 3, 4, 6, 7, 9, 10, 16, 18 decision what counts as an exception seems somewhat arbitrary on the basis of these counts Gerhard J¨ ager (UT¨ ubingen) Power laws Freiburg, January 19, 2011 73 / 112
Partition of primary colors ● 500 more fundamental problem: ● 200 partition frequencies are distributed 100 ● according to power law ● ● ● 50 ● frequency ● ● 20 ● frequency ∼ rank − 1 . 99 ● ● ● 10 ●●● ● ●● 5 ● no natural cutoff point to distinguish ● ● ● ● ● 2 ● ● ● ● ● ● ● regular from exceptional partitions 1 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 1 2 5 10 20 50 rank Gerhard J¨ ager (UT¨ ubingen) Power laws Freiburg, January 19, 2011 74 / 112
Partition of seven most important colors 500 ● ● 200 ● ● 100 ● ● 50 ● ● ● ● ● ● frequency frequency ∼ rank − 1 . 64 ● 20 ●● ●● 10 ● ● ● ● ● ● ● ● ● ● 5 ● ● ● ● ● ● ● ● ● ● ● ● ● ● 2 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 1 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 1 2 5 10 20 50 100 rank Gerhard J¨ ager (UT¨ ubingen) Power laws Freiburg, January 19, 2011 75 / 112
Partition of eight most important colors ● 200 100 ● ● ● ● ● ● ● ● 50 ●●●● ●● ● ● ● ● frequency ● 20 ● frequency ∼ rank − 1 . 46 ● ● ● ● ● ● ● ● ● 10 ● ● ● ● ● ● ● ● ● ● ● ● 5 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 2 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 1 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 1 2 5 10 20 50 100 200 rank Gerhard J¨ ager (UT¨ ubingen) Power laws Freiburg, January 19, 2011 76 / 112
Power laws Gerhard J¨ ager (UT¨ ubingen) Power laws Freiburg, January 19, 2011 77 / 112
Power laws Gerhard J¨ ager (UT¨ ubingen) Power laws Freiburg, January 19, 2011 78 / 112
Power laws from Newman 2006 Gerhard J¨ ager (UT¨ ubingen) Power laws Freiburg, January 19, 2011 79 / 112
Power laws are not everywhere Gerhard J¨ ager (UT¨ ubingen) Power laws Freiburg, January 19, 2011 80 / 112
Other linguistic power law distributions number of vowel systems and their frequency of occurrence vowels 3 14 4 14 5 4 2 5 97 3 6 26 12 12 7 23 6 5 4 3 8 6 3 3 2 9 7 7 3 (from Schwartz et al. 1997, Gerhard J¨ ager (UT¨ ubingen) Power laws Freiburg, January 19, 2011 81 / 112
Other linguistic power law distributions 100 ● 50 ● ● 20 frequency frequency ∼ rank − 1 . 06 ● ● ● ● 10 ● ● ● ● 5 ● ● ● ● ● ● ●●● 2 ●● 1 2 5 10 20 rank Gerhard J¨ ager (UT¨ ubingen) Power laws Freiburg, January 19, 2011 82 / 112
Other linguistic power law distributions ● ● 500 ● ● ● ● ● ● ● ● ● 100 size of language families ●●●●●● frequency ● ● ● 50 ● ● ● ● ● ● ● source: Ethnologue ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● frequency ∼ rank − 1 . 32 ● ● ● 10 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 5 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 1 ● ● ● ● ● ● ● ● 1 2 5 10 20 50 100 rank Gerhard J¨ ager (UT¨ ubingen) Power laws Freiburg, January 19, 2011 83 / 112
Other linguistic power law distributions ● 1000 500 ● ● 200 ● ● ● ● number of speakers per frequency (in million) ● ● 100 ● ●● language ●●●● ● ● ● ● 50 ● ● ● ● ● ● source: Ethnologue ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 20 ● ● ● ● ● ● ● ● frequency ∼ rank − 1 . 01 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 10 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 5 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 1 2 5 10 20 50 100 200 rank Gerhard J¨ ager (UT¨ ubingen) Power laws Freiburg, January 19, 2011 84 / 112
The World Atlas of Language Structures large scale typological database, conducted mainly by the MPI EVA, Leipzig 2,650 languages in total are used 142 features, with between 120 and 1,370 languages per feature available online Gerhard J¨ ager (UT¨ ubingen) Power laws Freiburg, January 19, 2011 85 / 112
The World Atlas of Language Structures Maslova 2008, “Meta-typological 1.000 distributions” ● ● ● ● ● ● ● ● ● ● ●●●●●● 0.500 ● ● ● ● ● ● ● hypothesis: ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.200 ● ● ● ● ● ● ● ● ● pick a random value for each feature ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.100 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● estimate the probability that a random ● ● ● ● ● ● ● ● ● ● ● ● 0.050 ● ● ● ● ● ● ● ● ● ● x ● ● language has this value ● ● ● ● ● ● 0.020 ● ● ● the likelihood that an arbitrarily ● ● ● 0.010 ● chosen feature value has a probability ● ● ● ● 0.005 ● x is proportional to a power of x ● ● only holds for the most frequent 30% of 0.01 0.02 0.05 0.10 0.20 0.50 1.00 P(p(type)<=x) all types for the entire range of type frequencies, the hypothesis can be rejected Gerhard J¨ ager (UT¨ ubingen) Power laws Freiburg, January 19, 2011 86 / 112
The World Atlas of Language Structures however, Maslova is perhaps right in the assumption that languages are power-law distributed across WALS types worth to test it within features rather than across features problem: number of feature values usually too small for statistic evaluation solution: cross-classification of two (randomly chosen) features only such feature pairs are considered that lead to at least 30 non-empty feature value combinations pilot study with 10 such feature pairs Gerhard J¨ ager (UT¨ ubingen) Power laws Freiburg, January 19, 2011 87 / 112
The World Atlas of Language Structures 0 10 Feature 1: Consonant-Vowel Ratio Pr(X ≥ x) −1 Feature 2: Subtypes of 10 Asymmetric Standard Negation Kolmogorov-Smirnov test: positive −2 10 0 1 2 10 10 10 x Gerhard J¨ ager (UT¨ ubingen) Power laws Freiburg, January 19, 2011 88 / 112
The World Atlas of Language Structures 0 10 Feature 1: Weight Factors in Pr(X ≥ x) Weight-Sensitive Stress −1 10 Systems Feature 2: Ordinal Numerals Kolmogorov-Smirnov test: positive −2 10 0 1 2 10 10 10 x Gerhard J¨ ager (UT¨ ubingen) Power laws Freiburg, January 19, 2011 89 / 112
The World Atlas of Language Structures 0 10 Feature 1: Third Person Zero of Verbal Person Pr(X ≥ x) Marking −1 10 Feature 2: Subtypes of Asymmetric Standard Negation Kolmogorov-Smirnov test: positive −2 10 0 1 2 10 10 10 x Gerhard J¨ ager (UT¨ ubingen) Power laws Freiburg, January 19, 2011 90 / 112
The World Atlas of Language Structures 0 10 Feature 1: Relationship between the Order of Object and Verb and the Pr(X ≥ x) Order of Adjective and −1 10 Noun Feature 2: Expression of Pronominal Subjects Kolmogorov-Smirnov −2 10 test: positive 0 1 2 3 10 10 10 10 x Gerhard J¨ ager (UT¨ ubingen) Power laws Freiburg, January 19, 2011 91 / 112
The World Atlas of Language Structures 0 10 Feature 1: Plurality in Independent Personal Pr(X ≥ x) Pronouns −1 10 Feature 2: Asymmetrical Case-Marking Kolmogorov-Smirnov test: positive −2 10 0 1 2 10 10 10 x Gerhard J¨ ager (UT¨ ubingen) Power laws Freiburg, January 19, 2011 92 / 112
The World Atlas of Language Structures 0 10 Feature 1: Locus of Marking: Pr(X ≥ x) Whole-language −1 10 Typology Feature 2: Number of Cases Kolmogorov-Smirnov test: positive −2 10 0 1 2 10 10 10 x Gerhard J¨ ager (UT¨ ubingen) Power laws Freiburg, January 19, 2011 93 / 112
The World Atlas of Language Structures 0 10 Feature 1: Prefixing vs. Suffixing in Inflectional Pr(X ≥ x) Morphology −1 10 Feature 2: Coding of Nominal Plurality Kolmogorov-Smirnov test: positive −2 10 0 1 2 3 10 10 10 10 x Gerhard J¨ ager (UT¨ ubingen) Power laws Freiburg, January 19, 2011 94 / 112
The World Atlas of Language Structures 0 10 Feature 1: Prefixing vs. Suffixing in Inflectional Pr(X ≥ x) Morphology −1 10 Feature 2: Ordinal Numerals Kolmogorov-Smirnov test: positive −2 10 0 1 2 10 10 10 x Gerhard J¨ ager (UT¨ ubingen) Power laws Freiburg, January 19, 2011 95 / 112
The World Atlas of Language Structures 0 10 Feature 1: Coding of Pr(X ≥ x) Nominal Plurality −1 10 Feature 2: Asymmetrical Case-Marking Kolmogorov-Smirnov test: positive −2 10 0 1 2 10 10 10 x Gerhard J¨ ager (UT¨ ubingen) Power laws Freiburg, January 19, 2011 96 / 112
The World Atlas of Language Structures 0 10 Feature 1: Position of Pr(X ≥ x) Case Affixes −1 10 Feature 2: Ordinal Numerals Kolmogorov-Smirnov test: negative −2 10 0 1 2 10 10 10 x Gerhard J¨ ager (UT¨ ubingen) Power laws Freiburg, January 19, 2011 97 / 112
Why power laws? Critical states Power laws are characteristic of critical states only small ice crystals in water above freezing point one big ice crystal in water below freezing point during transition from liquid to solid state: ice crystals of many sizes power-law distributed similar effect for all kinds of phase transitions in physics power laws are considered finger print of criticality Gerhard J¨ ager (UT¨ ubingen) Power laws Freiburg, January 19, 2011 98 / 112
Why power laws? Self-organized criticality some systems tend to return into a critical state due to their internal dynamics (see Bak et al. 1987) well-studied effect in computer simulations of cellular automata candidates for real-life examples are earth quakes forest fires breakdowns of electricity networks landscape formation avalanches ... Gerhard J¨ ager (UT¨ ubingen) Power laws Freiburg, January 19, 2011 99 / 112
Why power laws? this may turn neighboring cells The sandpile model into the critical state, leading to cellular automaton; loosely further shifts inspired by real sand piles see the computer simulation each cell has a certain value, its slope single grains are added at random, increasing the slope if the slope of a cell exceeds a critical value: its slope is reduced by r the slope of the four neighboring cells is increased by 1 Gerhard J¨ ager (UT¨ ubingen) Power laws Freiburg, January 19, 2011 100 / 112
Recommend
More recommend