the semantics of color terms a quantitative cross
play

The semantics of color terms. A quantitative cross-linguistic - PowerPoint PPT Presentation

The semantics of color terms. A quantitative cross-linguistic investigation Gerhard J ager gerhard.jaeger@uni-tuebingen.de May 20, 2010 University of Leipzig 1/124 The psychological color space physical color space has infinite


  1. Statistical feature extraction A0 B0 B1 B2 · · · I38 I39 I40 J0 first step: representation of red 0 0 0 0 · · · 0 0 2 0 raw data in contingency green 0 0 0 0 · · · 0 0 0 0 matrix blue 0 0 0 0 · · · 0 0 0 0 black 0 0 0 0 · · · 18 23 21 25 rows: color terms from white 25 25 22 23 · · · 0 0 0 0 . . . . . . . . . . various languages . . . . . . . . . . . . . . . . . . . . rot 0 0 0 0 · · · 1 0 0 0 columns: Munsell chips gr¨ un 0 0 0 0 · · · 0 0 0 0 cells: number of test gelb 0 0 0 1 · · · 0 0 0 0 . . . . . . . . . . . . . . . . . . . . persons who used the . . . . . . . . . . rouge 0 0 0 0 · · · 0 0 0 0 row-term for the vert 0 0 0 0 · · · 0 0 0 0 . . . . . . . . . . column-chip . . . . . . . . . . . . . . . . . . . . further processing: divide each row by the number n of test persons using the corresponding term duplicate each row n times 42/124

  2. Principal Component Analysis technique to reduce dimensionality of data input: set of vectors in an n -dimensional space first step: second step: rotate the coordinate system, such that choose a suitable m < n the new n coordinates are project the data on those m orthogonal to each other new coordinates where the the variations of the data data have the highest along the new coordinates variance are stochastically independent 43/124

  3. Principal Component Analysis alternative formulation: choose an m -dimensional linear sub-manifold of your n -dimensional space project your data onto this manifold when doing so, pick your sub-manifold such that the average squared distance of the data points from the sub-manifold is minimized intuition behind this formulation: data are “actually” generated in an m -dimensional space observations are disturbed by n -dimensional noise PCA is a way to reconstruct the underlying data distribution applications: picture recognition, latent semantic analysis, statistical data analysis in general, data visualization, ... 44/124

  4. Statistical feature extraction: PCA 0.30 0.25 first 15 principal components jointly proportion of variance explained 0.20 explain 91 . 6% of the total variance 0.15 choice of m = 15 is 0.10 determined by using “Kaiser’s stopping 0.05 rule” 0.00 principal components 45/124

  5. Statistical feature extraction: PCA after some post-processing (“varimax” algorithm): A 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 B C D E F G H I J 46/124

  6. Projecting observed data on lower-dimensional-manifold noise removal: project observed data onto the lower-dimensional submanifold that was obtained via PCA in our case: noisy binary categories are mapped to smoothed fuzzy categories (= probability distributions over Munsell chips) some examples: 47/124

  7. Projecting observed data on lower-dimensional-manifold 48/124

  8. Projecting observed data on lower-dimensional-manifold 49/124

  9. Projecting observed data on lower-dimensional-manifold 50/124

  10. Projecting observed data on lower-dimensional-manifold 51/124

  11. Projecting observed data on lower-dimensional-manifold 52/124

  12. Projecting observed data on lower-dimensional-manifold 53/124

  13. Projecting observed data on lower-dimensional-manifold 54/124

  14. Projecting observed data on lower-dimensional-manifold 55/124

  15. Projecting observed data on lower-dimensional-manifold 56/124

  16. Projecting observed data on lower-dimensional-manifold 57/124

  17. Projecting observed data on lower-dimensional-manifold 58/124

  18. Projecting observed data on lower-dimensional-manifold 59/124

  19. Projecting observed data on lower-dimensional-manifold 60/124

  20. Projecting observed data on lower-dimensional-manifold 61/124

  21. Projecting observed data on lower-dimensional-manifold 62/124

  22. Projecting observed data on lower-dimensional-manifold 63/124

  23. Projecting observed data on lower-dimensional-manifold 64/124

  24. Projecting observed data on lower-dimensional-manifold 65/124

  25. Projecting observed data on lower-dimensional-manifold 66/124

  26. Projecting observed data on lower-dimensional-manifold 67/124

  27. Smoothing the partitions from smoothed extensions we can recover smoothed partitions each pixel is assigned to category in which it has the highest degree of membership 68/124

  28. Smoothed partitions of the color space 69/124

  29. Smoothed partitions of the color space 70/124

  30. Smoothed partitions of the color space 71/124

  31. Smoothed partitions of the color space 72/124

  32. Smoothed partitions of the color space 73/124

  33. Smoothed partitions of the color space 74/124

  34. Smoothed partitions of the color space 75/124

  35. Smoothed partitions of the color space 76/124

  36. Smoothed partitions of the color space 77/124

  37. Smoothed partitions of the color space 78/124

  38. Convexity note: so far, we only used information from the WCS the location of the 330 Munsell chips in L*a*b* space played no role so far still, apparently partition cells always form continuous clusters in L*a*b* space Hypothesis (G¨ ardenfors): extension of color terms always form convex regions of L*a*b* space 79/124

  39. Support Vector Machines supervised learning technique smart algorithm to classify data in a high-dimensional space by a (for instance) linear boundary minimizes number of mis-classifications if the training data are not linearly separable SVM classification plot o o 3 o o o o o o o o o 2 red o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o 1 o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o x o o 0 o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o −1 o o o o o o o o o o o o o o o o o o o o o o o green o o o o o o o o o o o o −2 o o o o o o o o o o o −3 o o o −3 −2 −1 0 1 2 3 80/124 y

  40. Convex partitions a binary linear classifier divides an n -dimensional space into two convex half-spaces intersection of two convex set is itself convex hence: intersection of k binary classifications leads to convex sets procedure: if a language partitions the Munsell space into m categories, train m ( m − 1) many binary SVMs, one for each pair 2 of categories in L*a*b* space leads to m convex sets (which need not split the L*a*b* space exhaustively) 81/124

  41. Convex approximation 82/124

  42. Convex approximation 83/124

  43. Convex approximation 84/124

  44. Convex approximation 85/124

  45. Convex approximation 86/124

  46. Convex approximation 87/124

  47. Convex approximation 88/124

  48. Convex approximation 89/124

  49. Convex approximation 90/124

  50. Convex approximation 91/124

  51. Convex approximation on average, 93 . 7% of all Munsell chips are correctly classified by convex approximation proportion of correctly classified Munsell chips 0.95 0.90 ● ● ● ● ● ● ● 0.85 ● ● ● 0.80 92/124

  52. Convex approximation compare to the outcome of the same procedure without PCA, and with PCA but using a random permutation of the Munsell chips 100 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 80 ● ● ● ● ● ● ● ● ● ● ● ● degree of convexity (%) ● ● 60 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 40 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 20 ● 1 2 3 93/124

  53. Convex approximation choice of m = 10 is somewhat arbitrary outcome does not depend very much on this choice though 100 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 90 mean degree of convexity (%) 80 70 60 50 0 10 20 30 40 50 no. of principal components used 94/124

  54. Implicative universals first six features correspond nicely to the six primary colors white, black, red, green, blue, yellow according to Kay et al. (1997) (and many other authors) simple system of implicative universals regarding possible partitions of the primary colors 95/124

  55. Implicative universals I II III IV V  white   white  red red / yellow       yellow     green / blue     green / blue   black black  white   white  red � white / red / yellow     white �   red / yellow yellow     red / yellow     black / green / blue   green green black / green / blue       black / blue blue   black   white  white  red red      yellow      yellow     green   black / green / blue black / blue  white   white  red   red     yellow / green     yellow / green / blue    blue    black black  white  red     yellow / green   black / blue source: Kay et al. (1997) 96/124

  56. Partition of the primary colors each speaker/term pair can be projected to a 15-dimensional vector primary colors correspond to first 6 entries each primary color is assigned to the term for which it has the highest value defines for each speaker a partition over the primary colors 97/124

  57. Partition of the primary colors for instance: sample speaker (from Piraha): extracted partition:  white / yellow  A 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 B C D red E   F G   H green / blue I J   black supposedly impossible, but occurs 61 times in the database 98/124

  58. Partition of primary colors most frequent partition types: 1 { white } , { red } , { yellow } , { green, blue } , { black } ( 41 . 9% ) 2 { white } , { red } , { yellow } , { green } , { blue } , { black } ( 25 . 2% ) 3 { white } , { red, yellow } , { green, blue, black } ( 6 . 3% ) 4 { white } , { red } , { yellow } , { green } , { black, blue } ( 4 . 2% ) 5 { white, yellow } , { red } , { green, blue } , { black } ( 3 . 4% ) 6 { white } , { red } , { yellow } , { green, blue, black } ( 3 . 2% ) 7 { white } , { red, yellow } , { green, blue } , { black } ( 2 . 6% ) 8 { white, yellow } , { red } , { green, blue, black } ( 2 . 0% ) 9 { white } , { red } , { yellow } , { green, blue, black } ( 1 . 6% ) 10 { white } , { red } , { green, yellow } , { blue, black } ( 1 . 2% ) 99/124

  59. Partition of primay colors 87 . 1% of all speaker partitions obey Kay et al.’s universals the ten partitions that confirm to the universals occupy ranks 1, 2, 3, 4, 6, 7, 9, 10, 16, 18 decision what counts as an exception seems somewhat arbitrary on the basis of these counts 100/124

Recommend


More recommend