counting words
play

Counting Words: Type probabilities Population models Type-rich - PowerPoint PPT Presentation

Populations & samples Baroni & Evert The population Counting Words: Type probabilities Population models Type-rich populations, samples, ZM & fZM Sampling from and statistical models the population Random samples Expectation


  1. . . . and its solution Populations & ➥ We need a model for the population samples Baroni & Evert ◮ This model embodies our hypothesis that the distribution of type probabilities has a certain general shape The population Type probabilities (more precisely, we speak of a family of models) Population models ZM & fZM Sampling from the population Random samples Expectation Mini-example Parameter estimation Trial & error Automatic estimation A practical example

  2. . . . and its solution Populations & ➥ We need a model for the population samples Baroni & Evert ◮ This model embodies our hypothesis that the distribution of type probabilities has a certain general shape The population Type probabilities (more precisely, we speak of a family of models) Population models ZM & fZM ◮ The exact form of the distribution is then determined by Sampling from the population a small number of parameters (typically 2 or 3) Random samples Expectation Mini-example Parameter estimation Trial & error Automatic estimation A practical example

  3. . . . and its solution Populations & ➥ We need a model for the population samples Baroni & Evert ◮ This model embodies our hypothesis that the distribution of type probabilities has a certain general shape The population Type probabilities (more precisely, we speak of a family of models) Population models ZM & fZM ◮ The exact form of the distribution is then determined by Sampling from the population a small number of parameters (typically 2 or 3) Random samples Expectation ◮ These parameters can be estimated with relative ease Mini-example Parameter estimation Trial & error Automatic estimation A practical example

  4. Examples of population models Populations & 0.10 0.10 samples ● 0.08 0.08 Baroni & Evert ●●●● ● ● ● 0.06 ● 0.06 ● The population ● ● ● Type probabilities π k ● π k ● ● Population models 0.04 0.04 ● ● ● ● ZM & fZM ● ● ● ● ● ● Sampling from 0.02 0.02 ● ● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ● the population ● ● ● ●●●●●●●●●●●●●●●●●●●●● Random samples 0.00 0.00 Expectation 0 10 20 30 40 50 0 10 20 30 40 50 Mini-example k k Parameter estimation 0.10 0.10 Trial & error ● Automatic 0.08 0.08 ● estimation ● A practical ● 0.06 0.06 ● example ● ● ● π k π k ● ● 0.04 ● 0.04 ● ● ● ● ● ● ● ● ● ● ● ● 0.02 0.02 ● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● 0.00 0.00 0 10 20 30 40 50 0 10 20 30 40 50 k k

  5. The Zipf-Mandelbrot law as a population model Populations & What is the right family of models for lexical frequency samples distributions? Baroni & Evert ◮ We have already seen that the Zipf-Mandelbrot law The population Type probabilities captures the distribution of observed frequencies very Population models ZM & fZM well, across many phenomena and data sets Sampling from the population Random samples Expectation Mini-example Parameter estimation Trial & error Automatic estimation A practical example

  6. The Zipf-Mandelbrot law as a population model Populations & What is the right family of models for lexical frequency samples distributions? Baroni & Evert ◮ We have already seen that the Zipf-Mandelbrot law The population Type probabilities captures the distribution of observed frequencies very Population models ZM & fZM well, across many phenomena and data sets Sampling from the population ◮ Re-phrase the law for type probabilities instead of Random samples Expectation frequencies: Mini-example C Parameter π k := estimation ( k + b ) a Trial & error Automatic estimation A practical example

  7. The Zipf-Mandelbrot law as a population model Populations & What is the right family of models for lexical frequency samples distributions? Baroni & Evert ◮ We have already seen that the Zipf-Mandelbrot law The population Type probabilities captures the distribution of observed frequencies very Population models ZM & fZM well, across many phenomena and data sets Sampling from the population ◮ Re-phrase the law for type probabilities instead of Random samples Expectation frequencies: Mini-example C Parameter π k := estimation ( k + b ) a Trial & error Automatic estimation A practical ◮ Two free parameters: a > 1 and b ≥ 0 example ◮ C is not a parameter but a normalization constant, needed to ensure that � k π k = 1

  8. The Zipf-Mandelbrot law as a population model Populations & What is the right family of models for lexical frequency samples distributions? Baroni & Evert ◮ We have already seen that the Zipf-Mandelbrot law The population Type probabilities captures the distribution of observed frequencies very Population models ZM & fZM well, across many phenomena and data sets Sampling from the population ◮ Re-phrase the law for type probabilities instead of Random samples Expectation frequencies: Mini-example C Parameter π k := estimation ( k + b ) a Trial & error Automatic estimation A practical ◮ Two free parameters: a > 1 and b ≥ 0 example ◮ C is not a parameter but a normalization constant, needed to ensure that � k π k = 1 ➥ the Zipf-Mandelbrot population model

  9. The parameters of the Zipf-Mandelbrot model 0.10 0.10 Populations & ● samples a = 1.2 ● a = 2 0.08 0.08 b = 1.5 b = 10 Baroni & Evert ● 0.06 0.06 ● ● The population ● π k π k ● Type probabilities 0.04 0.04 ● ● Population models ● ● ● ZM & fZM ● ● ● 0.02 0.02 ● ● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● Sampling from ● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● the population 0.00 0.00 Random samples 0 10 20 30 40 50 0 10 20 30 40 50 Expectation Mini-example k k Parameter 0.10 0.10 estimation ● Trial & error a = 2 a = 5 0.08 0.08 ● Automatic b = 15 b = 40 estimation ● ● A practical 0.06 0.06 ● ● example ● ● π k π k ● ● 0.04 ● 0.04 ● ● ● ● ● ● ● ● ● ● ● ● 0.02 0.02 ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● 0.00 0.00 0 10 20 30 40 50 0 10 20 30 40 50 k k

  10. The parameters of the Zipf-Mandelbrot model ● Populations & ● ● 5e−02 5e−02 ● ● ● samples a = 1.2 a = 2 ● ● ● ●●●●● ● b = 1.5 b = 10 ● ● ● ● Baroni & Evert ● ● ● ●●●● ● ● ● ● ● ● ● ● ● ● ● ● ● 5e−03 ● 5e−03 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● The population ● ● ● ● ● ● ● ● ● ● ● ● π k ● ● π k ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Type probabilities ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Population models ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ZM & fZM 5e−04 ● ● ● ● ● ● ● 5e−04 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Sampling from ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● the population ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 1e−04 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 1e−04 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Random samples ● ● ● ● ● ● ● ● ● ● ● 1 2 5 10 20 50 100 1 2 5 10 20 50 100 Expectation Mini-example k k Parameter estimation ● ● 5e−02 5e−02 ● ● ● ● ● ●●●●● ● Trial & error ● a = 2 a = 5 ● ● ● ●●●●● Automatic b = 15 ● b = 40 ● ● ● ● estimation ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● A practical ● ● 5e−03 ● ● 5e−03 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● example ● ● ● ● ● ● ● ● ● ● ● ● ● ● π k ● ● ● π k ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 5e−04 ● ● ● ● ● ● 5e−04 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 1e−04 ● ● ● ● ● ● ● ● ● ● ● ● ● 1e−04 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 1 2 5 10 20 50 100 1 2 5 10 20 50 100 k k

  11. The finite Zipf-Mandelbrot model Populations & ◮ Zipf-Mandelbrot population model characterizes an samples infinite type population: there is no upper bound on k , Baroni & Evert and the type probabilities π k can become arbitrarily small The population Type probabilities Population models ZM & fZM Sampling from the population Random samples Expectation Mini-example Parameter estimation Trial & error Automatic estimation A practical example

  12. The finite Zipf-Mandelbrot model Populations & ◮ Zipf-Mandelbrot population model characterizes an samples infinite type population: there is no upper bound on k , Baroni & Evert and the type probabilities π k can become arbitrarily small The population ◮ π = 10 − 6 (once every million words), π = 10 − 9 (once Type probabilities Population models every billion words), π = 10 − 12 (once on the entire ZM & fZM Sampling from Internet), π = 10 − 100 (once in the universe?) the population Random samples Expectation Mini-example Parameter estimation Trial & error Automatic estimation A practical example

  13. The finite Zipf-Mandelbrot model Populations & ◮ Zipf-Mandelbrot population model characterizes an samples infinite type population: there is no upper bound on k , Baroni & Evert and the type probabilities π k can become arbitrarily small The population ◮ π = 10 − 6 (once every million words), π = 10 − 9 (once Type probabilities Population models every billion words), π = 10 − 12 (once on the entire ZM & fZM Sampling from Internet), π = 10 − 100 (once in the universe?) the population Random samples ◮ Alternative: finite (but often very large) number Expectation Mini-example of types in the population Parameter estimation ◮ We call this the population vocabulary size S Trial & error Automatic (and write S = ∞ for an infinite type population) estimation A practical example

  14. The finite Zipf-Mandelbrot model Populations & ◮ The finite Zipf-Mandelbrot model simply stops after samples the first S types ( w 1 , . . . , w S ) Baroni & Evert The population Type probabilities Population models ZM & fZM Sampling from the population Random samples Expectation Mini-example Parameter estimation Trial & error Automatic estimation A practical example

  15. The finite Zipf-Mandelbrot model Populations & ◮ The finite Zipf-Mandelbrot model simply stops after samples the first S types ( w 1 , . . . , w S ) Baroni & Evert ◮ S becomes a new parameter of the model The population Type probabilities ➜ the finite Zipf-Mandelbrot model has 3 parameters Population models ZM & fZM ◮ NB: C will not have the same value as for the Sampling from the population corresponding infinite ZM model Random samples Expectation Mini-example Parameter estimation Trial & error Automatic estimation A practical example

  16. The finite Zipf-Mandelbrot model Populations & ◮ The finite Zipf-Mandelbrot model simply stops after samples the first S types ( w 1 , . . . , w S ) Baroni & Evert ◮ S becomes a new parameter of the model The population Type probabilities ➜ the finite Zipf-Mandelbrot model has 3 parameters Population models ZM & fZM ◮ NB: C will not have the same value as for the Sampling from the population corresponding infinite ZM model Random samples Expectation Abbreviations: ZM for Zipf-Mandelbrot model, Mini-example Parameter and fZM for finite Zipf-Mandelbrot model estimation Trial & error Automatic estimation A practical example

  17. The next steps Populations & Once we have a population model . . . samples Baroni & Evert ◮ We still need to estimate the values of its parameters ◮ we’ll see later how we can do this The population Type probabilities Population models ZM & fZM Sampling from the population Random samples Expectation Mini-example Parameter estimation Trial & error Automatic estimation A practical example

  18. The next steps Populations & Once we have a population model . . . samples Baroni & Evert ◮ We still need to estimate the values of its parameters ◮ we’ll see later how we can do this The population Type probabilities ◮ We want to simulate random samples from the Population models ZM & fZM population described by the model Sampling from ◮ basic assumption: real data sets (such as corpora) are the population Random samples random samples from this population Expectation Mini-example Parameter estimation Trial & error Automatic estimation A practical example

  19. The next steps Populations & Once we have a population model . . . samples Baroni & Evert ◮ We still need to estimate the values of its parameters ◮ we’ll see later how we can do this The population Type probabilities ◮ We want to simulate random samples from the Population models ZM & fZM population described by the model Sampling from ◮ basic assumption: real data sets (such as corpora) are the population Random samples random samples from this population Expectation Mini-example ◮ this allows us to predict vocabulary growth, the number Parameter of previously unseen types as more text is added to a estimation Trial & error corpus, the frequency spectrum of a larger data set, etc. Automatic estimation A practical example

  20. The next steps Populations & Once we have a population model . . . samples Baroni & Evert ◮ We still need to estimate the values of its parameters ◮ we’ll see later how we can do this The population Type probabilities ◮ We want to simulate random samples from the Population models ZM & fZM population described by the model Sampling from ◮ basic assumption: real data sets (such as corpora) are the population Random samples random samples from this population Expectation Mini-example ◮ this allows us to predict vocabulary growth, the number Parameter of previously unseen types as more text is added to a estimation Trial & error corpus, the frequency spectrum of a larger data set, etc. Automatic ◮ it will also allow us to estimate the model parameters estimation A practical example

  21. Outline Populations & samples The type population Baroni & Evert The population Sampling from the population Type probabilities Population models ZM & fZM Sampling from Parameter estimation the population Random samples Expectation Mini-example A practical example Parameter estimation Trial & error Automatic estimation A practical example

  22. Sampling from a population model Populations & Assume we believe that the population we are interested in samples can be described by a Zipf-Mandelbrot model: Baroni & Evert The population 0.05 Type probabilities 5e−02 Population models a = 3 a = 3 ● ● ● 0.04 ● ● ● ●●●●● ZM & fZM b = 50 b = 50 ● ● ● ● ● ● ● ● ● ● ● ● Sampling from ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.03 ● ● the population ● ● ● ● 5e−03 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Random samples ● ● ● ● π k π k ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Expectation ● ● ● 0.02 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ● ● ● ● Mini-example ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 5e−04 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Parameter ● ● ● ● ● ● 0.01 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● estimation ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Trial & error ● ● ● ● ● ● ● ● 1e−04 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.00 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Automatic ● ● ● ● ● ● ● ● ● ● estimation 0 10 20 30 40 50 1 2 5 10 20 50 100 A practical k k example

  23. Sampling from a population model Populations & Assume we believe that the population we are interested in samples can be described by a Zipf-Mandelbrot model: Baroni & Evert The population 0.05 Type probabilities 5e−02 Population models a = 3 a = 3 ● ● ● 0.04 ● ● ● ●●●●● ZM & fZM b = 50 b = 50 ● ● ● ● ● ● ● ● ● ● ● ● Sampling from ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.03 ● ● the population ● ● ● ● 5e−03 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Random samples ● ● ● ● π k π k ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Expectation ● ● ● 0.02 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ● ● ● ● Mini-example ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 5e−04 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Parameter ● ● ● ● ● ● 0.01 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● estimation ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Trial & error ● ● ● ● ● ● ● ● 1e−04 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.00 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Automatic ● ● ● ● ● ● ● ● ● ● estimation 0 10 20 30 40 50 1 2 5 10 20 50 100 A practical k k example Use computer simulation to sample from this model: ◮ Draw N tokens from the population such that in each step, type w k has probability π k to be picked

  24. Sampling from a population model Populations & samples #1: 1 42 34 23 108 18 48 18 1 . . . Baroni & Evert The population Type probabilities Population models ZM & fZM Sampling from the population Random samples Expectation Mini-example Parameter estimation Trial & error Automatic estimation A practical example

  25. Sampling from a population model Populations & samples #1: 1 42 34 23 108 18 48 18 1 . . . Baroni & Evert time order room school town course area course time . . . The population Type probabilities Population models ZM & fZM Sampling from the population Random samples Expectation Mini-example Parameter estimation Trial & error Automatic estimation A practical example

  26. Sampling from a population model Populations & samples #1: 1 42 34 23 108 18 48 18 1 . . . Baroni & Evert time order room school town course area course time . . . The population #2: 286 28 23 36 3 4 7 4 8 . . . Type probabilities Population models ZM & fZM Sampling from the population Random samples Expectation Mini-example Parameter estimation Trial & error Automatic estimation A practical example

  27. Sampling from a population model Populations & samples #1: 1 42 34 23 108 18 48 18 1 . . . Baroni & Evert time order room school town course area course time . . . The population #2: 286 28 23 36 3 4 7 4 8 . . . Type probabilities Population models ZM & fZM #3: 2 11 105 21 11 17 17 1 16 . . . Sampling from the population Random samples Expectation Mini-example Parameter estimation Trial & error Automatic estimation A practical example

  28. Sampling from a population model Populations & samples #1: 1 42 34 23 108 18 48 18 1 . . . Baroni & Evert time order room school town course area course time . . . The population #2: 286 28 23 36 3 4 7 4 8 . . . Type probabilities Population models ZM & fZM #3: 2 11 105 21 11 17 17 1 16 . . . Sampling from the population Random samples #4: 44 3 110 34 223 2 25 20 28 . . . Expectation Mini-example Parameter estimation Trial & error Automatic estimation A practical example

  29. Sampling from a population model Populations & samples #1: 1 42 34 23 108 18 48 18 1 . . . Baroni & Evert time order room school town course area course time . . . The population #2: 286 28 23 36 3 4 7 4 8 . . . Type probabilities Population models ZM & fZM #3: 2 11 105 21 11 17 17 1 16 . . . Sampling from the population Random samples #4: 44 3 110 34 223 2 25 20 28 . . . Expectation Mini-example Parameter #5: 24 81 54 11 8 61 1 31 35 . . . estimation Trial & error Automatic #6: 3 65 9 165 5 42 16 20 7 . . . estimation A practical #7: 10 21 11 60 164 54 18 16 203 . . . example #8: 11 7 147 5 24 19 15 85 37 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

  30. Sampling from a population model Populations & In this way, we can . . . samples ◮ draw samples of arbitrary size N Baroni & Evert ◮ the computer can do it efficiently even for large N The population Type probabilities Population models ZM & fZM Sampling from the population Random samples Expectation Mini-example Parameter estimation Trial & error Automatic estimation A practical example

  31. Sampling from a population model Populations & In this way, we can . . . samples ◮ draw samples of arbitrary size N Baroni & Evert ◮ the computer can do it efficiently even for large N The population Type probabilities ◮ draw as many samples as we need Population models ZM & fZM Sampling from the population Random samples Expectation Mini-example Parameter estimation Trial & error Automatic estimation A practical example

  32. Sampling from a population model Populations & In this way, we can . . . samples ◮ draw samples of arbitrary size N Baroni & Evert ◮ the computer can do it efficiently even for large N The population Type probabilities ◮ draw as many samples as we need Population models ZM & fZM ◮ compute type frequency lists, frequency spectra and Sampling from the population vocabulary growth curves from these samples Random samples ◮ i.e., we can analyze them with the same methods that we Expectation Mini-example have applied to the observed data sets Parameter estimation Trial & error Automatic estimation A practical example

  33. Sampling from a population model Populations & In this way, we can . . . samples ◮ draw samples of arbitrary size N Baroni & Evert ◮ the computer can do it efficiently even for large N The population Type probabilities ◮ draw as many samples as we need Population models ZM & fZM ◮ compute type frequency lists, frequency spectra and Sampling from the population vocabulary growth curves from these samples Random samples ◮ i.e., we can analyze them with the same methods that we Expectation Mini-example have applied to the observed data sets Parameter estimation Trial & error Here are some results for samples of size N = 1000 . . . Automatic estimation A practical example

  34. Samples: type frequency list & spectrum Populations & samples rank r type k f r m V m Baroni & Evert 1 37 6 1 83 The population 2 36 1 2 22 Type probabilities Population models 3 33 3 3 20 ZM & fZM Sampling from 4 31 7 4 12 the population 5 31 10 5 10 Random samples Expectation 6 30 5 6 5 Mini-example Parameter 7 28 12 7 5 estimation Trial & error 8 27 2 8 3 Automatic estimation 9 24 4 9 3 A practical 10 24 16 10 3 example . . 11 23 8 . . . . 12 22 14 . . . . . . . . . sample #1

  35. Samples: type frequency list & spectrum Populations & samples rank r type k f r m V m Baroni & Evert 1 39 2 1 76 The population 2 34 3 2 27 Type probabilities Population models 3 30 5 3 17 ZM & fZM Sampling from 4 29 10 4 10 the population 5 28 8 5 6 Random samples Expectation 6 26 1 6 5 Mini-example Parameter 7 25 13 7 7 estimation Trial & error 8 24 7 8 3 Automatic estimation 9 23 6 10 4 A practical 10 23 11 11 2 example . . 11 20 4 . . . . 12 19 17 . . . . . . . . . sample #2

  36. Random variation in type-frequency lists Populations & Sample #1 Sample #2 40 40 ● samples ● ● ● ● Baroni & Evert ●● 30 30 ● ● ● ● ● ● ● ● The population ●● ● ● ●● ● ● Type probabilities 20 20 f r f r ● ● ●● r ↔ f r ● ●● Population models ●●● ●●●●● ● ● ●●● ZM & fZM ● ● ●● ● ●●●● ●●● ● ●● 10 10 ●●● ●●●● Sampling from ●●● ●●● ●●● ●●●●● ●●●●●●● the population ●●●●● ●●●●● ●●●● Random samples 0 0 Expectation 0 10 20 30 40 50 0 10 20 30 40 50 Mini-example r r Parameter estimation Trial & error Automatic estimation A practical example

  37. Random variation in type-frequency lists Populations & Sample #1 Sample #2 40 40 ● samples ● ● ● ● Baroni & Evert ●● 30 30 ● ● ● ● ● ● ● ● The population ●● ● ● ●● ● ● Type probabilities 20 20 f r f r ● ● ●● r ↔ f r ● ●● Population models ●●● ●●●●● ● ● ●●● ZM & fZM ● ● ●● ● ●●●● ●●● ● ●● 10 10 ●●● ●●●● Sampling from ●●● ●●● ●●● ●●●●● ●●●●●●● the population ●●●●● ●●●●● ●●●● Random samples 0 0 Expectation 0 10 20 30 40 50 0 10 20 30 40 50 Mini-example r r Parameter estimation Sample #1 Sample #2 40 40 ● Trial & error ● ● Automatic ● ● estimation ● ● 30 30 ● ● ● ● ● A practical ● ● ● example ● ● ● ● ● ● ● ● 20 20 f k f k ● k ↔ f k ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● 10 10 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● 0 0 0 10 20 30 40 50 0 10 20 30 40 50 k k

  38. Random variation in type-frequency lists Populations & ◮ Random variation leads to different type frequencies f k samples in every new sample Baroni & Evert ◮ particularly obvious when we plot them in population The population order (bottom row, k ↔ f k ) Type probabilities Population models ZM & fZM Sampling from the population Random samples Expectation Mini-example Parameter estimation Trial & error Automatic estimation A practical example

  39. Random variation in type-frequency lists Populations & ◮ Random variation leads to different type frequencies f k samples in every new sample Baroni & Evert ◮ particularly obvious when we plot them in population The population order (bottom row, k ↔ f k ) Type probabilities Population models ◮ Different ordering of types in the Zipf ranking ZM & fZM Sampling from for every new sample the population ◮ Zipf rank r in sample � = population rank k ! Random samples Expectation ◮ leads to severe problems with statistical methods Mini-example Parameter estimation Trial & error Automatic estimation A practical example

  40. Random variation in type-frequency lists Populations & ◮ Random variation leads to different type frequencies f k samples in every new sample Baroni & Evert ◮ particularly obvious when we plot them in population The population order (bottom row, k ↔ f k ) Type probabilities Population models ◮ Different ordering of types in the Zipf ranking ZM & fZM Sampling from for every new sample the population ◮ Zipf rank r in sample � = population rank k ! Random samples Expectation ◮ leads to severe problems with statistical methods Mini-example Parameter ◮ Individual types are irrelevant for our purposes, so let us estimation take a perspective that abstracts away from them Trial & error Automatic estimation ◮ frequency spectrum A practical ◮ vocabulary growth curve example

  41. Random variation in type-frequency lists Populations & ◮ Random variation leads to different type frequencies f k samples in every new sample Baroni & Evert ◮ particularly obvious when we plot them in population The population order (bottom row, k ↔ f k ) Type probabilities Population models ◮ Different ordering of types in the Zipf ranking ZM & fZM Sampling from for every new sample the population ◮ Zipf rank r in sample � = population rank k ! Random samples Expectation ◮ leads to severe problems with statistical methods Mini-example Parameter ◮ Individual types are irrelevant for our purposes, so let us estimation take a perspective that abstracts away from them Trial & error Automatic estimation ◮ frequency spectrum A practical ◮ vocabulary growth curve example ➥ considerable amount of random variation still visible

  42. Random variation: frequency spectrum 100 Sample #1 100 Sample #2 Populations & samples 80 80 Baroni & Evert 60 60 The population V m V m Type probabilities 40 40 Population models ZM & fZM 20 20 Sampling from the population 0 0 Random samples Expectation Mini-example m m Parameter Sample #3 Sample #4 100 100 estimation Trial & error Automatic 80 80 estimation A practical 60 60 example V m V m 40 40 20 20 0 0 m m

  43. Random variation: vocabulary growth curve 200 Sample #1 200 Sample #2 Populations & samples 150 150 Baroni & Evert V ( N ) V 1 ( N ) V ( N ) V 1 ( N ) The population 100 100 Type probabilities Population models ZM & fZM 50 50 Sampling from the population 0 0 Random samples 0 200 400 600 800 1000 0 200 400 600 800 1000 Expectation Mini-example N N Parameter Sample #3 Sample #4 200 200 estimation Trial & error Automatic 150 150 estimation A practical V ( N ) V 1 ( N ) V ( N ) V 1 ( N ) example 100 100 50 50 0 0 0 200 400 600 800 1000 0 200 400 600 800 1000 N N

  44. Expected values Populations & ◮ There is no reason why we should choose a particular samples sample to make a prediction for the real data – each one Baroni & Evert is equally likely or unlikely The population Type probabilities Population models ZM & fZM Sampling from the population Random samples Expectation Mini-example Parameter estimation Trial & error Automatic estimation A practical example

  45. Expected values Populations & ◮ There is no reason why we should choose a particular samples sample to make a prediction for the real data – each one Baroni & Evert is equally likely or unlikely The population Type probabilities ➥ Take the average over a large number of samples Population models ZM & fZM Sampling from the population Random samples Expectation Mini-example Parameter estimation Trial & error Automatic estimation A practical example

  46. Expected values Populations & ◮ There is no reason why we should choose a particular samples sample to make a prediction for the real data – each one Baroni & Evert is equally likely or unlikely The population Type probabilities ➥ Take the average over a large number of samples Population models ZM & fZM ◮ Such averages are called expected values or Sampling from the population expectations in statistics (frequentist approach) Random samples Expectation Mini-example Parameter estimation Trial & error Automatic estimation A practical example

  47. Expected values Populations & ◮ There is no reason why we should choose a particular samples sample to make a prediction for the real data – each one Baroni & Evert is equally likely or unlikely The population Type probabilities ➥ Take the average over a large number of samples Population models ZM & fZM ◮ Such averages are called expected values or Sampling from the population expectations in statistics (frequentist approach) Random samples ◮ Notation: E [ V ( N )] and E [ V m ( N )] Expectation Mini-example ◮ indicates that we are referring to expected values for a Parameter estimation sample of size N Trial & error ◮ rather than to the specific values V and V m Automatic estimation observed in a particular sample or a real-world data set A practical example ◮ Usually we can omit the sample size: E [ V ] and E [ V m ]

  48. The expected frequency spectrum 100 Sample #1 100 Sample #2 Populations & V m V m samples E [ V m ] E [ V m ] 80 80 Baroni & Evert 60 60 V m E [ V m ] V m E [ V m ] The population Type probabilities 40 40 Population models ZM & fZM 20 20 Sampling from the population 0 0 Random samples Expectation Mini-example m m Parameter Sample #3 Sample #4 100 100 estimation V m V m Trial & error E [ V m ] E [ V m ] Automatic 80 80 estimation A practical 60 60 V m E [ V m ] V m E [ V m ] example 40 40 20 20 0 0 m m

  49. The expected vocabulary growth curve Populations & samples Baroni & Evert Sample #1 Sample #1 200 200 The population Type probabilities Population models 150 150 ZM & fZM Sampling from E [ V 1 ( N )] E [ V ( N )] the population 100 100 Random samples Expectation Mini-example Parameter 50 50 estimation V ( N ) V 1 ( N ) E [ V ( N )] E [ V 1 ( N )] Trial & error Automatic estimation 0 0 0 200 400 600 800 1000 0 200 400 600 800 1000 A practical example N N

  50. Great expectations made easy Populations & ◮ Fortunately, we don’t have to take many thousands of samples samples to calculate expectations: there is a (relatively Baroni & Evert simple) mathematical solution ( ➜ Wednesday) The population Type probabilities Population models ZM & fZM Sampling from the population Random samples Expectation Mini-example Parameter estimation Trial & error Automatic estimation A practical example

  51. Great expectations made easy Populations & ◮ Fortunately, we don’t have to take many thousands of samples samples to calculate expectations: there is a (relatively Baroni & Evert simple) mathematical solution ( ➜ Wednesday) The population ◮ This solution also allows us to estimate the amount of Type probabilities Population models random variation ➜ variance and confidence intervals ZM & fZM Sampling from ◮ example: expected VGCs with confidence intervals the population ◮ we won’t pursue variance any further in this course Random samples Expectation Mini-example Parameter estimation Trial & error Automatic estimation A practical example

  52. Confidence intervals for the expected VGC Populations & samples Baroni & Evert Sample #1 Sample #1 200 200 The population Type probabilities Population models 150 150 ZM & fZM Sampling from E [ V 1 ( N )] E [ V ( N )] the population 100 100 Random samples Expectation Mini-example Parameter 50 50 estimation V ( N ) V 1 ( N ) E [ V ( N )] E [ V 1 ( N )] Trial & error Automatic estimation 0 0 0 200 400 600 800 1000 0 200 400 600 800 1000 A practical example N N

  53. A mini-example Populations & ◮ G. K. Zipf claimed that the distribution of English word samples frequencies follows Zipf’s law with a ≈ 1 Baroni & Evert ◮ a ≈ 1 . 5 seems a more reasonable value when you The population look at larger text samples than Zipf did Type probabilities Population models ZM & fZM Sampling from the population Random samples Expectation Mini-example Parameter estimation Trial & error Automatic estimation A practical example

  54. A mini-example Populations & ◮ G. K. Zipf claimed that the distribution of English word samples frequencies follows Zipf’s law with a ≈ 1 Baroni & Evert ◮ a ≈ 1 . 5 seems a more reasonable value when you The population look at larger text samples than Zipf did Type probabilities Population models ◮ The most frequent word in English is the with π ≈ . 06 ZM & fZM Sampling from the population Random samples Expectation Mini-example Parameter estimation Trial & error Automatic estimation A practical example

  55. A mini-example Populations & ◮ G. K. Zipf claimed that the distribution of English word samples frequencies follows Zipf’s law with a ≈ 1 Baroni & Evert ◮ a ≈ 1 . 5 seems a more reasonable value when you The population look at larger text samples than Zipf did Type probabilities Population models ◮ The most frequent word in English is the with π ≈ . 06 ZM & fZM Sampling from ◮ Zipf-Mandelbrot law with a = 1 . 5 and b = 7 . 5 yields a the population Random samples population model where π 1 ≈ . 06 (by trial & error) Expectation Mini-example Parameter estimation Trial & error Automatic estimation A practical example

  56. A mini-example Populations & ◮ How many different words do we expect to find in a samples 1-million word text? Baroni & Evert ◮ N = 1,000,000 ➜ E [ V ( N )] = 33026 . 7 The population ◮ 95%-confidence interval: V ( N ) = 32753 . 6 . . . 33299 . 7 Type probabilities Population models ZM & fZM Sampling from the population Random samples Expectation Mini-example Parameter estimation Trial & error Automatic estimation A practical example

  57. A mini-example Populations & ◮ How many different words do we expect to find in a samples 1-million word text? Baroni & Evert ◮ N = 1,000,000 ➜ E [ V ( N )] = 33026 . 7 The population ◮ 95%-confidence interval: V ( N ) = 32753 . 6 . . . 33299 . 7 Type probabilities Population models ◮ How many do we really find? ZM & fZM ◮ Brown corpus: 1 million words of edited American English Sampling from the population ◮ V = 45215 ➜ ZM model is not quite right Random samples Expectation Mini-example Parameter estimation Trial & error Automatic estimation A practical example

  58. A mini-example Populations & ◮ How many different words do we expect to find in a samples 1-million word text? Baroni & Evert ◮ N = 1,000,000 ➜ E [ V ( N )] = 33026 . 7 The population ◮ 95%-confidence interval: V ( N ) = 32753 . 6 . . . 33299 . 7 Type probabilities Population models ◮ How many do we really find? ZM & fZM ◮ Brown corpus: 1 million words of edited American English Sampling from the population ◮ V = 45215 ➜ ZM model is not quite right Random samples Expectation ◮ Physicists (and some mathematicians) are happy as long Mini-example as they get the order of magnitude right . . . Parameter estimation Trial & error Automatic estimation A practical example

  59. A mini-example Populations & ◮ How many different words do we expect to find in a samples 1-million word text? Baroni & Evert ◮ N = 1,000,000 ➜ E [ V ( N )] = 33026 . 7 The population ◮ 95%-confidence interval: V ( N ) = 32753 . 6 . . . 33299 . 7 Type probabilities Population models ◮ How many do we really find? ZM & fZM ◮ Brown corpus: 1 million words of edited American English Sampling from the population ◮ V = 45215 ➜ ZM model is not quite right Random samples Expectation ◮ Physicists (and some mathematicians) are happy as long Mini-example as they get the order of magnitude right . . . Parameter estimation ☞ Model was not based on actual data! Trial & error Automatic estimation A practical example

  60. Outline Populations & samples The type population Baroni & Evert The population Sampling from the population Type probabilities Population models ZM & fZM Sampling from Parameter estimation the population Random samples Expectation Mini-example A practical example Parameter estimation Trial & error Automatic estimation A practical example

  61. Estimating model parameters Populations & ◮ Parameter settings in the mini-example were based on samples general assumptions (claims from the literature) Baroni & Evert ◮ But we also have empirical data on the word frequency The population Type probabilities distribution of English available (the Brown corpus) Population models ZM & fZM Sampling from the population Random samples Expectation Mini-example Parameter estimation Trial & error Automatic estimation A practical example

  62. Estimating model parameters Populations & ◮ Parameter settings in the mini-example were based on samples general assumptions (claims from the literature) Baroni & Evert ◮ But we also have empirical data on the word frequency The population Type probabilities distribution of English available (the Brown corpus) Population models ZM & fZM ◮ Choose parameters so that population model matches Sampling from the population the empirical distribution as well as possible Random samples Expectation Mini-example Parameter estimation Trial & error Automatic estimation A practical example

  63. Estimating model parameters Populations & ◮ Parameter settings in the mini-example were based on samples general assumptions (claims from the literature) Baroni & Evert ◮ But we also have empirical data on the word frequency The population Type probabilities distribution of English available (the Brown corpus) Population models ZM & fZM ◮ Choose parameters so that population model matches Sampling from the population the empirical distribution as well as possible Random samples ◮ E.g. by trial and error . . . Expectation Mini-example ◮ guess parameters Parameter estimation ◮ compare model predictions for sample of size N 0 Trial & error Automatic with observed data ( N 0 tokens) estimation ◮ based on frequency spectrum or vocabulary growth curve A practical ◮ change parameters & repeat until satisfied example ◮ This process is called parameter estimation

  64. Parameter estimation by trial & error Populations & samples Baroni & Evert 25000 a = 1.5 , b = 7.5 50000 a = 1.5 , b = 7.5 The population observed ZM model Type probabilities 20000 40000 Population models ZM & fZM 15000 30000 V ( N ) E [ V ( N )] Sampling from V m E [ V m ] the population Random samples 10000 20000 Expectation Mini-example Parameter 10000 5000 estimation observed Trial & error ZM model Automatic estimation 0 0 0e+00 2e+05 4e+05 6e+05 8e+05 1e+06 A practical example m N

  65. Parameter estimation by trial & error Populations & samples Baroni & Evert 25000 a = 1.3 , b = 7.5 50000 a = 1.3 , b = 7.5 The population observed ZM model Type probabilities 20000 40000 Population models ZM & fZM 15000 30000 V ( N ) E [ V ( N )] Sampling from V m E [ V m ] the population Random samples 10000 20000 Expectation Mini-example Parameter 10000 5000 estimation observed Trial & error ZM model Automatic estimation 0 0 0e+00 2e+05 4e+05 6e+05 8e+05 1e+06 A practical example m N

  66. Parameter estimation by trial & error Populations & samples Baroni & Evert 25000 a = 1.3 , b = 0.2 50000 a = 1.3 , b = 0.2 The population observed ZM model Type probabilities 20000 40000 Population models ZM & fZM 15000 30000 V ( N ) E [ V ( N )] Sampling from V m E [ V m ] the population Random samples 10000 20000 Expectation Mini-example Parameter 10000 5000 estimation observed Trial & error ZM model Automatic estimation 0 0 0e+00 2e+05 4e+05 6e+05 8e+05 1e+06 A practical example m N

  67. Parameter estimation by trial & error Populations & samples Baroni & Evert 25000 a = 1.5 , b = 7.5 50000 a = 1.5 , b = 7.5 The population observed ZM model Type probabilities 20000 40000 Population models ZM & fZM 15000 30000 V ( N ) E [ V ( N )] Sampling from V m E [ V m ] the population Random samples 10000 20000 Expectation Mini-example Parameter 10000 5000 estimation observed Trial & error ZM model Automatic estimation 0 0 0e+00 2e+05 4e+05 6e+05 8e+05 1e+06 A practical example m N

  68. Parameter estimation by trial & error Populations & samples Baroni & Evert 25000 a = 1.7 , b = 7.5 50000 a = 1.7 , b = 7.5 The population observed ZM model Type probabilities 20000 40000 Population models ZM & fZM 15000 30000 V ( N ) E [ V ( N )] Sampling from V m E [ V m ] the population Random samples 10000 20000 Expectation Mini-example Parameter 10000 5000 estimation observed Trial & error ZM model Automatic estimation 0 0 0e+00 2e+05 4e+05 6e+05 8e+05 1e+06 A practical example m N

  69. Parameter estimation by trial & error Populations & samples Baroni & Evert 25000 a = 1.7 , b = 80 50000 a = 1.7 , b = 80 The population observed ZM model Type probabilities 20000 40000 Population models ZM & fZM 15000 30000 V ( N ) E [ V ( N )] Sampling from V m E [ V m ] the population Random samples 10000 20000 Expectation Mini-example Parameter 10000 5000 estimation observed Trial & error ZM model Automatic estimation 0 0 0e+00 2e+05 4e+05 6e+05 8e+05 1e+06 A practical example m N

  70. Parameter estimation by trial & error Populations & samples Baroni & Evert 25000 a = 2 , b = 550 50000 a = 2 , b = 550 The population observed ZM model Type probabilities 20000 40000 Population models ZM & fZM 15000 30000 V ( N ) E [ V ( N )] Sampling from V m E [ V m ] the population Random samples 10000 20000 Expectation Mini-example Parameter 10000 5000 estimation observed Trial & error ZM model Automatic estimation 0 0 0e+00 2e+05 4e+05 6e+05 8e+05 1e+06 A practical example m N

  71. Automatic parameter estimation Populations & ◮ Parameter estimation by trial & error is tedious samples ➜ let the computer to the work! Baroni & Evert The population Type probabilities Population models ZM & fZM Sampling from the population Random samples Expectation Mini-example Parameter estimation Trial & error Automatic estimation A practical example

  72. Automatic parameter estimation Populations & ◮ Parameter estimation by trial & error is tedious samples ➜ let the computer to the work! Baroni & Evert ◮ Need cost function to quantify “distance” between The population model expectations and observed data Type probabilities Population models ◮ based on vocabulary size and vocabulary spectrum ZM & fZM Sampling from (these are the most convenient criteria) the population Random samples Expectation Mini-example Parameter estimation Trial & error Automatic estimation A practical example

  73. Automatic parameter estimation Populations & ◮ Parameter estimation by trial & error is tedious samples ➜ let the computer to the work! Baroni & Evert ◮ Need cost function to quantify “distance” between The population model expectations and observed data Type probabilities Population models ◮ based on vocabulary size and vocabulary spectrum ZM & fZM Sampling from (these are the most convenient criteria) the population ◮ Computer estimates parameters by automatic Random samples Expectation minimization of cost function Mini-example Parameter ◮ clever algorithms exist that find out quickly in which estimation direction they have to “push” the parameters to Trial & error Automatic approach the minimum estimation ◮ implemented in standard software packages A practical example

  74. Cost functions for parameter estimation Populations & ◮ Cost functions compare expected frequency spectrum samples E [ V m ( N 0 )] with observed spectrum V m ( N 0 ) Baroni & Evert The population Type probabilities Population models ZM & fZM Sampling from the population Random samples Expectation Mini-example Parameter estimation Trial & error Automatic estimation A practical example

  75. Cost functions for parameter estimation Populations & ◮ Cost functions compare expected frequency spectrum samples E [ V m ( N 0 )] with observed spectrum V m ( N 0 ) Baroni & Evert ◮ Choice #1: how to weight differences The population Type probabilities M Population models � ◮ absolute values of differences � � � V m − E [ V m ] ZM & fZM � Sampling from m =1 the population M ◮ mean squared error 1 � 2 Random samples � � V m − E [ V m ] Expectation M Mini-example m =1 ◮ chi-squared criterion: scale by estimated variances Parameter estimation Trial & error Automatic estimation A practical example

  76. Cost functions for parameter estimation Populations & ◮ Cost functions compare expected frequency spectrum samples E [ V m ( N 0 )] with observed spectrum V m ( N 0 ) Baroni & Evert ◮ Choice #1: how to weight differences The population Type probabilities ◮ Choice #2: how many spectrum elements to use Population models ZM & fZM ◮ typically between M = 2 and M = 15 Sampling from ◮ what happens if M < number of parameters? the population Random samples Expectation Mini-example Parameter estimation Trial & error Automatic estimation A practical example

  77. Cost functions for parameter estimation Populations & ◮ Cost functions compare expected frequency spectrum samples E [ V m ( N 0 )] with observed spectrum V m ( N 0 ) Baroni & Evert ◮ Choice #1: how to weight differences The population Type probabilities ◮ Choice #2: how many spectrum elements to use Population models ZM & fZM ◮ typically between M = 2 and M = 15 Sampling from ◮ what happens if M < number of parameters? the population Random samples ◮ For many applications, it is important to match V Expectation Mini-example precisely: additional constraint E [ V ( N 0 )] = V ( N 0 ) Parameter ◮ general principle: you can match as many constraints estimation Trial & error as there are free parameters in the model Automatic estimation A practical example

  78. Cost functions for parameter estimation Populations & ◮ Cost functions compare expected frequency spectrum samples E [ V m ( N 0 )] with observed spectrum V m ( N 0 ) Baroni & Evert ◮ Choice #1: how to weight differences The population Type probabilities ◮ Choice #2: how many spectrum elements to use Population models ZM & fZM ◮ typically between M = 2 and M = 15 Sampling from ◮ what happens if M < number of parameters? the population Random samples ◮ For many applications, it is important to match V Expectation Mini-example precisely: additional constraint E [ V ( N 0 )] = V ( N 0 ) Parameter ◮ general principle: you can match as many constraints estimation Trial & error as there are free parameters in the model Automatic estimation ◮ Felicitous choice of cost function and M can A practical example substantially improve the quality of the estimated model ◮ It isn’t a science, it’s an art . . .

  79. Goodness-of-fit Populations & ◮ Automatic estimation procedure minimizes cost function samples until no further improvement can be found Baroni & Evert ◮ this is a so-called local minimum of the cost function The population ◮ not necessarily the global minimum that we want to find Type probabilities Population models ZM & fZM Sampling from the population Random samples Expectation Mini-example Parameter estimation Trial & error Automatic estimation A practical example

  80. Goodness-of-fit Populations & ◮ Automatic estimation procedure minimizes cost function samples until no further improvement can be found Baroni & Evert ◮ this is a so-called local minimum of the cost function The population ◮ not necessarily the global minimum that we want to find Type probabilities Population models ◮ Key question: is the estimated model good enough? ZM & fZM Sampling from the population Random samples Expectation Mini-example Parameter estimation Trial & error Automatic estimation A practical example

  81. Goodness-of-fit Populations & ◮ Automatic estimation procedure minimizes cost function samples until no further improvement can be found Baroni & Evert ◮ this is a so-called local minimum of the cost function The population ◮ not necessarily the global minimum that we want to find Type probabilities Population models ◮ Key question: is the estimated model good enough? ZM & fZM Sampling from ◮ In other words: does the model provide a plausible the population Random samples explanation of the observed data as a random sample Expectation Mini-example from the population? Parameter estimation Trial & error Automatic estimation A practical example

  82. Goodness-of-fit Populations & ◮ Automatic estimation procedure minimizes cost function samples until no further improvement can be found Baroni & Evert ◮ this is a so-called local minimum of the cost function The population ◮ not necessarily the global minimum that we want to find Type probabilities Population models ◮ Key question: is the estimated model good enough? ZM & fZM Sampling from ◮ In other words: does the model provide a plausible the population Random samples explanation of the observed data as a random sample Expectation Mini-example from the population? Parameter ◮ Can be measured by goodness-of-fit test estimation Trial & error ◮ use special tests for such models (Baayen 2001) Automatic estimation ◮ p-value specifies whether model is plausible A practical ◮ small p-value ➜ reject model as explanation for data example ➥ we want to achieve a high p-value

  83. Goodness-of-fit Populations & ◮ Automatic estimation procedure minimizes cost function samples until no further improvement can be found Baroni & Evert ◮ this is a so-called local minimum of the cost function The population ◮ not necessarily the global minimum that we want to find Type probabilities Population models ◮ Key question: is the estimated model good enough? ZM & fZM Sampling from ◮ In other words: does the model provide a plausible the population Random samples explanation of the observed data as a random sample Expectation Mini-example from the population? Parameter ◮ Can be measured by goodness-of-fit test estimation Trial & error ◮ use special tests for such models (Baayen 2001) Automatic estimation ◮ p-value specifies whether model is plausible A practical ◮ small p-value ➜ reject model as explanation for data example ➥ we want to achieve a high p-value ◮ Typically, we find p < . 001 – but the models can still be useful for many purposes!

Recommend


More recommend