Counting Words: Playtime The zipfR Toolkit Marco Baroni & - PowerPoint PPT Presentation

zipfR Baroni & Evert zipfR A guided tour Counting Words: Playtime The zipfR Toolkit Marco Baroni & Stefan Evert M´ alaga, 10 August 2006

Outline zipfR Baroni & Evert zipfR zipfR A guided tour A guided tour Playtime Playtime

zipfR zipfR ◮ http://purl.org/stefan.evert/zipfR Baroni & Evert ◮ http://www.r-project.org/ zipfR A guided tour Playtime

Loading zipfR Baroni & Evert library(zipfR) zipfR ?zipfR A guided tour Playtime data(package="zipfR")

Importing data zipfR Baroni & Evert data(ItaRi.spc) data(ItaRi.emp.vgc) zipfR A guided tour my.spc <- read.spc("my.spc.txt") Playtime my.vgc <- read.vgc("my.vgc.txt") my.tfl <- read.tfl("my.tfl.txt") my.spc <- tfl2spc(my.tfl)

Looking at spectra zipfR Baroni & Evert summary(ItaRi.spc) print(ItaRi.spc) zipfR A guided tour N(ItaRi.spc) Playtime V(ItaRi.spc) Vm(ItaRi.spc,1) Vm(ItaRi.spc,1:5) # Baayen’s P Vm(ItaRi.spc,1) / N(ItaRi.spc) plot(ItaRi.spc) plot(ItaRi.spc, log="x")

Looking at vgcs zipfR Baroni & Evert summary(ItaRi.emp.vgc) print(ItaRi.emp.vgc) zipfR A guided tour N(ItaRi.emp.vgc) # NB! Playtime plot(ItaRi.emp.vgc, add.m=1)

Creating vgcs with binomial interpolation zipfR Baroni & Evert # interpolated vgc zipfR ItaRi.bin.vgc <- vgc.interp(ItaRi.spc, A guided tour N(ItaRi.emp.vgc), m.max=1) Playtime summary(ItaRi.bin.vgc) # comparison plot(ItaRi.emp.vgc, ItaRi.bin.vgc, legend=c("observed","interpolated"))

Estimating LNRE models zipfR Baroni & Evert # ZM model zipfR ItaRi.zm <- lnre("zm", ItaRi.spc) A guided tour summary(ItaRi.zm) Playtime # ZM estimated fitting V and V_1 only ItaRi.mmax1.zm <- lnre("zm", ItaRi.spc, m.max=1) summary(ItaRi.mmax1.zm) # fZM model ItaRi.fzm <- lnre("fzm", ItaRi.spc, exact=F) # NB! summary(ItaRi.fzm)

Observed/expected spectra at estimation size 1 zipfR Baroni & Evert # expected spectra zipfR ItaRi.zm.spc <- lnre.spc(ItaRi.zm, N(ItaRi.zm)) A guided tour Playtime ItaRi.mmax1.zm.spc <- lnre.spc(ItaRi.mmax1.zm, N(ItaRi.mmax1.zm)) ItaRi.fzm.spc <- lnre.spc(ItaRi.fzm, N(ItaRi.fzm))

Observed/expected spectra at estimation size 2 zipfR Baroni & Evert # compare zipfR plot(ItaRi.spc, ItaRi.zm.spc, A guided tour ItaRi.mmax1.zm.spc, ItaRi.fzm.spc, Playtime legend=c("observed","zm","zm1","fzm")) # plot first 10 elements only plot(ItaRi.spc, ItaRi.zm.spc, ItaRi.mmax1.zm.spc, ItaRi.fzm.spc, legend=c("observed","zm","zm1","fzm"), m.max=10)

Expected spectra at 10 times the estimation size zipfR Baroni & Evert # extrapolated spectra zipfR ItaRi.zm.spc <- lnre.spc(ItaRi.zm, 10*N(ItaRi.zm)) A guided tour Playtime ItaRi.fzm.spc <- lnre.spc(ItaRi.fzm, 10*N(ItaRi.fzm)) # compare plot(ItaRi.zm.spc, ItaRi.fzm.spc, legend=c("zm","fzm"))

Evaluating extrapolation quality 1 zipfR Baroni & Evert # taking a subsample and estimating a model (if you # repat you’ll get different sample and different zipfR # model!) A guided tour Playtime ItaRi.sub.spc <- sample.spc(ItaRi.spc, N=700000) ItaRi.sub.fzm <- lnre("fzm", ItaRi.sub.spc, exact=F) ItaRi.sub.fzm

Evaluating extrapolation quality 2 zipfR Baroni & Evert # extrapolate vgc up to original sample size zipfR ItaRi.sub.fzm.vgc <- lnre.vgc(ItaRi.sub.fzm, A guided tour N(ItaRi.emp.vgc)) Playtime # compare plot(ItaRi.bin.vgc, ItaRi.sub.fzm.vgc, N0=N(ItaRi.sub.fzm), legend=c("interpolated","fZM"))

Compare growth of two categories 1 zipfR # the ultra- prefix Baroni & Evert zipfR data(ItaUltra.spc) A guided tour Playtime summary(ItaUltra.spc) # cf. summary(ItaRi.spc) # estimating model ItaUltra.fzm <- lnre("fzm",ItaUltra.spc,exact=F) ItaUltra.fzm

Compare growth of two categories 2 zipfR Baroni & Evert # extrapolation of V to ri- sample size zipfR ItaUltra.ext.vgc <- lnre.vgc(ItaUltra.fzm, A guided tour N(ItaRi.emp.vgc)) Playtime # compare plot(ItaUltra.ext.vgc, ItaRi.bin.vgc, N0=N(ItaUltra.fzm), legend=c("ultra-","ri-")) # zooming in plot(ItaUltra.ext.vgc, ItaRi.bin.vgc, N0=N(ItaUltra.fzm), legend=c("ultra-","ri-"), xlim=c(0,1e+5))

Now, try it yourself zipfR ◮ Pick comparable datasets Baroni & Evert ◮ Explore spc, empirical vgc, interpolated vgc zipfR ◮ Compute LNRE model(s) A guided tour ◮ Compare vgc and spectra of classes at different sample Playtime sizes

Data zipfR ◮ data(package="zipfR") Baroni & Evert ◮ E.g.: zipfR ◮ Brown adjectives vs. verbs ◮ Tiger NP vs. PP rules A guided tour ◮ Great Expectations vs. Oliver Twist Playtime ◮ ... ◮ Or import your own frequency lists

Explore zipfR ◮ Remember: ?zipfR Baroni & Evert ◮ Summaries, spectrum plots zipfR ◮ Empirical and interpolated vgcs A guided tour ◮ Plot vgcs of two classes together Playtime

LNRE modeling zipfR ◮ Try more than one model Baroni & Evert ◮ Play with exact and m.max arguments zipfR ◮ Look at goodness of fit, expected V and V m A guided tour ◮ Comparative spc plots at estimation size and larger sizes Playtime

Class comparison zipfR ◮ Extrapolate class with shorter sample Baroni & Evert ◮ Extrapolate both classes to very large sample size zipfR ◮ Look at spectra for matching sample sizes A guided tour Playtime

Already done? zipfR Try Case Study 2 from the tutorial (or go to get some lunch!) Baroni & Evert zipfR A guided tour Playtime

Counting Words: Playtime The zipfR Toolkit Marco Baroni & - PowerPoint PPT Presentation

zipfR Baroni & Evert zipfR A guided tour Counting Words: Playtime The zipfR Toolkit Marco Baroni & Stefan Evert M alaga, 10 August 2006 Outline zipfR Baroni & Evert zipfR zipfR A guided tour A guided tour Playtime

44 Days And Counting 44 Days And Counting 2010 World Equestrian Games Overview September 25

Counting is Hard: Probabilistically Counting Views at Reddit Krishnan Chandra, Data Engineer

Counting Basic 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 1 of 1 10/02/2003 04:00 PM 1

Counting CS1200, CSE IIT Madras Meghana Nasre April 2, 2020 CS1200, CSE IIT Madras Meghana

Counting CS1200, CSE IIT Madras Meghana Nasre March 26, 2020 CS1200, CSE IIT Madras Meghana

Counting and Probability Whats to come? Counting and Probability Whats to come?

Counting with automorphisms Lectures for CO 430 / 630 March 24 April 2, 2020 1. Counting

Triangle Counting in Large Sparse Graph Meng-Tsung Tsai r95065@cise.ntu.edu.tw Triangle Counting

Computing Lecture 6b: Step Counting & Activity Recognition Emmanuel Agu Step Counting (How

3/31/14 Counting counting is hard with only 10 fingers How many ways to do X ? X = Choose an

Words, Words, Words AND WHY THEY MATTER IN ADVERTISING AND MARKETING Steve Kaplan Becky

Proverbs Words: The Power of Life and Death Words: The Power of 3. Words: They Can Be

The nature and quantity of the unique words of narratives (i.e.., the words beyond the

Question 5-1) Number of words = 256K words = 2 8 *2 10 words Number of bits pre each word = 32 bit

Sturmian words, Lecture 3 Standard words Dominique Perrin 1 er d ecembre 2011 Dominique

Simplicity in Practice https://xkcd.com/1349/ Words, words, words. Hamlet, Act 2 Scene

Dydra define-declaration http:/ /dydra.com or ... don't walk sbcl 20 a sparql service ..

Graph Databases Introduction, Standardization, Opportunities Peter Eisentraut

Using BIBFRAME for bibliographic description Eric Lease Morgan <emorgan@nd.edu>

Reproducible Research with knitr Thomas J. Leeper Department of Political Science and Government

RDF 101 The Semantic Web meets Resource Management Terena EuroCAMP, 2006-04-03

Querying RDF, RDFS, OWL Partially adapted from Lee Feigenbaum and Olaf Hartigs slides What is

Project Organization Project Organization Abhijit Dasgupta Abhijit Dasgupta November 13, 2019

Marcus Deglos @manarth #cdl2011 > Old faithful, serving sites since 1996. > Quick

Counting Words: Playtime The zipfR Toolkit Marco Baroni & - PowerPoint PPT Presentation

zipfR Baroni & Evert zipfR A guided tour Counting Words: Playtime The zipfR Toolkit Marco Baroni & Stefan Evert M alaga, 10 August 2006 Outline zipfR Baroni & Evert zipfR zipfR A guided tour A guided tour Playtime

44 Days And Counting 44 Days And Counting 2010 World Equestrian Games Overview September 25

Counting is Hard: Probabilistically Counting Views at Reddit Krishnan Chandra, Data Engineer

Counting Basic 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 1 of 1 10/02/2003 04:00 PM 1

Counting CS1200, CSE IIT Madras Meghana Nasre April 2, 2020 CS1200, CSE IIT Madras Meghana

Counting CS1200, CSE IIT Madras Meghana Nasre March 26, 2020 CS1200, CSE IIT Madras Meghana

Counting and Probability Whats to come? Counting and Probability Whats to come?

Counting with automorphisms Lectures for CO 430 / 630 March 24 April 2, 2020 1. Counting

Triangle Counting in Large Sparse Graph Meng-Tsung Tsai r95065@cise.ntu.edu.tw Triangle Counting

Computing Lecture 6b: Step Counting &amp; Activity Recognition Emmanuel Agu Step Counting (How

3/31/14 Counting counting is hard with only 10 fingers How many ways to do X ? X = Choose an

Words, Words, Words AND WHY THEY MATTER IN ADVERTISING AND MARKETING Steve Kaplan Becky

Proverbs Words: The Power of Life and Death Words: The Power of 3. Words: They Can Be

The nature and quantity of the unique words of narratives (i.e.., the words beyond the

Question 5-1) Number of words = 256K words = 2 8 *2 10 words Number of bits pre each word = 32 bit

Sturmian words, Lecture 3 Standard words Dominique Perrin 1 er d ecembre 2011 Dominique

Simplicity in Practice https://xkcd.com/1349/ Words, words, words. Hamlet, Act 2 Scene

Dydra define-declaration http:/ /dydra.com or ... don't walk sbcl 20 a sparql service ..

Graph Databases Introduction, Standardization, Opportunities Peter Eisentraut

Using BIBFRAME for bibliographic description Eric Lease Morgan &lt;emorgan@nd.edu&gt;

Reproducible Research with knitr Thomas J. Leeper Department of Political Science and Government

RDF 101 The Semantic Web meets Resource Management Terena EuroCAMP, 2006-04-03

Querying RDF, RDFS, OWL Partially adapted from Lee Feigenbaum and Olaf Hartigs slides What is

Project Organization Project Organization Abhijit Dasgupta Abhijit Dasgupta November 13, 2019

Marcus Deglos @manarth #cdl2011 &gt; Old faithful, serving sites since 1996. &gt; Quick

Computing Lecture 6b: Step Counting & Activity Recognition Emmanuel Agu Step Counting (How

Using BIBFRAME for bibliographic description Eric Lease Morgan <emorgan@nd.edu>

Marcus Deglos @manarth #cdl2011 > Old faithful, serving sites since 1996. > Quick