r in psychometrics and
play

R in Psychometrics and as SAS and SPSS are still dominant at the - PowerPoint PPT Presentation

R in Psychometrics and Psychometrics in R s c i t s i t Jan de Leeuw, UCLA Statistics a t S Ucla In psychometrics, and in the closely related fields of quantititative methods for the social and educational sciences, R is not yet used


  1. R in Psychometrics and Psychometrics in R s c i t s i t Jan de Leeuw, UCLA Statistics a t S Ucla In psychometrics, and in the closely related fields of quantititative methods for the social and educational sciences, R is not yet used very often. Traditional mainframe packages such R in Psychometrics and as SAS and SPSS are still dominant at the user-level, Stata has made inroads at the teaching level, and Matlab is quite prominent at the research level. Psychometrics in R In this paper we define the most visible techniques in the psychometrics area, we give an overview of what is available in R, and we discuss what is missing. Jan de Leeuw We then outline a strategy and a project to fill in the gaps. The outcome will hopefully be a more prominent position of R in the social and behavioral sciences, and as a result less of a gap between these disciplines and mainstream statistics. 2 1. What is Psychometrics ? How is it related to other Foometrics ? 2. How much R is there in Psychometrics ? Can there be more ? Should there be more ? 3. How much Psychometrics is there in R ? Will there be more ? What is missing. A recent overview of what Psychometricians themselves think about Psychometrics is in Statistica Neerlandica, 60, 2006, 135-144. 3 4

  2. If Foo is a science then Foo often has both an Each of the social and behavioural sciences has a area Foometrics and an area Mathematical Foo . form of Foometrics , although they may not all use a name in this family. Mathematical Foo applies mathematical modeling to the Foo subject area, while Clearly Economics, Psychology, Biology, Foometrics develops and studies data analysis Archeology, Anthropology, and Environmental techniques for empirical data collected in Foo . Science have their own Foometrics . What we call statistics is the union of the various And then there are various recent upstarts such as Foometrics over all Foo . Not the intersection, but Cliometrics , Informetrics , Bibliometrics , the union. Behaviormetrics , Ecolometrics , Cybermetrics , and Scientometrics . 5 6 Sociology would like to have Sociometrics , but In this presentation we'll look at Psychometrics the name was already in use for something quite and Educometrics , with a dash of Sociometrics different. Historiometrics and Archeometrics are and Econometrics . there, but struggling. Psychometrics and Educometrics have been Education does not really have Educometrics , but around for a long time, at least since Galton, and we'll use it anyway. their development has been very closely linked and often the two have been indistinguishable. Social sciences in which data are less prominent usually have books and conferences with titles So we do not distort reality too much if we just such as Statistics in Foo -- they will have their simply call the body of techniques we discuss very own Foometrics in the future. Psychometrics . 7 8

  3. R in Psychometrics Traditionally psychologists doing data analysis Psychometrics use SPSS, some use SAS. MDS 3Mode Psychometricians developing data analysis IRT CA FA SEM techniques use Matlab, sociometricians and econometricians (at least in the US) tend to use LogLin HLM Stata. Educometrics Sociometrics The situation in France or England may be quite different. 9 10 This has mainly historical reasons -- it has to do Psychometric software is often distributed by with where these packages originated. incorporating it as modules in the standard packages (SPSS, SAS, Stata), using either native But it also has to do with the rather large distance matrix routines if available or linking in compiled between areas such as psychometrics and code. This guarantees good distribution, some (academic) statistics, which again has historical money, but certainly not efficient computation. reasons, most of them silly. Typically, there is not much interaction, despite institutions like ETS Examples are CATEGORIES for CA in SPSS, and Bell Labs. PROC CALIS for SEM and PROC GLM for MLA in SAS, and gllamm for SEM and MLA in And thus the R revolution has largely passed Stata. psychometrics by. 11 12

  4. In addition, psychometricians tend to write stand- Writing stand-alone compiled packages often alone packages for specific families of means that the psychometrician is a small techniques. This is often compiled code company, trying to make money. It also means a combined with a suitable GUI. certain form of competition, which does not really belong in academia. And it means The prototypical example are SEM packages like proprietary software, which costs money. LISREL, EQS, M-PLUS, AMOS, or MLA packages such as HLM or ML-WIN -- but there More seriously, perhaps, is that this approach many similar stand-alone packages for IRT and means black-box software, in which the CA and LLA as well. In fact the number of CA machinery is almost completely hidden. This packages in marketing, for example, is means the user often will not even try to staggering. understand what is going on. 13 14 The techniques implemented in the black-box Promoting the teaching and the use of R in packages are often complicated (many parameters, psychometrics has some major advantages. complicated optimizations, doubtful standard errors). 1. The distance to academic statistics becomes smaller. This is necessarily true: simpler techniques are already implemented in SAS or SPSS and usually 2. Software is more transparent -- driven by the institution has a site license for those. interpreted code. Reproducible results are more likely. Thus we have Deus Ex Machina software: it 3. One can teach with R. One can teach SAS, but one transforms large datasets into rather mysterious cannot teach with SAS (or LISREL). pictures or tables that are nevertheless acceptable, and often even encouraged, by peers and journals. 4. Software should be free. 15 16

  5. The psychoR project. Psychometrics in R I have been writing and planning a substantial We give a quick inventory of the psychometric number of psychometric techniques in R. software now available or soon to be available in R. Eventually they will grow up to be packages. I shall concentrate on CRAN, of course, while They are not intended to replace existing mentioning some additional easily available packages: let a thousand flowers bloom. They are packages on other servers. written following the familiar programming philosophy that you can write FORTRAN in any We shall see there is quite an abundance, although language. You can find them at in most cases all forms of organization is lacking http://www.cuddyvalley.org/psychoR and duplications abound. 17 18 1. Simple and Multiple Correspondence Analysis. JSS (www.jstatsoft.org) is planning a number of special issues, with appropriate guest editors, and There is CA and MCA both in MASS , in ade4, in names such as FactoMineR , and in homals . Many variations (Canonical CA, Fuzzy CA, Detrended CA, -- R in Psychometrics Multiway CA, Discriminant CA, Co-CA) in -- R in Econometrics ade4 , PTAk, cocorresp, vegan, made4. At least -- R in Sociometrics three more CA packages (Greenacre, Beh, De Leeuw) with various options are currently being and whatever else anyone suggests along these prepared. lines. Of course there is an inherent risk in actually making constructive suggestions -- you An Embarrassment of riches. may wind up to be a guest editor. 19 20

  6. 2. Item Response Theory The homals (soon gifi ) package does what SPSS Categories does, and more. It has many forms of ltm fits the simple Rasch model, the graded logistic multivariate analysis with optimal scaling, model for polytomous data, and the linear organized as extensions of MCA. But it is rather multidimensional logistic model. poorly documented. m mprobit fits the multivariate binary probit model. � tr ( X − G j Y j ) ′ ( X − G j Y j ) X ′ X = I min min Y j ∈ Y j j = 1 Logistic IRT is related to Gaussian ordination, implemented in various forms in VGAM. CA and MCA are extended in the psychoR project with distance association models ( distassoc , More Rasch model fitting packages are on their way. scalassoc , singlepeaked, logithom ), which also generalize many common IRT models. 21 22 This covers most IRT models, and then some. In psychoR we have There are also versions for marginal maximum k j n m β j ℓ exp( η ( x i , y j ℓ )) � � � likelihood estimation, and for cross tables with y i j ℓ log � k j ν = 1 β j ν exp( η ( x i , y j ν )) frequencies in the form i = 1 j = 1 ℓ = 1 k j n m n m � � � y ij ℓ log Φ ( τ j ℓ − η ( x i , y j ℓ )) − � � y i j log λ i j − λ ij , i = 1 j = 1 ℓ = 1 i = 1 j = 1 − Φ ( τ j ℓ − 1 − η ( x i , y j ℓ − 1 )) λ i j = α i β j exp( η ( x i , y j ))  x ′ i y j ,    This generalizes CA, the RC model, Quasi-   η ( x i , y j ) =  −� x i − y j � , Symmetry, and so on.    −� x i − y j � 2 .    23 24

Recommend


More recommend