fast item response theory irt analysis by using gpus
play

Fast Item Response Theory (IRT) Analysis by using GPUs Lei Chen - PowerPoint PPT Presentation

Fast Item Response Theory (IRT) Analysis by using GPUs Lei Chen lei.chen@liulishuo.com Liulishuo Silicon Valley AI Lab 1 Outline A brief introduction of Item Response Theory (IRT) Edward, a new probabilistic programming (PP) toolkit


  1. Fast Item Response Theory (IRT) Analysis by using GPUs Lei Chen lei.chen@liulishuo.com Liulishuo Silicon Valley AI Lab 1

  2. Outline • A brief introduction of Item Response Theory (IRT) • Edward, a new probabilistic programming (PP) toolkit • An experiment of using Edward to do IRT model estimation on both CPU and GPU computing platforms • Summary 2

  3. A concise introduction of adaptive learning • What's up with adaptive learning 3

  4. Adaptive learning is hot in the eduTech market • Increasing demands • Districts’ spending on adaptive learning products has grown threefold between 2013 and 2016 , according to a new analysis. EdWeek market brief 7/14/2017 • Increasing suppliers 4

  5. Precisely knowing students ability levels is important • Adaptive learning needs correct inputs about students’ ability levels, which are latent • Assessment are developed for inferring latent abilities • For a Yes/No question, the probability a student provides a correct answer p(X=1) depends on • his/her latent ability (theta) • Also other related factors, e.g., item’s di ffi culty, making a lucky guess, carelessness … 5

  6. Item Response Theory (IRT) • IRT provides a principled statistical method to quantify these factors and has been widely used to build up modern assessment industry • A widely used 2 parameter logistic model (2-PL) 6

  7. IRT with fewer or more parameters • 1-PL • Only having b, assume all items share same a • 3-PL • c for random guessing • 4-PL • d for inattention 7

  8. IRT’s wide usages 8

  9. IRT’s wide usages • More precise description of item performance 8

  10. IRT’s wide usages • More precise description of item performance • More precise scoring 8

  11. IRT’s wide usages • More precise description of item performance • More precise scoring • More powerful test assembly 8

  12. IRT’s wide usages • More precise description of item performance • More precise scoring • More powerful test assembly • Supporting advanced linking & equating to make standard tests be possible 8

  13. IRT’s wide usages • More precise description of item performance • More precise scoring • More powerful test assembly • Supporting advanced linking & equating to make standard tests be possible • Supporting adaptive testing by placing examinees and items on the same scale 8

  14. Concrete examples • “ Item response theory and computerized adaptive testing ” presentation made for a hands-on workshop by Rust, Cek, Sun, and Kosinski from University of Cambridge The Psychometrics Center • Very nice animations to explain IRT, how to use IRT to score, and CAT. 9

  15. Item Response Function Binary items Probability of getting item right 1 Parameters: Models: Measured concept (theta) 10

  16. Item Response Function Binary items Probability of getting item right 1 Parameters: Difficulty • Models: Difficulty 1 Parameter • Measured concept (theta) 10

  17. Item Response Function Binary items Probability of getting ) e item right 1 p o l s ( n Parameters: o i t Difficulty • a n Discrimination • i m i r c s i D Models: Difficulty 1 Parameter • 2 Parameter • Measured concept (theta) 10

  18. Item Response Function Binary items Probability of getting ) e item right 1 p o l s ( n Parameters: o i t Difficulty • a n Discrimination • i m Guessing • i r c s i D Models: Difficulty 1 Parameter • 2 Parameter • 3 Parameter • Guessing Measured concept (theta) 10

  19. Item Response Function Binary items Probability of getting ) e item right 1 p o Inattention l s ( n Parameters: o i t Difficulty • a n Discrimination • i m Guessing • i r c Inattention • s i D Models: Difficulty 1 Parameter • 2 Parameter • 3 Parameter • Guessing 4 Parameter • Measured concept (theta) 10

  20. Item Response Function Binary items Probability of getting ) e item right 1 p o Inattention l s ( n Parameters: o i t Difficulty • a n Discrimination • i m Guessing • i r c Inattention • s i D Models: Difficulty 1 Parameter • 2 Parameter • 3 Parameter • Guessing 4 Parameter • unfolding • Measured concept (theta) 10

  21. Scoring Test: 1.0 0.8 Probability 0.6 0.4 0.2 0.0 -3.0 -2.0 -1.0 0.0 1.0 2.0 3.0 Theta 11

  22. Scoring Test: 1. Normal distribution 1.0 0.8 Probability 0.6 0.4 0.2 0.0 -3.0 -2.0 -1.0 0.0 1.0 2.0 3.0 Theta 11

  23. Scoring Test: 1. Normal distribution 1.0 2. q1 – Correct 0.8 Probability 0.6 0.4 0.2 0.0 -3.0 -2.0 -1.0 0.0 1.0 2.0 3.0 Theta 11

  24. Scoring Test: 1. Normal distribution 1.0 2. q1 – Correct 0.8 Probability 0.6 0.4 0.2 0.0 -3.0 -2.0 -1.0 0.0 1.0 2.0 3.0 Theta Most likely score 11

  25. Scoring Test: 1. Normal distribution 1.0 2. q1 – Correct 3. q2 – Correct 0.8 Probability 0.6 0.4 0.2 0.0 -3.0 -2.0 -1.0 0.0 1.0 2.0 3.0 Theta 11

  26. Scoring Test: 1. Normal distribution 1.0 2. q1 – Correct 3. q2 – Correct 0.8 Probability 0.6 0.4 0.2 0.0 -3.0 -2.0 -1.0 0.0 1.0 2.0 3.0 Theta Most likely score 11

  27. Scoring Test: 1. Normal distribution 1.0 2. q1 – Correct 3. q2 – Correct 0.8 4. q3 - Incorrect Probability 0.6 0.4 0.2 0.0 -3.0 -2.0 -1.0 0.0 1.0 2.0 3.0 Theta 11

  28. Scoring Test: 1. Normal distribution 1.0 2. q1 – Correct 3. q2 – Correct 0.8 4. q3 - Incorrect Probability 0.6 0.4 0.2 0.0 -3.0 -2.0 -1.0 0.0 1.0 2.0 3.0 Theta Most likely score 11

  29. Scoring Test: 1. Normal distribution 1.0 2. q1 – Correct 3. q2 – Correct 0.8 4. q3 - Incorrect Probability 0.6 0.4 0.2 0.0 -3.0 -2.0 -1.0 0.0 1.0 2.0 3.0 Theta Most likely score 11

  30. Scoring Test: 1. Normal distribution 1.0 2. q1 – Correct 3. q2 – Correct 0.8 4. q3 - Incorrect Probability 0.6 0.4 0.2 0.0 -3.0 -2.0 -1.0 0.0 1.0 2.0 3.0 Theta Most likely score 11

  31. Computer Adaptive Testing • Standard tests • Containing fixed number of questions • Some are too simple and some are too di ffi cult for a specific test-taker • CAT • Items can be tailored • Save time/money • Measure test-taker’s ability more accurately 12

  32. Example of CAT Start the test: 1.0 0.8 Probability 0.6 0.4 0.2 0.0 -3.0 -2.0 -1.0 0.0 1.0 2.0 3.0 Theta 13

  33. Example of CAT Start the test: Incorrect response Correct response 1. Ask first question, e.g. of 1.0 medium difficulty 0.8 Probability 0.6 0.4 0.2 0.0 -3.0 -2.0 -1.0 0.0 1.0 2.0 3.0 Theta 13

  34. Example of CAT Start the test: 1. Ask first question, e.g. of 1.0 medium difficulty 2. Correct! 0.8 Probability 0.6 0.4 0.2 0.0 -3.0 -2.0 -1.0 0.0 1.0 2.0 3.0 Theta 13

  35. Example of CAT Start the test: 1. Ask first question, e.g. of 1.0 medium difficulty 2. Correct! 0.8 3. Score it Probability Normal distribution 0.6 0.4 0.2 0.0 -3.0 -2.0 -1.0 0.0 1.0 2.0 3.0 Theta Most likely score 13

  36. Example of CAT Start the test: 1. Ask first question, e.g. of 1.0 medium difficulty 2. Correct! 0.8 3. Score it Probability 0.6 0.4 0.2 0.0 -3.0 -2.0 -1.0 0.0 1.0 2.0 3.0 Theta Most likely score 13

  37. Example of CAT Start the test: 1. Ask first question, e.g. of 1.0 medium difficulty 2. Correct! 0.8 3. Score it Probability 4. Select next item with a 0.6 difficulty around the most Difficulty likely score (or with the max 0.4 information) 0.2 0.0 -3.0 -2.0 -1.0 0.0 1.0 2.0 3.0 Theta Most likely score 13

  38. Example of CAT Start the test: 1. Ask first question, e.g. of 1.0 medium difficulty 2. Correct! 0.8 3. Score it Probability 4. Select next item with a 0.6 difficulty around the most Difficulty likely score (or with the max 0.4 information) 5. And so on…. Until the 0.2 stopping rule is reached 0.0 -3.0 -2.0 -1.0 0.0 1.0 2.0 3.0 Theta Most likely score 13

  39. IRT model estimation • Mostly used Marginal Maximum Likelihood (MMLE) • Finding the marginal distribution of the item parameters by integrating over theta • Estimate item parameters by MLE • Obtain theta by MLE based on estimated item parameters • For a more e ffi cient estimation, use EM • Other ways • Joint Maximum Likelihood (JML) 14

  40. Bayesian solution • Issues with MLE • Depends on distribution of data • Estimation is not accurate when samples are small- sized • Hard to handle ability distribution is not normal • Bayesian solutions consider theta priors 15

  41. MCMC • Markov chain Monte Carlo (MCMC) used for Bayesian estimation • Ultimate goal is approximate p(parameters|data) by sampling many data points from the posterior probability • Hamiltonian MC is good at dealing with high-dimensional parameter spaces. HMC utilizes the geometry of the important regions of the posterior for making better proposals. 16

  42. Variational Inference • To approximate intractable distribution by using a family of distributions and finding the member of this family that can minimizes divergence to the true posterior • By approximating the posterior with a simpler function, leading to faster estimation • Kullback–Leibler (K-L) divergence was frequently used to measure two distributions’ closeness 17

Recommend


More recommend