choosing sample size for knowledge tracing
play

Choosing Sample Size for Knowledge Tracing Models DERRICK COETZEE - PowerPoint PPT Presentation

Choosing Sample Size for Knowledge Tracing Models DERRICK COETZEE Motivation BKT parameters are inferred from data But best solution for a given data set may not quite match the parameters that actually generated it ( sampling error )


  1. Choosing Sample Size for Knowledge Tracing Models DERRICK COETZEE

  2. Motivation ◦ BKT parameters are inferred from data ◦ But best solution for a given data set may not quite match the parameters that actually generated it ( sampling error ) 0,0,0,0,0 prior = 0.205 5 students, 0,0,0,0,0 4 parameters, learning = 0.010 5 problems each, 0,1,1,0,1 3 decimal digits each, guess = 0.142 0,1,0,0,0 25 bits of data 39.9 bits of data slip = 0.031 0,0,1,1,0 Not even possible for all parameter sets to be represented!

  3. Questions ◦ So how much data is needed for accurate estimates? ◦ And do the parameter values affect how much you need? ◦ Can we give confidence intervals for parameters?

  4. Normal distribution over samples ◦ Mean is almost always near true generating value ◦ Standard deviation can be used to describe variation of estimates ◦ Can use 68 – 95 – 99.7 rule for confidence intervals

  5. Variation does depend on parameter values ◦ Each parameter behaves differently ◦ Best estimates for parameters near zero/one, worst in 05-0.8 range

  6. There are interactions between parameter values ◦ Can’t just precompute a table of stddevs for each parameter  ◦ Complex relationship, analytical approach probably infeasible ◦ But at least there is continuity with small rates of change

  7. Sample size recommendations ◦ Stddev proportional to 1/sqrt(n) ◦ Must increase sample size by factor of 4 to improve error by factor of 2 ◦ Small data sets (<1000 students) will not give even one sigfig in all parameters ◦ Question systems based on small classes!

  8. No interaction between sample size and parameters ◦ Change sample size without changing parameters → predictable variation in error ◦ Gives an approach to estimate error on real-world data sets: ◦ Take samples with replacement, infer parameters for each, compute stddev ◦ Scale using 1/sqrt(n) to estimate stddevs at other sample sizes

  9. Knowledge Tracing for Interacting Student Pairs DERRICK COETZEE

  10. Motivation ◦ Standard Bayesian knowledge tracing uses fixed learning rate parameter to capture all learning

  11. Motivation ◦ One way to improve: use information on course materials viewed

  12. Motivation ◦ What about peer interaction (e.g. forums/chat)? ◦ Not fixed/static like instructional materials ◦ The level of knowledge of the other student is important ◦ Use our BKT model of the other student’s knowledge!

  13. Pair interaction scenario ◦ Simple case of student interaction ◦ Two students are paired and always interact between each item (no interactions with others) Learn Do exercise independently Interact with partner Learn Do exercise independently

  14. Pair interaction scenario ◦ Model independent learning and interaction stages

  15. Pair interaction scenario ◦ Model independent learning and interaction stages ◦ New parameters: teach, mislead Knows Other Probability student knows after knows interaction No No 0 Yes Yes 1 No Yes teach Yes No 1−mislead

  16. Results: Preliminary simulations ◦ 5-parameter system (prior, learn, guess, slip, teach) ◦ forget, mislead parameters fixed at zero ◦ Generate synthetic data, run EM from generating values ◦ Same behavior as classic system when teach = 0 ◦ Unstable when teach > 0 ◦ Converges to trivial solution prior=learn=teach=1, slip=proportion incorrect responses ◦ Occurs for both small and large teach parameters

  17. Results: Preliminary simulations ◦ 4-parameter system (learn, guess, slip, teach) ◦ forget, mislead, prior fixed at zero ◦ For small teach values (e.g. 0.05), teach converges to zero ◦ Yields nontrivial solutions for large teach values, but other parameters absorb some of the teach: ◦ learn=0.0900, guess=0.1400, slip=0.0900, teach=0.9000, 100 students → learn=0.1586, guess=0.1648, slip=0.0856, teach=0.6481 ◦ learn=0.0900, guess=0.1400, slip=0.0900, teach=0.9000, 1000 students → learn=0.1643, guess=0.1940, slip=0.1102, teach=0.7225

  18. Results: Preliminary simulations ◦ 4-parameter system (learn, guess, slip, teach) with 10000 students and high teach ◦ prior=0.0000, learn=0.0900, guess=0.1400, slip=0.0900, teach=0.9000 → prior=0.2184, learn=0.0841, guess=0.1239, slip=0.2658, teach=0.8793 ◦ prior and slip have high error, but learning/guess/teach are good ◦ teach accuracy increases dramatically with sample size

  19. Possible solutions ◦ Answer items between independent learning and interaction (more observed data) ◦ Mentor/mentee model: knowledge flows in only one direction ◦ Eliminate different parameters, or combine parameters to create lower-dimensional space

  20. Future work ◦ Determine whether interaction model produces better predictions on synthetic data ◦ Gather real-world pair interaction data using MOOCchat tool ◦ Determine whether pair interaction produces better predictions ◦ Typical values, appropriate interpretations for teach and mislead parameters? ◦ Generalize to more complex interactions

Recommend


More recommend