virtual conference
play

VIRTUAL CONFERENCE ictcm.com | #ICTCM ENHANCING A PROBABILITY - PowerPoint PPT Presentation

32 nd International Conference on Technology in Collegiate Mathematics VIRTUAL CONFERENCE ictcm.com | #ICTCM ENHANCING A PROBABILITY THEORY COURSE USING R RYAN RAHRIG, PH.D. ASSOCIATE PROFESSOR OF STATISTICS OHIO NORTHERN UNIVERSITY


  1. 32 nd International Conference on Technology in Collegiate Mathematics VIRTUAL CONFERENCE ictcm.com | #ICTCM

  2. ENHANCING A PROBABILITY THEORY COURSE USING R RYAN RAHRIG, PH.D. ASSOCIATE PROFESSOR OF STATISTICS OHIO NORTHERN UNIVERSITY

  3. BACKGROUND • Increased effort to use R in the classroom for our STAT majors

  4. GETTING R INTO MATH 4651 • Is there any benefit to incorporating R into a Probability Theory course? • Reinforce basic concept of probability • Provide check for analytical solutions • Foster students’ curiosity • Explore probing, follow-up questions to standard problems

  5. BASIC DEFINITION OF PROBABILTY • Introduction to the Practice of Statistics (Moore, et al.) … the proportion of times the outcome would occur in a very long series of repetitions • Fundamentals of Statistics (Sullivan) …the long term proportion in which a certain outcome is observed • Mathematical Statistics (Wackerly, et al.) …the stable long-term relative frequency

  6. THE CLASSIC BIRTHDAY PROBLEM! • In a set of n randomly selected people, what is the probability that two people share the same birthday? • Surprising result: probability is 50% or more if n ≥ 23

  7. THE CLASSIC BIRTHDAY PROBLEM! • Standard solution: ≥ = − = P ( X 1 ) 1 P ( X 0 ) 365 364 363 343 = − × × × × 1  365 365 365 365 P = − 365 23 1 23 365 ≈ . 507297

  8. LET R HAVE A TURN! • Simulations in R not only provide a check on the answer, but can help students “see” the concept of probability. • What happens in the long run when the birthday experiment is conducted many, many times?

  9. BUILDING THE R CODE • Perform the experiment once (get n birthdays): Bdays <- sample(1:365, size=n, replace=TRUE) • Put birthdays in ascending order and then find differences b/w consecutive elements D <- diff(sort(Bdays)) • Save whether there’s a match (if any differences are 0): Results[i] <- any(D==0) • Repeat many times!

  10. R CODE FOR SIMULATION n <- 23 Results <- numeric(1000000) for(i in 1:1000000) { Bdays <- sample(1:365, size=n, replace=TRUE) # Get n birthdays D <- diff(sort(Bdays)) Results[i] <- any(D==0) } mean(Results) # proportion in Results that are TRUE

  11. SIMULATION RESULTS First execution: [1] 0.507737 Second execution: [1] 0.507253 ***Recall exact value: 0.507297

  12. FURTHER EXPLORATIONS • The real beauty of using this approach is that the problem doesn’t have to end there. • Slight variations of the birthday problem  Difficult to solve analytically  Simple to simulate using R

  13. FURTHER EXPLORATIONS • What about probability of near birthdays (e.g. birthdays within a day of each other)? • Modify Results[i] <- any(D==0) to Results[i] <- any(D<=1) * * Also need to handle case of Dec. 31 and Jan 1

  14. FURTHER EXPLORATIONS • What is the probability that 3 (or more) in the room have a common birthday? Modify D <- diff(sort(Bdays)) Results[i] <- any(D==0) to D <- diff(sort(Bdays),lag = 2) Results[i] <- any(D==0)

  15. ASSIGNMENT • As capstone advisor, noticed trend of students struggling to come with ideas for their projects. • Need more opportunities for students to be the ones to come up with the questions. • Using R allows students to come up with additional variations actually solve them.

  16. OTHER VARIATIONS • Average number in the room that share a birthday? • Average number that walk in before first match? • Non-uniform birthdays • Generalized problem (n something different than 365) • And many more…

  17. MORE ADVANCED EXAMPLE Engineer at Marathon Petroleum wanted this problem solved: • Say you have a population set of 200 people. • Of those 200 people, 125 names are drawn as winners • 9 consecutive drawings in all • For x = 0, 1,…, 125, what is the probability that exactly x persons win in all 9 drawings?

  18. MORE ADVANCED EXAMPLE Context: There are certain pipeline systems that administer a ‘lottery’ system where 125 shippers out of 200 are randomly selected each month. If shipper wins 9 months in a row  good for shipper, bad for Marathon… For x = 1, 2,…, 125, By knowing the probability that x could graduate (win 9 months in a row) with 200 new shippers, we can use those probabilities to manage the risk.

  19. STUDENT APPROACH When given this problems, students tend to go with their first inclination: ENUMERATE 9   ( ) 9 200 Suppose x=5. There are total possible outcomes and determining ≈   56 1 . 6885 * 10 125   how many of these result in exactly 5 winning all 9 months gets out of hand very quickly. Before students give up, get them to think about whether the Marathon engineer needs a nice exact answer or if a very good approximation will do the job.

  20. USING R Using R to approximate the probability again allows the student to more literally follow the definition of probability. Coding the simulation forces the student to articulate the experiment in detail since the idea is to repeat the experiment many, many times. • How to simulate doing the experiment over and over? • How to determine proportion for each possibility?

  21. USING R • Run the entire lottery one time (Pick 125 #’s out of 200 nine different times) : L <- matrix(, nrow = 9, ncol = 125) for (i in 1:9) { L[i,] <- sample(1:200,size=125) } Snippet of possible L:

  22. USING R Count how many times each number appears: T<-table(L) Count how many were picked all 9 times (and save that for later): numWon9[k] <- sum(T==9)

  23. USING R Repeat the procedure many, many ( nsim ) times by wrapping it in a for loop: for (k in 1:nsim) { L <- matrix(, nrow = 9, ncol = 125) for (i in 1:9) { L[i,] <- sample(1:200,size=125) } T<-table(L) numWon9[k] <- sum(T==9) }

  24. USING R For x = 0, 1,…, 125, compute proportion where x were selected all 9 times: Final <- data.frame(num=0:125,Probs=0) for (x in 0:125) { Final[x+1,2] <- sum(numWon9==x)/nsim }

  25. FULL R SCRIPT RESULTS nsim <- 1000000 num9 <- numeric(nsim) for (k in 1:nsim) { L <- matrix(, nrow = 9, ncol = 125) for (i in 1:9) { L[i,] <- sample(1:200,size=125) } T<-table(L) num9[k] <-sum(T==9) } Final <- data.frame(num=0:125,Probs=0) for (i in 0:125) { Final[i+1,2] <- sum(num9==i)/nsim }

  26. EXACT SOLUTION – A MARKOV CHAIN! Having successfully found an answer, students motivated to find the exact answer. • Random process turns out to be a discrete-time homogeneous Markov chain, with the matrix M being the transition matrix. • If students have already learned Markov Chains, can go over solution. • If not, can point out how using R allowed them to solve a complicated probability problem simply using the basic concept of probability and a little bit of programming.

  27. 32 nd International Conference on Technology in Collegiate Mathematics VIRTUAL CONFERENCE #ICTCM Contact Information Ryan Rahrig Associate Professor of Statistics Ohio Northern University r-rahrig@onu.edu

Recommend


More recommend