virtual conference
play

VIRTUAL CONFERENCE ictcm.com | #ICTCM A Unified Introduction to - PowerPoint PPT Presentation

32 nd International Conference on Technology in Collegiate Mathematics VIRTUAL CONFERENCE ictcm.com | #ICTCM A Unified Introduction to Predictive Model Building for Undergraduate Researchers Hasthika Rupasinghe * Lasanthi Watagoda * Alan


  1. 32 nd International Conference on Technology in Collegiate Mathematics VIRTUAL CONFERENCE ictcm.com | #ICTCM

  2. A Unified Introduction to Predictive Model Building for Undergraduate Researchers Hasthika Rupasinghe * Lasanthi Watagoda * Alan Arnholt Appalachian State University ICTCM 2020 Hasthika Rupasinghe and Lasanthi Watagoda A Unified Introduction to Model Building ICTCM 2020 1 / 14

  3. Outline 1 Problem 2 Our approach 3 Classroom trials 4 Structure Guided Lab I: Data Cleaning Guided Lab II: Linear Model Fitting Guided Lab III: Non–Linear Model Fitting Hasthika Rupasinghe and Lasanthi Watagoda A Unified Introduction to Model Building ICTCM 2020 2 / 14

  4. Problems Hasthika Rupasinghe and Lasanthi Watagoda A Unified Introduction to Model Building ICTCM 2020 3 / 14

  5. Problems Detailed explanations of many algorithms used by researchers to create predictive models along with directions on how to use software to implement the algorithms are not commonly found in undergraduate textbooks . Hasthika Rupasinghe and Lasanthi Watagoda A Unified Introduction to Model Building ICTCM 2020 3 / 14

  6. Problems Detailed explanations of many algorithms used by researchers to create predictive models along with directions on how to use software to implement the algorithms are not commonly found in undergraduate textbooks . One of the challenges instructors face when using a standard text is providing activities that mimic a data scientist’s experience since data sets that accompany standard texts are generally clean and ready to be analyzed. Hasthika Rupasinghe and Lasanthi Watagoda A Unified Introduction to Model Building ICTCM 2020 3 / 14

  7. Problems Detailed explanations of many algorithms used by researchers to create predictive models along with directions on how to use software to implement the algorithms are not commonly found in undergraduate textbooks . One of the challenges instructors face when using a standard text is providing activities that mimic a data scientist’s experience since data sets that accompany standard texts are generally clean and ready to be analyzed. A second challenge is the plethora of R packages and differing syntax among R packages one may choose to implement the numerous statistical learning algorithms. Hasthika Rupasinghe and Lasanthi Watagoda A Unified Introduction to Model Building ICTCM 2020 3 / 14

  8. Our approach This work presents a unified introduction to building supervised prediction models using the caret package and provide guided labs where readers: Hasthika Rupasinghe and Lasanthi Watagoda A Unified Introduction to Model Building ICTCM 2020 4 / 14

  9. Our approach This work presents a unified introduction to building supervised prediction models using the caret package and provide guided labs where readers: Question the integrity of a data set and correct data entries to create a “clean” data set Hasthika Rupasinghe and Lasanthi Watagoda A Unified Introduction to Model Building ICTCM 2020 4 / 14

  10. Our approach This work presents a unified introduction to building supervised prediction models using the caret package and provide guided labs where readers: Question the integrity of a data set and correct data entries to create a “clean” data set Build several models making minimal changes to the R syntax Hasthika Rupasinghe and Lasanthi Watagoda A Unified Introduction to Model Building ICTCM 2020 4 / 14

  11. Our approach This work presents a unified introduction to building supervised prediction models using the caret package and provide guided labs where readers: Question the integrity of a data set and correct data entries to create a “clean” data set Build several models making minimal changes to the R syntax Practice reproducible research Hasthika Rupasinghe and Lasanthi Watagoda A Unified Introduction to Model Building ICTCM 2020 4 / 14

  12. Note: Instructors: The material in this article is suitable for use in classes where the instructors have advanced degrees in statistics and experience using R in the classroom. Students: Must have some knowledge in linear regression models (for Lab II) and classification models (for Lab III). Hasthika Rupasinghe and Lasanthi Watagoda A Unified Introduction to Model Building ICTCM 2020 5 / 14

  13. Classroom tested The guided labs have been used with two undergraduate classes. These labs were implemented in the courses where the students were already using R, R Markdown, and had been exposed to ggplot2. Data Science II — STT 3860 where the students used the guided project also has as prerequisites: a standard undergraduate (non-calculus based) introductory statistics course a data visualization and management course (Data Science I — STT 2860). Statistical Data Analysis II (STT 3851) has a prerequisite Statistical Data Analysis I (STT 3850) Hasthika Rupasinghe and Lasanthi Watagoda A Unified Introduction to Model Building ICTCM 2020 6 / 14

  14. Structure The Guided Labs are hosted on the Rstudio cloud and on GitHub: Questioning and Cleaning the bodyfat data Lab: GitHub repository rstudio.cloud project Linear models with the bodyfat data Lab: GitHub repository rstudio.cloud project Non-linear models with the bodyfat data Lab: GitHub repository rstudio.cloud project Instructor manual Instructors are welcome to email: hasthika@appstate.edu , lasanthi@appstate.edu or arnholtat@appstate.edu to get an instructor version of the labs. Hasthika Rupasinghe and Lasanthi Watagoda A Unified Introduction to Model Building ICTCM 2020 7 / 14

  15. Data Boston Data The Boston data set from the MASS package written by Ripley (2019) is used to illustrate various steps in predictive model building. BodyFat Data We use the data set provided in the article Fitting Percentage of Body Fat to Simple Body Measurements , Johnson (1996) Hasthika Rupasinghe and Lasanthi Watagoda A Unified Introduction to Model Building ICTCM 2020 8 / 14

  16. Lab I: Questioning and Cleaning the Body Fat Data Guided Lab I: Data Cleaning https://rstudio.cloud/project/1164 604 Hasthika Rupasinghe and Lasanthi Watagoda A Unified Introduction to Model Building ICTCM 2020 9 / 14

  17. Lab I: Questioning and Cleaning the Body Fat Data Guided Lab I: Data Cleaning https://rstudio.cloud/project/1164 604 The purpose of this activity is to have the reader critically question, evaluate, and clean the original BodyFat data. Hasthika Rupasinghe and Lasanthi Watagoda A Unified Introduction to Model Building ICTCM 2020 9 / 14

  18. Lab II: Fitting Linear Regression Models to Body Fat Data Guided Lab II: Linear Model Fitting https://rstudio.cloud/project/323646 Hasthika Rupasinghe and Lasanthi Watagoda A Unified Introduction to Model Building ICTCM 2020 10 / 14

  19. Lab II: Fitting Linear Regression Models to Body Fat Data Guided Lab II: Linear Model Fitting https://rstudio.cloud/project/323646 The purpose of this activity is to have the reader create several regression models to predict the Body Fat using the some or all of the body measurements (explanatory variables) found in the Body Fat Data. Hasthika Rupasinghe and Lasanthi Watagoda A Unified Introduction to Model Building ICTCM 2020 10 / 14

  20. Lab III: Fitting Non-Linear Regression Models to Body Fat Data Guided Lab III: Non–Linear Model Fitting https://rstudio.cloud/project/1169242 Hasthika Rupasinghe and Lasanthi Watagoda A Unified Introduction to Model Building ICTCM 2020 11 / 14

  21. Lab III: Fitting Non-Linear Regression Models to Body Fat Data Guided Lab III: Non–Linear Model Fitting https://rstudio.cloud/project/1169242 The purpose of this activity is to have the reader create several non-linear regression models to predict the Body Fat using the some or all of the body measurements (explanatory variables) found in the Body Fat Data. Hasthika Rupasinghe and Lasanthi Watagoda A Unified Introduction to Model Building ICTCM 2020 11 / 14

  22. References 1. Francis J. Anscombe, Graphs in statistical analysis, The American Statistician, 27 (1973), 17-21. 2. A. Azzalini and A.W. Bowman, A look at some data on the Old Faithful geyser, Journal of the Royal Statistical Society, Series C, 39 (1990), 357-366. 3. P . Bickel and J.W. O’Connell, Is there a sex bias in graduate admissions?, Science, 187 (1975), 398-404. Hasthika Rupasinghe and Lasanthi Watagoda A Unified Introduction to Model Building ICTCM 2020 12 / 14

  23. Thank You! Hasthika Rupasinghe and Lasanthi Watagoda A Unified Introduction to Model Building ICTCM 2020 13 / 14

  24. Titles: Dr., Dr. and Dr. Names: Hasthika Rupasinghe, Lasanthi Watagoda and Alan Arnholt Institution: Appalachian State University, Boone, NC Emails: hasthika@appstate.edu, lasanthi@appstate.edu and arnholtat@appstate.edu 32 nd International Conference on Technology in Collegiate Mathematics VIRTUAL CONFERENCE #ICTCM Contact Information

Recommend


More recommend