lecture 1 introduction to statistical computing
play

Lecture 1 : Introduction to Statistical Computing Biostatistics - PowerPoint PPT Presentation

. . September 6th, 2011 Biostatistics 615/815 - Lecture 1 Hyun Min Kang September 6th, 2011 Hyun Min Kang Lecture 1 : Introduction to Statistical Computing Biostatistics 615/815 - Statistical Computing . . Summary Implementation


  1. . . September 6th, 2011 Biostatistics 615/815 - Lecture 1 Hyun Min Kang September 6th, 2011 Hyun Min Kang Lecture 1 : Introduction to Statistical Computing Biostatistics 615/815 - Statistical Computing . . Summary Implementation Recursion Sorting Algorithms Syllabus Overview . . . . . . . . 1 / 49 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

  2. . Recursion September 6th, 2011 Biostatistics 615/815 - Lecture 1 Hyun Min Kang into working SOFTWARE PROGRAMs 1 Equip the ability to IMPLEMENT computational/statistical IDEAS . . BIOSTAT615/815 - Objectives Summary Implementation . 2 / 49 . . . . . Algorithms . Syllabus Overview Sorting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ✓ Understand the concept of algorithm ✓ Understand basic data structures and algorithms ✓ Practice the implementation of algorithms into programming languages ✓ Develop ability to make use of external libraries

  3. . 2 Learn COMPUTATIONAL COST management in developing . . 1 Equip the ability to IMPLEMENT computational/statistical IDEAS into working SOFTWARE PROGRAMs . . statistical methods. . statistical inference applications. required for an algorithm given data size. algorithms and to optimize the cost/accuracy trade-off. Hyun Min Kang Biostatistics 615/815 - Lecture 1 September 6th, 2011 BIOSTAT615/815 - Objectives Summary 3 / 49 Overview . Recursion . Sorting . Algorithms . Syllabus Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ✓ Understand the practical importance of computation cost in many ✓ Develop the ability to estimate computational time and memory ✓ Develop the ability to improve computation efficiency of existing

  4. . . 1 Equip the ability to IMPLEMENT computational/statistical IDEAS into working SOFTWARE PROGRAMs . . 2 Learn COMPUTATIONAL COST management in developing statistical methods. . 3 Understand NUMERICAL and RANDOMIZED ALGORITHMS for . statistical inference intractable problems computationally estimation of computationally intractable problems to obtain deterministic solution. Hyun Min Kang Biostatistics 615/815 - Lecture 1 September 6th, 2011 . . BIOSTAT615/815 - Objectives Summary . . . . . . . . Overview Syllabus Algorithms Sorting Recursion Implementation 4 / 49 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ✓ Learn numerical optimization methods for solving analytically ✓ Understand a variety of randomized algorithms for robust and efficient

  5. . methods cannot used in practice due to prohibitive computational cost. 1 Wherever there is a statistical application, that is where you’ll need computation. requires nontrivial computational procedures. what they are doing exactly. . 2 Computational efficiency is critical for large-scale data analysis In many analyses of high throughput biological data, more accurate Many statistical methods should work in principle if you have infinite . time, but almost impossible to run with large-scale data due to exponential time complexity with data size. Algorithmic understating of statistical methods may turn apparently impossible task into a possible one. Hyun Min Kang Biostatistics 615/815 - Lecture 1 September 6th, 2011 . . Why Is Statistical Computing Important? Summary . . . . . . . . Overview Syllabus Algorithms Recursion Sorting Implementation 5 / 49 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ✓ For example, typical regression or maximum likelihood estimation ✓ R or SAS may do the computation for you, but you need to understand

  6. . . . 1 Wherever there is a statistical application, that is where you’ll need computation. requires nontrivial computational procedures. what they are doing exactly. . 2 Computational efficiency is critical for large-scale data analysis Why Is Statistical Computing Important? methods cannot used in practice due to prohibitive computational cost. time, but almost impossible to run with large-scale data due to exponential time complexity with data size. impossible task into a possible one. Hyun Min Kang Biostatistics 615/815 - Lecture 1 September 6th, 2011 . . Summary Algorithms . . . . . . . . Overview Syllabus Sorting Recursion Implementation 5 / 49 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ✓ For example, typical regression or maximum likelihood estimation ✓ R or SAS may do the computation for you, but you need to understand ✓ In many analyses of high throughput biological data, more accurate ✓ Many statistical methods should work in principle if you have infinite ✓ Algorithmic understating of statistical methods may turn apparently

  7. . Sorting September 6th, 2011 Biostatistics 615/815 - Lecture 1 Hyun Min Kang 1 C++ Basics and Introductory Algorithms . . What Will Be Covered? Summary . Recursion Implementation 6 / 49 . Algorithms . . Syllabus Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . • Computational Time Complexity • Sorting • Divide and Conquer Algorithms • Searching • Key Data Structure and Standard Template Libraries • Dynamic Programming • Hidden Markov Models

  8. . Recursion September 6th, 2011 Biostatistics 615/815 - Lecture 1 Hyun Min Kang 2 Numerical Methods and Randomized Algorithms . . 1 C++ Basics and Introductory Algorithms . . What Will Be Covered? . Implementation Summary 7 / 49 Overview . . . . . . . . Syllabus Algorithms Sorting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . • Random Numbers • Matrix Operations and Least Square Methods • Importance Sampling • Expectation-Maximization • Markov-Chain Monte Carlo (MCMC) Methods • Simulated Annealing • Gibbs Sampling

  9. . Algorithms September 6th, 2011 Biostatistics 615/815 - Lecture 1 Hyun Min Kang Textbooks Summary . Recursion Sorting Implementation 8 / 49 . Syllabus . . Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . • “Introduction to Algorithms” (Strongly Recommended) ✓ by Cormen, Leiserson, Rivest, and Stein (CLRS) ✓ Third Edition, MIT Press, 2009 • “Numerical Recipes” (Recommended) ✓ by Press, Teukolsky, Vetterling, and Flannery ✓ Third Edition, Cambridge University Press, 2007 • “C++ Primer Plus” (Optional) ✓ by Stephen Prata ✓ Fifth Edition, Sams, 2004

  10. • Biweekly Assignments - 33% • Expected to solve extra problems on top of 615 assignments • Midterm Exam - 17% • Final Exam - 17% • Projects, to be completed in pairs - 33% . . . BIOSTAT815 . . . . . . . . Hyun Min Kang Biostatistics 615/815 - Lecture 1 September 6th, 2011 . . BIOSTAT615 . . . . . . . . . Overview Syllabus Algorithms Sorting Recursion Implementation Summary Assignments and Grading 9 / 49 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . • Biweekly Assignments - 50% • Midterm Exam - 25% • Final Exam - 25%

  11. . Implementation September 6th, 2011 Biostatistics 615/815 - Lecture 1 Hyun Min Kang . . BIOSTAT815 . . . BIOSTAT615 . . Summary Assignments and Grading 9 / 49 Recursion . . Sorting . Algorithms . Syllabus . . . . Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . • Biweekly Assignments - 50% • Midterm Exam - 25% • Final Exam - 25% • Biweekly Assignments - 33% • Expected to solve extra problems on top of 615 assignments • Midterm Exam - 17% • Final Exam - 17% • Projects, to be completed in pairs - 33%

  12. • Audiences should be familiar with the concept of probability • Even if you have no experience in C/C++/Java, you can follow the . . September 6th, 2011 Biostatistics 615/815 - Lecture 1 Hyun Min Kang course. accomplish the homework, especially for the first few weeks of the class, but you will need to expect to spend additional hours to not strictly required). distribution, hypothesis testing, and linear regression (BIOSTAT601 is programming statistical methods in C++ language. programming experience, with strong motivation to learn Target Audience for BIOSTAT615 Summary Implementation 10 / 49 Overview . . . . . . . . Syllabus Algorithms Sorting Recursion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . • Target audience for BIOSTAT615 includes those with little or small

  13. • Even if you have no experience in C/C++/Java, you can follow the . . September 6th, 2011 Biostatistics 615/815 - Lecture 1 Hyun Min Kang course. accomplish the homework, especially for the first few weeks of the class, but you will need to expect to spend additional hours to not strictly required). distribution, hypothesis testing, and linear regression (BIOSTAT601 is programming statistical methods in C++ language. programming experience, with strong motivation to learn Target Audience for BIOSTAT615 Summary Implementation 10 / 49 Overview . . Sorting . . Algorithms . . . . Recursion Syllabus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . • Target audience for BIOSTAT615 includes those with little or small • Audiences should be familiar with the concept of probability

Recommend


More recommend