STA 103: Probability and Statistical Inference Paul Marriott paul@stat.duke.edu 223C, Old Chemistry January 7, 2004 Syllabus • Basic laws of probability - random events, independence and dependence, expectations, Bayes theorem. • Discrete and continuous random variables, density, and distribution functions. Binomial and normal models for obser- vational data. • Introductions to maximum likelihood estimation and Bayesian inference. • One- and two-sample mean problems, simple linear regression, multiple linear regression with two explanatory vari- ables. • Applications in economics and quantitative social sciences, and natural sciences emphasized. This course is recommended for students majoring in economics and the natural or computational sciences. Prerequisites: MTH 31 or equivalent. Web-page • Details of the module can be found at http://www.stat.duke.edu/˜paul/STA103 • You can down-load pdf copies of these lecture notes as well as details of the labs sessions. • The readings will be also posted 1
Book, Readings and Labs • The course book is Mathematical Statistics and Data Analysis by John A. Rice • All of the mathematical aspects of the course are taken from this book. • There will also be readings posted on the course web-page covering applications of the mathematical ideas. • There are weekly labs (in 01 Old Chem.) details are posted on web-site. Evaluation and structure • The mathematical aspects of the module are evaluated through four quizs and partly in midterm and the final. There are weekly assignments (self-evaluated), which are closely related to quiz questions which you are strongly advised to do. • There will also be shorter quizzes and assignments in class for which short notice will be given • Computation issues in the course are found in the weekly lab sessions. Assignments given each week are to be handed in and marked. • Ideas and contextual issue. This stream looks at issues such as what is randomness and probability? what is the relevance of probability to science, economics and social science? what are statistical models and what can they do? The material for this is from the readings and class discussion. It will be evaluated via essay style questions in the midterm and final and short quizs in the lecture times. • The break-down of marks are: Total of quizzes (45%) , midterm (15%) , labs (15%) , final (25%) . Expectation for students • In learning Mathematical issue it is extremely important that you work in detail through the recommended questions yourselves • “Statistics is not a spectator sport” • I expect approximately 6-7 hours of student effort per week outside lectures/labs Contact • Contact hours for Paul Marriott are 3:00-4:00 Tuesdays, Wednesday and Thursdays or by email appointment, at 223C Old Chemistry Building. • The Teaching Assistants for STA103 are Janine Wilcox, and Tyler McCormick • There is a Statistical Education and Consulting Center in 211 AB Old Chem where the TA’s for this course and other TA can be contacted. 2
Timetable The timetable for lectures and labs can be found on the course website http://www.stat.duke.edu/˜paul/STA103/ Mathematical Resources This is a calculus-based statistics course. One stumbling block students often have are with purely technical mathematical issues. The following web-sites might be of help: Rice Virtual Lab in Statistics http://www.ruf.rice.edu/˜lane/rvls.html Mathworld (A free service for the mathematical community provided by Wolfram Research, makers of Mathematica, with additional support from the National Science Foundation) http://mathworld.wolfram.com/ Self-diagnostic quiz The following short quiz does not contributed to your final mark, but is a self diagnostic to evaluate your mathematical background. Please just give the answer to each of the following 1. Sketch the curve exp(2 x ) for x ∈ ( − 1 , 1) . 2. Differentiate x 4 with respect to x . 3. Differentiate x exp( x ) with respect to x . 4. Integrate the function x 3 over the interval (0 , 1) . 5. What is the mean value of the set { 1 , 2 , − 4 , − 2 , 0 } ? 6. What is the median value of the set { 1 , 2 , − 4 , − 2 , 0 } ? 7. Solve for x if x 2 − 2 x + 1 = 0 8. Sketch the curve for x ∈ ( − 1 , 1) � 1 / 2 if x < 0 f ( x ) = 0 if x ≥ 0 3
Some examples To start thinking about ideas of probability and statistics consider the following three examples which we will return to throughout the course. 1. A model for movements in the stock market. 2. Testing the effectiveness of US corporate government 3. Who wins the Olympic games: Economic resources and medal totals Modelling the stock market The following plot show the movement of the Dow Jones index in 1970 and the histogram of daily changes. changes in Dow Jones Index 850 0.06 0.05 800 0.04 Relative Frequency 750 Dow Jones Index 0.03 0.02 700 0.01 650 0.00 0 50 100 150 200 250 −20 −10 0 10 20 30 Trading Day (1970) Change in price 4
Modelling the stock market The following plots show how you might think of modelling the Dow Jones index. It compares the index with a simple random game; win +1 with probability 0 . 5 , lose − 1 with probability 0 . 5 , played many times. Log Dow Jones Index, 1970 Plot of Coin Toss Game 20 6.70 0 6.65 Winnings in Coin Tossing −20 6.60 Log Price 6.55 −40 6.50 −60 6.45 0 50 100 150 200 250 0 50 100 150 200 250 Day Index Issues As with the previous example the following issues can be discussed • What does it mean to say the Dow Jones index is random? • In what sense are the Dow Jones index and the random game the same, in what ways different? • Can we use random, or statistical models to predict? If so how can we evaluate how well they predict? 5
Testing the effectiveness of US corporate government • A paper by Hunter (1997) which is one of the readings uses empirical data to test to see if corporate boards are inefficiently large. • He uses publicly available data on firm costs of rural distributors of electricity. • He uses a linear regression model and statistical tests to empirically check an economically based hypothesis. Issues As with the previous example the following issues can be discussed • What is a regression model and when is it appropriate for testing hypotheses in Economics and in the Social Sciences? • What can be learned from a Statistical test? • What is the nature of the data that we have in Economics and how does it differ from that in natural or experiment sciences? Predicting the winners • Prediction very commonly done in Economics and in the Business world, and we look at an example of prediction done using models typically used in Economics • Here applied to the problem of predicting which counties will win medals in the Olympics. • The study by Bernard and Busse tried to predict, before the recent Sydney 2000 Olympics, how many medals each county would win. 6
Model fit Prediction against Reality: number of medals 100 + + 80 Actual medals (Observed) 60 + + + 40 + + + + + + + 20 + + + + + + + + + + 0 20 40 60 80 100 predicted medals (Fitted) Issues 1. A regression model has been used, what does that mean and how should such a model be selected? 2. How should you assess the predictive power of such a model? 7
A brief overview of Inference In order to be able to start to understand the readings let us take an informal look at some of the fundamental statistical ideas in this course 1. What is a statistical model? 2. What is inference? 3. What does statistically significant mean? 4. What is a p -value? What is a statistical model? 1. It is a mathematical and probability based tool 2. By making certain simplifying assumptions the variability in a set of data is described using probability theory 3. A model might be used to: (a) describe the data (b) to predict future values (c) to test to see if there is evidence in the data to support or contradict an economic theory 4. In Economics the majority of models are either regression models or time series models. We concentrate on regression in this course. Example of a model • In the paper “Who wins the Olympic games: economic resources and medal totals” Andrew Bernard and Meghan Busse use a regression model to both explain what economic factors are primary associated with winning Olympic medals and to predict the number of medals won in the future. • The study tried to predict, before the recent Sydney 2000 Olympics, how many medals each county would win at Sydney based on a selection of economic indicators • The model takes explanatory variables for each country and gives out a response (i.e. the number of medals to be won.) explanatory variables → response 8
Recommend
More recommend