stat 113 topic outline final exam
play

STAT 113: TOPIC OUTLINE (FINAL EXAM) COLIN REIMER DAWSON, FALL 2015 - PDF document

STAT 113: TOPIC OUTLINE (FINAL EXAM) COLIN REIMER DAWSON, FALL 2015 The final exam will cover the following six areas in roughly equal proportion. For example, there might be one multi-part question for each major heading (possibly with some


  1. STAT 113: TOPIC OUTLINE (FINAL EXAM) COLIN REIMER DAWSON, FALL 2015 The final exam will cover the following six areas in roughly equal proportion. For example, there might be one multi-part question for each major heading (possibly with some crossover, where it makes sense). 1. Research Design / Describing Samples Questions in this category will focus on issues of research design (sampling procedures, confounding, etc.) as well as the kinds of descriptive statistics and visualizations that we covered prior to Exam 1, and which are treated in Chapters 1-2 of the textbook. Key concepts are listed in the Exam 1 topic outline, and include ideas related to (1) sampling and study structure • experimental vs. observational studies and what we can glean from them • confounding • sampling and sources of sampling bias (2) structure of data • identifying cases, variables, and types of variables (3) descriptive measures • central tendency (mean, median) • variability (range, IQR, variance, standard deviation) • relationships (correlation, regression models) (4) descriptive visualizations • bar plots • histograms • box-and-whisker-plots • scatterplots Date : December 11, 2015. 1

  2. 2 COLIN REIMER DAWSON, FALL 2015 (5) prediction • interpretation and use of linear regression models • interpretation of residuals • checking residuals to diagnose problems with the model 2. Inference Foundations Questions in this category will focus on “big picture” issues about statistical inference, and in particular, about confidence intervals and hypothesis tests. Covered mainly in Ch. 3-6 of the textbook, and on our last exam. These topics include: (1) Distinguishing populations and samples • parameters vs. statistics • identifying appropriate populations • variability across different random samples that we could have gotten • What is a sampling distribution, and why do we care? (2) Use and meaning of a confidence interval • Why do we have uncertainty in our estimates? • What does the margin of error tell us, and how is it affected by sample size, confidence level, population variability... • What does the confidence level mean? (3) Use and interpretation of a hypothesis test • Why do we need to do tests, as opposed to just looking at our data? • What are null and alternative hypotheses statements about? • What is the conceptual criterion to say that we have evidence for a research hypothesis? • Measuring consistency/inconsistency with a null hypothesis • Logic and interpretation of P -values. • What does it mean for a result to be “statistically significant”? • What can we say if we reject H 0 ? What can we say if we do fail to reject H 0 ?

  3. STAT 113: TOPIC OUTLINE (FINAL EXAM) 3 • Kinds of statistical errors (Type I / Type II), what they mean, what kinds of things affect how likely they are to occur (4) Logic of bootstrapping and randomization • What do we do when we construct a “bootstrap” sample? • What assumptions are randomization procedures based on? • Difference between sample size and number of samples • What are the individual points in bootstrap / randomization distributions? • How can we use bootstrap / randomization distributions to con- struct confidence intervals / compute P -values? (5) Common structure of most test statistics / confidence intervals • Standardized test statistics such as z and t statistics measure the number of away from that the is. • Relationship beween magnitude of a test statistic and the P - value 3. Inference for Correlation and Regression Questions in this category will focus on making inferences (confidence inter- vals, hypothesis tests) about relationships between quantitative variables, in particular correlation and the slope of a regression line; and also on (a) making estimates with a margin of error about the expected/predicted/mean value of a y variable in a cross-section of a population sharing a particular x value, and (b) making predictions with a margin of error about the value of a y variable for a particular case with a particular x value. This is the mate- rial from Chapter 9 of the textbook, and from the class slides and handouts from 11/23 and 11/24. Particular topics include: (1) The difference between a population correlation ( ρ ) and a sample correlation ( r ) • How sample correlations vary across samples around a popula- tion correlation • How to simulate random samples assuming no association/correlation. (2) The difference between a population regression line-of-best-fit and a sample regression line-of-best-fit • How sample lines vary around a population line

  4. 4 COLIN REIMER DAWSON, FALL 2015 • How to simulate random samples assuming that x has no power to predict y (3) t -tests for population correlation and population slope-of-best-fit- line • How to compute the test statistic (using the appropriate stan- dard error) • Conditions that msut be satisfied in order to use a t -distribution (4) “Coefficient of Determination” ( R 2 ) for a regression model • Interpretation as a proportion • Relationship to correlation 4. Goodness of Fit and Association Tests for Categorical Variables Questions in this category will focus on hypothesis tests when the response variable is categorical and may have more than two levels (and so we can’t simply do a single-proportion test), and/or when there is also an explanatory variable that may have more than two levels (that is, there are more than two groups). This is the topic of Ch. 7 in the textbook, and the classes from 11/25-11/2. Specific topics include: (1) The distinction between expected (“long run”) category counts/proportions of a categorical variable and the particular distribution of a single sample (2) Schemes to construct simulated random samples of one categorical variable assuming particular long-run proportions. (3) Ways to measure “distance” between observed and expected out- comes, including the χ 2 statistic • How are the different parts computed, and what do they repre- sent? • What is the role/purpose of the normalization (denominator)? • How can we use individual terms in the sum to investigate de- gree of discrepancy for individual categories? (4) Finding expected proportions for combinations of two categorical values, assuming the individual variable proportions are fixed (5) Simulating random samples assuming fixed proportions for each vari- able separately, and assuming no relationship (constant “conditional distributions”).

  5. STAT 113: TOPIC OUTLINE (FINAL EXAM) 5 (6) The concept of “degrees of freedom” in a set of random proportions. (7) What kinds of χ 2 values represent discrepancies from H 0 (8) How to use a χ 2 distribution to find a P -value / measure how “un- expected” a sample of counts is. (9) Relationship between χ 2 goodness of fit test and z -test of a single proportion when there are only two categories (i.e., there is a binary response variable) (10) Relationship betwen a χ 2 test of association and a z -test of a differ- ence of proportions when there are two binary variables. 5. Comparing Multiple Means Questions in this category will focus on hypothesis tests when the response variable is quantitative, and the explanatory variable is categorical, but may have more than two levels (i.e., there are more than two groups, and we want to compare the typical outcomes of a quantitative variable). This is the material from Chapter 8 of the textbook, and from the classes on 12/4, 12/7 and 12/8. Specific topics are: (1) Distinction between different sample means and different population means (2) Why is it a good idea to do one overall test first, instead of lots of separate tests of pairs? (3) Scheme to simulate random samples assuming a set of means is equal (4) Ways to measure “distance among” more than two means. • Simple things, good for randomization • F -statistic, good for theoretical distribution-based test (5) Idea of dividing variance into within groups vs between groups • Why does it make sense to normalize by within groups variance, when interested in variation across means? (6) Meaning of components of the ANOVA table (7) After a significant F -test • Confidence intervals of individual means • Confidence intervals and tests of differences of pairs • Using the pooled MS Within as the common estimate of within- group variance

  6. 6 COLIN REIMER DAWSON, FALL 2015 6. Practical Integration The last type of question will ask you to pull together themes from across the different sections of the course, focusing on some common principles (see also “Inference Foundations”), perhaps asking you to take a research question and describe from start to finish how you would approach it. (1) Can you identify what parameter(s) make the most sense to focus on? Considerations: • Categorical Vs. Quantitative Response Variable? – Proportion vs. Mean • Categorical vs. Quantitative vs. No Explanatory Variable? – Differences between/among groups vs. Correlation vs. In- ference about a single proportion/mean • One, two, or more groups? – Inference about a single proportion/mean vs. difference of proportions/means vs. more complex scenarios (chi- square/ANOVA) (2) Can you interpret the conclusion of a hypothesis test / confidence interval in real world terms? Can you distinguish between scenarios when causal conclusions are / are not justified?

Recommend


More recommend