Lecture 18: Review Lecture Ani Manichaikul amanicha@jhsph.edu 15 - PowerPoint PPT Presentation

Lecture 18: Review Lecture Ani Manichaikul amanicha@jhsph.edu 15 May 2007

Types of Biostatistics n 1) Descriptive Statistics n Exploratory Data Analysis n often not in literature n Summaries n "Table 1" in a paper n Goal: visualize relationships, generate hypotheses

Types of Biostatistics n 2) Inferential Statistics n Confirmatory Data Analysis n Methods Section of paper n Goal: quantify relationships, test hypotheses

Approach to Modeling A general approach for most statistical modeling is to: n Define the Population of Interest n State the Scientific Questions & Underlying Theories n Describe and Explore the Observed Data n Define the Model n Probability part (models the randomness / noise) n Systematic part (models the expectation / signal)

Approach to Modeling n Estimate the Parameters in the Model n Fit the Model to the Observed Data n Make Inferences about Covariates n Check the Validity of the Model n Verify the Model Assumptions n Re-define, Re-fit, and Re-check the Model if necessary n Interpret the results of the Analysis in terms of the Scientific Questions of Interest

Stem-and-Leaf Plots n Age in years (10 observations) 25, 26, 29, 32, 35, 36, 38, 44, 49, 51 Age Interval Observations 20-29 5 6 9 30-39 2 5 6 8 40-49 4 9 50-59 1

Grouping: Frequency Distribution Tables n Shows the number of observations for each range of data n Intervals can be chosen in ways similar to stem-and-leaf displays Age Interval Frequency 20-29 3 30-39 4 40-49 2 50-59 1

Histograms n Pictures of the frequency or relative frequency distribution Histogram of Age 4 3 Frequency 2 1 1 2 3 4 Age Ca tegory

Box-and-Whisker Plots Box Plot of Age 50 45 Age in Years 40 35 30 25 n IQR = 44 – 29 = 15 n Upper Fence = 44 + 15* 1.5 = 66.5 n Lower Fence = 29 – 15* 1.5 = 6.5

2 Continuous Variables n Scatterplot Age by Height in cm 190 180 Height in Centimeters 170 160 150 25 30 35 40 45 50 Age in Years Scatterplots visually display the relationship between n two continuous variables

Why is the power of a test important? n Power indicates the chance of finding a “significant” difference when there really is one n Low power: like to obtain non-significant results even when significant differences exist n High power is desirable! n Low power is usually cause by small sample size

We’re not always right

Errors in Hypothesis Testing α n Aim: to keep Type I error small by specifying a small rejection region n α is set before performing a test, usually at 0.05

Errors in Hypothesis Testing β n Aim: To keep Type II error small and thus power high

β : Probability of Type II Error n The value of β is usually unknown since it depends on a specified alternative value. n β depends on sample size and α . n Before data collection, scientists decide n the test they will perform n α n the desired β n They will use this information to choose the sample size

P-Values n Definition: The p-value for a hypothesis test is the probability of obtaining by chance, alone, when H 0 is true , a value of the test statistic as extreme or more extreme (in the appropriate direction) than the one actually observed.

Steps of Hypothesis Testing n Define the null hypothesis, H 0 . n Define the alternative hypothesis, H a , where H a is usually of the form “not H 0 ”. n Define the type 1 error, α , usually 0.05. n Calculate the test statistic n Calculate the P-value n If the P-value is less than α , reject H 0 . Otherwise fail to reject H 0 .

Why use linear regression? n Linear regression is very powerful. It can be used for many things: n Binary X n Continuous X n Categorical X n Adjustment for confounding n Interaction n Curved relationships between X and Y

SLR: Y= � 0 + � 1 X 1 �� n Linear regression is used for continuous outcome variables n � 0 : mean outcome when X= 0 (Center!) n Binary X = “dummy variable” for group n � 1 : mean difference in outcome between groups n Continuous X n � 1 : mean difference in outcome corresponding to a 1-unit increase in X n Center X to give meaning to � 0 n Test � 1 = 0 in the population 20

Assumptions of Linear Regression n L Linear relationship n I Independent observations n N Normally distributed around line n E Equal variance across X’s

In Simple Linear Regression n In simple linear regression (SLR) : n One Predictor / Covariate / Explanatory Variable: X n In multiple linear regression (MLR): n Same Assumptions as SLR, (i.e. L.I.N.E.), but: n More than one Covariate: X 1 , X 2 , X 3 , …, X p Model: Y ~ N( µ , σ 2 ) n µ = E(Y | X) = β 0 + β 1 X 1 + β 2 X 2 + β 3 X 3 + ... β p Xp n

Regression Methods

Nested models n One model is nested within another if the parent model contains one set of variables and the extended model contains all of the original variables plus one or more additional variables.

Difference in assessing variables: “nested models” n other predictor(s) n assess with t test if single variable defines predictor n assess with F test (today) if two or more variables are needed to define the predictor n potential confounder(s) n compare CI of primary predictor to see whether new parameter is significantly different

The F test H 0 : all new � ’s=0 in population H A : at least one new � is not 0 in population ( ) − RSS RSS parent nested ( ) # of new variables added = F obs RSS nested residual df nested ( ) − 69 . 6 49 . 8 2 = = F 4 . 4 obs 49 . 8 22 What is F cr ?

The F test: notes n The F test can be used to compare any two nested models n If only one variable is added, it’s easier to compare the models using the t test for that variable n t 2 = F if one variable is added n For any regression, the estimated variance of the residuals is RSS/(residual df)

Nested Models n Comparing nested models n 1 new variable: use t test for that variable n 2+ new variables: use F test n Categorical predictor n set one group as reference n create dummy variable for other groups n include/exclude all dummy variables n evaluate categorical predictor with F test

Effect Modification n In linear regression, effect modification is a way of allowing the association between the primary predictor and the outcome to change with the level of another predictor. n If the 3 rd predictor is binary, that results in a graph in which the two lines (for the two groups) are no longer parallel.

Splines and Quadratic Terms n Splines are used to allow the regression line to bend n the breakpoint is arbitrary and decided graphically or by hypothesis n the actual slope above and below the breakpoint is usually of more interest than the coefficient for the spline (ie the change in slope) n Quadratic term allows for curvature in the model 31

Logistic regression n For binary outcomes n Model log odds probability, which we also call the logit n Baseline term interpreted as log odds n Other coefficients are log odds ratios

Logistic regression model   [ ] P(relief | Tx)   = log odds(Relie f | Tx) log   P(no relief | Tx)   = β 0 + β 1 Tx 0 if Placebo where: Tx = 1 if Drug

Then… n log( odds(Relief|Drug) ) = β 0 + β 1 n log( odds(Relief|Placebo) ) = β 0 n log( odds(R|D)) – log( odds(R|P)) = β 1

And…   odds(R | D) log = β 1   n Thus:   odds(R | P)   OR = exp( β 1 ) = e β 1 !! n And: n So: exp( β 1 ) = odds ratio of relief for patients taking the Drug-vs-patients taking the Placebo.

Logistic Regression Logit estimates Number of obs = 70 LR chi2(1) = 2.83 Prob > chi2 = 0.0926 Log likelihood = -46.99169 Pseudo R2 = 0.0292 ------------------------------------------------------------------------------ y | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- drug | .8137752 .4889211 1.66 0.096 -.1444926 1.772043 _cons | -.2876821 .341565 -0.84 0.400 -.9571372 .3817731 ------------------------------------------------------------------------------ Estimates: ˆ ˆ β β + log( odds(relief) ) = Drug 0 1 = -0.288 + 0.814(Drug) Therefore: OR = exp(0.814) = 2.26 !

Adding other variables n What if Pr(relief) = function of Drug or Placebo AND Age n We could easily include age in a model such as: log( odds(relief) ) = β 0 + β 1 Drug + β 2 Age

Logistic Regression n As in MLR, we can include many additional covariates. n For a Logistic Regression model with p predictors: log ( odds(Y= 1)) = β 0 + β 1 X 1 + ... + β p X p = = Pr( 1 ) Pr( 1 ) Y Y where: odds(Y= 1) = = − = = 1 Pr( 1 ) Y Pr( 0 ) Y

Types of interpretation n � 0 + � 1 = ln(odds) (for X= 1) n � 1 = difference in log odds + � 0 � e 1 = odds (for X= 1) n � e = odds ratio 1 n n But we started with P(Y= 1). Can we find that?

More useful math p robability = n odds − 1 p robability odds = p robability n + 1 odds + � � e ( ) 0 1 = = so p robability for X 1 + + � � n 1 e 0 1

Lecture 18: Review Lecture Ani Manichaikul amanicha@jhsph.edu 15 - PowerPoint PPT Presentation

Lecture 18: Review Lecture Ani Manichaikul amanicha@jhsph.edu 15 May 2007 Types of Biostatistics n 1) Descriptive Statistics n Exploratory Data Analysis n often not in literature n Summaries n "Table 1" in a paper n Goal: visualize

FE Review-Transportation 1 FE Review-Transportation 2 FE Review-Transportation 3 FE

FE Review-Mechanics of Materials 1 FE Review-Mechanics of Materials 2 FE Review-Mechanics of

MTA-RF: Fabrication Readiness Review Bowring Review Daniel Bowring Lawrence Berkeley National

Keeyask Engineering Review Jan 30 2017 Project Design Review Contract Cost Review

Part 1 Part 1 I ntroduction Review of I ntroduction Review of I ntroduction, Review of I

Peer Review Process Boris Sokolov, PhD Scientific Review Officer Center for Scientific Review

Malaysian Healthy Ageing Society Plenary Lecture Plenary Lecture Plenary Lecture Plenary

SAB Review: SAB Review: IRIS Toxicological Review IRIS Toxicological Review of Acrylamide of

STATE DRUG OVERDOSE REVIEW FATALITY REVIEW TEAM November 28, 2017 Fatality Review Teams The

5-Year Review OCP Monitoring Program 5 Year Review Annual Review Five Year Review

Welcome & Introduction Welcome & Introduction Annual Review 2017 Annual Review 2017

Title I Annual Review July 7, 2014 Goal: Complete Title I Annual Review Outcomes: Review

Virginia Webb, PhD, RD Procurement Review Process First review cycle Review last

Review S. Cheng (OU-Tulsa) October 17, 2017 1 / 28 Lecture 10 Review Conditioning reduces

Review for Final Also: Review homework Review Lecture 9 slides Example : Binary

CS 241: Systems Programming Lecture 15. Strings Fall 2019 Prof. Stephen Checkoway 1 Review of

Lecture 11: Interpreting logistic regression models Ani Manichaikul amanicha@jhsph.edu 3 May

Intro to GLM Day 3: Quantities of interest Federico Vegetti Central European University ECPR

Special Topics Some complex model-building problems can be handled using the linear regression

Introducing Open Platform for NFV Dirk Kutscher Chief Researcher NEC

Statistical Modelling in Stata: Categorical Outcomes Mark Lunt Centre for Epidemiology Versus

Week 4: Binary Outcomes Logistic Regression & Classification Max H. Farrell The University

Searching for family members - (Durbin et al., Ch.5) Suppose we have a family of related

CSE-571 Grid maps or scans Probabilistic Robotics [Lu & Milios, 97; Gutmann, 98: Thrun

Lecture 18: Review Lecture Ani Manichaikul amanicha@jhsph.edu 15 - PowerPoint PPT Presentation

Lecture 18: Review Lecture Ani Manichaikul amanicha@jhsph.edu 15 May 2007 Types of Biostatistics n 1) Descriptive Statistics n Exploratory Data Analysis n often not in literature n Summaries n "Table 1" in a paper n Goal: visualize

FE Review-Transportation 1 FE Review-Transportation 2 FE Review-Transportation 3 FE

FE Review-Mechanics of Materials 1 FE Review-Mechanics of Materials 2 FE Review-Mechanics of

MTA-RF: Fabrication Readiness Review Bowring Review Daniel Bowring Lawrence Berkeley National

Keeyask Engineering Review Jan 30 2017 Project Design Review Contract Cost Review

Part 1 Part 1 I ntroduction Review of I ntroduction Review of I ntroduction, Review of I

Peer Review Process Boris Sokolov, PhD Scientific Review Officer Center for Scientific Review

Malaysian Healthy Ageing Society Plenary Lecture Plenary Lecture Plenary Lecture Plenary

SAB Review: SAB Review: IRIS Toxicological Review IRIS Toxicological Review of Acrylamide of

STATE DRUG OVERDOSE REVIEW FATALITY REVIEW TEAM November 28, 2017 Fatality Review Teams The

5-Year Review OCP Monitoring Program 5 Year Review Annual Review Five Year Review

Welcome &amp; Introduction Welcome &amp; Introduction Annual Review 2017 Annual Review 2017

Title I Annual Review July 7, 2014 Goal: Complete Title I Annual Review Outcomes: Review

Virginia Webb, PhD, RD Procurement Review Process First review cycle Review last

Review S. Cheng (OU-Tulsa) October 17, 2017 1 / 28 Lecture 10 Review Conditioning reduces

Review for Final Also: Review homework Review Lecture 9 slides Example : Binary

CS 241: Systems Programming Lecture 15. Strings Fall 2019 Prof. Stephen Checkoway 1 Review of

Lecture 11: Interpreting logistic regression models Ani Manichaikul amanicha@jhsph.edu 3 May

Intro to GLM Day 3: Quantities of interest Federico Vegetti Central European University ECPR

Special Topics Some complex model-building problems can be handled using the linear regression

Introducing Open Platform for NFV Dirk Kutscher Chief Researcher NEC

Statistical Modelling in Stata: Categorical Outcomes Mark Lunt Centre for Epidemiology Versus

Week 4: Binary Outcomes Logistic Regression &amp; Classification Max H. Farrell The University

Searching for family members - (Durbin et al., Ch.5) Suppose we have a family of related

CSE-571 Grid maps or scans Probabilistic Robotics [Lu &amp; Milios, 97; Gutmann, 98: Thrun

Welcome & Introduction Welcome & Introduction Annual Review 2017 Annual Review 2017

Week 4: Binary Outcomes Logistic Regression & Classification Max H. Farrell The University

CSE-571 Grid maps or scans Probabilistic Robotics [Lu & Milios, 97; Gutmann, 98: Thrun