linear models ii
play

Linear Models II Design of Experiments, Analysis of Variance and - PDF document

Linear Models II Design of Experiments, Analysis of Variance and Multiple Regression http://bcf.isb-sib.ch/teaching/introStat/ EMBnet Course Introduction to Statistics for Biologists, Jan 2009 The research process Scientific


  1. Linear Models II Design of Experiments, Analysis of Variance and Multiple Regression http://bcf.isb-sib.ch/teaching/introStat/ EMBnet Course – Introduction to Statistics for Biologists, Jan 2009 The research process � � Scientific question of interest � � Decision on what data to collect (and how) � � Collection and analysis of data � � Conclusions, generalization � � Communication and dissemination of results EMBnet Course – Introduction to Statistics for Biologists

  2. Generic Question : Does a ‘treatment’ have an ‘effect’? Examples : � � Does wine prevent cancer? � � Does smoking cause lung cancer? � � Does milk reduce osteoporosis? � � Does physical exercise slow artheriosclerosis? � � Does statin treatment lower blood lipids? EMBnet Course – Introduction to Statistics for Biologists Experimental Design – why do we care? � � Poor design costs: – � time, money, ethical considerations � � To ensure relevant data are collected, and can be analyzed to test the scientific hypothesis/ question of interest – � Decide in advance how data will be analyzed – � ‘Designing the experiment’ = ‘Planning the analysis’ � � The design is about the science (biology) EMBnet Course – Introduction to Statistics for Biologists

  3. Planning an Experiment � � What measurements to make ( response ) � � What conditions to study ( treatments ) � � What experimental material to use ( units ) A “good” experiment � � tests what you want to test / estimates the effects you are interested in � � controls for everything else (exclusion, blocking, adjustment) to avoid bias and confounding EMBnet Course – Introduction to Statistics for Biologists Example Cancer Diagnosis � � Blood samples were taken from 25 cancer patients and � � a control group of 25 healthy people. � � The healthy people were a consecutive series that came to hospital as blood donors. � � The laboratory analyzed the “positive” samples in March and the “negative” samples in April. � � What can go wrong in this study? EMBnet Course – Introduction to Statistics for Biologists Jan 2009

  4. Example Agricultural experiment • � Response = crop yield • � Treatments Two different sorts of potatoes are compared • � Units Two pieces of land can be used Field 1 Field 2 EMBnet Course – Introduction to Statistics for Biologists Jan 2009 Example: two blocks Type A Block 1 Type B Block 2 Is this a good design ? EMBnet Course – Introduction to Statistics for Biologists Jan 2009

  5. Blocking and Replication 5 replicas for each treatment in the first Block 1 block and 8 in the Block 2 second . • � replication is needed to estimate the scale of random effects measurement errors • � fields are subdivided into smaller areas; the choice of potato sort of to be planted is randomized inside the two blocks EMBnet Course – Introduction to Statistics for Biologists Jan 2009 Addressing the question � � A basic means to address this type of question involves comparing two groups of study subjects – � Control group: provides a baseline for comparison – � Treatment group: group receiving the ‘treatment’ EMBnet Course – Introduction to Statistics for Biologists

  6. Types of variability � � Planned systematic (difference between the conditions, wanted) � � Chance variation (can handle this with statistical models) � � Unplanned systematic differences ( NOT wanted) – � Can bias results – � Can only be corrected for if it can be included in the model (adjusting) – � e.g. time of measurements EMBnet Course – Introduction to Statistics for Biologists Confounding factors � � Ideally, both the treatment and control groups are exactly alike in all respects (except for group membership) � � A confounding factor (or confounder ) is associated with both the group membership and the response � � Example: strong association of gender and lung cancer, confounded by smoking � � Unbalanced factors that are not associated with response are not confounding EMBnet Course – Introduction to Statistics for Biologists

  7. Replication, Randomization, Blocking � � Replication – to reduce random variation of the test statistic, increases generalizability � � Randomization – to remove bias � � Blocking – to reduce unwanted variation � � Idea here is that units within a block are similar to each other, but different between blocks � � ‘Block what you can, randomize what you cannot’ EMBnet Course – Introduction to Statistics for Biologists Experimental vs. Observational studies � � Controlled experiment : subjects assigned to groups by the investigator – � randomization : protects against bias in assignment to groups – � blind , double-blind : protects against bias in outcome assessment/measurement – � placebo : fake ‘treatment’ � � Observational study : subjects ‘assign’ themselves to groups – � confounder : associated with both group membership and the outcome of interest EMBnet Course – Introduction to Statistics for Biologists

  8. Observational studies � � Advantages – � often easier to carry out – � don’t ‘interfere’ with the system, what you see is ‘ natural ’ rather than ‘artificial’ – � variation is biologically relevant , as it has been unaltered – � sometimes manipulation is not possible � � Drawbacks – � confounders EMBnet Course – Introduction to Statistics for Biologists Hibernation example � � General question: How do changes in an animal’s environment cause the animal to start hibernating? � � What changes should be studied ?? – � temperature – � photoperiod (day length: long or short) � � What measurement(s) to take? – � nerve activity enzyme (Na + K + ATP-ase) � � What animal to study – � golden hamster, 2 organs (brain, heart) EMBnet Course – Introduction to Statistics for Biologists

  9. Specific question � � General question : How do changes in an animal’s environment cause the animal to start hibernating? � � => Specific question : What is the effect of changing day length on the concentration of the sodium pump enzyme in two golden hamster organs? EMBnet Course – Introduction to Statistics for Biologists Sources of variability � � Variability due to conditions of interest (wanted) – � Day length (long vs. short) – � Organ (heart vs. brains) � � Variability in the response ( NOT wanted): measurement error – � Preparation of enzyme suspension – � Instrument calibration � � Variability in experimental units ( NOT wanted) – � Biological differences among hamsters – � Environmental differences EMBnet Course – Introduction to Statistics for Biologists

  10. Basic designs: Completely randomized � � Focus on 1 organ (heart, say) � � Random assignment: use chance to assign hamsters to long and short days � � ‘Random’ is not the same as ‘haphazard’ � � For balance , assign same number to short and long � � Example (8 hamsters): Long: 4, 1, 7, 2 Short: 3, 8, 5, 6 EMBnet Course – Introduction to Statistics for Biologists Basic designs: Randomized block � � Suppose that the hamsters came from 4 different litters , with 2 hamsters per litter � � Expect hamsters from the same litter to be more similar than hamsters from different litters � � Can take each pair of hamsters and randomly assign short or long to one member of each pair � � Example (coin flip, say): S, L // L, S // S, L // S, L EMBnet Course – Introduction to Statistics for Biologists

  11. Basic designs: Factorial crossing � � Compare 2 (or more) sets of conditions in the same experiment : Long vs. Short and Heart vs. Brain � � In this example, there are 4 combinations of conditions: – � Long/Heart, Long/Brain, Short/Heart, Short/Brain � � Example (2 coin flips, say): L/H: 7, 2 L/B: 4, 1 S/H: 3, 5 S/B: 8, 6 EMBnet Course – Introduction to Statistics for Biologists Basic designs: Split plot/ repeated measures � � First, randomly assign Long days to 4 hamsters and Short days to the other 4 � � Then, use each hamster twice : once to get Heart conc, and once to get Brain conc � � This design has units of different sizes for each factor – � for day length , the unit is a hamster – � for organ , the unit is a part of a hamster EMBnet Course – Introduction to Statistics for Biologists

  12. Summary � � Optimize precision of the estimates among main comparisons of interest � � Must satisfy scientific and physical constraints of the experiment � � You can save a lot of time , money and heart- ache by consulting with an experienced analyst on design issues before any steps of the experiment have been carried out EMBnet Course – Introduction to Statistics for Biologists X categorical- Y continuous � � We can visually inspect the dependence of the distribution of Y given X by a series of boxplot or stripcharts EMBnet Course – Introduction to Statistics for Biologists, Jan 2009

Recommend


More recommend