Statistical Analysis for M edical and Public Health Data Qazvin University of M edical Sciences 2017
Workshop Schedule 1- Types of variables 2- Types of Studies 3- Types of data summaries 4- Types of statistical inference 5- statistical graphs and data analysis with STATA
1. Types of variables • Qualitative variables : responses are not number – Nominal variable: makes group of people; no comparison Examples: gender, status (ill, health) – Ordinal variable: makes group of people; simple comparison (< = >) Examples: education, social class (I,II,III,IV)
1. Types of variables • Quantitative variables : responses are numbers 1. interval variables: makes groups, comparison, zero point or origin was made by scientists difference is OK but ratio is not Examples: temperature (0c, 32F , -270K), poverty line (Toman, $, … ) 20C – 10C = 10C 20C/ 10C = 2 F = 32 + 1.8* C 32+1.8* 20=68 32+1.8* 10=50 68 F – 50 F=18F+32=50F=10C 68F/ 50F = 1.36
1. Types of variables 2. Ratio variables: makes groups, comparison, zero point or origin is a true zero difference is OK and ratio is OK Examples: age, weight, height 180cm – 170cm = 10cm 180cm/ 170cm=1.06 180kg-170kg = 10kg 18kg/ 170kg=1.06 Statistical methods for interval and ratio variables are the same.
1. Types of variables • Dependent variable (Y) or outcome or response or end point is a function of many factors • Independent variables (X1, X2, … , Xk) predictors, factors, exploratory variables, treatment are possible causes for Y
2. Types of studies • Observational study : Definition: An observational study 1. draws inferences from a sample to a population 2. independent variables are not under the control of the researcher because of: ethical concerns logistical constraints 3. Randomization of treatment is impossible
Types of observational studies • Case-control study: study originally developed in epidemiology, in which two existing groups differing in outcome are identified and compared on the basis of some supposed causal attribute. • Cross-sectional study: involves data collection from a population, or a representative subset, at one specific point in time. • Longitudinal study: correlational research study that involves repeated observations of the same variables over long periods of time. • Cohort study or Panel study: a particular form of longitudinal study where a group of patients is closely monitored over a span of time. • Ecological study: an observational study in which at least one variable is measured at the group level.
Types of observational studies Disadvantage: cannot be used as reliable sources to make statements of fact about the " safety , efficacy , or effectiveness " of a practice Advantages: 1- provide information on “real world” use and practice 2- detect signals about the benefits and risks of practices in the general population 3- help formulate hypotheses to be tested in subsequent experiments 4- provide data needed to design more informative pragmatic clinical trials 5- inform clinical practice
Experimental Study Definition: the investigator actively manipulates which groups receive the agent or exposure under study Randomized controlled trials (RCT) The steps in an RCT are: 1. State the hypothesis 2. Select the participants. This step includes sample size, inclusion and exclusion criteria, and informed consent 3. Allocate participants randomly to either the treatment or control group; Randomization 4. Administer the intervention. a blinded fashion; single blind; double blind 5. At a pre-determined time, the outcomes are monitored
3- Types of data summaries • Tables • Graphs • Descriptive statistics
3- Types of data summaries One-way table: shows distribution of one variable Table 1 Distribution of blood group of who where when Blood group Freq. percent A 25 18.52 B 40 29.63 AB 55 40.74 O 15 11.11 Total 135 100
3- Types of data summaries Two-way table : shows distribution of one variable by second one Table 2 Distr. of … by … who when where Disease Yes Disease NO total Blood Freq. % Freq. % Freq. % group A 20 5 25 B 20 20 40 AB 40 15 55 O 10 5 15 Total 90 45 135
3- Types of data summaries • Three-way table Application: effect of exposure on outcome after controlling for a confounder Age group exposure Disease + Disease - 25 - 30 Y es no … … … >= 75 Y es No
Statistical Graphs • For qualitative variables: 1. Simple Bar chart 2. Clustered Bar chart 3. Pie chart 4. Clustered pie chart 15
Bar chart for race 100 96 80 67 60 count of id 40 26 20 0 white black other 16
Distribution of low birth weight by race 80 73 60 count of id 42 40 25 23 20 15 11 0 0 1 0 1 0 1 white black other 17
Distribution of race 35.45% 50.79% 13.76% white black other 18
Distribution of low birth weight by race white black 23.96% 42.31% 57.69% 76.04% other 37.31% 62.69% 0 1 Graphs by race 19
Statistical Graphs • For quantitative variables (continuous or discrete) • Histogram • Box plot • Scatter plot • line plot • ROC curve (Receiver operating characteristic) curve 20
Distribution of volume as a continuous variable 25 20 15 Percent 10 5 0 5,000 10,000 15,000 20,000 25,000 Volume (thousands) 21
Distribution of M ileage as discrete variable 15 10 Percent 5 0 10 20 30 40 Mileage (mpg) 22
Distribution of blood pressure (bp) by Sex effect of sex on bp 180 160 Blood pressure 140 120 Male Female 23
Distribution of blood pressure (bp) by age groups and sex effects of age group and sex on bp 180 Blood pressure 160 140 120 Male Female Male Female Male Female 30-45 46-59 60+ 24
Scatter plot of life expectancy by population growth 50 60 70 80 4 Avg. 2 annual % growth 0 80 70 Life expectancy at birth 60 50 0 2 4 25
Line chart for life expectancy over years 65 60 life expectancy 55 50 45 40 1900 1910 1920 1930 1940 Year 26
Line charts for life expectancy and inflation over years 60 50 40 30 20 10 1900 1910 1920 1930 1940 Year life expectancy inflation 27
Receiver Operator Characteristic Curve (ROC) curve • To examine if a clinical marker or a new clinical test is suitable for diagnosing a disease • Find a cutoff point and its sensitivity and specificity for a marker or a test • ROC gives Area Under Curve (AUC) and p-value to examine the efficacy of the marker or test • AUC > 0.5 and closer to 1.0 indicates acceptable marker or test for diagnosing 28
An example of a bad marker 1.00 0.75 Sensitivity 0.50 0.25 0.00 0.00 0.25 0.50 0.75 1.00 Specificity Area under ROC curve = 0.3870 ROC -Asymptotic Normal-- Obs Area Std. Err. [95% Conf. Interval] -------------------------------------------------------- 189 0.3870 0.0452 0.29841 0.47564 29
ROC curve for a good marker 1.00 0.75 Sensitivity 0.50 0.25 0.00 0.00 0.25 0.50 0.75 1.00 Specificity Area under ROC curve = 0.9964 ROC -Asymptotic Normal-- Obs Area Std. Err. [95% Conf. Interval] -------------------------------------------------------- 2000 0.9964 0.0013 0.99390 0.99893 30
Choosing a Cutoff point Detailed report of sensitivity and specificity Correctly Cutpoint Sensitivity Specificity Classified ( >= 1 ) 100.00% 0.00% 50.00% ( >= 2 ) 99.70% 94.20% 96.95% ( >= 3 ) 99.50% 96.00% 97.75% ( >= 4 ) 99.30% 97.60% 98.45% ( >= 5 ) 98.80% 98.30% 98.55% ( >= 6 ) 97.80% 98.50% 98.15% ( >= 7 ) 97.30% 98.80% 98.05% ( >= 8 ) 96.50% 99.70% 98.10% ( > 8 ) 0.00% 100.00% 50.00% 31
Fundaments of statistical Testing and Confidence Interval 32
Fundaments of statistical Testing Research Loop: Population with Representative statistics sample unknown parameters 33
Fundaments of statistical Testing M ethods for statistical inference: 1- Estimation 1-1 Point estimation 1-2 Confidence Interval estimation 2- Statistical Testing (T est of Hypothesis) 34
What is a point estimate? A point estimate is a statistical measure that is calculated based on data obtained in a sample. Examples: sample mean, sample proportion, etc. Population parameters point estimate M ean = µ Xbar Prop. = P X/ n; X=number of successes, n=sample size Standard deviation= σ s s/√n Standard Error = Std. Err. Coefficient of Variation= σ / µ s/ Xbar 35
M ajor problem with point estimates • T o what extend we have confidence to generalize a point estimate to its parameter in the population? • No specific answer! • A point estimate may have confidence from 0% to 100% • The question is answered by building an interval with interested confidence and centered on the point estimate 36
Recommend
More recommend