Introduction to Inferential Statistics Jaranit Kaewkungwal, Ph.D. Faculty of Tropical Medicine Mahidol University 1
2 Data & Variables
Types of Data Types of Data QUALITATIVE QUALITATIVE Data expressed by type Data expressed by type Data that has been described Data that has been described QUANTITATIVE QUANTITATIVE Data classified by numeric value Data classified by numeric value Data that has been measured or counted Data that has been measured or counted QUALITITATIVE and QUANTITATIVE data are not mutually exclusive QUALITITATIVE and QUANTITATIVE data are not mutually exclusive Adapted from: Dr. Craig Jackson, Adapted from: Dr. Craig Jackson, University of Central England
Types of Data: Qualitative (Categorical) Data Types of Data: Qualitative (Categorical) Data NOMINAL DATA NOMINAL DATA • values that the data may have do not have specific order values that the data may have do not have specific order • • values act as labels with no real meaning values act as labels with no real meaning • • Binomial: two possible values (categories, states) • Multinomial: more than two possible values (categories, states) e.g. Health status healthy =1 sick=2 e.g. Health status healthy =1 sick=2 e.g. Treatment new regimen = 1 standard regimen = 2 e.g. Treatment new regimen = 1 standard regimen = 2 e.g. hair colour e.g. hair colour brown =1 brown =1 blond =2 blond =2 black =100 black =100 ORDINAL DATA ORDINAL DATA • values with some kind of ordering values with some kind of ordering • • data that has been measured or counted data that has been measured or counted • e.g. social class: upper=1 middle = 2 working = 3 e.g. social class: upper=1 middle = 2 working = 3 e.g. glioblastoma glioblastoma tumor grade: tumor grade: 1 2 3 4 5 e.g. 1 2 3 4 5 1 st st 2 nd nd 3 rd rd e.g. position in a race: e.g. position in a race: 1 2 3 Adapted from: Dr. Craig Jackson, Adapted from: Dr. Craig Jackson, University of Central England
Types of Data: Quantitative Data Types of Data: Quantitative Data DISCRETE DISCRETE • distinct or separate parts, with no finite detail distinct or separate parts, with no finite detail • e.g children in family e.g children in family CONTINUOUS CONTINUOUS • between any two values, there would be a third between any two values, there would be a third • e.g between meters there are centimetres e.g between meters there are centimetres INTERVAL INTERVAL • equal intervals between values and an arbitrary zero on the sc equal intervals between values and an arbitrary zero on the scale ale • e.g temperature gradient e.g temperature gradient RATIO RATIO • equal intervals between values equal intervals between values and and an absolute zero an absolute zero • e.g body mass index e.g body mass index Adapted from: Dr. Craig Jackson, Adapted from: Dr. Craig Jackson, University of Central England
Levels of Variables Levels of Variables Temperature Temperature White Hot White Hot 80 o o C “Dangerous Dangerous” ” C “ 80 60 o o C Unsafe Unsafe “Unpleasant Unpleasant” ” C “ 60 Red Hot Red Hot “Uncomfortable Uncomfortable” ” “ 40 o o C C 40 “Tolerable Tolerable” ” “ Safe Safe 20 o o C C 20 “Comfortable Comfortable” ” “ Cold Cold 10 o o C “Cold Cold” ” C “ 10 Adapted from: Dr. Craig Jackson, Adapted from: Dr. Craig Jackson, University of Central England
Examples of Data Coding 1 2 3 Nominal/Cat. Var 4 1 2 3 4 1 2 3 4 Ordinal/Cat. Var 88 Exclude from Analysis? 7 99
8 Cont. 99 Examples of Data Coding 1 2 99 Cont.
9 Example of Descriptive Statistics
Constant vs. Variable Variables are the specific properties that have the ability to take different values. Constants are the specific properties that cannot vary or won’t be made to vary. 10
Terminology - - Variables Variables Terminology INDEPENDENT INDEPENDENT (syn: treatment, experimental, predictor, input, exposure, explanatory variable) is a stimulus or activity that is identified or manipulated to predict the dependent variable; they are considered as the causal factors, or that you may manipulate. e.g. new drug, working hours, exposure, worker attitudes, policies ies e.g. new drug, working hours, exposure, worker attitudes, polic DEPENDENT DEPENDENT (syn: Effect, criterion, criterion measure, outcome, output variable) is a response that the researcher wanted to predict; they are considered as the outcomes of the treatments or the responses to changes in the independent variables. e.g. Symptomotology Symptomotology, productivity, accident rates, attitudes, health status, , productivity, accident rates, attitudes, health status, e.g. performance on neuropsychological test performance on neuropsychological test Adapted from: Dr. Craig Jackson, Adapted from: Dr. Craig Jackson, University of Central England
Terminology - - Variables Variables Terminology CONTROLLED CONTROLLED Extraneous variable is a variable that has a potential to distorts the relationship between dependent and independent variables. • Controlled extraneous variables are recognized before the study is initiated and are controlled in the design and selection criteria . • Uncontrolled extraneous variables are recognized before the study is initiated or, sometimes, even if recognized cannot be controlled in the design and selection phase. Usually an attempt is made to assess and adjust them through sophisticated statistical tools. e.g., Working hours, temperatures, extraneous exposure, diet, class, income, ass, income, e.g., Working hours, temperatures, extraneous exposure, diet, cl Ambient noise and temperature in testing room Ambient noise and temperature in testing room Adapted from: Dr. Craig Jackson, Adapted from: Dr. Craig Jackson, University of Central England
Study Variables Independent Variables & Dependent variables X Y X (independent) Y (dependent) Extraneous variable X (independent) Y (dependent) X (independent) Y (dependent) X2 (independent) 13
Study Variables Confounding Variable - When the effects of two or more variables cannot be separated. 14
Study Variables Confounding Variable - When the effects of two or more variables cannot be separated. S T D ra te C o n d o m Y e s 5 5 /9 5 (6 1 % ) U se N o 4 5 /1 0 5 (4 3 % ) “Condom Use increases the risk of STD” BUT ... S T D ra te # P a r tn e r s < 5 C o n d o m Y e s 5 /1 5 (3 3 % ) U se N o 3 0 /8 2 (3 7 % ) # P a r tn e r s > 5 C o n d o m Y e s 5 0 /8 0 (6 2 % ) U se N o 1 5 /2 3 (6 5 % ) Explanation: Individuals with more partners are more likely to use condoms. But individuals with more partners are also more likely to 15 get STD.
Example of Study Variables Infant/ / Child Growth Dependent Var: Indepependent Var: Adult Fatness Extraneous Var: - Confouding Var (Adj,/Controlled Var) : Child Age, Adult Age, Socio-economic, Smoking, Physical Acitivity, etc. Sex (male & female) - Uncontrolled Var: 16
Bias & Chance 17
Measuring Outcomes: Observed vs. Truth Possible Explanations of Outcome Measured Bias Chance Truth Observed = Truth + Error Systematic error + Random error (Bias) (Chance) 18
Bias vs. Chance 19
Bias vs.Chance Bias: • A process at any stage of inference tending to produce results that depart systematically from the true values. Chance: • The divergence of an observation on a sample from the true population value in either direction. • The divergence due to chance alone is called random variation Bias and chance- are not mutually exclusive. 20
Bias vs.Chance “A well designed, carefully executed study usually gives results that are obvious without a formal analysis and if there are substantial flaws in design or execution a formal analysis will not help.” 21 Johnson AF. Beneath the technological fix. J Chron Dis 1985 (38), 957-961
Chance Probability of being hit 2.5 % 2.5 % 50% 50% 68% 95% “Free kick” 22
Chance Probability of getting goal 50% 50% 68% 95% 2.5 % 2.5 % “Free kick” 23
Normal Distribution in Descriptive Statistics Standard Score Raw Score 20 25 30 35 40 X = 30; SD = 5 24
Types of Statistical Methods 25
Types of Statistics • By Level of Generalization – Descriptive Statistics – Inferential Statistics • Parameter Estimation • Hypothesis Testing – Comparison Sampling Generalization/ – Association Techniques Inferential Statistics – Multivariable data analysis • By Level of Underlying Distribution – Parametric Statistics – Non-parametric Statistics 26
Descriptive Statistics 27
Descriptive Statistics • Measure of Location (Categorical Vars) – Frequency ( f ) 270 260 250 Female 240 Male 230 220 Count 210 Female Male Gender • Measure of Location (Continuous Vars) – Mean n ∑ n ∑ x Average x i i or = μ = 1 i = = 1 i x N n – Median Mid-point X 1 X 2 X 3 X 4 X 5 X 6 X 7 X 8 X 9 – Mode X 1 X 2 X 3 X 4 X 5 X 6 X 7 X 8 X 9 The Most Frequent 28 ( 1 2 2 2 2 3 3 4 5)
Recommend
More recommend