Empirical problem solving Statistical method R.W. Oldford
Empirical problem solving - PPDAC The reasoning chain of any empirical study has five essential links or stages: ◮ Each stage has its own concerns to be dealt with. ◮ Each stage depends on those stages which went before. ◮ None of which can be overlooked. Reference: Scientific Method, Statistical Method, and the Speed of Light
PPDAC - Problem ◮ Target population/process (units and collection) ◮ Variates (explanatory and response) ◮ Population attribute(s) of interest ◮ Problem aspect(s) ◮ Causative, descriptive, predictive
PPDAC - Problem: Target Population Target population/process P Target ◮ the collection of units that we want to learn about ◮ sometimes a process which produces units over time is easier to define than is a fixed population. ◮ e.g. stock trading prices, production lines, streaming data, . . . ◮ carefully define what constitutes an individual unit of P Target ◮ P Target often includes inaccessible units, for example ◮ future units (especially if P Target is a process), ◮ units that cannot be studied for ethical reasons, . . . ◮ write all this information down, keep notes
PPDAC - Problem: Variates Variates x ( u ) , y ( u ) , z ( u ) , . . . , ∀ u ∈ P Target ◮ brainstorm all characteristics that might attached any individual unit u (i.e. variates) ◮ err on the side of too many variates ◮ critical review can come later ◮ for each variate, identify the kind of values it might take ◮ discrete, continuous, ◮ finite, practically infinite, ◮ categorical, ordinal, interval, ratio scale ◮ arrange variates on a fishbone diagram (or possibly several diagrams) ◮ distinguish response variates from explanatory variates ◮ use fishbone to help elicit possible variates ◮ use fishbone to help group and organize possible variates ◮ understanding the variates helps define P Target ◮ write all this information down, keep notes
PPDAC - Problem: Variates: Fishbone diagram ◮ place every variate on the diagram ◮ use branches (e.g. "6 Ms") to organize ◮ sub-branches organize further ◮ e.g. measurement: gauge, person, method ◮ might have more than one fishbone diagram
PPDAC - Problem: Population attributes Population attributes a 1 ( P Target ) , a 2 ( P Target ) , a 3 ( P Target ) , . . . ◮ numerical: counts, locations, scales, correlations, other coefficients, . . . ◮ functions: regressions (parametric and nonparametric), density functions, prediction functions, dependence graphs, . . . ◮ graphical: barplots, scatterplots, density estimates, contour plots, heatmaps, . . .
PPDAC - Problem: Aspect Problem aspect ◮ Descriptive: ◮ attributes are really population/process summaries ◮ interest lies in learning their values ◮ often interst lies in relating variate values ◮ Predictive: ◮ attributes relate variate values of units ◮ interest lies in predicting values of some variate values from those of others ◮ e.g. predicting y ( u ) for some u in the future, perhaps given x ( u ) ◮ Causative: ◮ interest lies in discovering a causal relation ◮ interest lies in changes of attributes
PPDAC - Problem: Aspect: Causation It is useful to have a general working definition of causation. Have: ◮ an explanatory variate’s value x ( u ) can be set for all u ∈ P Target ◮ an attribute of interest a ( P ) Of interest: ◮ when x ( u ) is changed to x ⋆ ( u ) for every u , and ◮ no other changes are made to any other variate z ( u ), ◮ does the attribute of interest a ( P ) change in response? If so, then we say that the changes in x caused the change in a ( P ). Note that this defining the causal effect on the population attribute a ( P ) level, not on an individual unit u level.
PPDAC - Problem: Aspect: Causation Begin with a population P and set the value of x ( u ) for all u ∈ P (denoted x ( u ) ← “value”) Now, if a ( P | x ← “ red ′′ ) � = a ( P | x ← “ blue ′′ ), then we say that the change in x caused the change in a ( P ) and write ∆ x = ⇒ ∆ a ( P ) .
PPDAC - Problem: Aspect: Causation In contrast, if we only observe x ( u ) for all u ∈ P (i.e. that x ( u ) = “value”) Then a ( P | x = “ red ′′ ) and a ( P | x = “ blue ′′ ) are simply different attributes . This says nothing about a causal relation between in x caused and a ( P ). We simply observe whatever differences exist between the two attributes.
PPDAC - Problem: Aspect: Causation Note that ◮ if we cannot set the values of x ( u ) we cannot assert causal relation, ◮ the changes in x ( u ) need not all be to the same value as in the above example,more likely are changes ◮ x ( u ) → x ( u ) + δ , or ◮ x ( u ) → x ( u ) + δ u , or ◮ x ( u ) → (1 + δ u ) × x ( u ) and ◮ any x ( u ) could be vector valued, or more complex. ◮ causal effect is at the population level ◮ this causal definition is an idealization, ◮ but shows the difference between a causal relation and an observational one, ◮ and suggests that establishing causation is likely to be a challenge. ◮ the challenge includes variations on ◮ study error ◮ sample error ◮ measurement error
PPDAC - Plan ◮ Study population/process, Variates, Attributes ◮ experimental or observational ◮ Develop sampling protocol ◮ Dealing with variates ◮ fishbone diagram ◮ selecting response variate ◮ controlling explanatory variates ◮ experimental variates (causative aspect) ◮ measuring process(es) ◮ data collection protocol(s)
PPDAC - Plan: Study population Here we determine the study population/process P Study , ◮ the collection of units that we want have access to ◮ should resemble P Target as much as possible, especially in the population attributes of interest ◮ could also be thought of as a process produces units over time but ◮ it will have a finite (possibly large) number of units, ◮ either because it is a population, or ◮ because the study must be done within a finite fixed time period. ◮ Again, carefully define what constitutes an individual unit of P Study ◮ P Study includes only units which are available and accessible for study, during the time of the study. ◮ write all this information down, keep notes
PPDAC - Plan: Sampling protocol Here we determine a sampling protocol, ◮ how should units from P Study be selected to be part of the sample S , for example ◮ possibly rule out some samples as possibilities ◮ determine how samples are to be selected ◮ deterministically or by some probability mechanism? ◮ with what probabilities of selection? ◮ there are numerous sampling plans to choose from, depending on the problem ◮ determine how sample selection (including how to randomly select) will be implemented ◮ the sampling protocol might depend on how particular variates are dealt with ◮ write down all procedures and instructions, keep notes
PPDAC - Plan: Variates Dealing with variates ◮ determine response and explanatory variates using another fishbone diagram, ◮ controlling explanatory variates, and on fishbone diagram mark explanatory variates as ◮ (B) for “blocking” variates, if their values are fixed, or otherwise severely constrained by design, ◮ (R) for “randomized” if the values are assigned by deliberate randomization ◮ (E) for “experimental” if the variate will be set purposefully and differently to evaluate its causal effect (if any), ◮ (O) for “observed” if the value will simply be measured and observed, ◮ values of remaining variates are not to be recorded ◮ write all this information down, keep notes If there is an experimental variate, i.e. whose value is deliberately set, then this is an experimental study , otherwise it is an observational study .
PPDAC - Plan: Population attributes Population attributes a 1 ( P Study ) , a 2 ( P Study ) , a 3 ( P Study ) , . . . ◮ numerical: counts, locations, scales, correlations, other coefficients, . . . ◮ functions: regressions (parametric and nonparametric), density functions, prediction functions, dependence graphs, . . . ◮ graphical: barplots, scatterplots, density estimates, contour plots, heatmaps, . . . These should, as much as possible, match those for the target population.
PPDAC - Plan: Measuring process(es) For every variate whose value is to be determined in the study, there will be an associated measuring system, or process, used to determine that value. For each variate, identify ◮ the gauge(s) or instrument(s) to be used to determine the value ◮ the person, or persons, involved in determining the value ◮ the method to be followed by the person, or person(s), using the gauge(s) Some effort will be required to ensure that sources of measuring variability and bias are identified and made as small as practically possible and scientifically necessary. This could involve a separate study (and PPDAC) to make the measuring systems sufficiently reliable (e.g. a gauge repeatability and reproducibiilty study).
Recommend
More recommend