“It is better to observe than to criticise.” – Bobby Wellins (Jazz Line-up, 13/2/2011) Teesside University, Social Futures Institute, seminar, 18/11/2015 1
“Best of all is is to to co convey vey th the mag e magnitude nitude of of th the eff e effect ect an and d th the e de degr gree ee of of ce cert rtai ainty nty ex expl plicitly icitly .” – Pinker (2014, p. 45) Teesside University, Social Futures Institute, seminar, 18/11/2015 2
“Usually wh what at on one e wa wants nts to to kn know ow is is no not t wh whether the cha ether the change nge ma make kes s an any di diff fferenc erence, e, bu but t to to kn know ow how w li like kely ly it is it is th that the at the ch chan ange ge wi will ll be be bi big g en enou ough gh .” – (Landauer , 1997, p. 222)”) Teesside University, Social Futures Institute, seminar, 18/11/2015 3
Ma Magnitude gnitude-based based in infer erence ence in in be beha haviour vioural al resear search Paul ul van an Schaik haik p.van an-sc schaik haik@t @tee ees.ac .ac.uk .uk http://sss p://sss-studne udnet.tees t.tees.a .ac.uk/p .uk/psy sycholog hology/staf /staff/P /Paul aul_vs/i vs/inde ndex.htm .htm Teesside University, Social Futures Institute, seminar, 18/11/2015 4
Ou Outline tline • Problem and proposed solution • Quantification in behavioural research • Statistical inference in behavioural research • Magnitude-based inference • The application of magnitude-based inference in behavioural research • Other approaches • Limitations • Recommendations Teesside University, Social Futures Institute, seminar, 18/11/2015 5
The e pr prob oblem lem A researcher conducts a study comparing two software designs in terms of their usability She conducts usability tests with two groups, each using one of the designs, and collects various measures These include perceived usability, error rate and time- on-task She then compares the two groups in terms of their mean scores on the measures, using a t test She finds that, although differences in mean scores are apparent, the test results do not show statistical significance What should the researcher conclude about the difference in usability between the two designs? Teesside University, Social Futures Institute, seminar, 18/11/2015 6
A pr proposed posed solution lution As an altnernative to null-hypothesis significance- testing (NHST), use information about uncertainty in the data, • the observed value of the effect and • smallest substantial values for the effect • to make two kinds of magnitude-based inference: mechanistic and practical Use the results of (NHST) as input Use spreadsheets available on the Internet to generate inferences Developed and influential in sport- and exercise science Teesside University, Social Futures Institute, seminar, 18/11/2015 7
Qua uantifi ntification cation in in us user rese search arch • “The systematic study of the goals, needs, and capabilities of users so as to specify design, construction, or improvement of tools to benefit how users work and live” (Schumacher, 2009, p. 6) • Usability- and user-experience data • E.g. psychometric data, error rate and time-on-task • Formative research • users ’ interaction with an artefact is studied to generate data that, when analysed, provide information to inform system improvement • Summative research • establishes the quality interaction of an artefact in comparison with another artefact or a benchmark Teesside University, Social Futures Institute, seminar, 18/11/2015 8
Sta tatistical tistical in inferen erence ce in in us user er re rese search arch Usually, null-hypothesis significance testing (NHST) is used; limitations: 1. null hypothesis of no effect is (almost) always false 2. ignores the smallest important effect: has no effect on the inference that is made in NHST 3. does not address practical relevance; does not clearly define or distinguish practical and mechanistic significance 4. a non-significant result is inconclusive and a crude classification of inference is used (reject or retain H 0 ) 5. sample size estimation is based on NHST Teesside University, Social Futures Institute, seminar, 18/11/2015 9
Me Merits its of of magnitude gnitude-based based in inference rence 1. Requires the researcher to define smallest important effect, rather than null effect 2. Uses smallest important effect as integral part of inference, so inferences are not an artefact of sample size 3. Provides a rigorous and principled approach to infer practical significance; provides a rigorous distinction between practical and mechanistic significance Teesside University, Social Futures Institute, seminar, 18/11/2015 10
Mo More merits its 4. Provides a more refined classification of inferences that can be made than merely rejecting or retaining the null hypothesis 5. Estimates of required sample size are based on practical significance or mechanistic significance and researcher-defined smallest important effect Teesside University, Social Futures Institute, seminar, 18/11/2015 11
Inf nference erence of of me mech chanistic anistic sig ignificance nificance (1 (1) For descriptive purposes, an effect can be • classified in terms of its size • in relation to smallest important + and - effect size • as positive, trivial or negative For inference proper, the chances of an effect • being positive, negative or trivial are used • The chances of the effect being positive: effect falling above the threshold of the smallest important + effect • The chances of the effect being negative: effect falling below the threshold of the smallest important - effect • The chances of a trivial effect: 100% minus the sum of the chances of a + effect and those of a - effect Teesside University, Social Futures Institute, seminar, 18/11/2015 12
Inf nference erence of of me mech chanistic anistic sig ignificance nificance (2 (2) An inference is then made from the chances of • each of three ranges of outcome (positivity, triviality and negativity) as follows • Unclear effect: both the chances of the obtained effect being + and the chances of the effect being - effect are too large (e.g., both greater than the default value of 0.05 or other appropriate cut-offs). • Otherwise, clear effect, seen as substantially +, - or trivial and considered to have the size of the observed value, with a qualification of probability Proposed interpretation of probability ranges • Teesside University, Social Futures Institute, seminar, 18/11/2015 13
The effect … Probability Chances Odds positive/trivial/negative beneficial/negligible/harmful is almost certainly not … <0; 0.005] <0; 0.5%] <0; 1:199] is very unlikely to be … <0.005; 0.05] <0.5%; 5%] <1:199: 1:19] is unlikely to be …, is probably not <0.05; 0.25] <5%; 25%] <1:19; 1:3] … is possibly (not) …, may (not) be … <0.25; 0.75] <25%; 75%] <1:3; 3:1] is likely to be ..., is probably … <0.75; 0.95] <75%; 95%] <3:1; 19:1] is very likely to be … <0.95; 0.995] <95%; 99.5%] <19:1; 199:1] <199:1; > is almost certainly … <0.995; 1> <99.5; 100> Teesside University, Social Futures Institute, seminar, 18/11/2015 14
Teesside University, Social Futures Institute, seminar, 18/11/2015 15
Teesside University, Social Futures Institute, seminar, 18/11/2015 16
Inf nference erence of of pra ractical ctical sig ignific nificance ance (1 (1) For descriptive purposes, an effect can be • classified in terms of its size • in relation to smallest important beneficial and harmful effect size • as beneficial, negligible or harmful For inference proper, the chances of an effect • being beneficial, harmful or negligible are used • The chances of the effect being beneficial: effect falling above the threshold of the smallest important ben. effect • The chances of the effect being harmful: effect falling below the threshold of the smallest important harmf. effect • The chances of a negligible effect: 100% minus the sum of the chances of a ben. effect and those of a harmf. effect Teesside University, Social Futures Institute, seminar, 18/11/2015 17
Inf nference erence of of pra ractical ctical sig ignific nificance ance (2 (2) Type-1 practical error • • analogous to that of Type-I error in NHST (rejecting the null hypothesis when it is true) Type-2 practical error • • analogous to that of Type-II error in NHST (retaining the null hypothesis when it is false) In the practical (‘clinical’) application of effects • • the chance of using a harmful effect (a Type-1 practical error) needs to be far smaller than • the chance of not using a beneficial effect (a Type-2 practical error) Teesside University, Social Futures Institute, seminar, 18/11/2015 18
Recommend
More recommend