hfes webinar series how do you know that your metrics
play

HFES Webinar Series How Do You Know That Your Metrics Work? - PowerPoint PPT Presentation

HFES Webinar Series How Do You Know That Your Metrics Work? Fundamental Psychometric Approaches to Evaluating Metrics Presented by Fred Oswald, Rice University Moderated by Rebecca A. Grier, Ford Motor Company Hosted by the Perception and


  1. HFES Webinar Series How Do You Know That Your Metrics Work? Fundamental Psychometric Approaches to Evaluating Metrics Presented by Fred Oswald, Rice University Moderated by Rebecca A. Grier, Ford Motor Company Hosted by the Perception and Performance Technical Group

  2. HFES Webinar Series • Began in 2011 • Organized by the Education & Training Committee • This webinar is organized and hosted by the HFES Perception and Performance Technical Group, http://hfes-pptg.org/. • See upcoming and past webinars at http://bit.ly/HFES_Webinars

  3. HFES Webinar FAQs 1. There are no CEUs for this webinar. 2. This webinar is being recorded. HFES will post links to the recording and presentation slides on the HFES Web site within 3-5 business days. Watch your e-mail for a message containing the links. 3. Listen over your speakers or via the telephone. If you are listening over your speakers, make sure your speaker volume is turned on in your operating system and your speakers are turned on. 4. All attendees are muted. Only the presenters can be heard. 5. At any time during the webinar, you can submit questions using the Q&A panel. The moderator will read the questions following the last presentation. 6. Trouble navigating in Zoom? Type a question into Chat. HFES staff will attempt to help. 7. HFES cannot resolve technical issues related to the webinar service. If you have trouble connecting or hearing the audio, click the “Support” link at www.zoom.us.

  4. About the Presenters Presenter Fred Oswald, PhD, is a professor in the Department of Psychology at Rice University. An organiza:onal psychologist, he addresses issues pertaining to personnel selec:on, college admission, military selec:on and classifica:on, and school-to-work transi:on. Oswald publishes sta:s:cal and methodological research in the areas of big data, meta-analysis, measure development, and psychometrics. He is an Associate Editor of Journal of Management, Psychological Methods, Advances in Methods and Prac9ce in Psychological Science, and Journal of Research in Personality. Fred received his MA and PhD in industrial-organiza:onal psychology from the University of Minnesota in 1999. Moderator Rebecca A. Grier, PhD, is a human factors scien:st at Ford Motor Company researching human interac:on with highly automated vehicles. In addi:on, she is secretary/treasurer of the HFES Percep:on and Performance Technical Group and chair of the Society of Automo:ve Engineers Taskforce on Iden:fying Automated Driving Systems - Dedicated Vehicles (ADS-DV) User Issues for Persons with Disabili:es. Rebecca received her MA and PhD in human factors/ experimental psychology from the University of Cincinna: and a BS With Honors in psychology from Loyola University, Chicago.

  5. animalmascots.com/01-00887/German-Shepard-Mascot-Costume How do you know that your metrics work? Fundamental questions about metrics Fred Oswald Rice University HFES Webinar Series April 12, 2018

  6. Outline: Questions Addressed Q0: Context of measurement? Q1: Develop a new measure? Q2: How to develop good items? Q3: Format of the measure? Q4: Evidence for reliability? Q5: Practical analysis tips?

  7. Q0: Context of measurement? Purposes • Evaluative (e.g., system comparison, individual differences) • Developmental (e.g., training evaluation) • Managerial decision-making (e.g., compensate, promote, transfer, terminate)

  8. Q0: Context of measurement? Content • General ßà Specific/Contextualized • Multiple ßà Single measures • Strong ßà Weak or “ subtle ” indicators Form • Many items ßà Few items • Self-report ßà “ Other ” report • Traditional ßà Innovative

  9. Q0: Context of measurement? Broad Contexts • Academic vs. organizational • IRB vs. organizational climate for surveying • Perceptions of fairness, relevance (from all stakeholders, including those of the test-taker) • Legal concerns

  10. Q1: Develop a new measure? No à Use an existing measure when • there is a strong theoretical basis • past empirical research demonstrates reliability and validity • you are not interested in a measure- development study (do not toss an ad hoc measure into a study) …

  11. Q1: Develop a new measure? Yes à Develop a new measure when • access to existing measures is limited (expensive, proprietary) • there is room for improvement (improved theory, aligning the measure with the intended purpose, increased sensitivity to the test-taker perspective, updating language) • test security is of concern ( “ freshening ” item pools, previous test compromises) • there is limited testing time …

  12. A Common Context: Limited Testing Time Problem : Too many constructs and not enough time, resources … or test-taker patience Reasons : Theories get complex; organizations place high demand on measures/data to answer many practical organizational questions Solutions : Reduce constructs to “ essential ” ones? Abandon use of multiple scales for a construct? Shorten measures?

  13. Q2: How to develop good items? Good measure development – and therefore good results – requires sound investment. • Expertise (substantive researchers, SMEs, psychometricians, sensitivity review) • Development process (item generation, refinement, translation/backtranslation) • Research/evidence (reliability, validity, low adverse impact, generalizability)

  14. Q2: How to develop good items? Item content can be evaluated for relevancy, deficiency and contamination; however, these three characteristics can also be psychological phenomena (e.g., did the test-taker forget or get confused by the item content?).

  15. Q2: How to develop good items? Appropriate content sampling from a construct domain is a necessary condition for obtaining interpretable reliability evidence for a set of items. High reliability coefficient does not ensure adequate content sampling: collections of items can covary due to shared contaminants or shared deficiencies.

  16. Q2: How to develop good items? job job construct construct satisfaction satisfaction items … item 1 item 2 item 3 item k • Items sample different aspects of the theoretical construct. • e.g, satisfaction with: autonomy, salary, job variety, management, coworkers … . • Controlled heterogeneity entails varying these aspects across items to triangulate on the psychological construct. • Varying items allows for distinguishing item content that is item-specific vs. construct-relevant.

  17. Q2: How to develop good items? workload construct (task load) items … item 1 item 2 item 3 item k Another construct: Workload • Identified facets (aspects) under its construct umbrella: NASA-TLX: Mental Demand, Physical Demand, Temporal Demand, Performance, Frustration, Effort • Controlled heterogeneity: create items reflecting each facet • Given enough items, a facet can become a reliable scale on its own (e.g., high alpha, strong single factor).

  18. Q2: How to develop good items? Who generates items? • SMEs: domain-specific experts/theorists; job analysts; job incumbents; researchers themselves relying on past theories and measures • Item categorization process How many people are needed to generate items? • Generate items (from SMEs, research literature) until themes (and possibly content) are redundant. Often need fewer SMEs than one might think.

  19. Q2: How to develop good items? What items are appropriate given the measurement goals? e.g., • Knowledge: easier items (minimal competence); difficult items (certify professionals); full range (accurately measure people ’ s knowledge across the range) • Personality: screen out extremes (e.g., antisocial) vs. assess normal personality(e.g., agreeableness) • Adaptive items (regardless of domain): initial item is in the “ middle ” of the construct continuum; subsequent items are tailored to test-takers ’ past responses (reliable items consistent with one ’ s true score improves estimation, reducing test time)

  20. Q2: How to develop good items? Items developed will eventually be refined, based on psychometric analysis (item-remainder correlations; CFA factor loadings).

  21. Q3: Format of the measure? Instructions • Often, instructions are written at too high of a grade level (5 grade levels too high for patient discharge interview; Spandorfer et al., 1993). • Very few people read them (Novick & Ward, 2006), though novices who read them will improve (Catrambone, 1990). • Detailed instructions about providing “ objective ” information (BARS, time spans, frequencies) often do not change the subjective response process. • General suggestion: Assume that test-takers will ignore (or skim) instructions and proceed accordingly! Novel formats requiring instructions demand pilot testing to ensure comprehension and quality responding.

  22. Q3: Format of the measure? Look and feel? • Grammar and syntax matter, not just for understandability and better data but for credibility. • Clear, readable font (this is HFES! J ). • Intuitive method of responding. • For web-based measures, check browser types and screen resolutions. • Minimize drudgery; maximize simplicity and readability. Keep all cognitive burdens to a necessary minimum (see Dillman’s Tailored Design Method ; Dillman, 2014)

Recommend


More recommend