using secondary information
play

Using Secondary Information to inform Evidence-Based Discovery - PowerPoint PPT Presentation

Using Secondary Information to inform Evidence-Based Discovery Catherine (Cathy) Blake Associate Director - Center for Informatics Research in Science and Scholarship (CIRSS) Associate Professor - Graduate School of Library and Information


  1. Using Secondary Information to inform Evidence-Based Discovery Catherine (Cathy) Blake Associate Director - Center for Informatics Research in Science and Scholarship (CIRSS) Associate Professor - Graduate School of Library and Information Science with courtesy appointments in Computer Science and Medical Information Science University of Illinois at Urbana Champaign clblake@illinois.edu 1

  2. Motivation • Relentless increase in electronic text – Life Sciences MEDLINE Articles/Year 1000 • 22 million citations Thousands 750 • 5,200 journals • 12,000 new articles each week 500 – Chemistry 250 • > 110,000 articles in 1 year 0 1950 1970 1990 2010 • Consequences Year – Hundreds of thousands of relevant articles – Implicit connections between literature go unnoticed We need to shift from Retrieval to Synthesis 2

  3. Scientists as a User Population Medical Public Health Reliability of Hypothesis Smoking and palpatory impotence projection procedures Analysis Qualitative Quantitative design Prospective My data Retrospective collection Interviews Interviews methods Artifacts Observations 3

  4. Guesswork guided Manual Synthesis by scientifically trained intuition Rescher (1978) Hypothesis Context MEDLINE Projection Information Embase Corpus Select Retrieval Extraction Verification Verify Analysis Extract Analyze Facts Collaboration Iteration

  5. Information Synthesis Synthetic Hypothesis Context MEDLINE Estimate Projection Information Embase Corpus Retrieval Extraction Verification Analysis Facts Collaboration Iteration

  6. Meta-Analysis vs. Information Synthesis • Traditional analysis Systematic Review – same study design – medicine = RCT – epidemiology = cohort • Information Synthesis Information Synthesis Information Synthesis – any study that includes required information – use a synthetic estimate for missing Key Primary info. Entire information External Secondary study database information

  7. Using a Synthetic Estimate 2 1 What are people with What are people in a similar Breast Cancer exposed to? population exposed to? Facts for each study • number of patients Codebook • age of patients • question asked Database of Studies with • geographic location • age, gender risk factors Breast Cancer • risk-factor exposure … • % responses BRFSS patients 3 Are these rates significantly different?

  8. METIS Information Extractor Semantic grammar based on words, numbers, and semantic types in the Unified Medical Language System (UMLS) {term;’age’} {term:’of’} {number;10<n2<110} {term;’to’}{number;10<n2<110} The age of breast cancer subjects ranged between 20 to 64 years old. {semantic type: neoplastic process, or disease} Information extracted :  risk factor exposure (tobacco and alcohol )  gender  age (min, max, mean)  start and end dates  number of subjects with medical condition  geographical location 9

  9. METIS Info Extractor Recall Prec. Recall Prec. (1) Number of subjects 0.65 0.90 0.53 0.95 (2) Tobacco Use Table Rows 0.92 0.88 0.98 0.87 Table Column 0.82 0.82 0.47 0.47 (3) Age Minimum 0.90 0.90 0.70 0.70 Maximum 1.00 1.00 0.80 0.80 Mean 0.50 0.50 0.60 0.60 (4) Location 0.83 0.83 0.71 0.71 (5) Timeframe Start Year 0.90 0.90 0.70 0.70 End Year 1.00 1.00 0.60 0.60 Average 0.84 0.86 0.68 0.71

  10. Synthetic Estimate Evaluation 1 Actual Estimated 0.8 Control Rate Tobacco 0.6 0.4 Consumption 0.2 0 1 2 3 4 Average Article Identifier Actual 1 Estimated 0.8 Control Rate 0.6 Alcohol 0.4 Consumption 0.2 0 1 2 3 4 Average Article Identifier

  11. Findings thus far … • To what extent can information synthesis tasks be automated? – METIS Info extractor: ~60-70% precision and recall – Synthetic estimate is close to values in the traditional studies • How do effect-sizes compare with a traditional meta-analysis ? – Similar effect-sizes – More work required to explore publication bias • Could this be used to detect risk factors sooner ? – risk factors are reported as secondary information before primary information • How much effort would this save ? – Given full text : 31 years 13

  12. Acknowledgements • Using Scientific Text to Identify Breast Cancer Risk-Factors – California Breast Cancer Research program • Towards Evidence-Based Discovery (NSF) – This material is based upon work supported by the National Science Foundation under Grant No. (1115774). Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the NSF. • Sociotechnical Data Analytics (IMLS) – This project is made possible by a grant from the U.S. Institute of Museum and Library Services (IMLS), Laura Bush 21st Century Librarian Program Grant Number RE-05-12-0054-12 • Thanks to user groups, annotators and academic mentors – Particularly to Dr. Adams, Dr. Tengs, Dr. Catherine Carpenter, Dr. Wanda Pratt, Nora Williams and Craig Evans

  13. Questions and comments most welcome Cathy Blake clblake@illinois.edu

Recommend


More recommend