Tabular Data Extraction Epidemiology Table Classification and - PowerPoint PPT Presentation

Dec 03, 2022 •197 likes •360 views

Tabular Data Extraction Epidemiology Table Classification and Factor Alignment Garrick Sherman Last semester... Worked with Dr. Andrew Leakey Plant biologist The effects of carbon dioxide on photosynthesis Data is locked

Tabular Data Extraction Epidemiology Table Classification and Factor Alignment Garrick Sherman
Last semester... ● Worked with Dr. Andrew Leakey ○ Plant biologist ○ The effects of carbon dioxide on photosynthesis ● Data is “locked away” ● Goals ○ Extract data from articles ○ Keep data associated with articles ○ Add structure to data
Last semester... ● Look for a set of search terms ● Parse HTML tables into CSV files ○ Also extract table captions ● Identify columns, “subtables,” and captions about the search terms
Last semester... ● Column-based ○ 53 columns extracted ○ Recall: 0.1130 or 0.6279 ○ Precision: 0.3774 or 0.5094 ● Subtables ○ 23 extracted ○ Recall: 0.1356 ○ Precision: 0.3158
This semester... ● Epidemiology journals ○ 11 high impact breast cancer journals ■ e.g. British Medical Journal, Cancer, International Journal of Breast Cancer, etc. ● Classify table as containing summary sample characteristics ● Align factors ○ e.g. “Marital Status” and “Married”
{"Age at diagnosis (years):"=> {"<40"=>["9 (146)", "5 (12)"], "40-49"=>["26 (437)", "29 (71)"], "50-59"=>["37 (631)", "39 (94)"], "≥60"=>["29 (500)", "27 (66)"]}, "Marital status:"=> {"Living with partner"=>["79 (180)"], "Living alone"=>["21 (48)"]}, "Metropolitan classification:"=> {"Metropolitan area"=>["59 (1001)", "49 (119)"], "Non-metropolitan area"=>["41 (711)", "51 (124)"]}, ….
This semester... ● Goal: ○ Automated metadata extraction ○ Faceted search ■ Find studies of related populations
Dataset ● First table ○ ~1,500 first tables ○ Train: 1,001 ○ Test: 497 ● NXML format ● Fresh codebase ○ But same table parsing approaches
Training ● Manual annotation ○ Classify based on first 10 lines (or more, if needed) and caption ○ Final tally: ■ 41.36% sample characteristics ■ 58.64% other ● Would certainly be improved with domain expertise
Classification ● Information gain ○ Tokens from factor and options
Classification ● Test results: ○ 177 predicted positive ■ Random sample of 50 ■ Precision: 85.71% ○ 300 predicted negative ■ Random sample of 50 ■ Precision: 76.00%
Factor Alignment ● Alignment approaches ○ Literal ○ Percentage-based ○ Name-inclusive ● Evaluation ○ Choose 10 randomly, calculate precision ○ Report average precision ○ Has some drawbacks
Factor Alignment ● #1 Histology (N = 20) ○ ■ Ductal,Lobular,Other Morphological type ○ Ductal,Lobular,Other,Unknown ■ Histological type ○ ■ Ductal,Lobular,Other,NA ● #2 Histological type ○ Ductal,Lobular,Ductulolobular,Medullary ■ ○ Histology Ductal,Lobular,Medullary ■
Factor Alignment ● Results ○ Literal: 0.9167 ○ Percentage: 0.8624 ○ Name-based: 0.9500
Conclusion ● Naive Bayes classifier works well because data is independent ● Simple methods of factor alignment are effective ● Automated approaches can help resolve table structure and contents ● Potential applications for faceted search

Recommend

Differential Privacy Tabular Data Li Xiong Outline Tabular data and histogram/range

CS573 Data Privacy and Security Differential Privacy Tabular Data Li Xiong Outline Tabular data and histogram/range queries Algorithms for low dimensional data Algorithms for high dimensional data Example: statistics/synthetic

759 views • 55 slides

uf: Minimizing the Coq Extraction TCB Eric Mullen , Stuart Pernsteiner, James Wilcox, Zachary

uf: Minimizing the Coq Extraction TCB Eric Mullen , Stuart Pernsteiner, James Wilcox, Zachary Tatlock, Dan Grossman 1 Extraction 2 Extraction K coq 2 Extraction K coq 2 Extraction K coq Extraction 2 Extraction K coq Extraction K

1.3k views • 106 slides

Locally tabular polymodal logics Ilya Shapirovsky Institute for Information Transmission Problems

Locally tabular polymodal logics Ilya Shapirovsky Institute for Information Transmission Problems of the Russian Academy of Sciences, Moscow June 30, 2017 Locally tabular (or locally finite ) logics A logic L is locally tabular if, for any

608 views • 46 slides

CENG 342 Digital Systems Tabular Minimization Larry Pyeatt SDSM&T Tabular Minimization

CENG 342 Digital Systems Tabular Minimization Larry Pyeatt SDSM&T Tabular Minimization Karnaugh Maps are good for up to six input variables, but cannot be extended beyond that. Karnaugh Maps are not easily implemented in a computer

447 views • 32 slides

CS573 Data Privacy and Security Differential Privacy tabular data and range queries Li Xiong

CS573 Data Privacy and Security Differential Privacy tabular data and range queries Li Xiong Outline Tabular data and histogram/range queries Algorithms for low dimensional data Algorithms for high dimensional data Example:

692 views • 57 slides

Mathematics 101: Tabular and Graphical Presentation of Data Olive R. Cawiding Department of

Mathematics 101: Tabular and Graphical Presentation of Data Olive R. Cawiding Department of Mathematics and Computer Science University of the Philippines Baguio Textual Presentation of Data Tabular Presentation of Data Graphical Presentation

672 views • 37 slides

Fast Mining of Massive Tabular Data via Approximate Distance Computations Graham Cormode, Piotr

Fast Mining of Massive Tabular Data via Approximate Distance Computations Graham Cormode, Piotr Indyk, Nick Koudas, S. Muthukrishnan Tabular Data Much data is stored in tables: Cellphone traffic IP traffic between source and

401 views • 20 slides

Soil Extraction Cell: An Alternative Soil Extraction Cell: An Alternative Method of Soil

Soil Extraction Cell: An Alternative Soil Extraction Cell: An Alternative Method of Soil Extraction for O Organics i Joe Boyd Environmental Express Charleston, SC , Various Extraction Techniques Various Extraction Techniques

475 views • 24 slides

Declarative Information Extraction Declarative Information Extraction Using Datalog Datalog with

Declarative Information Extraction Declarative Information Extraction Using Datalog Datalog with Embedded with Embedded Using Extraction Predicates Extraction Predicates Warren Shen, AnHai Doan, Jeffrey Naughton University of Wisconsin,

566 views • 25 slides

Data Mining l The Extraction of useful information from data l The automated extraction of hidden

Data Mining l The Extraction of useful information from data l The automated extraction of hidden predictive information from (large) databases l Business, Huge data bases, customer data, mine the data Also Medical, Genetic, Astronomy, etc. l

535 views • 32 slides

Data Interpolation and Extraction Using ArcGIS 10 Data Types GIS/Data Center | Email

Data Interpolation and Extraction Using ArcGIS 10 Data Types GIS/Data Center | Email gisdata@rice.edu | Lab (713) 348-2599 | library.rice.edu/gdc Data Interpolation and Extraction Data Types Vector Data Polygons GIS/Data Center | Email

344 views • 19 slides

Variability Extraction and Analysis Toolkit (VEXA) VEXA Introduction The Variability Extraction

Variability Extraction and Analysis Toolkit (VEXA) VEXA Introduction The Variability Extraction and Analysis (VEXA) toolkit is a collection of complementary procedures to help with many different tasks of variability extraction , feature analysis

269 views • 5 slides

3. Feature Extraction 3.1 Feature Extraction from Speech or other types of audio like music

3. Feature Extraction 3.1 Feature Extraction from Speech or other types of audio like music See Schukat-Talamazzini Chapter 3 2 Goal of Feature Extraction Capture essential information about speech Be robust against background

741 views • 55 slides

Automated Feature Extraction Automated Feature Extraction for Object Recognition for Object

Automated Feature Extraction Automated Feature Extraction for Object Recognition for Object Recognition I.Levner and V.Bulitko http://ircl.cs.ualberta.ca Outline Outline Motivation System Overview Feature Extraction Problem

601 views • 15 slides

WDPlus: Leveraging Wikidata to Link and Extend Tabular Data Daniel Garijo , Pedro Szekely

WDPlus: Leveraging Wikidata to Link and Extend Tabular Data Daniel Garijo , Pedro Szekely Information Sciences Institute and Department of Computer Science @dgarijov dgarijo@isi.edu Abundance of data sources in the Web Users of data face

601 views • 13 slides

Automatic Wrapper Generation and Data Extraction Kristina Lerman University of Southern

Automatic Wrapper Generation and Data Extraction Kristina Lerman University of Southern California August 25, 2010 University of Southern California 1 Automatic Data Extraction Data extraction with wrappers Users specifies the schema

765 views • 40 slides

Traceability in laboratory medicine: a driver of accurate results for patients Graham H Beastall

Traceability in laboratory medicine: a driver of accurate results for patients Graham H Beastall Joint Committee for Traceability in Laboratory Medicine gbeastall@googlemail.com Outline Laboratory medicine in healthcare Traceability in

394 views • 25 slides

Microarray Data Analysis of Adenocarcinoma Patients Survival Using ADC and K-Medians

Microarray Data Analysis of Adenocarcinoma Patients Survival Using ADC and K-Medians Clustering Wenting Zhou, Weichen Wu, Nathan Palmer, Emily Mower, Noah Daniels, Lenore Cowen, Anselm Blumer Tufts University http://camda.cs.tufts.edu

809 views • 37 slides

The MAJOR HI STOCOMPATI BI LI TY COMPLEX & ANTI GEN PRESENTATI ON MHC MHC - tightly

The MAJOR HI STOCOMPATI BI LI TY COMPLEX & ANTI GEN PRESENTATI ON MHC MHC - tightly linked cluster of genes w hose products are associated w ith intracellular recognition and self/ nonself discrim ination Major role in determ ining

714 views • 40 slides

Updates on transient elastography Victor de Ldinghen MD PhD CHU Bordeaux France Hong Kong

Updates on transient elastography Victor de Ldinghen MD PhD CHU Bordeaux France Hong Kong November 5th, 2017 Disclosures AbbVie Gilead BMS MSD Intercept Pharma Echosens Supersonic Imagine Mayoli

752 views • 45 slides

modulated by histone modifications modifications are catalyzed by enzymes alterations

Nuclear inheritance which is not based on differences in the DNA sequence (Holliday, 1994) chromatin structure influences gene expression modulated by histone modifications modifications are catalyzed by enzymes

599 views • 30 slides

IMP761 webcast slides Date & Time: March 26, 2019, 7:45 am Australian Eastern Daylight Time

IMP761 webcast slides Date & Time: March 26, 2019, 7:45 am Australian Eastern Daylight Time March 25, 2019, 4:45 pm US Eastern Daylight Time Register: Interested parties can register via a link to the webcast on the Companys website or

150 views • 13 slides

Inclusion NOW! Speaking Points (to use with PowerPoint) Introduction (slide 1) My name is

Inclusion NOW! Speaking Points (to use with PowerPoint) Introduction (slide 1) My name is (say name). I want to begin by thanking you all for allowing me to speak to you about inclusion. Transition Before we explore the life of many

225 views • 5 slides

Parenting with intellectual disability The Australian Perspective Catherine Wade, PhD

Parenting with intellectual disability The Australian Perspective Catherine Wade, PhD cwade@parentingrc.org.au The Association for Successful Parenting 2011 International Conference Overview of presentation 1. History and development of

598 views • 42 slides