Fast and Accurate Metadata Authoring Using Ontology-Based - PowerPoint PPT Presentation

Fast and Accurate Metadata Authoring Using Ontology-Based Recommendations S100 Martínez-Romero, M. , O’Connor, M. J., Shankar, R., Panahiazar, M., Willrett, D., Egyedi, A. L., Gevaert, O., Graybeal, J., Musen, M. A. Stanford University

What is metadata? • Data that describe data • Crucial for: • Finding experimental datasets online • Understanding how the experiments were performed • Reusing the data to perform new analyses AMIA 2017 | amia.org 2

AMIA 2017 | amia.org 3

Poor metadata age age [y] Age age [year] AGE age [years] `Age age in years age (after birth) age of patient age (in years) Age of patient age (y) age of subjects age (year) age(years) age (years) Age(years) Age (years) Age(yrs.) Age (Years) Age, year age (yr) age, years age (yr-old) age, yrs age (yrs) age.year Age (yrs) age_years AMIA 2017 | amia.org 4

Poor metadata An analysis of metadata from NCBI’s BioSample • 73% of “Boolean” values • nonsmoker, former-smoker • 26% of “integer” values • JM52, UVPgt59.4, pig • 68% of ontology terms • presumed normal, wild_type Gonçalves, R. S. et al. (2017). Metadata in the BioSample Online Repository are Impaired by Numerous Anomalies . SemSci 2017 Workshop, co-located with ISWC 2017. Vienna, Austria. AMIA 2017 | amia.org 5

Metadata authoring is hard [Your presentation on this and next slides] AMIA 2017 | amia.org 6

Metadata template • A computational platform for metadata management • Goal: Overcome the impediments to creating high-quality metadata Metadata template AMIA 2017 | amia.org 7

DESIGN TEMPLATE FILL IN METADATA SUBMIT METADATA Template authors Metadata authors (e.g., standards (e.g., scientists) LINCS committees) Public Databases https://cedar.metadatacenter.org/templates/edit/https://repo.metadatacenter.org/templates/ab105771-564e-42a1-9be4-5a63891… https://cedar.metadatacenter.org/instances/edit/https://repo.metadatacenter.org/template-instances/d4f1059e-8e27-4166-902f-… A sample study Acute stress disorder Stanford University John Doe Longitudinal template metadata Template Designer Metadata Editor Metadata Repository AMIA 2017 | amia.org 8

DESIGN TEMPLATE FILL IN METADATA SUBMIT METADATA Template authors Metadata authors (e.g., standards (e.g., scientists) LINCS committees) Public Databases https://cedar.metadatacenter.org/templates/edit/https://repo.metadatacenter.org/templates/ab105771-564e-42a1-9be4-5a63891… https://cedar.metadatacenter.org/instances/edit/https://repo.metadatacenter.org/template-instances/d4f1059e-8e27-4166-902f-… A sample study Acute stress disorder Stanford University John Doe Longitudinal template metadata Template Designer Metadata Editor Metadata Repository We developed a metadata recommendation system AMIA 2017 | amia.org 9

Metadata recommendation system store 1 https://cedar.metadatacenter.org/instances/edit/https://repo.metadatacenter.org/template-instances/d4f1059e-8e27-4166-902f-… metadata A sample study Acute stress disorder Stanford University John Doe Longitudinal Metadata Editor Metadata Repository 3 2 generate analyze suggestions existing metadata Metadata Recommender AMIA 2017 | amia.org 10

Filling in a CEDAR template AMIA 2017 | amia.org 11

Evaluation workflow CEDAR BioSample 20% Test template dataset BioSample Gene 80% template Training Expression instances dataset (1) metadata ( ≈ 35K) Preprocessing (3) Training and Ingestion (4) Testing & (2) Analysis Evaluation CEDAR Metadata Semantic results Repository annotation Metadata Recommender Annotated BioSample 80% Training template dataset instances ( ≈ 35K) 20% Test dataset AMIA 2017 | amia.org 16

Evaluation workflow CEDAR BioSample 20% Test template dataset • For “disease”, ”sex”, BioSample and “tissue” Gene 80% template Training Expression • Top 3 suggestions instances dataset (1) metadata ( ≈ 35K) Preprocessing (3) Training and Ingestion (4) Testing & (2) Analysis Evaluation CEDAR Metadata Semantic results Repository annotation Metadata Recommender Annotated BioSample 80% Training template dataset instances ( ≈ 35K) 20% Test dataset AMIA 2017 | amia.org 19

Testing & Analysis Compared suggested vs. expected metadata Measure: Reciprocal Rank (RR) . Appropriate to judge systems that return a ranking of suggestions when there is only a relevant result !"#$%&'#() !(+, (!!) = 1 1 Position of the expected result in the ranking of suggestions AMIA 2017 | amia.org 20

How is the RR calculated? Reciprocal Rank Expected Suggested K (RR) 1) asthma asthma 2) lung cancer 1 1/1 3) respiratory disease 1) myeloma lymphoma 2) lymphoma 2 1/2 3) acute myeloid leukemia 1) respiratory disease lung cancer 2) asthma 3 1/3 3) lung cancer Mean Reciprocal Rank (MRR) = (1/1 + 1/2 + 1/3) / 3 = 0.61 AMIA 2017 | amia.org 21

Results 1 On average: Mean Reciprocal Rank (MRR) 0.9 • Metadata 0.8 Recommender = 0.77 0.7 • Baseline 0.6 (majority vote) = 0.31 0.5 0.4 0.3 Better performance with 0.2 respect to the baseline for: 0.1 • Fields with many 0 different values disease tissue sex • Templates with many Baseline Metadata Recommender correlated fields AMIA 2017 | amia.org 22

Summary • We developed a metadata recommendation system as part of an end-to-end system for metadata management called CEDAR • Generates context-sensitive suggestions in real time • Incorporates both ontology-based and free-text suggestions AMIA 2017 | amia.org 23

Summary Our approach makes it easier for scientists to generate high-quality metadata for experimental datasets • So that the datasets can be found, interpreted, and reused • Essential to ensure scientific reproducibility AMIA 2017 | amia.org 24

facebook.com/metadatacenter @metadatacenter Channel: Metadata Center github.com/metadatacenter http://cedar.metadatacenter.org AMIA 2017 | amia.org 25

Fast and Accurate Metadata Authoring Using Ontology-Based - PowerPoint PPT Presentation

Fast and Accurate Metadata Authoring Using Ontology-Based Recommendations S100 Martnez-Romero, M. , OConnor, M. J., Shankar, R., Panahiazar, M., Willrett, D., Egyedi, A. L., Gevaert, O., Graybeal, J., Musen, M. A. Stanford University What

Authoring Support with Authoring Support with acrolinx IQ acrolinx - the company

Rebecca Gatward Introduction Organisations have their own unique approaches to authoring g q

loom p W eb 3 .0 Content Authoring Linked Data Authoring for Non-Experts Ralf Heese, Markus

A Common Criteria A Common Criteria Authoring Environment Authoring Environment * Supporting

UNSD metadata template / SDMX Metadata Structure Definition Elena De Jess, UNSD Standardized

Data driven Ontology Alignment Data driven Ontology Alignment Nigam Shah nigam@stanford.edu

Drive-Thru: Drive-Thru: Fast, Accurate Evaluation of Fast, Accurate Evaluation of Storage Power

Treating metadata in agriculture Treating metadata in agriculture using Semantic MediaWiki using

TAKING DATA ON FORM TAKING DATA ON FORM- -WOUND WOUND MOTORS MOTORS By : Manuel Manny

Evaluating wiki-enhanced ontology authoring Marco Rospocher Fondazione Bruno Kessler, Data and

DUNE Data Model Meeting: Metadata Metadata Needs And Considerations Steven Timm The following

Metadata In ArcGIS 10.0 Jason Cupp Whats New In ArcGIS 10.0 New Metadata Editor for

Hitachi NEXT 2018 Automating Onboarding Data with Metadata Injection Contents Page 2:

From SDTM to displays, through ADaM & Analyses Results Metadata, a flight on board METADATA

Batch Metadata Editing in DSpace 1.6+ Maureen P. Walsh, The Ohio State University Libraries

Summary Report for Ontology Metadata task group of the Vocabulary and Semantic Services Interest

Program Synthesis Tikhon Jelvis (tikhon@jelv.is) February 24, 2014 Synthesis Find a program

Experiences with Generic R&D Performed at Fermilab, Particularly at the TallBo Facility

CS261 Data Structures Course Introduction Class Description General-purpose data structures

Take a small REST Simple approaches for REST in smalltalk Norbert Hartl 2denker What we do...

joint distributions Often, several random variables are simultaneously observed X = height and Y

Towards a Model Fusion: Case of . . . Numerical Example: . . . Fast, Practical Alternative

Radial Conformal Field Theory Joint work with Nikolai G. Makarov Nam-Gyu Kang Department of

Zecha hariah riah 9:9 :9 (G (GNT NT) ) Rejoice, rejoice, people of Zion! Shout for joy,

Fast and Accurate Metadata Authoring Using Ontology-Based - PowerPoint PPT Presentation

Fast and Accurate Metadata Authoring Using Ontology-Based Recommendations S100 Martnez-Romero, M. , OConnor, M. J., Shankar, R., Panahiazar, M., Willrett, D., Egyedi, A. L., Gevaert, O., Graybeal, J., Musen, M. A. Stanford University What

Authoring Support with Authoring Support with acrolinx IQ acrolinx - the company

Rebecca Gatward Introduction Organisations have their own unique approaches to authoring g q

loom p W eb 3 .0 Content Authoring Linked Data Authoring for Non-Experts Ralf Heese, Markus

A Common Criteria A Common Criteria Authoring Environment Authoring Environment * Supporting

UNSD metadata template / SDMX Metadata Structure Definition Elena De Jess, UNSD Standardized

Data driven Ontology Alignment Data driven Ontology Alignment Nigam Shah nigam@stanford.edu

Drive-Thru: Drive-Thru: Fast, Accurate Evaluation of Fast, Accurate Evaluation of Storage Power

Treating metadata in agriculture Treating metadata in agriculture using Semantic MediaWiki using

TAKING DATA ON FORM TAKING DATA ON FORM- -WOUND WOUND MOTORS MOTORS By : Manuel Manny

Evaluating wiki-enhanced ontology authoring Marco Rospocher Fondazione Bruno Kessler, Data and

DUNE Data Model Meeting: Metadata Metadata Needs And Considerations Steven Timm The following

Metadata In ArcGIS 10.0 Jason Cupp Whats New In ArcGIS 10.0 New Metadata Editor for

Hitachi NEXT 2018 Automating Onboarding Data with Metadata Injection Contents Page 2:

From SDTM to displays, through ADaM &amp; Analyses Results Metadata, a flight on board METADATA

Batch Metadata Editing in DSpace 1.6+ Maureen P. Walsh, The Ohio State University Libraries

Summary Report for Ontology Metadata task group of the Vocabulary and Semantic Services Interest

Program Synthesis Tikhon Jelvis (tikhon@jelv.is) February 24, 2014 Synthesis Find a program

Experiences with Generic R&amp;D Performed at Fermilab, Particularly at the TallBo Facility

CS261 Data Structures Course Introduction Class Description General-purpose data structures

Take a small REST Simple approaches for REST in smalltalk Norbert Hartl 2denker What we do...

joint distributions Often, several random variables are simultaneously observed X = height and Y

Towards a Model Fusion: Case of . . . Numerical Example: . . . Fast, Practical Alternative

Radial Conformal Field Theory Joint work with Nikolai G. Makarov Nam-Gyu Kang Department of

Zecha hariah riah 9:9 :9 (G (GNT NT) ) Rejoice, rejoice, people of Zion! Shout for joy,

From SDTM to displays, through ADaM & Analyses Results Metadata, a flight on board METADATA

Experiences with Generic R&D Performed at Fermilab, Particularly at the TallBo Facility