a linked data representation for summary statistics and
play

A Linked Data Representation for Summary Statistics and Grouping - PowerPoint PPT Presentation

A Linked Data Representation for Summary Statistics and Grouping Criteria RPI IDEA/Tetherless World Constellation James P. McCusker, Michel Dumontier, Shruthi Chari, Joanne S. Luciano, and Deborah L. McGuinness Class: G(case:TCGA-BRCA)


  1. A Linked Data Representation for Summary Statistics and Grouping Criteria RPI IDEA/Tetherless World Constellation James P. McCusker, Michel Dumontier, Shruthi Chari, Joanne S. Luciano, and Deborah L. McGuinness

  2. Class: G(case:TCGA-BRCA) SubClassOf: sio:human and sio:'has role' some (sio:'subject role’ Summary statistics and sio:'in relation to' value case:TCGA-BRCA) across groups can be has attribute a G(case:TCGA-BRCA) count formalized as linked data 1098 has value using owl:Class -based sets, has attribute a a mean age expressing aggregate values 1098 has value 1098 has value as attributes of those a maximal value has unit day classes. 32872 has value a minimal value 2009 has value 2 A Linked Data Representation for Summary Statistics and Grouping Criteria 10/28/19

  3. Example Data Schema – Genomic Data Commons Clinical Annotations 3 A Linked Data Representation for Summary Statistics and Grouping Criteria 10/28/19

  4. Defining Grouping Criteria (starting with Calvanese et al. 2008) Class: GDC_Subject OWL EquivalentTo: sio:human and sio:'has role' some (sio:'subject role' and sio:'in relation to' some sio:investigation) select ?GDC_Subject WHERE { ?GDC_Subject a sio:SIO_000485; # human SPARQL sio:SIO_000228 [ # has role a sio:SIO_000883; # study subject sio:SIO_000668 [ # in relation to a sio:SIO_000747 # investigation ] ]. } 4 A Linked Data Representation for Summary Statistics and Grouping Criteria 10/28/19

  5. Defining Grouping Criteria (starting with Calvanese et al. 2008) q (¯ x, α (¯ y )) ← φ where Class: ¯ x SubClassOf: φ !(# $) We will reserve for later. 5 A Linked Data Representation for Summary Statistics and Grouping Criteria 10/28/19

  6. ̅ Grouping Criteria as OWL Templates Class: ¯ x " = $(& ! , … , & " ) SubClassOf: φ Class: G ( g 1 , . . . , g n ) SubClassOf: φ Class: G(?x) SubClassOf: sio:human and sio:'has role' some (sio:'subject role' and sio:'in relation to' value ?x) 6 A Linked Data Representation for Summary Statistics and Grouping Criteria 10/28/19

  7. Grouping Criteria as a SPARQL query select ?GDC_Subject ?x where { ?GDC_Subject a sio:SIO_000485; # human sio:SIO_000228 [ # has role a sio:SIO_000883; # study subject sio:SIO_000668 ?x # in relation to ]. ?x a sio:SIO_000747 # investigation } Class: G(?x) SubClassOf: sio:human and sio:'has role' some (sio:'subject role' and sio:'in relation to' value ?x) 7 A Linked Data Representation for Summary Statistics and Grouping Criteria 10/28/19

  8. Grouped Criteria as expanded classes Class: G(case:FM-AD) SubClassOf: sio:human and sio:'has role' some (sio:'subject role’ and sio:'in relation to' value case:FM-AD) Class: G(case:TARGET-NBL) SubClassOf: sio:human and sio:'has role' some (sio:'subject role’ and sio:'in relation to' value case:TARGET-NBL) Class: G(?x) ... SubClassOf: sio:human and sio:'has role' some (sio:'subject role' and sio:'in relation to' value ?x) 8 A Linked Data Representation for Summary Statistics and Grouping Criteria 10/28/19

  9. graph = IsomorphicGraph() owl:Classes with property graph = source_graph.query(””” describe ?restr where { restriction definitions can ?G owl:equivalentClass|rdfs:subClassOf ?restr. be assigned URIs }”””, initBindings={“G”:my.Class} ) automatically based on digest = graph.graph_digest() the graph digest of that property restriction using source_graph.add(( RGDA1 or similar graph my.Class, owl:equivalentClass, digest algorithms. digest_prefix[digest] )) 9 A Linked Data Representation for Summary Statistics and Grouping Criteria 10/28/19

  10. WARNING! We will be discussing the use of OWL 2 puns. 10 A Linked Data Representation for Summary Statistics and Grouping Criteria 10/28/19

  11. TL;DR for OWL 2 Punning : Statements asserted about a resource as an OWL Class cannot be used to draw inferences about the resource as an OWL Individual or vice-versa. 11 A Linked Data Representation for Summary Statistics and Grouping Criteria 10/28/19

  12. has part object has attribute Expressing aggregate quality has attribute capability is participant in values relies on the measurement has attribute role value Semanticscience n i d e z i a l e has part r s Integated Ontology, or i process is located in an expressive Space is contained in information is part of entity equivalent. entity has attribute content entity exists at Information has value measured at time literal Time measurement 12 A Linked Data Representation for Summary Statistics and Grouping Criteria 10/28/19

  13. s has attribute a s lit p p lit has value First, if needed we reify non-SIO statements as s attributes. has attribute a s res p res p 13 A Linked Data Representation for Summary Statistics and Grouping Criteria 10/28/19

  14. ∀ G, α (¯ y ) ∃ A ∈ α , Y ∈ ¯ ya yattr ( G, Y ) ∧ attr ( Y, A ) ∧ val ( A, α (¯ ¯ y )) Finally, here’s what we has attribute ∈ ¯ ya a G Y do with . G, α (¯ y ) ∃ has attribute ∈ α , Y a A has value A, α (¯ y )) 14 A Linked Data Representation for Summary Statistics and Grouping Criteria 10/28/19

  15. Class: G(case:TCGA-BRCA) SubClassOf: sio:human and sio:'has role' some (sio:'subject role’ and sio:'in relation to' value case:TCGA-BRCA) has attribute a G(case:TCGA-BRCA) count Here’s what it looks like 1098 has value in practice. has attribute a a mean age 1098 has value 1098 has value a maximal value has unit day 32872 has value a minimal value 2009 has value 15 A Linked Data Representation for Summary Statistics and Grouping Criteria 10/28/19

  16. Implementation in Jupyter Notebook Adenocarcinoma Carcinoma Squamous Cell Carcinoma Ductal Breast Carcinoma Endometrioid Adenocarcinoma Glioblastoma Serous Cystadenocarcinoma § We can query summary statistics Gastric Papillary Adenocarcinoma Melanoma from an RDF graph and put the Diagnosis Non-Small Cell Carcinoma results into it’s own graph. Di ff use Large B-Cell Lymphoma Acinar Cell Carcinoma § We query the statistics out and Neuroendocrine Carcinoma Small Cell Carcinoma display them using Vega-Lite. Papillary Carcinoma Mucinous Adenocarcinoma Thymoma Adult Cholangiocarcinoma Cervical Adenocarcinoma Acute Myeloid Leukemia Not Otherwis… 0 1,000 2,000 3,000 4,000 5,000 # of cases 16 A Linked Data Representation for Summary Statistics and Grouping Criteria 10/28/19

  17. Many thanks to: Coauthors: Deborah, Michel, Joanne, and Shruthi Others whom I’ve bothered about this: John Erickson, Patrice Seyed, and James Michaelis.

Recommend


More recommend