Interactive Knowledge Capture Yolanda Gil Director, Knowledge Technologies Associate Division Director for Research Research Professor, Computer Science Intelligent Systems Division Information Sciences Institute University of Southern California http://www.isi.edu/~gil USC INFORMATION SCIENCES INSTITUTE Yolanda Gil 1
Knowledge Technologies at USC/ISI: Major Threads USC INFORMATION SCIENCES INSTITUTE Yolanda Gil 2
Semantic Workflows in Wings (http://wings.isi.edu) [Gil et al JETAI ʼ 11; Gil et al IEEE-IS ʼ 11; Gil et al e-Science ʼ 09; Kim et al JWS ʼ 08] Unique capability to reason about application tasks and data, has uses in science and intel Semantic Example: Workflow for Pixel Intensity Quantification descriptions of of brain imagery [Kumar datasets (RDF, OWL) et al 10; Kurc et al 09] Metadata • Compact workflow properties template (left) Semantic Automatically generated constraints executable workflow for How 2560x2400 pixels (right) • computations transform the data Automatic propagation of constraints Assistance • - Parameters - Algorithms Generation • Validation • Execution in grids or clouds USC INFORMATION SCIENCES INSTITUTE Yolanda Gil 3
LinkedDataLens: Extracting Networks of Interest from 25B+ Linked Data Cloud [Groth and Gil ʼ 11] A growing large structured source of data that can be exploited in many application areas Web of Data: 25B RDF triples (statements) with 395M links from 203 data sets [Berners-Lee et al 09] Community-created through extraction • from web sources US Semators who share – News sources, events, geospatial Alma Mater information, bioinformatics, academic Nodes=136 Edges=340 LinkedDataLens: a system to density=0.037 extract networks of interest AvgClusterCoeff=0.620 3 Framework accessible over the web, no • isConnected=False need to install any software NConnectedComp=17 Workflows extract RDF triples through • queries, create network, and use social network analysis algorithms to extract Pharmaceutical Our group interesting statistics companies who contributed make the same – Size, centrality, connected drug BibBase components, etc. Extracted networks can be integrated Nodes=609 • with other existing networks and used Edges=3032 Honorary density=0.016 by other applications AvgClusterCoeff=0.58 Mention – Networks about people, places, 4 Triplification events, etc isConnected=False Challenge 2010 NbConnectedComp=1 Eg, Pharmaceutical firms doing clinical 7 trials in California for same drug USC INFORMATION SCIENCES INSTITUTE Yolanda Gil 4
Social Knowledge Collection for Communities of Interest (http://www.isi.edu/ikcap/shortipedia) [Vrandecic et al 2010; Vrandecic et al 2011] Unique capability for social collection of structured content, with documented provenance Community- Why: Social content collection tools are either created lightly structured or too rigid structured content Wikis and collaborative tools provide community • repositories, but are not structured or aggregated in a searchable manner Use of pre-defined schema/ontology that • community fills out with contributions Semantic wiki is a framework that enables contributors to define organic characterizations Emerging semantics that lead to emerging unified models lead to dynamically Users define vocabulary/ontology • aggregated – Voluntarily adopt definitions by others content Formal queries can retrieve structured content • Can use proactive normalization techniques to • encourage consensus where possible Provenance-aware semantic wiki Alternative views can be accommodated • 3 rd Place Eg, Android is “jailbreakable” and “bricked” by RF Semantic software Web Structured provenance records • Challenge 2010 – Document sources and evidence – Can filter query results according to provenance Eg,Software that bricks phones according to NIST USC INFORMATION SCIENCES INSTITUTE Yolanda Gil 5
Unifying Provenance Models for Trusted Systems [Groth et al ʼ 11; Sahoo et al ʼ 11; Moreau et al ʼ 10] Unifying provenance models would enable the computation of trust metrics for content Why: Provenance is rarely captured, Chaired W3C Provenance hampering trust assessment Group (2009-2010): Mappings across 10 When captured, it is diverse in its nature and • Emerging underlying implementation popular provenance standards (Open - Document-based: where information was vocabularies Provenance found/extracted (e.g., NYT) Model) Use cases and - Attribution-based: who created the requirements information Charter for a Working - Process-based: how information was derived Group with 17 core from documents or datasets concepts Unifying models of provenance Can assess trust in content (existing trust • models are attribution-based only) Builds on open standards Semantic Web standards (OWL, RDF) • Open Provenance Model • Dublin Core • Unifying provenance infrastructure Mappings across existing provenance Access provenance records across • vocabularies heterogeneous systems to assess trust Integrate data taking provenance records into • account USC INFORMATION SCIENCES INSTITUTE Yolanda Gil 6
Recommend
More recommend