interactive knowledge capture
play

Interactive Knowledge Capture Yolanda Gil Director, Knowledge - PowerPoint PPT Presentation

Interactive Knowledge Capture Yolanda Gil Director, Knowledge Technologies Associate Division Director for Research Research Professor, Computer Science Intelligent Systems Division Information Sciences Institute University of Southern


  1. Interactive Knowledge Capture Yolanda Gil Director, Knowledge Technologies Associate Division Director for Research Research Professor, Computer Science Intelligent Systems Division Information Sciences Institute University of Southern California http://www.isi.edu/~gil USC INFORMATION SCIENCES INSTITUTE Yolanda Gil 1

  2. Knowledge Technologies at USC/ISI: Major Threads USC INFORMATION SCIENCES INSTITUTE Yolanda Gil 2

  3. Semantic Workflows in Wings (http://wings.isi.edu) [Gil et al JETAI ʼ 11; Gil et al IEEE-IS ʼ 11; Gil et al e-Science ʼ 09; Kim et al JWS ʼ 08] Unique capability to reason about application tasks and data, has uses in science and intel  Semantic Example: Workflow for Pixel  Intensity Quantification descriptions of of brain imagery [Kumar datasets (RDF, OWL) et al 10; Kurc et al 09] Metadata • Compact workflow  properties template (left) Semantic  Automatically generated  constraints executable workflow for How 2560x2400 pixels (right) • computations transform the data Automatic  propagation of constraints Assistance • - Parameters - Algorithms Generation • Validation • Execution in  grids or clouds USC INFORMATION SCIENCES INSTITUTE Yolanda Gil 3

  4. LinkedDataLens: Extracting Networks of Interest from 25B+ Linked Data Cloud [Groth and Gil ʼ 11] A growing large structured source of data that can be exploited in many application areas  Web of Data: 25B RDF triples  (statements) with 395M links from 203 data sets [Berners-Lee et al 09] Community-created through extraction • from web sources US Semators who share – News sources, events, geospatial Alma Mater information, bioinformatics, academic Nodes=136 Edges=340 LinkedDataLens: a system to  density=0.037 extract networks of interest AvgClusterCoeff=0.620 3 Framework accessible over the web, no • isConnected=False need to install any software NConnectedComp=17 Workflows extract RDF triples through • queries, create network, and use social network analysis algorithms to extract Pharmaceutical Our group interesting statistics companies who contributed make the same – Size, centrality, connected drug BibBase components, etc. Extracted networks can be integrated Nodes=609 • with other existing networks and used Edges=3032 Honorary density=0.016 by other applications AvgClusterCoeff=0.58 Mention – Networks about people, places, 4 Triplification events, etc isConnected=False Challenge 2010 NbConnectedComp=1 Eg, Pharmaceutical firms doing clinical 7 trials in California for same drug USC INFORMATION SCIENCES INSTITUTE Yolanda Gil 4

  5. Social Knowledge Collection for Communities of Interest (http://www.isi.edu/ikcap/shortipedia) [Vrandecic et al 2010; Vrandecic et al 2011] Unique capability for social collection of structured content, with documented provenance  Community- Why: Social content collection tools are either  created lightly structured or too rigid structured content Wikis and collaborative tools provide community • repositories, but are not structured or aggregated in a searchable manner Use of pre-defined schema/ontology that • community fills out with contributions Semantic wiki is a framework that enables  contributors to define organic characterizations Emerging semantics that lead to emerging unified models lead to dynamically Users define vocabulary/ontology • aggregated – Voluntarily adopt definitions by others content Formal queries can retrieve structured content • Can use proactive normalization techniques to • encourage consensus where possible Provenance-aware semantic wiki  Alternative views can be accommodated • 3 rd Place Eg, Android is “jailbreakable” and “bricked” by RF Semantic software Web Structured provenance records • Challenge 2010 – Document sources and evidence – Can filter query results according to provenance Eg,Software that bricks phones according to NIST USC INFORMATION SCIENCES INSTITUTE Yolanda Gil 5

  6. Unifying Provenance Models for Trusted Systems [Groth et al ʼ 11; Sahoo et al ʼ 11; Moreau et al ʼ 10] Unifying provenance models would enable the computation of trust metrics for content  Why: Provenance is rarely captured,  Chaired W3C Provenance hampering trust assessment Group (2009-2010): Mappings across 10 When captured, it is diverse in its nature and •  Emerging underlying implementation popular provenance standards (Open - Document-based: where information was vocabularies Provenance found/extracted (e.g., NYT) Model) Use cases and  - Attribution-based: who created the requirements information Charter for a Working  - Process-based: how information was derived Group with 17 core from documents or datasets concepts Unifying models of provenance  Can assess trust in content (existing trust • models are attribution-based only) Builds on open standards  Semantic Web standards (OWL, RDF) • Open Provenance Model • Dublin Core • Unifying provenance infrastructure  Mappings across existing provenance Access provenance records across • vocabularies heterogeneous systems to assess trust Integrate data taking provenance records into • account USC INFORMATION SCIENCES INSTITUTE Yolanda Gil 6

Recommend


More recommend