The Semantic Web: Web of (integrated) Data Frank van Harmelen Vrije Universiteit Amsterdam Take home message � Semantic Web = Web of Data (no longer only web of text, web of pictures) � Set of open, stable W3C standards � Rapidly emerging tools & vendors � Use cases: � data integration � web services � knowledge management � search (intranets) 1
Outline � The vision � What is required � Machine representation � XML, RDF, OWL � Where are we now? � Examples Things we would like to do on the Web 2
“I ntelligent” things we can’t do today � Search engines • concepts, not keywords • semantic narrowing/widening of queries � Shopbots • semantic interchange, not screenscraping � E-commerce � Negotiation, catalogue mapping, personalisation � Web Services � Need semantic characterisations to find them, � to combine them � Navigation • by semantic proximity, not hardwired links � ..... Why can’t Google do this… harmelen harmelen 3
Other use-case are � personalisation � semantic linking � data integration � web services � ... Sounds good, so.. how is this tackled? 4
Outline � The vision � What is required � Machine representation � XML, RDF, OWL � Where are we now? � Examples machine accessible meaning (What it’s like to be a machine) name symptoms disease drug administration Meta-data ! 5
What is meta-data? name symptoms disease drug administration � it's just data � it's data describing other data � its' meant for machine consumption meta-data + ontologies reduces <treatment> < name> < symptoms> IS-A < disease> < drug> < drug administration> 6
What’s inside an ontology? � terms + specialisation hierarchy � classes + class-hierarchy � instances � slots/values � inheritance (multiple? defaults?) � restrictions on slots (type, cardinality) � properties of slots (symm., trans., …) � relations between classes (disjoint, covers) � reasoning tasks: classification, subsumption Increasing semantic “weight” I n short (for the duration of this tutorial) � Ontologies are not definitive descriptions of what exists in the world (= philosphy) � Ontologies are shared models of the world constructed to facilitate communication � Yes, ontologies exist (because we build them) 7
Real life examples � handcrafted (often by communities) � music: CDnow (2410/5), MusicMoz (1073/7) � biomedical: SNOMED (200k), GO (15k), Emtree(45k+ 190k) � ranging from lightweight ( Yahoo, UNSPC ) to heavyweight ( Cyc ) � ranging from small ( METAR ) to large ( UNSPC ) allright, but how to represent all this in a computer? 8
Outline � The vision � What is required � machine representation � XML, RDF, OWL � Where are we now? � Examples Semantic Web “architecture” 9
What was XML again? <country name=”Netherlands”> <capital name=”Amsterdam”> <areacode>020</areacode> </capital> </country> country name capital “Netherlands” name areacode “Amsterdam” “020” So why not just use XML? � No agreement on: <country name=”Netherlands”> <capital name=”Amsterdam”> � structure <areacode>020</areacode> </capital> • is country a: </country> –object? <nation> –class? <name>Netherlands</name> –attribute? <capital>Amsterdam</capital> <capital_areacode> –relation? 020 –something else? </capital_areacode> </nation> • what does nesting mean? ● Are the above XML documents the same? � vocabulary ● Do they convey the same information? • is country the ● Is the answer machine-derivable? same as nation? 10
So: XML ≠ machine accessible meaning < ναμε > name < > < εδυχατιον > education < > < Χς > CV < > < ωορκ > work < > < πριϖατε > private < > The semantic pyramid again 11
W3C Stack � XML : � Surface syntax, no semantics � XML Schema : � Describes structure of XML documents � RDF : � Datamodel for “relations” between “things” � RDF Schema : � RDF Vocabulary Definition Language � OWL : � A more expressive Vocabulary Definition Language RDF & RDF Schema � RDF = � relations between things � all objects are URL’s (both things and relations) � RDF Schema = � hierarchical organisation of an RDF vocabulary � all things are URL’s (classes of things, subclass relations) � For more details: see slides later today 12
The semantic pyramid again OWL: things RDF Schema can’t do � equality � enumeration � number restrictions � Single-valued/multi-valued � Optional/required values � inverse, symmetric, transitive � boolean algebra � Union, complement Again: For more details: see slides later today 13
Sounds good in theory. How far are you with this in practice? Where are we now: tools � Languages are stable (W3C) � Tooling is rapidly emerging � HP, IBM, Oracle, Adobe, … � Parsers, � Editors, � visualisers, � large scale storage and querying Aduna � Portal generation I ntellidim ension 14
Three example use-cases � Closed-world data integration: DOPE browser @ Elsevier � Open-world data integration: streaming media @ Philips � Semantic Web services � Conclusions Closed-world data integration: DOPE Browswer @ Elsevier This section joint with This section joint with Aduna and Aduna and Anita de Waard@Elsevier Anita de Waard@Elsevier 15
Background � Vertical Information Provision � Buy a topic instead of a Journal ! � Web provides new opportunities � Business driver: drug development � Rich, information-hungry market � Good thesaurus (EMTREE) The Data � Document repositories: � ScienceDirect: approx. 500.000 fulltext articles � MEDLINE: approx. 10.000.000 abstracts � Extracted Metadata � The Collexis Metadata Server: concept- extraction ("semantic fingerprinting") � Thesauri and Ontologies � EMTREE: 60.000 preferred terms 200.000 synonyms 16
Query Architecture: interface RDF Schema EMTREE RDF RDF …. Datasource 1 Datasource n 17
18
19
20
Web-based data integration scenario: • heterogeneous • open This section material from This section material from Zharko Aleksovski @ VU & Philips Zharko Aleksovski @ VU & Philips 21
Motivating scenario LaunchCast iTunes Rhapsody Sem antic W eb Buy.com Napster User devices consum er.philips.com MusicNow eMusic W al* Mart MusicNet Providers Musicm atch Example “Hits” from the “60s” “Evergreens” Mediator Music Ontology Evergreens and Golden hits are related: Golden hits is mostly subclass of Evergreens 22
Domain characteristics � Many music providers � Wide variety of music offered � Constantly increasing in size and evolving � Cumbersome to browse and retrieve music � There is no agreement � Different terms are used � The same terms contain different sets of artists data-sources CDNow (Amazon.com) Size: 2410 classes ArtistGigs Depth: 5 levels Size: 382 classes Depth: 4 levels Artist Direct Network Size: 465 classes CD baby Depth: 2 levels Size: 222 classes Depth: 2 levels Yahoo All Music Guide Size: 96 classes Size: 403 classes Depth: 3 levels Depth: 2 levels MusicMoz Size: 1073 classes Depth: 7 levels 23
Why approximate matching � Genre is not precisely defined � Pop and Rock have no common definition on the big portals AllMusic.com, Amazon.com and MP3.com � Exact reasoning will not be useful A X 99 % 1 % Results A - AllMusicGuide B - ArtistDirectNetwork 600000 500000 400000 B subClass of A 300000 A subClass of B equivalences 200000 100000 0 0 1 2 3 4 5 6 7 8 9 0 . . . . . . . . . . . 0 0 0 0 0 0 0 0 0 0 1 24
Semantic Web Services This section material from This section material from Marta Sabou @ VU Marta Sabou @ VU What are web-services � a software system designed to support interoperable machine-to-machine interaction over a network. � has an interface described in a machine processable format (specifically WSDL). � Other systems interact with a web service in a manner specified by its descriptions using SOAP messages 25
Web Service Tasks � Web Service Discovery & Selection � Find an airline that can fly me to Marina del Rey � Web Service I nvocation � Book flight tickets from NWA to arrive 12 th Oct. � Web Service Composition & I nteroperation � Arrange taxis, flights and hotel for travel from Southampton to Portland, OR, via Marina del Rey, CA. � Web Service Execution Monitoring � Has the taxi to Gatwick Airport been reserved yet? Limitations of WS Technology � Manual Discovery � Manual Invocation � Manual (ad hoc) Mediation � Manual (ad hoc) Composition 26
Recommend
More recommend