KDI A Methodology for Data Integration Fausto Giunchiglia and Mattia Fumagallli University of Trento 1
Overview of the Model Generalized Queries Etypes Model Evaluation Case Studies 2
Overview of the Model Components of the Model “Data wrangling” 3
Components of the Model Datasets Standards Generalized Query KDI Methodology MODEL Application Schema Language 4
Components of the Model SIRI European Open Data Portal GTFS Open Data Trentino GQ 1,2, 3 -n Open Street Map INSPIRE Datasets Standards Generalized Query KDI Methodology MODEL Ontological principles Application Schema Language Hi E It n 5
“Data wrangling” Application Technical Dataset1 1 Standard Pilot Relevant Standard Reference Application Datasets Application De facto Dataset2 2 Standard 6
Generalized Queries Application Scenario Identify the Concepts Queries Collection Mechanism 7
Application Scenario Choose the application scenario Transport Tourism 8
Generalized Queries Start with a set of ground queries : Given the application scenario, a set of queries will arise which place demands on an underlying ontology. • Give a list all the Hotels in X City which has facility for disable ? • Identification of general query pattern Give me all X in Y AND WHERE.property.True • Identification: Concepts and Properties Entity: Hotel, City Property: Hotel.name, City.name, facilityForDisable. Boolean 9
Identify the Concepts ? Identify all the core concepts Date which are needed to answer the Location Wi-Fi generalized queries . Speed Driver ? Elevator Statue Wheelchair Hotel Train Road Accessibility Bus Movie Address Party Cold Building Agency Restaurant Ticket Dinner Mountain Trip Price Recipe Weather House Country No 10
Queries Collecting Mechanism Ø Query generation methodology 1. via a user study, for instance via questionnaires or focus group 2. via a benchmarking analysis of existing sites and data 3. heuristically based on the understanding of the domain developer 4. from datasets – (see rapidminer tree example… see also http://quepy.machinalis.com/) 5. a combination of the above 11
EER Model Schema Level Language Level 12
Schema Level Schema 13
Schema Level IS_A Hotel Building Functioning The Plaza ValueOf Schema Example Status Building Date of Attribute 1907 construction Country AddreessCountry 14
ER Model (example) 15
ER Model and Relational Database (example) Hotel Country 16
EER Model (example) Hotel Country 17
Alignment with Upper Ontology and Classification Physical Social Mental Event Property Artifact Place Object Object Trip Movie Agency Address Building Location House Statue No Hotel Country Party Recipe Weather Restaurant Mountain Cold Road Train Dinner Wi-Fi Bus Ticket Elevator Price WheelchairAccess ibility 18 Schema Level
Formal Modelling Ontology Design 19 Schema Level
Issue_1: Attributes and DataProperties Complex Address AddressCountry Yes Wi-Fi Simple No 20 Schema Level
Issue_2: Relation and ObjectProperties Hotel AddressCountry Country (The (USA) Plaza) PartOf City (New York) 21 Schema Level
Language Level Language Language 22
Language Level a broad highway designed for high-speed En traffic 17 hyponym IS_A Highway Freeway Freeway expressway, freeway, Language motorway, pike, state highway, superhighway synonym 23
Evaluation Inconsistency check Incompleteness check 24
Evaluation of Ontological Model Inconsistency • • circularity errors: [ex. Traveler subclassO f Person; Person subClassOfTraveler; ] • semantic inconsistency errors: [ex. Airbus or Waterbus subclassOf Bus] • partition errors: [ex. Non stop Flight SubClassOf InternationalFlight and DomesticFlight where International and Domestic flight are disjoint] Incompleteness: On traveling domain, if we classify only beach • and mountain location, and we do not consider cultural heritage site • Redundancy • Identical formal definition of some class • Identical formal definition of instances 25
Case Studies Evaluation of Methodology Result 26
Case Studies (example) Topics Tourism in Trento Emergency Transport Response In London In London Case Studies Where to eat in Real Estate Trento Event in Trento Geospatial 27
Evaluation of Methodology • Technique Used standard Human Computer Interaction (HCI) technique • • Open Ended questions mixed with Likert scale closed questions • How: Balanced Questioners • Number of participant: 18 • Participants Information • Nationality: Italian, Indian, Germany, Brazil, Ukraine, Ethiopia, Mexico, Uganda, Cameroon • Gender: Male 13 Female 5 • Age Range: 18-25 (14), 26-30 (4) • Level of education: Undergraduate (3) Postgraduate (15) 28 Case Studies
Evaluation of Methodology Perspicuity : How easy it is to get familiar with the methodology Efficiency : How effectively user can perform the process Dependability : Can user control the process Stimulation : Is it exciting and motivating Novelty : Is it innovative and creative 29 Case Studies
Results Pros Cons • Well Structured • You need many practice to build something very well • programmatically durable • Needs more time to master • It practically allows describe • difficult to identify class for to the world align with top level • Provides methods to minimize • Necessary to write the distance between the real world and the abstraction documentation to clarify choices and terms • Helps finding out eventual • Formalizing DERA to DL defects of the ontology and helps correcting them : taxonomic errors, inconsistencies, reliability 30 Case Studies
Reference q Data on the Web Best Practices W3C Recommendation 31 January 2017 https://www.w3.org/TR/dwbp/ q Das, S., & Giunchiglia, F. (2016, October). GeoEtypes: Harmonizing Diversity in Geospatial Data (Short Paper). In OTM Confederated International Conferences" On the Move to Meaningful Internet Systems" (pp. 643-653). Springer International Publishing. q Hlomani, H., & Stacey, D. (2014). Approaches, methods, metrics, measures, and subjectivity in ontology evaluation: A survey. Semantic Web Journal , 1-5. q Giunchiglia, F., & Dutta, B. (2011). DERA: A FACETED KNOWLEDGE ORGANIZATION FRAMEWORK. q Guarino, N., & Welty, C. A. (2009). An overview of OntoClean. In Handbook on ontologies (pp. 201-220). Springer Berlin Heidelberg. q Gomez-Perez, A., Fernández-López, M., & Corcho, O. (2006). Ontological Engineering: with examples from the areas of Knowledge Management, e-Commerce and the Semantic Web. Springer Science & Business Media. 31
Recommend
More recommend