kdi a methodology for data integration
play

KDI A Methodology for Data Integration Fausto Giunchiglia and - PowerPoint PPT Presentation

KDI A Methodology for Data Integration Fausto Giunchiglia and Mattia Fumagallli University of Trento 1 Overview of the Model Generalized Queries Etypes Model Evaluation Case Studies 2 Overview of the Model Components of the Model


  1. KDI A Methodology for Data Integration Fausto Giunchiglia and Mattia Fumagallli University of Trento 1

  2. Overview of the Model Generalized Queries Etypes Model Evaluation Case Studies 2

  3. Overview of the Model Components of the Model “Data wrangling” 3

  4. Components of the Model Datasets Standards Generalized Query KDI Methodology MODEL Application Schema Language 4

  5. Components of the Model SIRI European Open Data Portal GTFS Open Data Trentino GQ 1,2, 3 -n Open Street Map INSPIRE Datasets Standards Generalized Query KDI Methodology MODEL Ontological principles Application Schema Language Hi E It n 5

  6. “Data wrangling” Application Technical Dataset1 1 Standard Pilot Relevant Standard Reference Application Datasets Application De facto Dataset2 2 Standard 6

  7. Generalized Queries Application Scenario Identify the Concepts Queries Collection Mechanism 7

  8. Application Scenario Choose the application scenario Transport Tourism 8

  9. Generalized Queries Start with a set of ground queries : Given the application scenario, a set of queries will arise which place demands on an underlying ontology. • Give a list all the Hotels in X City which has facility for disable ? • Identification of general query pattern Give me all X in Y AND WHERE.property.True • Identification: Concepts and Properties Entity: Hotel, City Property: Hotel.name, City.name, facilityForDisable. Boolean 9

  10. Identify the Concepts ? Identify all the core concepts Date which are needed to answer the Location Wi-Fi generalized queries . Speed Driver ? Elevator Statue Wheelchair Hotel Train Road Accessibility Bus Movie Address Party Cold Building Agency Restaurant Ticket Dinner Mountain Trip Price Recipe Weather House Country No 10

  11. Queries Collecting Mechanism Ø Query generation methodology 1. via a user study, for instance via questionnaires or focus group 2. via a benchmarking analysis of existing sites and data 3. heuristically based on the understanding of the domain developer 4. from datasets – (see rapidminer tree example… see also http://quepy.machinalis.com/) 5. a combination of the above 11

  12. EER Model Schema Level Language Level 12

  13. Schema Level Schema 13

  14. Schema Level IS_A Hotel Building Functioning The Plaza ValueOf Schema Example Status Building Date of Attribute 1907 construction Country AddreessCountry 14

  15. ER Model (example) 15

  16. ER Model and Relational Database (example) Hotel Country 16

  17. EER Model (example) Hotel Country 17

  18. Alignment with Upper Ontology and Classification Physical Social Mental Event Property Artifact Place Object Object Trip Movie Agency Address Building Location House Statue No Hotel Country Party Recipe Weather Restaurant Mountain Cold Road Train Dinner Wi-Fi Bus Ticket Elevator Price WheelchairAccess ibility 18 Schema Level

  19. Formal Modelling Ontology Design 19 Schema Level

  20. Issue_1: Attributes and DataProperties Complex Address AddressCountry Yes Wi-Fi Simple No 20 Schema Level

  21. Issue_2: Relation and ObjectProperties Hotel AddressCountry Country (The (USA) Plaza) PartOf City (New York) 21 Schema Level

  22. Language Level Language Language 22

  23. Language Level a broad highway designed for high-speed En traffic 17 hyponym IS_A Highway Freeway Freeway expressway, freeway, Language motorway, pike, state highway, superhighway synonym 23

  24. Evaluation Inconsistency check Incompleteness check 24

  25. Evaluation of Ontological Model Inconsistency • • circularity errors: [ex. Traveler subclassO f Person; Person subClassOfTraveler; ] • semantic inconsistency errors: [ex. Airbus or Waterbus subclassOf Bus] • partition errors: [ex. Non stop Flight SubClassOf InternationalFlight and DomesticFlight where International and Domestic flight are disjoint] Incompleteness: On traveling domain, if we classify only beach • and mountain location, and we do not consider cultural heritage site • Redundancy • Identical formal definition of some class • Identical formal definition of instances 25

  26. Case Studies Evaluation of Methodology Result 26

  27. Case Studies (example) Topics Tourism in Trento Emergency Transport Response In London In London Case Studies Where to eat in Real Estate Trento Event in Trento Geospatial 27

  28. Evaluation of Methodology • Technique Used standard Human Computer Interaction (HCI) technique • • Open Ended questions mixed with Likert scale closed questions • How: Balanced Questioners • Number of participant: 18 • Participants Information • Nationality: Italian, Indian, Germany, Brazil, Ukraine, Ethiopia, Mexico, Uganda, Cameroon • Gender: Male 13 Female 5 • Age Range: 18-25 (14), 26-30 (4) • Level of education: Undergraduate (3) Postgraduate (15) 28 Case Studies

  29. Evaluation of Methodology Perspicuity : How easy it is to get familiar with the methodology Efficiency : How effectively user can perform the process Dependability : Can user control the process Stimulation : Is it exciting and motivating Novelty : Is it innovative and creative 29 Case Studies

  30. Results Pros Cons • Well Structured • You need many practice to build something very well • programmatically durable • Needs more time to master • It practically allows describe • difficult to identify class for to the world align with top level • Provides methods to minimize • Necessary to write the distance between the real world and the abstraction documentation to clarify choices and terms • Helps finding out eventual • Formalizing DERA to DL defects of the ontology and helps correcting them : taxonomic errors, inconsistencies, reliability 30 Case Studies

  31. Reference q Data on the Web Best Practices W3C Recommendation 31 January 2017 https://www.w3.org/TR/dwbp/ q Das, S., & Giunchiglia, F. (2016, October). GeoEtypes: Harmonizing Diversity in Geospatial Data (Short Paper). In OTM Confederated International Conferences" On the Move to Meaningful Internet Systems" (pp. 643-653). Springer International Publishing. q Hlomani, H., & Stacey, D. (2014). Approaches, methods, metrics, measures, and subjectivity in ontology evaluation: A survey. Semantic Web Journal , 1-5. q Giunchiglia, F., & Dutta, B. (2011). DERA: A FACETED KNOWLEDGE ORGANIZATION FRAMEWORK. q Guarino, N., & Welty, C. A. (2009). An overview of OntoClean. In Handbook on ontologies (pp. 201-220). Springer Berlin Heidelberg. q Gomez-Perez, A., Fernández-López, M., & Corcho, O. (2006). Ontological Engineering: with examples from the areas of Knowledge Management, e-Commerce and the Semantic Web. Springer Science & Business Media. 31

Recommend


More recommend