schema matching in a large scale schema matching in a
play

Schema Matching in a Large Scale Schema Matching in a Large Scale - PowerPoint PPT Presentation

Schema Matching in a Large Scale Schema Matching in a Large Scale Personal Schema Based Querying Personal Schema Based Querying Marko Smiljani , Maurice van Keulen, Willem Jonker Dutch Dutch-Belgian Database Day Belgian Database Day -


  1. Schema Matching in a Large Scale Schema Matching in a Large Scale Personal Schema Based Querying Personal Schema Based Querying Marko Smiljani � , Maurice van Keulen, Willem Jonker Dutch Dutch-Belgian Database Day Belgian Database Day - December 3, 2004 December 3, 2004 - Antwerp, Belgium Antwerp, Belgium

  2. in this talk in this talk • motivation motivation • personal schema based querying • understanding understanding • formalizing the schema matching problem • solving solving • clustering in schema matching • validating validating • semantic validation without semantics

  3. motivation motivation

  4. mediated schema mediated schema data //account[number=1234]/owner data data mediator

  5. personal schema personal schema data //account[number=1234]/owner PSQ data data PSQ – Personal Schema Based Query Answering System

  6. architecture architecture schemas schema loader schema repository ���������������� ��������������� �������������� ��� ��� ������� ��������������� select ���� �������� ��������������� ��������������� ������� ���������� ���������������� data

  7. Dé éj jà à Vu Vu D ���������������� �������������� ������� ��������������� ����������

  8. goals and issues goals and issues goals • efficiency of schema matching (time-to-last, time-to-first) • effectiveness of schema matching (precision/recall) issues • trees vs. graphs • the objective function

  9. understanding understanding

  10. schema matching schema matching hints

  11. formalism formalism constraint optimization problem constraint optimization problem well known framework, well known framework, offering a range of approaches for efficient problem solving offering a range of approaches for efficient problem solving

  12. formalism formalism correctness ranking

  13. finding a solution finding a solution

  14. the idea of clustering the idea of clustering distance based clustering

  15. why clustering? why clustering? • clusters can be ranked • search space is reduced

  16. clustering approaches (and issues) clustering approaches (and issues) • clustering method has to be scalable k-medoid • how to initialize • pre-computation of distance hand made linear-time clustering • make it intelligent, yet keep it close to linear-time

  17. validation validation

  18. validation paradox validation paradox s s e e a a r r c c h h s s p p a a c c e e P = T / A A H T R = T / H semantic validation • semantic validation • does not like large search spaces! does not like large search spaces! vs. . vs. clustering is only useful in large search spaces! • clustering is only useful in large search spaces! •

  19. estimating the precision and recall estimating the precision and recall • size based • order based

  20. size based quality estimation size based quality estimation g n i r e t s u l c o n B P = T / A A H T R = T / H g n i r e t s u l c s e y H R 12 = B / A T B

  21. size based quality estimation size based quality estimation NO CLUSTERING NO CLUSTERING CLUST. BEST CASE CLUST. BEST CASE B H B/A = 93% CLUST. WORST CASE CLUST. WORST CASE

  22. order based quality estimation order based quality estimation � ✁ ✄ ✝ ✞ ✠ ✂ ✟ ✳ g g g g ☎ ✡ n n n n i i i i r r r r e e ✆ e ☛ e t t t t s s s s u u u u l l l l c c c c s o s o e n e n ✎ y y ✏ ✧ ✑ ★ ✒ ✩ ✓ ✪ ✱ ✔ ✫ ✕ ✬ ✖ ✭ ✗ ✮ ✘ ✯ ✙ ✰ ✌ ✍ ✚ ✛ ✜ ✢ ✲ ✲ ✣ ✤ ✥ ✦ ☞

  23. order based quality estimation order based quality estimation NO CLUSTERING NO CLUSTERING CLUST. ALG 1 CLUST. ALG 1 CLUST. ALG 2 CLUST. ALG 2

  24. what comes next what comes next • add intelligence to clustering • impact of other hints on clustering • using graphs

  25. En dat was het dan! En dat was het dan! Vragen? Vragen?

Recommend


More recommend