from web 2 0 to semantic web a semi automated approach
play

From Web 2.0 to Semantic Web A Semi-Automated Approach Andreas He, - PowerPoint PPT Presentation

From Web 2.0 to Semantic Web A Semi-Automated Approach Andreas He, Christian Maa and Francis Dierick Lycos Europe 01/06/2008 | 1 Outline Motivation Proposals for better tagging Tag suggestion / semi-automated tagging Tag


  1. From Web 2.0 to Semantic Web A Semi-Automated Approach Andreas Heß, Christian Maaß and Francis Dierick Lycos Europe 01/06/2008 | 1

  2. Outline » Motivation » Proposals for better tagging » Tag suggestion / semi-automated tagging » Tag merging » Conclusion | 2

  3. Motivation » Ontologies: high entrance barriers » Folksonomies: widely used, low for ordinary users entrance barriers » Goals » Draw benefits from complementary nature » Improve quality of folksonomies » Eventually merge folksonomies and ontologies | 3

  4. Semantic Web Web 2.0 Experts develop Ontology Thing is_a is a is_a Party Occupation Person has a Merkel Chancellor

  5. Semantic Web Web 2.0 Experts develop Community provides Meta-data Ontology Content (tags) CDU Thing Refers to Angela Merkel is_a is a is_a Berlin Party Occupation Person 123.jpg has a Merkel Chancellor Search request : Angela Merkel Search result: 123.jpg

  6. Semantic Web Web 2.0 Experts develop Community provides Meta-data Ontology Content (tags) CDU Thing Refers to Angela Merkel is_a is a is_a Berlin has a Party Occupation Person Disadvantages Advantages Merkel 123.jpg Chancellor + Ontology controlled by experts - lack of quality control + reasoning, inference - error-prone & unstructured Disadvantages Advantages Search request : Angela Merkel - Language of experts != + user's vocabulary Search result: 123.jpg language of users + high profliferation & cheap - low proliferation & expensive

  7. Semantic Web Web 2.0 Mutual assistance Experts develop Community provides Meta-data Ontology Content (tags) CDU Thing Refers to Angela Merkel is_a is a is_a Berlin Party Occupation Person 123.jpg has a Merkel Chancellor Search request : Angela Merkel Search result: 123.jpg

  8. Semantic Web Web 2.0 Experts develop Community provides Meta-data Ontology Content (tags) CDU Thing Refers to Angela Merkel is_a is a is_a Berlin Party Occupation Person 123.jpg has a Merkel Chancellor Search request : Angela Merkel Search result: 123.jpg Background information: Merkel → Chancellor

  9. Moving from Folksonomies to Ontologies: Tag Quality Tag Merging: Eliminate duplicates / synonyms / misspellings / nonsense hard drives Tiger Berlin computer CPU Lycos watch hard drive Germany politics clock history Alt computers Angela Merkel hardware software histryo Europe screen ugagua MP Techno member of parliament | 9

  10. Moving from Folksonomies to Ontologies: Tag Quality Tag Merging: Eliminate duplicates / synonyms / misspellings / nonsense hard drives Tiger Berlin computer CPU Lycos watch hard drive Germany politics clock history Alt computers Angela Merkel hardware software histryo Europe screen ugagua MP Techno member of parliament | 10

  11. Moving from Folksonomies to Ontologies: Tag Quality Tag Merging: Eliminate duplicates / synonyms / misspellings / nonsense hard drives Tiger Berlin CPU Lycos watch Germany politics clock history Alt computers Angela Merkel hardware software Europe screen MP Techno member of parliament | 11

  12. Moving from Folksonomies to Ontologies: Tag Quality Topic Detection hard drives Tiger Berlin CPU Lycos watch Germany politics clock history Alt computers Angela Merkel hardware software Europe screen MP Techno member of parliament | 12

  13. Moving from Folksonomies to Ontologies: Tag Quality Topic Detection hard drives CPU computers hardware software screen | 13

  14. Moving from Folksonomies to Ontologies: Tag Quality Relation Extraction hard drives CPU computers hardware software screen | 14

  15. Moving from Folksonomies to Ontologies: Tag Quality Relation Qualification hard drives CPU p a r t is_a _ o f computers hardware software i s _ a screen | 15

  16. Proposed Measures » Semi-Automated Tagging » Lower the threshold towards creating meta-data » Tag Merging » Improving tag quality » Extract Relations » First step on the move from folksonomies to more structured form » User Rating » Involve user in refining quality » Information Extraction » Automatically fill blanks | 16

  17. Proposed Measures » Semi-Automated Tagging » Lower the threshold towards creating meta-data » Tag Merging » Improving tag quality » Extract Relations » First step on the move from folksonomies to more structured form » User Rating » Involve user in refining quality » Information Extraction » Automatically fill blanks | 17

  18. Semi-Automated Tagging » Text classification, training data needed » Semi-automated annotation of very short texts | 18

  19. Choice of Classification Algorithm » Speed is important » Interactive: user does not want to wait » Use well-known Rocchio text classification algorithm » Simple, fast, incremental, suitable for high number of classes » Works well only if texts are short and of similar length » ... but this is the case here » Use part-of-speech-tagger for dimensionality reduction » Only nouns and proper nouns | 19

  20. Evaluation (I): Precision » Tested precision with 4 test users 100 90 » Original tagging far from perfect 80 70 » Suggestion quality not great 60 Person 1 Person 2 50 » But good enough for interactive use Person 3 Person 4 40 Average » In 87% at least one correct prediction 30 20 within top 5 10 0 Original Tags Suggested Tags | 20

  21. Evaluation (II): absolute numbers » More correct suggestions than 4000 original tags in total 3500 » Assumption: People will tag more 3000 2500 2000 Incorrect Correct 1500 1000 500 0 Original Tags Suggested Tags | 21

  22. Tag Merging » Goals » Elimination and merging of incorrectly spelled tags » Merging of different spelling variations » Example » „computer“ vs. „computers“ (singular/plural) | 22

  23. Tag Merging - Algorithm Dictionary Similar Tags tags with score Input Tag Input Tag tag tag  .76 .543 ABC  .334 tag tag .275 tag ... .38 tag Inspect using different Spell Checker Candidates similarity measures | 23

  24. Tag Merging - Algorithm » Why this extra step? Dictionary Similar Tags tags with score Input Tag Input Tag tag tag  .76 .543 ABC  .334 tag tag .275 tag ... .38 tag Inspect using different Spell Checker Candidates similarity measures | 24

  25. Tag Merging - Algorithm » Computing similarities is slow » Pairwise checking is Θ (n²) Dictionary Similar Tags tags with score Input Tag Input Tag tag tag  .76 .543 ABC  .334 tag tag .275 tag ... .38 tag Inspect using different Spell Checker Candidates similarity measures | 25

  26. Tag Merging - Algorithm Jaro-Winkler Levenshtein tag tag  .76 .543 ABC  .334 tag tag .275 tag ... .38 tag Inspect using different Spell Checker Candidates similarity measures | 26

  27. Tag Relations Related Tags tag tag  .76 .543 ABC  .334 tag tag .275 tag ... .38 tag Inspect using different Spell Checker Candidates similarity measures | 27

  28. Tag Merging - Algorithm Fine-tuning with Dictionary Machine Learning! Similar Tags tags with score Input Tag Input Tag tag tag  .76 .543 ABC  .334 tag tag .275 tag ... .38 tag Inspect using different Spell Checker Candidates similarity measures | 28

  29. Tag Merging - Evaluation » Can reach high precision by fine tuning with machine learning » Trade-off between precision and recall tunable » Precision in sample (100 tags): 95% » Fully automated batch processing possible » With this setting 12% smaller tag cloud | 29

  30. Conclusion » Proposed ways to combine strengths of folksonomies and ontologies » Semi-automated Tagging and ... » Tag Merging to increase folksonomy quality » Outlined plan for future work | 30

  31. Thank You for Your Attention! » Questions? | 31

  32. Tag Suggestions - Algorithm » Rocchio with dimensionality reduction t a g t a g t a g t a g g e d p o s t i n g s b a g o f w o r d s f o r t a g i n d e x e x t r a c t t e r m s / d i m e n s i o n a l i t y r e d u c t i o n t a g t a g q u e r y t a g t a g s | 32

Recommend


More recommend