From Web 2.0 to Semantic Web A Semi-Automated Approach Andreas Heß, Christian Maaß and Francis Dierick Lycos Europe 01/06/2008 | 1
Outline » Motivation » Proposals for better tagging » Tag suggestion / semi-automated tagging » Tag merging » Conclusion | 2
Motivation » Ontologies: high entrance barriers » Folksonomies: widely used, low for ordinary users entrance barriers » Goals » Draw benefits from complementary nature » Improve quality of folksonomies » Eventually merge folksonomies and ontologies | 3
Semantic Web Web 2.0 Experts develop Ontology Thing is_a is a is_a Party Occupation Person has a Merkel Chancellor
Semantic Web Web 2.0 Experts develop Community provides Meta-data Ontology Content (tags) CDU Thing Refers to Angela Merkel is_a is a is_a Berlin Party Occupation Person 123.jpg has a Merkel Chancellor Search request : Angela Merkel Search result: 123.jpg
Semantic Web Web 2.0 Experts develop Community provides Meta-data Ontology Content (tags) CDU Thing Refers to Angela Merkel is_a is a is_a Berlin has a Party Occupation Person Disadvantages Advantages Merkel 123.jpg Chancellor + Ontology controlled by experts - lack of quality control + reasoning, inference - error-prone & unstructured Disadvantages Advantages Search request : Angela Merkel - Language of experts != + user's vocabulary Search result: 123.jpg language of users + high profliferation & cheap - low proliferation & expensive
Semantic Web Web 2.0 Mutual assistance Experts develop Community provides Meta-data Ontology Content (tags) CDU Thing Refers to Angela Merkel is_a is a is_a Berlin Party Occupation Person 123.jpg has a Merkel Chancellor Search request : Angela Merkel Search result: 123.jpg
Semantic Web Web 2.0 Experts develop Community provides Meta-data Ontology Content (tags) CDU Thing Refers to Angela Merkel is_a is a is_a Berlin Party Occupation Person 123.jpg has a Merkel Chancellor Search request : Angela Merkel Search result: 123.jpg Background information: Merkel → Chancellor
Moving from Folksonomies to Ontologies: Tag Quality Tag Merging: Eliminate duplicates / synonyms / misspellings / nonsense hard drives Tiger Berlin computer CPU Lycos watch hard drive Germany politics clock history Alt computers Angela Merkel hardware software histryo Europe screen ugagua MP Techno member of parliament | 9
Moving from Folksonomies to Ontologies: Tag Quality Tag Merging: Eliminate duplicates / synonyms / misspellings / nonsense hard drives Tiger Berlin computer CPU Lycos watch hard drive Germany politics clock history Alt computers Angela Merkel hardware software histryo Europe screen ugagua MP Techno member of parliament | 10
Moving from Folksonomies to Ontologies: Tag Quality Tag Merging: Eliminate duplicates / synonyms / misspellings / nonsense hard drives Tiger Berlin CPU Lycos watch Germany politics clock history Alt computers Angela Merkel hardware software Europe screen MP Techno member of parliament | 11
Moving from Folksonomies to Ontologies: Tag Quality Topic Detection hard drives Tiger Berlin CPU Lycos watch Germany politics clock history Alt computers Angela Merkel hardware software Europe screen MP Techno member of parliament | 12
Moving from Folksonomies to Ontologies: Tag Quality Topic Detection hard drives CPU computers hardware software screen | 13
Moving from Folksonomies to Ontologies: Tag Quality Relation Extraction hard drives CPU computers hardware software screen | 14
Moving from Folksonomies to Ontologies: Tag Quality Relation Qualification hard drives CPU p a r t is_a _ o f computers hardware software i s _ a screen | 15
Proposed Measures » Semi-Automated Tagging » Lower the threshold towards creating meta-data » Tag Merging » Improving tag quality » Extract Relations » First step on the move from folksonomies to more structured form » User Rating » Involve user in refining quality » Information Extraction » Automatically fill blanks | 16
Proposed Measures » Semi-Automated Tagging » Lower the threshold towards creating meta-data » Tag Merging » Improving tag quality » Extract Relations » First step on the move from folksonomies to more structured form » User Rating » Involve user in refining quality » Information Extraction » Automatically fill blanks | 17
Semi-Automated Tagging » Text classification, training data needed » Semi-automated annotation of very short texts | 18
Choice of Classification Algorithm » Speed is important » Interactive: user does not want to wait » Use well-known Rocchio text classification algorithm » Simple, fast, incremental, suitable for high number of classes » Works well only if texts are short and of similar length » ... but this is the case here » Use part-of-speech-tagger for dimensionality reduction » Only nouns and proper nouns | 19
Evaluation (I): Precision » Tested precision with 4 test users 100 90 » Original tagging far from perfect 80 70 » Suggestion quality not great 60 Person 1 Person 2 50 » But good enough for interactive use Person 3 Person 4 40 Average » In 87% at least one correct prediction 30 20 within top 5 10 0 Original Tags Suggested Tags | 20
Evaluation (II): absolute numbers » More correct suggestions than 4000 original tags in total 3500 » Assumption: People will tag more 3000 2500 2000 Incorrect Correct 1500 1000 500 0 Original Tags Suggested Tags | 21
Tag Merging » Goals » Elimination and merging of incorrectly spelled tags » Merging of different spelling variations » Example » „computer“ vs. „computers“ (singular/plural) | 22
Tag Merging - Algorithm Dictionary Similar Tags tags with score Input Tag Input Tag tag tag .76 .543 ABC .334 tag tag .275 tag ... .38 tag Inspect using different Spell Checker Candidates similarity measures | 23
Tag Merging - Algorithm » Why this extra step? Dictionary Similar Tags tags with score Input Tag Input Tag tag tag .76 .543 ABC .334 tag tag .275 tag ... .38 tag Inspect using different Spell Checker Candidates similarity measures | 24
Tag Merging - Algorithm » Computing similarities is slow » Pairwise checking is Θ (n²) Dictionary Similar Tags tags with score Input Tag Input Tag tag tag .76 .543 ABC .334 tag tag .275 tag ... .38 tag Inspect using different Spell Checker Candidates similarity measures | 25
Tag Merging - Algorithm Jaro-Winkler Levenshtein tag tag .76 .543 ABC .334 tag tag .275 tag ... .38 tag Inspect using different Spell Checker Candidates similarity measures | 26
Tag Relations Related Tags tag tag .76 .543 ABC .334 tag tag .275 tag ... .38 tag Inspect using different Spell Checker Candidates similarity measures | 27
Tag Merging - Algorithm Fine-tuning with Dictionary Machine Learning! Similar Tags tags with score Input Tag Input Tag tag tag .76 .543 ABC .334 tag tag .275 tag ... .38 tag Inspect using different Spell Checker Candidates similarity measures | 28
Tag Merging - Evaluation » Can reach high precision by fine tuning with machine learning » Trade-off between precision and recall tunable » Precision in sample (100 tags): 95% » Fully automated batch processing possible » With this setting 12% smaller tag cloud | 29
Conclusion » Proposed ways to combine strengths of folksonomies and ontologies » Semi-automated Tagging and ... » Tag Merging to increase folksonomy quality » Outlined plan for future work | 30
Thank You for Your Attention! » Questions? | 31
Tag Suggestions - Algorithm » Rocchio with dimensionality reduction t a g t a g t a g t a g g e d p o s t i n g s b a g o f w o r d s f o r t a g i n d e x e x t r a c t t e r m s / d i m e n s i o n a l i t y r e d u c t i o n t a g t a g q u e r y t a g t a g s | 32
Recommend
More recommend