Harnessing Folksonomies for Resource Classification PhD Thesis Arkaitz Zubiaga UNED July 12th, 2011 Advisors: Raquel Mart´ ınez Unanue V´ ıctor Fresno Fern´ andez
Table of Contents PhD Thesis Arkaitz Zubiaga Motivation 1 Motivation Selection of a Selection of a Classifier 2 Classifier STS & Datasets STS & Datasets 3 Representing the Representing the Aggregation of Tags Aggregation of 4 Tags Tag Tag Distributions on STS 5 Distributions on STS User Behavior User Behavior on STS 6 on STS Conclusions & Outlook Conclusions & Outlook 7 Publications Publications 8 Arkaitz Zubiaga (UNED) PhD Thesis July 12th, 2011 2 / 98
Motivation Index PhD Thesis Arkaitz Zubiaga Motivation 1 Motivation Selection of a Selection of a Classifier 2 Classifier STS & Datasets STS & Datasets 3 Representing the Representing the Aggregation of Tags Aggregation of 4 Tags Tag Tag Distributions on STS 5 Distributions on STS User Behavior User Behavior on STS 6 on STS Conclusions & Outlook Conclusions & Outlook 7 Publications Publications 8 Arkaitz Zubiaga (UNED) PhD Thesis July 12th, 2011 3 / 98
Motivation Resource Classification PhD Thesis Arkaitz Zubiaga Motivation Selection of a Classifier STS & Datasets Representing the Aggregation of Tags Tag Distributions on STS User Behavior on STS Conclusions & Outlook Publications Arkaitz Zubiaga (UNED) PhD Thesis July 12th, 2011 4 / 98
Motivation Resource Classification PhD Thesis Arkaitz Zubiaga Motivation Selection of a Classifier STS & Datasets Representing the Aggregation of Tags Tag Distributions on STS User Behavior on STS Conclusions & Outlook Publications Arkaitz Zubiaga (UNED) PhD Thesis July 12th, 2011 5 / 98
Motivation Resource Classification PhD Thesis Arkaitz Zubiaga Motivation Selection of a Classifier STS & Datasets Representing the Aggregation of Tags Tag Distributions on STS User Behavior on STS Conclusions & Outlook Publications Arkaitz Zubiaga (UNED) PhD Thesis July 12th, 2011 6 / 98
Motivation Resource Classification PhD Thesis Arkaitz Zubiaga Motivation Selection of a Classifying resources is a common task . Classifier STS & Web pages, books, movies, files,... Datasets Representing Large collections of resources → expensive & effortful the Aggregation of to classify manually. Tags Tag LoC reported an average cost of $94.58 for cataloging Distributions each book in 2002. on STS User Behavior on STS Enormous costs and efforts → automatic classification . Conclusions & Outlook Publications Arkaitz Zubiaga (UNED) PhD Thesis July 12th, 2011 7 / 98
Motivation Resource Classification PhD Thesis Arkaitz Zubiaga Motivation Selection of a Classifier Representation of resources → self-content . STS & Datasets Use of self-content of resources presents some issues : Representing the Not always representative enough. Aggregation of Tags Not always accessible (e.g., books). Tag Distributions on STS Social tags provided by users → alternative to solve the User Behavior problem. on STS Conclusions & Outlook Publications Arkaitz Zubiaga (UNED) PhD Thesis July 12th, 2011 8 / 98
Motivation Tagging PhD Thesis Arkaitz Zubiaga Motivation Selection of a Classifier STS & Datasets Representing the Aggregation of Tags Tag Distributions on STS User Behavior on STS Conclusions & Outlook Publications T1 , T2 , T3 = sets of tags . Arkaitz Zubiaga (UNED) PhD Thesis July 12th, 2011 9 / 98
Motivation Social Tagging PhD Thesis Arkaitz Zubiaga Motivation Selection of a Classifier STS & Datasets Representing the Aggregation of Tags Tag Distributions on STS User Behavior on STS Conclusions & Outlook Aggregation of user annotations → folksonomy . Publications Folksonomy: Folk (People) + Taxis (Classification) + Nomos (Management). Arkaitz Zubiaga (UNED) PhD Thesis July 12th, 2011 10 / 98
Motivation Organization of Resources PhD Thesis Arkaitz Zubiaga Motivation User annotations → own organization of resources . Selection of a Classifier STS & A user’s tags Datasets Representing Tag # Resources the Aggregation of 82 research Tags 28 twitter Tag Distributions 35 web2.0 on STS 42 language User Behavior on STS 64 english Conclusions & ... ... Outlook Publications Arkaitz Zubiaga (UNED) PhD Thesis July 12th, 2011 11 / 98
Motivation Example of Bookmarks PhD Thesis Arkaitz Zubiaga User Resource Tags Motivation 1 user1 flickr.com photo , web2.0 , social Selection of a Classifier 2 user2 flickr.com photography , images STS & 3 user1 google.com searchengine Datasets 4 user3 twitter.com microblogging , twitter Representing the Aggregation of Tags Bookmark: (1) user u i ∈ U who annotates Tag Distributions (2) resource r j ∈ R being annotated on STS (3) tags T ij = { t 1 , ..., t n } ∈ T utilized. User Behavior on STS Conclusions & Outlook Publications b ij : u i × r j × T ij Arkaitz Zubiaga (UNED) PhD Thesis July 12th, 2011 12 / 98
Motivation Sum of Annotations PhD Thesis Arkaitz Zubiaga Top tags (79,681 users) Motivation Tag Rank Tag User Count Selection of a Classifier 1 photos 22,712 STS & Datasets 2 flickr 19,046 Representing 3 photography 15,968 the Aggregation of 4 photo 15,225 Tags 5 sharing 10,648 Tag Distributions 6 9,637 images on STS 7 9,528 web2.0 User Behavior on STS 8 4,571 community Conclusions & 9 3,798 Outlook social Publications 10 3,115 pictures Arkaitz Zubiaga (UNED) PhD Thesis July 12th, 2011 13 / 98
Motivation Tag-based Resource Classification PhD Thesis Arkaitz Zubiaga Motivation Selection of a Classifier STS & Datasets Representing the Aggregation of Tags Tag Distributions on STS User Behavior on STS Conclusions & Outlook Publications Arkaitz Zubiaga (UNED) PhD Thesis July 12th, 2011 14 / 98
Motivation Problem Statement PhD Thesis Arkaitz Zubiaga How can the annotations provided by users on social tagging Motivation systems be exploited to improve the accuracy of a resource Selection of a classification task? Classifier STS & Datasets Representing the Aggregation of Tags Tag Distributions on STS User Behavior on STS Conclusions & Outlook Publications Arkaitz Zubiaga (UNED) PhD Thesis July 12th, 2011 15 / 98
Motivation Related Work PhD Thesis Arkaitz Zubiaga Motivation Selection of a Classifier Social tags for information management : STS & Datasets Search: Bao et al. (2007) & Heymann et al. (2008). Representing the Aggregation of Recommender Systems: Shepitsen et al. (2008) & Li Tags et al. (2008). Tag Distributions on STS Enhanced Browsing: Smith (2008). User Behavior on STS Conclusions & Outlook Publications Arkaitz Zubiaga (UNED) PhD Thesis July 12th, 2011 16 / 98
Motivation Related Work PhD Thesis Arkaitz Zubiaga Motivation Selection of a Classification: Noll and Meinel (2008) → statistical Classifier analysis of matches between tags & taxonomies . STS & Datasets Tags are useful for broad categorization . Representing Not for narrower categorization . the Aggregation of Tags Lack of further research with: Tag Distributions Actual classification experiments. on STS Other types of resources . User Behavior on STS Different representations of social tags. Conclusions & Outlook Publications Arkaitz Zubiaga (UNED) PhD Thesis July 12th, 2011 17 / 98
Selection of a Classifier Index PhD Thesis Arkaitz Zubiaga Motivation 1 Motivation Selection of a Selection of a Classifier 2 Classifier STS & Datasets STS & Datasets 3 Representing the Representing the Aggregation of Tags Aggregation of 4 Tags Tag Tag Distributions on STS 5 Distributions on STS User Behavior User Behavior on STS 6 on STS Conclusions & Outlook Conclusions & Outlook 7 Publications Publications 8 Arkaitz Zubiaga (UNED) PhD Thesis July 12th, 2011 18 / 98
Selection of a Classifier Characteristics of the task PhD Thesis Arkaitz Zubiaga We have: Motivation Selection of a Large set of resources : some labeled + many unlabeled. Classifier Multiclass taxonomy. STS & Datasets Automated classifiers learn a model from labeled Representing the resources . Aggregation of Tags This model is used to classify unlabeled resources Tag afterward. Distributions on STS User Behavior 2 learning settings: on STS Supervised : only labeled resources considered for learning. Conclusions & Outlook Semi-supervised : unlabeled resources are also taken into Publications account. Arkaitz Zubiaga (UNED) PhD Thesis July 12th, 2011 19 / 98
Selection of a Classifier Support Vector Machines (SVM) PhD Thesis Arkaitz Zubiaga Motivation Selection of a Classifier STS & Datasets Representing the Aggregation of Tags Tag Distributions on STS User Behavior on STS Hyperplane that separates with largest margin . Conclusions & Outlook Publications Use of kernels → redimensions the space. Resource/Hyperplane margin → Classifier’s reliability . Arkaitz Zubiaga (UNED) PhD Thesis July 12th, 2011 20 / 98
Recommend
More recommend