What to Read Next? The Value of Social Metadata for Book Search - PowerPoint PPT Presentation

What to Read Next? The Value of Social Metadata for Book Search Toine Bogers Aalborg University Copenhagen Research seminar talk January 21, 2014

Outline • Introduction • Types of book discovery • Problem statement & talk focus • Methodology • Results & analysis • Discussion & conclusions 2

Books are not dead (they aren’t even sick!) • Books remain very popular! - No. of books sold: 2.57 billion books in US in 2010 (up by 4.1% from 2008) - Sales revenue: $13.9 billion in US in 2010 (up by 5.8% from 2008) - Sales revenue: up by 11.8% in US from Q1 2011 to Q1 2012 ‣ E-books top-selling category for the first time, at the expense of paperback sales - > 3 million new books published in the US in 2011 • So there is definitely a need for discovering (new) interesting books! 3

Types of book discovery • Search (“ Show me all books about X ”) 4

Bibliotek.dk

Types of book discovery • Search (“ Show me all books about X ”) • Recommendation (“ Show me interesting books! ”) 6

Amazon.com

Types of book discovery • Search (“ Show me all books about X ”) • Recommendation (“ Show me interesting books! ”) - 64% of library patrons are interested in personalized recommendations! 8

Types of book discovery • Search (“ Show me all books about X ”) • Focused recommendation (“ Show me interesting books about X! ”) • Recommendation (“ Show me interesting books! ”) 10

LibraryThing forum topic

Types of book discovery • Search (“ Show me all books about X ”) • Focused recommendation (“ Show me interesting books about X! ”) • Recommendation (“ Show me interesting books! ”) 13

Problem statement & talk focus • Problem statement - How can we provide the best possible focused book recommendations? t o n e r a e w • Research questions o S t ! x e t l u l f t a g n i k o o l 1. How can we ensure recommendations are topically relevant ? Which book metadata is most instrumental in finding relevant books? 2. How can we ensure recommendations are of high quality How do we incorporate taste/opinions into the recommendation process? 3. How can we best combine quality and topicality? 14

Methodology • Topically relevant recommendations → right up the alley of a text search engine ! • What do we need to evaluate a book search engine? - Large collection of book records - Realistic book requests & information needs (= topics ) - Relevance judgments (“ Which books are relevant for which topics? ”) ‣ Need to alleviate some of the problems of system-based evaluation! - Realistic evaluation metric 15

Methodology: Collection of book records • Amazon/LibraryThing collection - Part of the 2011-2013 INEX Social Book Search track - 2.8 million book metadata records ‣ Mix of metadata from Amazon and Librarything ‣ Controlled metadata from Library of Congress (LoC) and British Library (BL) ‣ ISBNs are used as document ID (similar editions linked to the same work) ‣ Balanced mix of fiction and non-fiction - Provides for a natural test-bed for focused recommendation ! 16

Methodology: Collection of book records Amazon + LoC + BL • Different groups of metadata fields • Different grou • Different grou * * * - Title - Blurb - Dewey * - Publisher - Epigraph - Thesaurus * * - Editorial - First words - Index terms Controlled metadata - Creator - Last words - Series - Quotation - Tags Content - Award Tags - Character - User reviews Reviews - Place LibraryThing Metadata 17

Methodology: Topics & relevance judgments • Realistic book requests & information needs - Focused book recommendations can touch upon many different aspects ‣ Users search for topics, genres, authors, plots, etc. ‣ Users want books that are engaging, funny, well-written, educational, etc. ‣ Users have different preferences, knowledge, reading level, etc. - LibraryThing fora contain many such focused requests! 18

Topic title Annotated LT topic Group name Narrative 19

Methodology: Topics & relevance judgments • Realistic book requests & information needs - Focused book recommendations can touch upon many different aspects ‣ Users search for topics, genres, authors, plots, etc. ‣ Users want books that are engaging, funny, well-written, educational, etc. ‣ Users have different preferences, knowledge, reading level, etc. - LibraryThing fora contain many of such focused requests! - Collected 211 different topics from the LibraryThing fora, annotated with ‣ Type (fiction vs. non-fiction) ‣ Subject (same author, subject, series, genre, known item, edition) 20

Methodology: Topics 2% 2% Genre Known-item Edition 2% Series 3% Other 2% 43% 48% 52% 46% Fiction Non-fiction Author Subject 21

Methodology: Relevance judgments • Problem: relevance often judged by students or retired CIA analysts • Solution: take recommendations from LT members 22

Topic title Annotated LT topic Group name Narrative Recommended books 23

Methodology: Relevance judgments • Problem: relevance often judged by students or retired CIA analysts • Solution: take recommendations from LT members - Provided by people interested in the topic, - Free of charge, - Judged both on topical relevance and quality ! • Graded relevance scoring - Relevance score of 1 if suggested by other LT members 24

Catalog additions Forums suggestions added after the topic was posted 25

Methodology: Relevance judgments • Problem: relevance often judged by students or retired CIA analysts • Solution: take recommendations from LT members - Provided by people interested in the topic, - Free of charge, - Judged both on topical relevance and quality ! • Graded relevance scoring - Relevance score of 1 if suggested by other LT members - Relevance score of 4 if added by the topic creator after posting the request 26

Methodology: Evaluation • Main metric: Normalized Discounted Cumulated Gain (NDCG) - Measures the usefulness (gain) of DCG a book in the ranked results list ideal ranking ‣ Scores range between 0.0 and 1.0 system output the closer, - Book ranking matters (as opposed the better! to regular Precision) ‣ Relevant books before non-relevant books rank - Takes graded relevance judgments into account ‣ Highly relevant books before slightly relevant books, etc. - Evaluated on NDCG@10 (only over the first 10 results) 27

Results Set of metadata fields NDCG@10 Metadata 0.2015 Content 0.0115 Controlled metadata 0.0496 Controlled metadata (+LoC, +BL) 0.0691 Tags 0.2056 Reviews 0.2832 All fields 0.3058 All fields (+LoC, +BL) 0.3029 0.0 0.1 0.2 0.3 0.4 NDCG@10 28

Results: Does controlled metadata help? Set of metadata fields NDCG@10 Metadata 0.2015 Content 0.0115 Controlled metadata 0.0496 Controlled metadata (+LoC, +BL) 0.0691 Tags 0.2056 Reviews 0.2832 All fields 0.3058 All fields (+LoC, +BL) 0.3029 0.0 0.1 0.2 0.3 0.4 NDCG@10 29

Results: Tags vs. controlled metadata Set of metadata fields NDCG@10 Metadata 0.2015 Content 0.0115 Controlled metadata 0.0496 Controlled metadata (+LoC, +BL) 0.0691 Tags 0.2056 Reviews 0.2832 All fields 0.3058 All fields (+LoC, +BL) 0.3029 0.0 0.1 0.2 0.3 0.4 NDCG@10 30

Results: Fiction vs. non-fiction Non- Metadata fields Fiction fiction Metadata 0.2297 0.1798 Controlled metadata 0.0998 0.0461 Tags 0.1804 0.1576 Reviews 0.2975 0.2671 All fields 0.3228 0.2806 0.0 0.1 0.2 0.3 0.4 Note: ‘Content’ left out, ‘Controlled metadata’ NDCG@10 and ‘All fields’ is w/ LoC and BD metadata 33

Results: Author vs. subject Metadata fields Author Subject Metadata 0.2600 0.1795 Controlled metadata 0.1628 0.0529 Tags 0.1738 0.1629 Reviews 0.4170 0.2499 All fields 0.4095 0.2697 0.0 0.1 0.2 0.3 0.4 Note: ‘Content’ left out, ‘Controlled metadata’ NDCG@10 and ‘All fields’ is w/ LoC and BD metadata 34

Results: Author vs. subject Metadata fields Author Subject Metadata 0.2600 0.1795 Controlled metadata 0.1628 0.0529 Tags 0.1738 0.1629 Reviews 0.4170 0.2499 All fields 0.4095 0.2697 0.0 0.1 0.2 0.3 0.4 Note: ‘Content’ left out, ‘Controlled metadata’ NDCG@10 and ‘All fields’ is w/ LoC and BD metadata 35

Results: Tags vs. controlled metadata Set of metadata fields NDCG@10 Metadata 0.2015 Content 0.0115 Controlled metadata 0.0496 Controlled metadata (+LoC, +BL) 0.0691 Tags 0.2056 Reviews 0.2832 All fields 0.3058 All fields (+LoC, +BL) 0.3029 0.0 0.1 0.2 0.3 0.4 NDCG@10 36

What to Read Next? The Value of Social Metadata for Book Search - PowerPoint PPT Presentation

What to Read Next? The Value of Social Metadata for Book Search Toine Bogers Aalborg University Copenhagen Research seminar talk January 21, 2014 Outline Introduction Types of book discovery Problem statement & talk focus

What to Read Next? The Value of Social Metadata for Book Search Toine Bogers Royal School of

Book of Abstracts and Presentation of Keynote Speakers Book of Abstracts of the 7th Metadata and

UNSD metadata template / SDMX Metadata Structure Definition Elena De Jess, UNSD Standardized

Hitachi NEXT 2018 Automating Onboarding Data with Metadata Injection Contents Page 2:

Metadata In ArcGIS 10.0 Jason Cupp Whats New In ArcGIS 10.0 New Metadata Editor for

From SDTM to displays, through ADaM & Analyses Results Metadata, a flight on board METADATA

Batch Metadata Editing in DSpace 1.6+ Maureen P. Walsh, The Ohio State University Libraries

DUNE Data Model Meeting: Metadata Metadata Needs And Considerations Steven Timm The following

Water Quality Fun Book ter Quality Fun Book Water Quality Fun Book ater Quality Fun Book Join

Book Diskette Guide Life Presentation Skill Windows pem chodron book titles for windows xp book

MIXED BOOK FOCUS QUESTIONS WHAT WAS THE BOOK ABOUT? GIVE A QUICK SUMMARY OF THE BOOK- BUT

C hildrens Book Award Federation of Childrens Book Groups Sponsorship Charity no. 268289 C

The Mata Book William Gould President StataCorp LLC September 2018, London W. Gould

We are learning: to write a book review Sharing a shell Day 4 Book Review.notebook June 04, 2020

Baby Penguin Slips and Slides Baby Penguin Slips and Slides Book Review Book Review This book

The Book of the Covenant Then he took the Book of the Covenant and read in the hearing of the

Informatics 1: Data & Analysis Lecture 13: Annotation of Corpora Ian Stark School of

PAPER SESSIONS (EVOLUTION EDITION) Dr. Vadim Zaytsev 5/10/14 November 2014 THESIS FAIR 10

P U B L I C P O L I C Y F O R FA I R N E S S & E F F I C I E N C Y I I MPA 612: Economy,

Decays and transition form factors of 0 , and ' mesons: status at KLOE/KLOE-2 and other

A Glimpse of the History of Cryptography Cunsheng Ding Department of Computer Science HKUST,

Towards Transferring Bulgarian Sentences with Elliptical Elements to Universal Dependencies

L ECTURE 24: D ATA A SSOCIATION L INE F EATURES I NSTRUCTOR : G IANNI A. D I C ARO F E AT U R E E

Hadrons tutorial Antonin Portelli - 03/11/2020 1. Installation Install Grid, the options and

Sambuz

Useful Links

Newsletter

Mail Us

What to Read Next? The Value of Social Metadata for Book Search - PowerPoint PPT Presentation

What to Read Next? The Value of Social Metadata for Book Search Toine Bogers Aalborg University Copenhagen Research seminar talk January 21, 2014 Outline Introduction Types of book discovery Problem statement & talk focus

What to Read Next? The Value of Social Metadata for Book Search Toine Bogers Royal School of

Book of Abstracts and Presentation of Keynote Speakers Book of Abstracts of the 7th Metadata and

UNSD metadata template / SDMX Metadata Structure Definition Elena De Jess, UNSD Standardized

Hitachi NEXT 2018 Automating Onboarding Data with Metadata Injection Contents Page 2:

Metadata In ArcGIS 10.0 Jason Cupp Whats New In ArcGIS 10.0 New Metadata Editor for

From SDTM to displays, through ADaM &amp; Analyses Results Metadata, a flight on board METADATA

Batch Metadata Editing in DSpace 1.6+ Maureen P. Walsh, The Ohio State University Libraries

DUNE Data Model Meeting: Metadata Metadata Needs And Considerations Steven Timm The following

Water Quality Fun Book ter Quality Fun Book Water Quality Fun Book ater Quality Fun Book Join

Book Diskette Guide Life Presentation Skill Windows pem chodron book titles for windows xp book

MIXED BOOK FOCUS QUESTIONS WHAT WAS THE BOOK ABOUT? GIVE A QUICK SUMMARY OF THE BOOK- BUT

C hildrens Book Award Federation of Childrens Book Groups Sponsorship Charity no. 268289 C

The Mata Book William Gould President StataCorp LLC September 2018, London W. Gould

We are learning: to write a book review Sharing a shell Day 4 Book Review.notebook June 04, 2020

Baby Penguin Slips and Slides Baby Penguin Slips and Slides Book Review Book Review This book

The Book of the Covenant Then he took the Book of the Covenant and read in the hearing of the

Informatics 1: Data &amp; Analysis Lecture 13: Annotation of Corpora Ian Stark School of

PAPER SESSIONS (EVOLUTION EDITION) Dr. Vadim Zaytsev 5/10/14 November 2014 THESIS FAIR 10

P U B L I C P O L I C Y F O R FA I R N E S S &amp; E F F I C I E N C Y I I MPA 612: Economy,

Decays and transition form factors of 0 , and ' mesons: status at KLOE/KLOE-2 and other

A Glimpse of the History of Cryptography Cunsheng Ding Department of Computer Science HKUST,

Towards Transferring Bulgarian Sentences with Elliptical Elements to Universal Dependencies

L ECTURE 24: D ATA A SSOCIATION L INE F EATURES I NSTRUCTOR : G IANNI A. D I C ARO F E AT U R E E

Hadrons tutorial Antonin Portelli - 03/11/2020 1. Installation Install Grid, the options and

Sambuz

Useful Links

Newsletter

Mail Us

From SDTM to displays, through ADaM & Analyses Results Metadata, a flight on board METADATA

Informatics 1: Data & Analysis Lecture 13: Annotation of Corpora Ian Stark School of

P U B L I C P O L I C Y F O R FA I R N E S S & E F F I C I E N C Y I I MPA 612: Economy,