Enriching the Web by Modeling Reading Difficulty Kevyn Collins-Thompson Associate Professor, University of Michigan ESAIR 2013: Exploiting Semantic Annotations in Information Retrieval October 28, 2013
Acknowledgements Joint work with my collaborators: Paul Bennett, Ryen White, Sue Dumais ( MSR ) Jin Young Kim ( Microsoft ) Sebastian de la Chica ( Microsoft ) Paul Kidwell ( LLNL ) Guy Lebanon ( Amazon ) David Sontag ( NYU ) Enriching the Web with Readability Metadata
Bringing together readability and the Web … sometimes in unexpected ways Text Readability Modeling and Prediction We use the comparative and superlative form to compare and contrast different objects in English. Vocabulary Use the comparative form to show the difference between two objects. Example: New York is more exciting than Seattle. Use the superlative form when speaking about three or more objects to show which object is 'the most' of something. Example: New York Topic Interest is the most exciting city in the USA. Here is a chart showing how to construct the comparative form in English. Notice in the example sentences that we use 'than' to compare the two objects We use the comparative and superlative form to compare and contrast different objects in English. Syntax Use the comparative form to show the difference between two objects. Example: New York is more exciting than Seattle. Use the superlative form when speaking about three or more objects to show which Coherence object is 'the most' of something. Example: New York is the most exciting city in the USA. Here is a chart showing how to construct the comparative form in English. Notice in the example Visual Cues sentences that we use 'than' to compare the two objects Reading level prediction Topic prediction
Bringing together readability and the Web … sometimes in unexpected ways The Web We use the comparative and superlative form to compare and contrast different objects in English. Vocabulary Use the comparative form to show the difference between two objects. Example: New York is more exciting than Seattle. Use the superlative form when speaking about three or more objects to show which object is 'the most' of something. Example: New York Topic Interest is the most exciting city in the USA. Here is a chart showing how to construct the comparative form in English. Notice in the example Readability of content sentences that we use 'than' to compare the two objects We use the comparative and superlative form to compare and contrast different objects in English. Syntax Use the comparative form to show the difference between two objects. Example: New York is more exciting than Seattle. Use the superlative form when speaking about three or more objects to show which Coherence object is 'the most' of something. Example: New York is the most exciting city in the USA. Here is a chart showing how to construct the comparative form in English. Notice in the example Visual sentences that we use 'than' to compare the two objects Search Engines
How modeling reading difficulty enriches the Web: Adding reading level metadata to pages leads to novel applications and unexpected insights Web Pages Queries Web sites User Model We use the comparative and superlative form to compare Estimating query topic Educational and contrast different objects in English. Use the comparative form to show the difference between two We use the comparative and superlative form to compare objects. Example: New York is more exciting than Seattle. difficulty and contrast different objects in English. Use the Use the superlative form when speaking about three or comparative form to show the difference between two augmentation more objects to show which object is 'the most' of We use the comparative and superlative form to compare objects. Example: New York is more exciting than Seattle. something. Example: New York is the most exciting city in the and contrast different objects in English. Use the Use the superlative form when speaking about three or USA. comparative form to show the difference between two more objects to show which object is 'the most' of Here is a chart showing how to construct the comparative objects. Example: New York is more exciting than Seattle. something. Example: New York is the most exciting city in the form in English. Notice in the example sentences that we Use the superlative form when speaking about three or USA. use 'than' to compare the two objects We use the more objects to show which object is 'the most' of Here is a chart showing how to construct the comparative comparative and superlative form to compare and contrast something. Example: New York is the most exciting city in the form in English. Notice in the example sentences that we Resolving ambiguity by different objects in English. Use the comparative form to USA. use 'than' to compare the two objects We use the show the difference between two objects. Example: New Here is a chart showing how to construct the comparative comparative and superlative form to compare and contrast York is more exciting than Seattle. Use the superlative form form in English. Notice in the example sentences that we different objects in English. Use the comparative form to Reading Level when speaking about three or more objects to show which use 'than' to compare the two objects We use the reading level show the difference between two objects. Example: New object is 'the most' of something. Example: New York is the comparative and superlative form to compare and contrast York is more exciting than Seattle. Use the superlative form most exciting city in the USA. different objects in English. Use the comparative form to Predicting site when speaking about three or more objects to show which User ability and Here is a chart showing how to construct the comparative show the difference between two objects. Example: New object is 'the most' of something. Example: New York is the form in English. Notice in the example sentences that we York is more exciting than Seattle. Use the superlative form Metadata most exciting city in the USA. use 'than' to compare the two objects when speaking about three or more objects to show which Here is a chart showing how to construct the comparative object is 'the most' of something. Example: New York is the form in English. Notice in the example sentences that we expertise expertise most exciting city in the USA. use 'than' to compare the two objects Here is a chart showing how to construct the comparative Personalized search by form in English. Notice in the example sentences that we use 'than' to compare the two objects reading difficulty 0.4 Page global Snippet reading level 0.2 prediction 0 Level User annotation Computing better snippets (page-snippet match) Trustworthiness? In-page Personalized variation difficulty measures Assessing user motivation Enriching the Web with Readability Metadata
Web pages occur at a wide range of reading difficulty levels Query [insect diet]: Lower difficulty Enriching the Web with Readability Metadata
Medium difficulty [insect diet] Enriching the Web with Readability Metadata
Higher difficulty [insect diet] Enriching the Web with Readability Metadata
Users also exhibit a wide range of proficiency and expertise • Students at different grade levels • Non-native speakers • General population – Large variation in language proficiency – Special needs, language deficits – Familiarity or expertise in specific topic areas • Even for a single user there can be broad variation in intent across search queries Enriching the Web with Readability Metadata
Default results for [insect diet] Enriching the Web with Readability Metadata
Relevance as seen by an elementary school student (e.g. age 10) X Technical X Technical X Relevance X Relevance X Technical X Relevance X Technical Enriching the Web with Readability Metadata
Blending in lower difficulty results would improve relevance for this user X Technical X Technical X Relevance X Relevance Enriching the Web with Readability Metadata
Reading difficulty has many factors • Factors include: – Semantics, e.g. vocabulary – Syntax, e.g. sentence structure, complexity – Discourse-level structure – Reader background and interest in topic – Text legibility – Supporting illustrations and layout • Different from parental control, UI issues Enriching the Web with Readability Metadata
Traditional readability measures don’t work for Web content • Flesch-Kincaid (Microsoft Word) RG FK 0 . 39 [ Words / Sentence ] 11 . 8 [ Syllables / Word ] 15 . 59 • Problems include: – They assume the content has well-formed sentences – They are sensitive to noise – Input must be at least 100 words long • Web content is often short, noisy, less structured – Page body, titles, snippets, queries, captions, … • Billions of pages → computational constraints on metadata types • We focus on vocabulary-based prediction models that learn fine- grained models of word usage from labeled texts Enriching the Web with Readability Metadata
Recommend
More recommend