exploring measures of readability for spoken language
play

Exploring Measures of Readability for Spoken Language Introduction - PowerPoint PPT Presentation

Readability for Spoken Language Sowmya Vajjala and Detmar Meurers Exploring Measures of Readability for Spoken Language Introduction Analyzing linguistic features of subtitles to identify age-specific TV programs Our Approach The


  1. ’Readability’ for Spoken Language Sowmya Vajjala and Detmar Meurers Exploring Measures of “Readability” for Spoken Language Introduction Analyzing linguistic features of subtitles to identify age-specific TV programs Our Approach The Corpus Features Tools and Resources Sowmya Vajjala and Detmar Meurers Experiments Setup and General Results Feature Selection University of T¨ ubingen, Germany Ablation Test Results Confusion Matrix Effect of Text Size Conclusions The 3rd Workshop on Predicting and Improving Text Readability for Target Reader Populations @ EACL, Gothenburg, April 27, 2014 1 / 19

  2. ’Readability’ for The talk in a nutshell Spoken Language Sowmya Vajjala and Detmar Meurers Introduction Our Approach ◮ Idea: investigate if features from readability assessment The Corpus can be used to characterize age-specific TV programs. Features Tools and Resources ◮ based on a corpus of BBC subtitles Experiments Setup and General Results ◮ using a text classification approach Feature Selection Ablation Test Results Confusion Matrix Effect of Text Size ◮ We show that the authentic materials targeting specific Conclusions age groups exhibit ◮ a broad range of linguistic and psycholinguistic characteristics ◮ indicative of the complexity of the language used. ◮ Our approach reaches an accuracy of 95.9%. 2 / 19

  3. ’Readability’ for Motivation Spoken Language Sowmya Vajjala and Detmar Meurers Introduction ◮ Reading, listening and watching TV are all ways to Our Approach obtain information. The Corpus Features ◮ Some TV programs are also created for particular Tools and Resources Experiments age-groups (similar to graded readers). Setup and General Results Feature Selection ◮ Audio-visual presentation and language are important Ablation Test Results Confusion Matrix factors in creating age-specific TV programs. Effect of Text Size Conclusions ◮ How characteristic of the targeted age group is language by itself? ◮ We hypothesize that the linguistic complexity of the subtitles is a good predictor. ◮ We explore this hypothesis using features from automatic readability assessment. 3 / 19

  4. ’Readability’ for Our Approach: Overview Spoken Language Sowmya Vajjala and Detmar Meurers Introduction Our Approach The Corpus Features ◮ Corpus : BBC subtitles (Van Heuven et al. 2014) Tools and Resources Experiments ◮ TV programs targeting different age groups Setup and General Results Feature Selection Ablation Test Results ◮ Features : range of properties, mostly from Second Confusion Matrix Effect of Text Size Language Acquisition and Psycholinguistic research Conclusions ◮ Modeling : three-class text classification ◮ Evaluation : accuracy, with 10-fold cross-validation 4 / 19

  5. ’Readability’ for The BBC Subtitles Corpus Spoken Language Sowmya Vajjala and Detmar Meurers Introduction ◮ The BBC started subtitling all scheduled programs on Our Approach its main channels in 2008. The Corpus Features Tools and Resources ◮ Van Heuven et al. (2014) compiled a subtitles corpus Experiments Setup and General Results from nine BBC TV channels. Feature Selection Ablation Test Results Confusion Matrix ◮ Subtitles of four channels are annotated: Effect of Text Size CBeebies, CBBC, News and Parliament. Conclusions ◮ Corpus in numbers: Program Category Age group # texts avg. tokens avg. sentence length per text (in words) CBEEBIES < 6 years 4846 1144 4.9 CBBC 6–12 years 4840 2710 6.7 Adults (News + Parliament) > 12 years 3776 4182 12.9 ◮ We use a balanced set consisting of 3776 texts per class. 5 / 19

  6. ’Readability’ for Features 1 Spoken Language Sowmya Vajjala and Detmar Meurers ◮ Lexical Features Introduction ◮ lexical richness features from Second Language Our Approach Acquisition (SLA) research The Corpus Features ◮ e.g., Type-Token ratio, noun variation, . . . Tools and Resources Experiments ◮ POS density features Setup and General Results ◮ e.g., # nouns/# words, # adverbs/# words, . . . Feature Selection Ablation Test Results Confusion Matrix ◮ traditional features and formulae Effect of Text Size ◮ e.g., # characters per word, Flesch-Kincaid score, . . . Conclusions ◮ Syntactic Features ◮ syntactic complexity features from SLA research. ◮ e.g., # dep. clauses/clause, average clause length, . . . ◮ other parse tree features ◮ e.g., # NPs per sentence, avg. parse tree height, . . . = Features from Vajjala & Meurers (2012) 6 / 19

  7. ’Readability’ for Features 2 Spoken Language Sowmya Vajjala and Detmar Meurers Introduction Our Approach The Corpus ◮ Morphological properties of words Features Tools and Resources ◮ e.g., Does the word contain a stem along with an affix? Experiments Setup and General Results abundant=abound+ant Feature Selection Ablation Test Results ◮ Age of Acquisition (AoA) Confusion Matrix Effect of Text Size ◮ average age-of-acquisition of words in a text Conclusions ◮ Other Psycholinguistic features ◮ e.g., word abstractness ◮ Avg. number of senses per word (obtained from WordNet) 7 / 19

  8. ’Readability’ for Implementation details Spoken Language Sowmya Vajjala and Tools, Resources and Algorithms used Detmar Meurers Introduction ◮ Tools: Our Approach ◮ For Lexical Features The Corpus Features ◮ Stanford Tagger (Toutanova et al. 2003) Tools and Resources Experiments ◮ For Syntactic Features Setup and General Results Feature Selection ◮ Berkeley Parser (Petrov & Klein 2007) Ablation Test Results Confusion Matrix ◮ Tregex Pattern Matcher (Levy & Andrew 2006) Effect of Text Size ◮ For Classification: Conclusions ◮ algorithms implemented in WEKA (http://www.cs.waikato.ac.nz/ml/weka/) . ◮ Resources: ◮ Celex Lexical Database (http://celex.mpi.nl) ◮ Kuperman et al. (2012)’s AoA ratings ◮ MRC Psycholinguistic database (http://ota.oucs.ox.ac.uk/headers/1054.xml) ◮ Wordnet Database (http://wordnet.princeton.edu) 8 / 19

  9. ’Readability’ for Classification Experiments Spoken Language Sowmya Vajjala and Detmar Meurers Introduction Our Approach The Corpus Features ◮ We explored several classification algorithms Tools and Resources (SMO, J48 Decision tree, Logistic Regression) Experiments Setup and General Results ◮ SMO marginally outperformed the others (1–1.5%). Feature Selection Ablation Test Results ◮ So, all further experiments were performed with SMO. Confusion Matrix Effect of Text Size Conclusions ◮ Random baseline: 33% ◮ Sentence length baseline: 71.4% ◮ Accuracy using the full set of 152 features: 95.9%. 9 / 19

  10. ’Readability’ for Feature Selection Spoken Language Sowmya Vajjala and Detmar Meurers Introduction Our Approach The Corpus We explored two feature selection approaches to understand Features Tools and Resources which features contribute the most to classification accuracy. Experiments Setup and General Results Feature Selection Ablation Test Results 1. Select features individually based on Information Gain (IG) Confusion Matrix Effect of Text Size ◮ implemented as InfoGainAttributeEval in WEKA Conclusions 2. Select a subset of those features that do not correlate with each other but are highly predictive. ◮ implemented as CfsSubsetEval (Hall 1999) in WEKA 10 / 19

  11. ’Readability’ for Results for Top 10 IG features Spoken Language Sowmya Vajjala and Detmar Meurers Introduction Rank Feature Accuracy Our Approach 1 avg. AoA (Kuperman et al. 2012) 82.4% The Corpus 2 avg. # PPs in a sentence 74.0% Features Tools and Resources 3 avg. # instances where the lemma 77.7% Experiments has stem and affix Setup and General Results Feature Selection 4 avg. parse tree height 73.4% Ablation Test Results Confusion Matrix 5 avg. # NPs in a sentence 73.0% Effect of Text Size 6 avg. # instances of affix substitution 74.3% Conclusions 7 avg. # prep. in a sentence 72.0% 8 avg. # instances where a lemma is 68.3% not a count noun 9 avg. # clauses per sentence 72.5% 10 sentence length 71.4% Accuracy with all the 10 features together: 84.5%. 11 / 19

  12. ’Readability’ for Feature selection with CfsSubsetEval Spoken Language Sowmya Vajjala and Detmar Meurers Introduction Our Approach The Corpus Features Tools and Resources Experiments ◮ 41 features of the total of 152 features are selected. Setup and General Results Feature Selection Ablation Test Results ◮ The full list of selected features is provided in the paper. Confusion Matrix Effect of Text Size ◮ Classification accuracy with 41 features: 93.9%. Conclusions → only 2% less than classification with all the features 12 / 19

  13. ’Readability’ for Feature selection: Result Summary Spoken Language Sowmya Vajjala and Detmar Meurers Introduction Our Approach The Corpus Features Tools and Resources Feature Subset (#) Accuracy SD on 10-fold CV Experiments All Features (152) 95.9% 0.37 Setup and General Results Feature Selection Cfs on all features (41) 93.9% 0.59 Ablation Test Results Confusion Matrix Top-10 IG features (10) 84.5% 0.70 Effect of Text Size Conclusions Avg. SD for the test sets in all CV folds is given to make comparisons in terms of statistical significance possible. 13 / 19

Recommend


More recommend