classifying the terms of service
play

Classifying the Terms of Service Capstone Presentation | Sam - PowerPoint PPT Presentation

Classifying the Terms of Service Capstone Presentation | Sam Beardsworth Goal Build a model to make Terms of Service easier to read How? Identify the content Extract the meaning Highlight important terms Approach No shortage of


  1. Classifying the Terms of Service Capstone Presentation | Sam Beardsworth

  2. Goal Build a model to make Terms of Service easier to read How? • Identify the content • Extract the meaning • Highlight important terms

  3. Approach No shortage of data: it's literally on every website But how to make sense of it? Answer: Use a pre-classified dataset (courtesy of ToS;DR)

  4. ToS;DR • started in June 2012 • aims to review and score Terms of Service policies of major web services • users can look up terms through website / browser extension • public, transparent, community-driven • volunteer project

  5. Data Gathering API : broken but was able to obtain the same info via public repos Additional challenges - ToS;DR has had 3 incarnations - API only has good data for incarnation #2 - Scrape all 3 and merge by ID Some manual cleaning needed

  6. Dataset 1688 observations (extracts) mean length: 65 words max length: 1410 words! 107340 words total / 6469 unique 17 columns - discarded 9 as purely administrative

  7. Dataset ID Status Service Source quote Topic Case Point 1720 pending facebook Cookie Policy 'We use Tracking Personal data bad cookies to used for help us show advertising ads...' 1311 approved nokia T&C 'Except as set Content Service retains bad forth in the deleted Privacy content Policy...' 2261 approved whatsapp NA 'When you Right to leave Data deleted good delete your after account WhatsApp closure account...' unique: 179 22 143 4

  8. Dataset: Filling the gaps

  9. EDA

  10. Lemmatization

  11. Topic Exploration • 22 topics, inbalanced • dropped <25 observations • remember to balance during classification

  12. Modelling 19 topics Baseline accuracy: 0.117 70-30 train-test split, stratified by topic Basic, untuned logistic regression Test accuracy: 0.615

  13. Improving the score • TF-IDF to reduce feature importance of common words • imblearn's RandomOverSampler to reduce class imbalance in the training set • GridSearchCV for optimal Logistic Regression hyperparameters Improved test accuracy: 0.641

  14. Beyond Logistic Regression the sklearn 'try everything' approach... ...optimised with GridSearch

  15. Model Comparison

  16. Alternative Models word2vec - 3.5 GB dictionary pre-trained on news articles - applied to pre-lemmatized tokens (corpus) - performed differently but more poorly Accuracy score: 0.613 Principle Component Analysis / SVD - explanatory value relatively low - 19% across PC1-2, 37% across PC1-10

  17. Alternative Models Latent Dirichlet Allocation (LDA) "a technique to extract the hidden topics from large volumes of text... The challenge is how to extract good quality of topics that are clear, segregated and meaningful " Some themes: - Consistently identified 'virtual currency' as a topic - Change and modification - Damage and waiver

  18. LDA Heatmap comparing unsupervised sorting into 19 topics, versus human- classified topics

  19. Where from here?

  20. Quiz You agree to provide Grammarly with accurate and complete registration information and to promptly notify Grammarly in the event of any changes to any such information. Anonymity & Tracking Personal Data ??? ???

  21. Quiz You agree to provide Grammarly with accurate and complete registration information and to promptly notify Grammarly in the event of any changes to any such information. Anonymity & Tracking Personal Data Human Model

  22. Quiz Nothing here should be considered legal advice. We express our opinion with no guarantee and we do not endorse any service in any way. Please refer to a qualified attorney for legal advice. Governance Guarantee ??? ???

  23. Quiz Nothing here should be considered legal advice. We express our opinion with no guarantee and we do not endorse any service in any way. Please refer to a qualified attorney for legal advice. Governance Guarantee Model Human

  24. Quiz For revisions to this Privacy Policy that may be materially less restrictive on our use or disclosure of personal information you have provided to us, we will make reasonable efforts to notify you and obtain your consent before implementing revisions with respect to such information. Personal Data Changes to Terms ??? ???

  25. Quiz For revisions to this Privacy Policy that may be materially less restrictive on our use or disclosure of personal information you have provided to us, we will make reasonable efforts to notify you and obtain your consent before implementing revisions with respect to such information. Personal Data Changes to Terms Model Human

  26. Practical Application Unfavourable Terms or: classifying into good and bad

  27. Extract Review

  28. Model Performance Same approach as before Best performer: - K-Nearest Neighbours: 0.71 What if we focus solely on unfavourable terms?

  29. Predicting Unfavourable Terms • Do people really care about good or neutral statements? • Real value is in being able to highlight potential unfavourable terms Reclassify: - Good + Neutral = Neutral - Bad = Warning

  30. Binary Classification Improved performance Best performers: - K-Nearest Neighbours: 0.75 - LinearSVC: 0.76 Additional benefit: ability to tune the model to correctly predict more warning statements at expense of more 'false' warnings.

  31. Evaluation / Next Steps There are three areas for next steps: 1. Building a proof of concept for an end-user classification tool 2. Improve the model 3. Subject matter expertise

  32. Questions?

Recommend


More recommend