glad groningen lightweight authorship detection
play

GLAD: Groningen Lightweight Authorship Detection PAN, Authorship - PowerPoint PPT Presentation

GLAD: Groningen Lightweight Authorship Detection PAN, Authorship verification, 2015 Manuela Hrlinmann, Benno Weck, Esther van den Berg, Simon uster, Malvina Nissim The challenge given: a set of Known documents written by the same Author


  1. GLAD: Groningen Lightweight Authorship Detection PAN, Authorship verification, 2015 Manuela Hürlinmann, Benno Weck, Esther van den Berg, Simon Š uster, Malvina Nissim

  2. The challenge given: a set of Known documents written by the same Author A_K , given: one Unknown document written by an unknown Author A_U, task: determine whether A_U = A_K

  3. How can we recognise different authors?

  4. How can we recognise different authors? Unusual word choice? Shorter sentences? More complex grammar?

  5. How can we recognise different authors? individual_vector(feat1, feat2…) individual_vector(feat1, feat2…) individual_vector(feat1, feat2…)

  6. How can we then differentiate between authors?

  7. How can we then differentiate between authors? Different word choice? Different sentence length? Different grammar?

  8. How can we then differentiate between authors? similarity_vector(feat1, feat2, …)

  9. Our approach • machine learning approach training on PAN (2015) data • using SVM to do two-class classification task • a set of features • feature ablation studies to tune the system to each different language

  10. The core aim - A lightweight system!

  11. The aim Input in training training training any instance instance instance language

  12. The aim Input in training training training any instance instance instance language Features should be easy to extract model

  13. The aim Input in training training training any instance instance instance language Features should be easy to extract model Training & Testing time should be fast prediction

  14. Our features

  15. Our features similarity_vector(entropy_of_known, visual_features, …)

  16. Our features To determine relevance: grouping

  17. Our features Individual Individual Joint - = Vector_K(feat1,feat2) Vector_U(feat1,feat2) Vector_Joint(feat1,feat2)

  18. Comparing features

  19. Comparing features Results of ablation & single-feature experiments: Helpful features

  20. Side note: • Punctuation Visual features • Line ending • Letter case • Ling length • Block size

  21. Side note: • Punctuation Visual features • Line ending • Letter case • Ling length • Block size Con Not a • characteristic of the author Not a • linguistic feature

  22. 
 
 Side note: • Punctuation Visual features • Line ending • Letter case • Ling length • Block size Pro Con Not a • Can be • characteristic author- “Pa-pa, pa-pa, pa-pa! 
 of the author specific for Not a • some genres Here, stop her. She’ll fall down. linguistic If it works… • Here, turn around. Walk this way. feature Ma-ma, ma-ma, ma-ma; 
 Oh, I think you are a darling. Mer-ry Christ-mas! Mer-ry Christmas.”

  23. Comparing features Results of ablation & single-feature experiments: Harmful features

  24. Comparing features Results of ablation & single-feature experiments: Features that are harmful, helpful, or helpful-depending-on-the-language

  25. Comparing features Results of ablation & single-feature experiments: Features that are harmful, helpful, or helpful-depending-on-the-language

  26. Comparing features Results of ablation & single-feature experiments: Differences are subtle

  27. Comparing features Results of ablation & single-feature experiments: Differences are subtle

  28. Resulting groups

  29. Results

  30. Results • Simple similarity features work

  31. Results • Simple similarity features work in unison

  32. Results • Simple similarity features work in unison independent of language (except greek)

  33. Results • Simple similarity features work in unison independent of language (except greek) • System works fast (runtime av. 1 minute)

  34. Final conclusion GLAD … is a light and fast language- independent system … allows language adaptation done via feature selection … involves innovative visual features which appear useful (especially for English data) and could be investigated further

Recommend


More recommend