bootstrapped authorship attribution in compression space
play

Bootstrapped Authorship Attribution in Compression Space Ramon de - PowerPoint PPT Presentation

Bootstrapped Authorship Attribution in Compression Space Ramon de Graaf Leiden Institute of Advanced Computer Science Cor Veenman Digital Technology and Biometrics Department Bootstrapped Authorship Attribution in Compression Space de Graaff


  1. Bootstrapped Authorship Attribution in Compression Space Ramon de Graaf Leiden Institute of Advanced Computer Science Cor Veenman Digital Technology and Biometrics Department Bootstrapped Authorship Attribution in Compression Space de Graaff & Veenman - PAN 2012 Poster Preview

  2. PAN Authorship Attribution Problem • Multi-class statistical pattern recognition problem – Proper feature representation • Dataset properties – Very few training document samples – Low number of authors – Large documents • Performance measure – Average precision, recall, and F1 score over all authors Bootstrapped Authorship Attribution in Compression Space de Graaff & Veenman - PAN 2012 Poster Preview

  3. Approach • Low dimensional feature representation – Compression Distances to Prototypes (CDP) >Compression distance measure (CDM) >Compressor: Prediction by Partial Matching (PPM) • Prototypes required to compute distance to – Draw one from each training document without replacement • To learn a statistical model, more samples required – Bootstrapping from the large training document Bootstrapped Authorship Attribution in Compression Space de Graaff & Veenman - PAN 2012 Poster Preview

Recommend


More recommend