Authorship Identification with Modality Specific Meta Features - PowerPoint PPT Presentation

Authorship Identification with Modality Specific Meta Features Thamar Solorio, Sangita Pillay, Manuel Montes, Natural Language Processing Lab University of Alabama at Birmingham Thamar Solorio (UAB) PAN 2011 1 / 11

Introduction Introduction Authorship attribution assumes unique and identifiable writeprints in text. But similarities exist among authors across specific linguistic dimensions . We want to take advantage of these similarities to improve prediction accuracy. Thamar Solorio (UAB) PAN 2011 2 / 11

Proposed approach Proposed approach Idea: Exploit independent clustering of linguistic modalities to generate meaningful meta features Assumption: The individual processing of linguistic modalities will allow the extraction of relations in the writeprint of authors, and these relations will be unique for each author. Thamar Solorio (UAB) PAN 2011 3 / 11

Proposed approach Document representation More specifically 1 Document representation A document x is represented as { x 1 , x 2 , ..., x m } where m is the number of modalities, and each x i is a vector with | x i | features in modality i Note that union ( x 1 , x 2 , ..., x m ) = x intersection ( x 1 , x 2 , ..., x m ) = ∅ 2 Generating meta features Each of the m different vectors are input to a clustering algorithm Output= m clustering solutions for the training data with k clusters each Note this is an unsupervised step, no class information is included Thamar Solorio (UAB) PAN 2011 4 / 11

Proposed approach Generating meta features More specifically 2 Generating meta features From each cluster c j in each of the m clustering solutions, we compute a centroid by averaging all the feature vectors in that cluster. 1 � centroid m j = (1) x i | c m j | x i ∈ c mj where j above ranges from 1 to k , the number of clusters. Meta features = the similarity of each instance to these centroids using the cosine function. Each instance x is now represented by the original set of first level features � x i 1 , ..., x i | xi | � in combination with the meta features � x i 1 , ..., x i k � generated for each modality j . Thamar Solorio (UAB) PAN 2011 5 / 11

The PAN competition Features First level features Four linguistic modalities: 1 Lexical features 2 Stylistic features 3 Perplexities from language models 4 Syntactic features Note that these features were selected for AA in posts from web forums 1 , no customization was performed for the PAN data. 1 Solorio et al. (to appear in IJCNLP’11) Thamar Solorio (UAB) PAN 2011 6 / 11

The PAN competition Features First level features Modality Features Stylistic Total number of words Average number of words per sentence Binary feature indicating use of quotations Binary feature indicating use of signature Rate of all caps words Rate of non-alphanumeric characters Rate of sentence initial words with first letter capitalized Rate of digits Number of new lines in the text Average number of punctuations (!?.;:,) per sentence Rate of contractions (won’t, can’t) Rate of two or more consecutive non-alphanumeric characters Lexical Bag of words (freq. of unigrams) Perplexity Perplexity values from character 3-grams Syntactic Part-of-Speech (POS) tags Dependency relations Chunks (unigram freq.) Table: Feature breakdown by modality Thamar Solorio (UAB) PAN 2011 7 / 11

The PAN competition Experimental settings Experimental settings We used WEKA’s implementation of SVMs For clustering we used CLUTO Parameter for the number of clusters k =number of authors × 15 Baseline system : training and testing the model with only first level features (FLF) No out of training author experiments Thamar Solorio (UAB) PAN 2011 8 / 11

The PAN competition Results Results TestSet MacroAvg MacroAvg MacroAvg MicroAvg MicroAvg MicroAvg System Precision Recall F1 Precision Recall F1 Baseline Large 0.119 0.054 0.041 0.155 0.155 0.155 MSMF Large 0.171 0.084 0.066 0.148 0.148 0.148 Change 43.6% 55% 60.9% -4.5% -4.5% -4.5% Baseline Small 0.440 0.152 0.148 0.384 0.384 0.384 MSMF Small 0.415 0.205 0.185 0.440 0.440 0.440 Change -5.6% 34.8% 25% 14.5% 14.5% 14.5% Table: Comparison of micro and macro averaged precision, recall, and F1 values in two PAN’11 test sets. MSMF stands for our modality specific meta features approach. Thamar Solorio (UAB) PAN 2011 9 / 11

Concluding remarks Concluding remarks Lessons learned Meta features helped improve accuracy, for the most part Feature selection is a must Current work Understand better the role of the meta features Need to handle out of training authors Evaluate the influence of modality specific features Develop new approaches to exploit the linguistic modalities Thamar Solorio (UAB) PAN 2011 10 / 11

Concluding remarks Thank you for your attention! And many thanks to the PAN organizers Thamar Solorio (UAB) PAN 2011 11 / 11

Authorship Identification with Modality Specific Meta Features - PowerPoint PPT Presentation

Authorship Identification with Modality Specific Meta Features Thamar Solorio, Sangita Pillay, Manuel Montes, Natural Language Processing Lab University of Alabama at Birmingham Thamar Solorio (UAB) PAN 2011 1 / 11 Introduction Introduction

Authorship & Publication August 4, 2009 Authorship Publication Authorship Each author

Authorship: why not just toss a coin? Benefits and responsibilities of authorship Tactics

Meta- Meta -Programming with Programming with Modelica Modelica for Meta- for Meta

Kernel Methods and String Kernels for Authorship Analysis Marius Popescu 1 Cristian Grozea 2 1

GLAD: Groningen Lightweight Authorship Detection PAN, Authorship verification, 2015 Manuela

A Mathematical Study A Mathematical Study of Authorship Attribution of Authorship Attribution

Vote/Veto Meta-Classifier for Authorship Identification Roman Kern Christin Seifert Mario

Relationship between imagery modality and dominant sensory modality of the task Robin Nicolas

1 + 1 > 2? Getting More Out of Multi-Modality Imaging Matthias J. Ehrhardt September 26, 2019

Cross-domain Authorship Attribution Overview of the Author Identification Task at PAN-2018

Authorship ID at PAN11 What -- Why -- How Patrick Juola Evaluating Variations in Language

META Seal of Recognition and META Prize Award Ceremony Georg Rehm (DFKI) on behalf of the

Meta-DermDiagnosis: Few-Shot Skin Disease Identification using Meta-Learning Kushagra Mahajan ,

Managing Research Integrity during the COVID-19 Emergency Authorship agreements Abigail Norris

A multitude of linguistically- rich features for authorship attribution Ludovic Tanguy, Assaf

Leveraging discourse information effectively for authorship attribution Elisa Ferracane, Su

Meta-Scheduling in Advance using Red-Black Trees in Heterogeneous Grids Luis Toms, Agustn

Metaprogramming Prof. Dr. Ralf Lmmel Universitt Koblenz-Landau Software Languages Team

(Meta-)Datamanagement with KNIME SWIB 2017 Workshop SWIIB 2017 Workshop KNIME 1 Your mentors

Efficient Off-Policy Meta- Reinforcement Learning via Probabilistic Context Variables Rakelly,

CS 251 Fall 2019 CS 251 Fall 2019 CS 251 Fall 2019 CS 251 Fall 2019 Principles of

Outline The UNICORE Grid System 1. Introduction UNICORE UNICORE Plus Project Tutorial

Parallelization of Scientific Applications (I) A Parallel Structured Flow Solver - URANUS

Static Scheduling for Large-Scale Heterogeneous Platforms Yves Robert Ecole Normale Sup

Explore More Topics

Sambuz

Useful Links

Newsletter

Mail Us

Authorship Identification with Modality Specific Meta Features - PowerPoint PPT Presentation

Authorship Identification with Modality Specific Meta Features Thamar Solorio, Sangita Pillay, Manuel Montes, Natural Language Processing Lab University of Alabama at Birmingham Thamar Solorio (UAB) PAN 2011 1 / 11 Introduction Introduction

Authorship &amp; Publication August 4, 2009 Authorship Publication Authorship Each author

Authorship: why not just toss a coin? Benefits and responsibilities of authorship Tactics

Meta- Meta -Programming with Programming with Modelica Modelica for Meta- for Meta

Kernel Methods and String Kernels for Authorship Analysis Marius Popescu 1 Cristian Grozea 2 1

GLAD: Groningen Lightweight Authorship Detection PAN, Authorship verification, 2015 Manuela

A Mathematical Study A Mathematical Study of Authorship Attribution of Authorship Attribution

Vote/Veto Meta-Classifier for Authorship Identification Roman Kern Christin Seifert Mario

Relationship between imagery modality and dominant sensory modality of the task Robin Nicolas

1 + 1 &gt; 2? Getting More Out of Multi-Modality Imaging Matthias J. Ehrhardt September 26, 2019

Cross-domain Authorship Attribution Overview of the Author Identification Task at PAN-2018

Authorship ID at PAN11 What -- Why -- How Patrick Juola Evaluating Variations in Language

META Seal of Recognition and META Prize Award Ceremony Georg Rehm (DFKI) on behalf of the

Meta-DermDiagnosis: Few-Shot Skin Disease Identification using Meta-Learning Kushagra Mahajan ,

Managing Research Integrity during the COVID-19 Emergency Authorship agreements Abigail Norris

A multitude of linguistically- rich features for authorship attribution Ludovic Tanguy, Assaf

Leveraging discourse information effectively for authorship attribution Elisa Ferracane, Su

Meta-Scheduling in Advance using Red-Black Trees in Heterogeneous Grids Luis Toms, Agustn

Metaprogramming Prof. Dr. Ralf Lmmel Universitt Koblenz-Landau Software Languages Team

(Meta-)Datamanagement with KNIME SWIB 2017 Workshop SWIIB 2017 Workshop KNIME 1 Your mentors

Efficient Off-Policy Meta- Reinforcement Learning via Probabilistic Context Variables Rakelly,

CS 251 Fall 2019 CS 251 Fall 2019 CS 251 Fall 2019 CS 251 Fall 2019 Principles of

Outline The UNICORE Grid System 1. Introduction UNICORE UNICORE Plus Project Tutorial

Parallelization of Scientific Applications (I) A Parallel Structured Flow Solver - URANUS

Static Scheduling for Large-Scale Heterogeneous Platforms Yves Robert Ecole Normale Sup

Explore More Topics

Sambuz

Useful Links

Newsletter

Mail Us

Authorship & Publication August 4, 2009 Authorship Publication Authorship Each author

1 + 1 > 2? Getting More Out of Multi-Modality Imaging Matthias J. Ehrhardt September 26, 2019