PAN 2010 Results Uncovering Plagiarism, Authorship, and Social Software Misuse Bauhaus-Universität Weimar – Martin Potthast, Benno Stein Andreas Eiselt, Teresa Holfeld Universidad Politécnica de Valencia – Alberto Barrón-Cedeño, Paolo Rosso University of the Aegean – Efstathios Stamatatos Bar-Ilan University – Moshe Koppel http://pan.webis.de
The PAN Competition Information is nothing without Retrieval 2 � www.webis.de c
The PAN Competition 2nd International Competition on Plagiarism Detection, PAN 2010 These days, plagiarism and text reuse is rife on the Web. Task: Given a set of suspicious documents and a set of source documents, find all plagiarized sections in the suspicious documents and, if available, the corresponding source sections. 3 � www.webis.de c
The PAN Competition 2nd International Competition on Plagiarism Detection, PAN 2010 These days, plagiarism and text reuse is rife on the Web. Task: Given a set of suspicious documents and a set of source documents, find all plagiarized sections in the suspicious documents and, if available, the corresponding source sections. Corpus: PAN-PC-10 ❑ 27 073 documents (obtained from 22 874 books from the Project Gutenberg) ❑ 68 558 plagiarism cases (about 0-10 cases per document) ❑ 6 plagiarism-relevant parameters (length, language, task, obfuscation, topic, fraction) [Potthast et al., COLING 2010] 4 � www.webis.de c
The PAN Competition Plagiarism Detection Results Plagdet 0.80 Kasprzak 0.71 Zou 0.69 Muhr 0.62 Grozea 0.61 Oberreuter 0.59 Torrejón 0.52 Pereira 0.51 Palkovskii 0.44 Sobha 0.26 Gottron 0.22 Micol 0.21 Costa-jussà� 0.21 Nawab 0.20 Gupta 0.14 Vania 0.06 Suàrez 0.02 Alzahrani 0.00 Iftene 0 1 5 � www.webis.de c
The PAN Competition Plagiarism Detection Results Plagdet ❑ Plagdet combines precision, recall, and granularity: 0.80 Kasprzak 0.71 Zou F 1 plagdet ( S, R ) = 0.69 Muhr log 2 (1 + gran ( S, R )) 0.62 Grozea 0.61 Oberreuter 0.59 Torrejón prec ( S, R ) = 1 | � s ∈ S ( s ⊓ r ) | 0.52 � Pereira | R | | r | 0.51 Palkovskii r ∈ R 0.44 Sobha 0.26 Gottron 0.22 Micol | � r ∈ R ( s ⊓ r ) | rec ( S, R ) = 1 � 0.21 Costa-jussà� | S | | s | 0.21 Nawab s ∈ S 0.20 Gupta 0.14 Vania ❑ The granularity gran measures 0.06 Suàrez the average number of times 0.02 Alzahrani 0.00 a plagiarism case is detected. Iftene 0 1 [Potthast et al., COLING 2010] 6 � www.webis.de c
The PAN Competition Plagiarism Detection Results Recall Precision Granularity 0.69 0.94 1.00 Kasprzak 0.63 0.91 1.07 Zou 0.71 0.84 1.15 Muhr 0.48 0.91 1.02 Grozea 0.48 0.85 1.01 Oberreuter 0.45 0.85 1.00 Torrejón 0.41 0.73 1.00 Pereira 0.39 0.78 1.02 Palkovskii 0.29 0.96 1.01 Sobha 0.32 0.51 1.87 Gottron 0.24 0.93 2.23 Micol 0.30 0.18 1.07 Costa-jussà� 0.17 0.40 1.21 Nawab 0.14 0.50 1.15 Gupta 0.26 0.91 6.78 Vania 0.07 0.13 2.24 Suàrez 0.05 0.35 17.31 Alzahrani 0.00 0.60 8.68 Iftene 0 1 0 1 1 2 7 � www.webis.de c
The PAN Competition Information is nothing without Retrieval 8 � www.webis.de c
The PAN Competition 1st International Competition on Wikipedia Vandalism Detection, PAN 2010 Every edit on Wikipedia has to be double-checked for integrity— even if it affects just one char. Task: Given a set of edits on Wikipedia articles, distinguish ill-intentioned edits from well-intentioned edits. 9 � www.webis.de c
The PAN Competition 1st International Competition on Wikipedia Vandalism Detection, PAN 2010 Every edit on Wikipedia has to be double-checked for integrity— even if it affects just one char. Task: Given a set of edits on Wikipedia articles, distinguish ill-intentioned edits from well-intentioned edits. Corpus: PAN-WVC-10 ❑ 32 452 edits (sampled from a week’s worth of Wikipedia edit logs) ❑ 28 468 different edited articles (edit frequency resembles article importance) ❑ 2391 edits are vandalism (a 7% ratio is in concordance with the literature) [Potthast, SIGIR 2010] 10 � www.webis.de c
The PAN Competition Plagiarism Detection Results 1� 0.8� PAN '10 Meta Detector� Mola Velasco� Adler� 0.6� Javanmardi� TP rate� Chichkov� Seaward� 0.4� Hegedüs� Harpalani� White� Iftene� 0.2� Random� Detector� 0� 0� 0.2� 0.4� 0.6� 0.8� 1� FP rate� 11 � www.webis.de c
The PAN Competition Vandalism Detection Results 1 0.8 0.6 Precision 0.4 0.2 Random� Detector 0 0 0.2 0.4 0.6 0.8 1 Recall 12 � www.webis.de c
The PAN Competition Vandalism Detection Results 1 0.8 0.6 Precision 0.4 Iftene 0.2 Random� Detector 0 0 0.2 0.4 0.6 0.8 1 Recall 13 � www.webis.de c
The PAN Competition Vandalism Detection Results 1 0.8 0.6 Precision 0.4 White Iftene 0.2 Random� Detector 0 0 0.2 0.4 0.6 0.8 1 Recall 14 � www.webis.de c
The PAN Competition Vandalism Detection Results 1 0.8 0.6 Precision 0.4 Harpalani White Iftene 0.2 Random� Detector 0 0 0.2 0.4 0.6 0.8 1 Recall 15 � www.webis.de c
The PAN Competition Vandalism Detection Results 1 0.8 0.6 Precision 0.4 Hegedüs Harpalani White Iftene 0.2 Random� Detector 0 0 0.2 0.4 0.6 0.8 1 Recall 16 � www.webis.de c
The PAN Competition Vandalism Detection Results 1 0.8 0.6 Precision Seaward 0.4 Hegedüs Harpalani White Iftene 0.2 Random� Detector 0 0 0.2 0.4 0.6 0.8 1 Recall 17 � www.webis.de c
The PAN Competition Vandalism Detection Results 1 0.8 0.6 Precision Chichkov Seaward 0.4 Hegedüs Harpalani White Iftene 0.2 Random� Detector 0 0 0.2 0.4 0.6 0.8 1 Recall 18 � www.webis.de c
The PAN Competition Vandalism Detection Results 1 0.8 0.6 Javanmardi Precision Chichkov Seaward 0.4 Hegedüs Harpalani White Iftene 0.2 Random� Detector 0 0 0.2 0.4 0.6 0.8 1 Recall 19 � www.webis.de c
The PAN Competition Vandalism Detection Results 1 0.8 Adler 0.6 Javanmardi Precision Chichkov Seaward 0.4 Hegedüs Harpalani White Iftene 0.2 Random� Detector 0 0 0.2 0.4 0.6 0.8 1 Recall 20 � www.webis.de c
The PAN Competition Vandalism Detection Results 1 0.8 Mola Velasco Adler 0.6 Javanmardi Precision Chichkov Seaward 0.4 Hegedüs Harpalani White Iftene 0.2 Random� Detector 0 0 0.2 0.4 0.6 0.8 1 Recall 21 � www.webis.de c
The PAN Competition Vandalism Detection Results 1 0.8 PAN'10 Meta Detector Mola Velasco Adler 0.6 Javanmardi Precision Chichkov Seaward 0.4 Hegedüs Harpalani White Iftene 0.2 Random� Detector 0 0 0.2 0.4 0.6 0.8 1 Recall 22 � www.webis.de c
The PAN Competition Vandalism Detection Results ROC-AUC ROC rank PR-AUC PR rank Detector 0.95690 – 0.77609 – – PAN ’10 Meta Detector 0.92236 1 0.66522 1 – Mola Velasco 0.90351 2 0.49263 3 ↓ Adler 0.89856 3 0.44756 4 ↓ Javanmardi 0.89377 4 0.56213 2 ⇈ Chichkov 0.87990 5 0.41365 7 � Seaward 0.87669 6 0.42203 5 ↑ Hegedus 0.85875 7 0.41498 6 ↑ Harpalani 0.84340 8 0.39341 8 – White 0.65404 9 0.12235 9 – Iftene 0.50000 10 0.08490 10 – Random Detector 23 � www.webis.de c
The PAN Competition Information is nothing without Retrieval Retrieval is nothing without Evaluation 24 � www.webis.de c
The PAN Competition Information is nothing without Retrieval Retrieval is nothing without Evaluation 25 � www.webis.de c
Recommend
More recommend