plagiarism detection system Andrzej Sobecki, Marcin Kpa IKC 2017 - PowerPoint PPT Presentation

Mar 23, 2024 •385 likes •531 views

Improving performance of a plagiarism detection system Andrzej Sobecki, Marcin Kpa IKC 2017 Plagiarism detection problem Text documents unstructured form Finding a potential source documents based on the suspected document

Improving performance of a plagiarism detection system Andrzej Sobecki, Marcin Kępa IKC 2017
Plagiarism detection problem • Text documents – unstructured form • Finding a potential source documents based on the suspected document • Searching in many repositories • Process must be short and accuracy
How we do that The important stage for accuracy Calculating Parsing Hashing Filtering similarities The crucial stage for performance
Filtering – actual solution Hash function h(x) Doc Doc profile One hash – One sentence Repository Doc Doc profile Suspected Suspected Doc profile doc doc profile profile Count Available documents Repositories identical profiles hash values
Filtering – possible solutions • Algorithms dedicated for digital libraries, • Available search engines e.g., the elastic search, • Components of the hadoop ecosystem, • What is an effect of precision and recall values for performance and accuracy of the plagiarism detection process?
Class of problem detecting similarities • Unstructured text documents, • Is required to analyzing most of the documents available in the repositories, • New documents are continuously add to the repositories, • Effective filtering with high values of recall and precision, • Finding similar sentences are more important than keywords.
Models described in the article • KASKADA HashMap, • HDFS, • HDFS HashMap, • Hbase,
Results – documents with fixed length
Results – documents with different lengths
Results – parallel tasks
Results - scalability
Results — cost of preparing structures
Summary • Have you any questions?

Recommend

07.01.2011 Topics Plagiarism Detection Software 2010 Plagiarism Plagiarism Detection

07.01.2011 Topics Plagiarism Detection Software 2010 Plagiarism Plagiarism Detection Software Silver Bullet or Test 2010 Waste of Time? Test Results Portal Plagiarism Debora Weber-Wulff Questions? HTW Berlin Flickr

285 views • 3 slides

A Pairwise Document Analysis Approach for Monolingual Plagiarism Detection Introduction

A Pairwise Document Analysis Approach for Monolingual Plagiarism Detection Introduction Plagiarism : Unauthorized use of Text , code, idea, . Plagiarism detection research area has received increasing attention The rapid growth of

520 views • 21 slides

Raid On Code Pirate - A Plagiarism Detection System Supervisor Project Members Mr. Daya Sagar

Raid On Code Pirate - A Plagiarism Detection System Supervisor Project Members Mr. Daya Sagar Baral Kailash Budhathoki Rakesh Manandhar Shilpa Singhal Introduction What is plagiarism? Using others ideas, thoughts, work without

441 views • 17 slides

Instructor-Centric Source Code Plagiarism Detection and Plagiarism Corpus Jonathan Y. H. Poon,

Instructor-Centric Source Code Plagiarism Detection and Plagiarism Corpus Jonathan Y. H. Poon, Kazunari Sugiyama , Yee Fan Tan, Min-Yen Kan National University of Singapore Introduction Plagiarism in undergraduate courses 181 / 319

768 views • 34 slides

Towards the Exploitation of Statistical Language Models for Plagiarism Detection with Reference

Towards the Exploitation of Statistical Language Models for Plagiarism Detection with Reference Alberto Barrn Cedeo and Paolo Rosso Universidad Politcnica de Valencia July, 2008 LM for plagiarism detection PAN08, Patras Greece 1/20

301 views • 28 slides

Stylometry in plagiarism detection and author profiling Paolo Rosso PRHLT Research Center

Stylometry in plagiarism detection and author profiling Paolo Rosso PRHLT Research Center Universitat Politcnica de Valncia http://www.dsic.upv.es/~prosso/ Tehran, 25/01/2017 Outline Plagiarism Intrinsic plagiarism detection

1k views • 62 slides

Intrinsic Plagiarism Detection Intrinsic Plagiarism Detection Using Character n gram Profiles

Intrinsic Plagiarism Detection Intrinsic Plagiarism Detection Using Character n gram Profiles g g Efstathios Stamatatos Efstathios Stamatatos University of the Aegean Talk Layout Talk Layout Introduction The style change function The

508 views • 26 slides

External Plagiarism Detection using Information Retrieval and Sequence Alignment Rao Muhammad

Framework for Monolingual External Plagiarism Detection Evaluation Future Work External Plagiarism Detection using Information Retrieval and Sequence Alignment Rao Muhammad Adeel Nawab, Mark Stevenson and Paul Clough Natural Language

376 views • 16 slides

WHAT IS PLAGIARISM? According to plagiarism.org, following to be plagiarism: To submit

WHAT IS PLAGIARISM? According to plagiarism.org, following to be plagiarism: To submit someone else's work as your own, To copy words or ideas from someone else without giving credit, and Failure to put a quote or excerpt in

464 views • 20 slides

External and Intrinsic Plagiarism Detection using a Cross-Lingual Retrieval and Segmentation

External and Intrinsic Plagiarism Detection using a Cross-Lingual Retrieval and Segmentation System Markus Muhr, Roman Kern, Mario Zechner, Michael Granitzer { mmuhr, rkern, mzechner, mgrani } @know-center.at CLEF 2010 / PAN / 2010-09-22

282 views • 15 slides

INTRINSIC PLAGIARISM DETECTION PAN 2011 @ CLEF USING CHARACTER TRIGRAM DISTANCE SCORES U N D E

M. K ESTEMONT , K. L UYCKX & W. D AELEMANS INTRINSIC PLAGIARISM DETECTION PAN 2011 @ CLEF USING CHARACTER TRIGRAM DISTANCE SCORES U N D E R A N O V E L D O C U M E N T R E P R E S E N T A T I O N PLAGIARISM DETECTION External

372 views • 21 slides

Plagiarism Detection in Open Access Publications Jens Brandt, Martin Gutbrod, Oliver Wellnitz,

Plagiarism Detection in Open Access Publications Jens Brandt, Martin Gutbrod, Oliver Wellnitz, Lars Wolf 4th International Plagiarism Conference, 21-23 June 2010 Introduction Open Access Open Access Plagiarism Search Conclusion Outline

827 views • 24 slides

Babelplagiarism: what can BabelNet do for cross- language plagiarism detection? Roberto Navigli

Babelplagiarism: what can BabelNet do for cross- language plagiarism detection? Roberto Navigli Joint work with Simone Ponzetto Mirella Lapata Andrea Moro Babelplagiarism: What can BabelNet do for 21/09/2012 2 cross-language plagiarism

856 views • 60 slides

Uncovering Plagiarism, Authorship, and Social Software Misuse PAN 2011 Results [pan.webis.de]

Uncovering Plagiarism, Authorship, and Social Software Misuse PAN 2011 Results [pan.webis.de] The PAN Competition Plagiarism Detection The web is rife with text reuse: boilerplate, translations, paraphrases, summaries, and plagiarism. c 2

872 views • 13 slides

DETECTION USING SENTENCE CORRELATIONS Muharram Mansoorizadeh and Taher Rahgooy Bu-Ali Sina

PERSIAN PLAGIARISM DETECTION USING SENTENCE CORRELATIONS Muharram Mansoorizadeh and Taher Rahgooy Bu-Ali Sina University Hamedan, Iran Outline Plagiarism Detection The Proposed Approach Results and Discussion

297 views • 14 slides

Cross-lingual similarity calculation for plagiarism detection and more Tools and resources

Cross-lingual similarity calculation for plagiarism detection and more Tools and resources Ralf Steinberger European Commission Joint Research Centre (JRC) http://langtech.jrc.ec.europa.eu/ PAN-CLEF, Rome, Italy, 19 September 2012

471 views • 17 slides

Overview of the 3rd International Competition on Plagiarism Detection Martin Potthast 1 , Andreas

Overview of the 3rd International Competition on Plagiarism Detection Martin Potthast 1 , Andreas Eiselt 1 , Alberto Barrn-Cedeo 2 Benno Stein 1 , Paolo Rosso 2 1 Web Technology & Information Systems. Bauhaus-Universit Weimar, Germany 2

686 views • 24 slides

Overview of the 4th International Competition on Plagiarism Detection Martin Potthast Parth

Overview of the 4th International Competition on Plagiarism Detection Martin Potthast Parth Gupta Tim Gollub Paolo Rosso Matthias Hagen Jan Graegger NLEL Group Johannes Kiesel Universitat Politcnica de Valncia Maximilian Michel

432 views • 26 slides

Overview of the 2nd International Competition on Plagiarism Detection Martin Potthast, Alberto

Overview of the 2nd International Competition on Plagiarism Detection Martin Potthast, Alberto Barrn-Cedeo, Andreas Eiselt, Benno Stein, Paolo Rosso Bauhaus-Universitt Weimar & Universidad Politcnica de Valencia http://pan.webis.de

494 views • 27 slides

Quite Simple Approaches for Authorship Attribution, Intrinsic Plagiarism Detection and Sexual

Quite Simple Approaches for Authorship Attribution, Intrinsic Plagiarism Detection and Sexual Predator Identification Notebook for PAN at CLEF 2012 Anna Vartapetiance Dr. Lee Gillam Oh what a tangled web we weave, When first we practice to

307 views • 17 slides

ENCOPLOT: Pairwise Sequence Matching in Linear Time Applied to Plagiarism Detection Cristian

Network Security Plagiarism Encoplot ENCOPLOT: Pairwise Sequence Matching in Linear Time Applied to Plagiarism Detection Cristian Grozea, Christian Gehl , Marius N. Popescu* christian.gehl@first.fraunhofer.de Fraunhofer Institute FIRST (IDA)

337 views • 32 slides

Corpus and Evaluation Measures for Automatic Plagiarism Detection Alberto Barrn-Cedeo 1 ,

Corpus and Evaluation Measures for Automatic Plagiarism Detection Alberto Barrn-Cedeo 1 , Martin Potthast 2 , Paolo Rosso 1 , Benno Stein 2 , Andreas Eiselt 2 1 NLE Lab, Universidad Politcnica de Valencia, Spain {lbarron,

802 views • 49 slides

A plagiarism detection procedure in three steps: selection, matches and squares Chiara

A plagiarism detection procedure in three steps: selection, matches and squares Chiara Basile - basile@dm.unibo.it Mathematics Department University of Bologna, Italy PAN09 Workshop, San Sebastin - Donostia, 10/09/2009 Joint work

907 views • 68 slides

Plagiarism detection for Java: a tool comparison Jurriaan Hage e-mail: jur@cs.uu.nl homepage:

[ Faculty of Science Information and Computing Sciences] Plagiarism detection for Java: a tool comparison Jurriaan Hage e-mail: jur@cs.uu.nl homepage: http://www.cs.uu.nl/people/jur/ Joint work with Peter Rademaker and Nik` e van Vugt.

1.33k views • 46 slides