Unsupervised Ranking for Plagiarism Source Retrieval Kyle Williams, - PowerPoint PPT Presentation

Jun 26, 2023 •215 likes •305 views

Unsupervised Ranking for Plagiarism Source Retrieval Kyle Williams, Hung-Hsuan Chen, Sagnik Ray Choudhury and C. Lee Giles Information Sciences and Technology Computer Science and Engineering The Pennsylvania State University Core Ideas

Unsupervised Ranking for Plagiarism Source Retrieval Kyle Williams, Hung-Hsuan Chen, Sagnik Ray Choudhury and C. Lee Giles Information Sciences and Technology Computer Science and Engineering The Pennsylvania State University
Core Ideas ● The union of the results of multiple queries has a higher probability of containing a true positive that each query individually ○ So submit multiple queries and combine results ● The ranking of the search results does not necessarily reflect the probability of a true positive ○ So re-rank results
Approach to Source Retrieval ● Query generation ○ Partition document into 5 sentence paragraphs ○ Queries constructed from non-overlapping sequences of 10 POS tagged words ■ Verbs, nouns, adjectives ○ Multiple queries per paragraph ○ This approach performed better overall than TF-IDF and BM25 ● Query Submission ○ Submit the first 3 queries for each paragraph ○ Return 3 results for each query and combine to form a single result set
● Result Ranking ○ Re-rank results returned by the queries ○ For each result: ■ Get snippet ■ Calculate similarity between snippet and suspicious document based on 5-word overlaps For a suspicious document d and result snippet s, the similarity Sim between the snippet and the suspicious documents is given by: Where S() is the set of 5-word sequences ● Re-rank results by similarity
● Document Downloading ○ Download results in re-ranked order ■ Only consider results that have a similarity above some threshold ■ We required that snippets and the suspicious document must have at least 5 5-word sequences in common ○ Check with Oracle for match ■ Stop if match found ○ Don’t re-download documents that have previously been downloaded for a given suspicious document
Results ● Competitive precision and recall with highest F1 ● We submitted a relatively large number of queries ○ But queries are cheap! (at least from a bandwidth perspective) ● Relatively few documents downloaded ○ Similarity threshold controlled this
Future Ideas ● Better query construction and query selection ● Supervised ranking of search results?

Recommend

07.01.2011 Topics Plagiarism Detection Software 2010 Plagiarism Plagiarism Detection

07.01.2011 Topics Plagiarism Detection Software 2010 Plagiarism Plagiarism Detection Software Silver Bullet or Test 2010 Waste of Time? Test Results Portal Plagiarism Debora Weber-Wulff Questions? HTW Berlin Flickr

285 views • 3 slides

WHAT IS PLAGIARISM? According to plagiarism.org, following to be plagiarism: To submit

WHAT IS PLAGIARISM? According to plagiarism.org, following to be plagiarism: To submit someone else's work as your own, To copy words or ideas from someone else without giving credit, and Failure to put a quote or excerpt in

464 views • 20 slides

Instructor-Centric Source Code Plagiarism Detection and Plagiarism Corpus Jonathan Y. H. Poon,

Instructor-Centric Source Code Plagiarism Detection and Plagiarism Corpus Jonathan Y. H. Poon, Kazunari Sugiyama , Yee Fan Tan, Min-Yen Kan National University of Singapore Introduction Plagiarism in undergraduate courses 181 / 319

768 views • 34 slides

Who idea is it? Acknowledging and building on other work, or just plain plagiarism. Allison Mann

Who idea is it? Acknowledging and building on other work, or just plain plagiarism. Allison Mann & Hieu Le What is Plagiarism? Plagiarism vs copy? What is Plagiarism? Dictionary Definition: The action or practice of taking someone else's

189 views • 14 slides

and Retrieval Source: H. Jegou Source: H. Jegou Source: H. Jegou Source: H. Jegou Source: H.

Semantic Image Indexing and Retrieval Source: H. Jegou Source: H. Jegou Source: H. Jegou Source: H. Jegou Source: H. Jegou Source: H. Jegou Source: H. Jegou Source: H. Jegou Outline State of the nation Early description methods

2.13k views • 130 slides

UNSUPERVISED LEARNING, CLUSTERING UNSUPERVISED LEARNING UNSUPERVISED LEARNING Supervised

UNSUPERVISED LEARNING, CLUSTERING UNSUPERVISED LEARNING UNSUPERVISED LEARNING Supervised learning: X - y pairs, f(x) function approximation Unsupervised learning: only X, no y Exploring the space of X measurements,

441 views • 14 slides

plagiarism detection system Andrzej Sobecki, Marcin Kpa IKC 2017 Plagiarism detection problem

Improving performance of a plagiarism detection system Andrzej Sobecki, Marcin Kpa IKC 2017 Plagiarism detection problem Text documents unstructured form Finding a potential source documents based on the suspected document

531 views • 13 slides

XML Retrieval XML Retrieval XML Retrieval XML Retrieval DB/IR in DB/IR in Theory Theory Web

XML Retrieval XML Retrieval XML Retrieval XML Retrieval DB/IR in DB/IR in Theory Theory Web in Web in Practice Practice Sihem Amer-Yahia Mariano Consens Yahoo! Research University of Toronto In collaboration with: Ricardo Baeza-Yates

1.06k views • 59 slides

Chapter III: Ranking Principles Information Retrieval & Data Mining Universitt des

Chapter III: Ranking Principles Information Retrieval & Data Mining Universitt des Saarlandes, Saarbrcken Wintersemester 2013/14 Chapter III: Ranking Principles III.1 Boolean Retrieval & Document Processing Boolean Retrieval,

784 views • 77 slides

External Plagiarism Detection using Information Retrieval and Sequence Alignment Rao Muhammad

Framework for Monolingual External Plagiarism Detection Evaluation Future Work External Plagiarism Detection using Information Retrieval and Sequence Alignment Rao Muhammad Adeel Nawab, Mark Stevenson and Paul Clough Natural Language

376 views • 16 slides

Stylometry in plagiarism detection and author profiling Paolo Rosso PRHLT Research Center

Stylometry in plagiarism detection and author profiling Paolo Rosso PRHLT Research Center Universitat Politcnica de Valncia http://www.dsic.upv.es/~prosso/ Tehran, 25/01/2017 Outline Plagiarism Intrinsic plagiarism detection

1k views • 62 slides

A Pairwise Document Analysis Approach for Monolingual Plagiarism Detection Introduction

A Pairwise Document Analysis Approach for Monolingual Plagiarism Detection Introduction Plagiarism : Unauthorized use of Text , code, idea, . Plagiarism detection research area has received increasing attention The rapid growth of

520 views • 21 slides

Intrinsic Plagiarism Detection Intrinsic Plagiarism Detection Using Character n gram Profiles

Intrinsic Plagiarism Detection Intrinsic Plagiarism Detection Using Character n gram Profiles g g Efstathios Stamatatos Efstathios Stamatatos University of the Aegean Talk Layout Talk Layout Introduction The style change function The

508 views • 26 slides

Plagiarism Detection in Open Access Publications Jens Brandt, Martin Gutbrod, Oliver Wellnitz,

Plagiarism Detection in Open Access Publications Jens Brandt, Martin Gutbrod, Oliver Wellnitz, Lars Wolf 4th International Plagiarism Conference, 21-23 June 2010 Introduction Open Access Open Access Plagiarism Search Conclusion Outline

827 views • 24 slides

Whose idea is it? Acknowledging and building on other work, or just plain plagiarism? Lina Qiu,

Whose idea is it? Acknowledging and building on other work, or just plain plagiarism? Lina Qiu, Zongshun Zhang, Tianyi Chen Outline What is plagiarism Common Types Avoid Plagiarism Troubles Causes Discussions

378 views • 18 slides

Uncovering Plagiarism, Authorship, and Social Software Misuse PAN 2011 Results [pan.webis.de]

Uncovering Plagiarism, Authorship, and Social Software Misuse PAN 2011 Results [pan.webis.de] The PAN Competition Plagiarism Detection The web is rife with text reuse: boilerplate, translations, paraphrases, summaries, and plagiarism. c 2

872 views • 13 slides

Finite-State Methods in Natural-Language Processing: 1Motivation Ronald M. Kaplan and

Finite-State Methods in Natural-Language Processing: 1Motivation Ronald M. Kaplan and Martin Kay Motivation 1 Finite-State Methods in Language Processing The Application of a branch of mathematics The regular branch of automata

818 views • 43 slides

From Sorting to Heaps to Compression Data Compression video on demand/set top box jpeg

From Sorting to Heaps to Compression Data Compression video on demand/set top box jpeg in browsers gzip, pkzip, compress, zip, ... for files (stacker?) Lossy compression, Lossless compression Huffman coding possible to

275 views • 6 slides

Combatting Academic Corruption: Plagiarism Carolyn Campbell Observatory on Borderless Higher

Combatting Academic Corruption: Plagiarism Carolyn Campbell Observatory on Borderless Higher Education Carolyn.Campbell@obhe.org Plagiarism an act of copying the ideas or words of another person without giving credit to that person

263 views • 10 slides

Corpus and Evaluation Measures for Automatic Plagiarism Detection Alberto Barrn-Cedeo 1 ,

Corpus and Evaluation Measures for Automatic Plagiarism Detection Alberto Barrn-Cedeo 1 , Martin Potthast 2 , Paolo Rosso 1 , Benno Stein 2 , Andreas Eiselt 2 1 NLE Lab, Universidad Politcnica de Valencia, Spain {lbarron,

802 views • 49 slides

Overview you are assessed solely on your own work B39RB3 Research Methods but: your

Overview you are assessed solely on your own work B39RB3 Research Methods but: your work is in the context of the work of other Citation and Collaboration people e.g. previous research or results; existing designs Greg

503 views • 5 slides

Spring 2020 Faculty Meeting "Hey Thats Mine!" Cite it When you Write it Dr. Mary

Spring 2020 Faculty Meeting "Hey Thats Mine!" Cite it When you Write it Dr. Mary Spoto Vice President of Academic Affairs Incidence of Student Plagiarism Trends Nationally and at Saint Leo Daniel Duerr, Assistant Director of

1.15k views • 58 slides

Paraphrasing vs. Plagiarism rev ised : 0 3.0 4.13 | | English 1301: Com position I || D. Glen Sm

Paraphrasing vs. Plagiarism rev ised : 0 3.0 4.13 | | English 1301: Com position I || D. Glen Sm ith, instructor 1 Paraphrasing vs. Plagiarism Research papers m ake students nervous. There are m any w ays to unintentionally create im proper

259 views • 12 slides

How much do new students learn about information literacy in their first months of college? A

How much do new students learn about information literacy in their first months of college? A comparison of nursing and teacher education students at a Norwegian university Library IL-courses Searching Evaluating and citing sources

834 views • 23 slides