Introduction to Historical Text Reuse Detection Marco Büchler, Emily Franzini, Greta Franzini, Maria Moritz eTRAP Research Group Göttingen Centre for Digital Humanities Institute of Computer Science Georg August University Göttingen, Germany KITAB DH Hackathon 2015 20. Oktober 2015
Overview • What is text reuse? • Aspects of text reuse • ACID for the Digital Humanities • Big (Humanities) Data • Language Model 2015 DH Estonia – Text Reuse Hackathon 20. Oktober 2015
My interests :) 2015 DH Estonia – Text Reuse Hackathon 20. Oktober 2015
What do you associate with text reuse/intertextuality? 2015 DH Estonia – Text Reuse Hackathon 20. Oktober 2015
Typical expectation of a computer scientist: oversimplification 2015 DH Estonia – Text Reuse Hackathon 20. Oktober 2015
Expectations of a humanists: oversimplification 2015 DH Estonia – Text Reuse Hackathon 20. Oktober 2015
Text Reuse for Humanities and Computer Science • Question : Why is Text Reuse so relevant for Humanities and Computer Science? • Premise : The amount of digitally available data is growing exponentially (Big Data) • Humanities: – Lines of transmission and textual criticism – Transmissions of ideas/thoughts under different circumstances and conditions • Computer Science: – Text Decontamination for stylometry and authorship attribution, dating of texts – gen. Text Mining, Corpus Linguistics 2015 DH Estonia – Text Reuse Hackathon 20. Oktober 2015
Temperature Map 2015 DH Estonia – Text Reuse Hackathon 20. Oktober 2015
Respect to the topic • ACID for the Digital Humanities: – A cceptance – C omplexity – I nteroperability – D iversity 2015 DH Estonia – Text Reuse Hackathon 20. Oktober 2015
ACID for the Digital Humanities – Acceptance I 2015 DH Estonia – Text Reuse Hackathon 20. Oktober 2015
ACID for the Digital Humanities – Acceptance II How to be accepted by humanists if text mining is a black box we can't look into? 2015 DH Estonia – Text Reuse Hackathon 20. Oktober 2015
ACID for the Digital Humanities – Acceptance III Transparency: How to provide user- friendly insights into complex mining techniques and machine learning? 2015 DH Estonia – Text Reuse Hackathon 20. Oktober 2015
Current approach 2015 DH Estonia – Text Reuse Hackathon 20. Oktober 2015
ACID for the Digital Humanities – Acceptance IV 2015 DH Estonia – Text Reuse Hackathon 20. Oktober 2015
ACID for the Digital Humanities – Acceptance V 2015 DH Estonia – Text Reuse Hackathon 20. Oktober 2015
ACID for the Digital Humanities – Acceptance VI 2015 DH Estonia – Text Reuse Hackathon 20. Oktober 2015
ACID for the Digital Humanities – Acceptance VII 2015 DH Estonia – Text Reuse Hackathon 20. Oktober 2015
ACID for the Digital Humanities – Acceptance VII 2015 DH Estonia – Text Reuse Hackathon 20. Oktober 2015
ACID for the Digital Humanities – Complexity 2015 DH Estonia – Text Reuse Hackathon 20. Oktober 2015
ACID for the Digital Humanities – Interoperability 2015 DH Estonia – Text Reuse Hackathon 20. Oktober 2015
ACID for the Digital Humanities – Diversity (Reuse Types) • Stability (yellow) • Purpose (green) • Size of text reuse (blue) • Classification (light blue) • Degree of distribution (purple) • Written and oral transmission KITAB DH Hackathon 2015 20. Oktober 2015
ACID for the Digital Humanities – Diversity (Reuse Styles) KITAB DH Hackathon 2015 20. Oktober 2015
Key problem Basic question: Distribution of Reuse Types und Reuse Styles are often unknown: Which model(s) should be chosen? KITAB DH Hackathon 2015 20. Oktober 2015
Outline KITAB DH Hackathon 2015 20. Oktober 2015
Thank you! " Stealing from one is plagiarism, stealing from many is research " (Wilson Mitzner, 1876-1933) Visit us at http://etrap.gcdh.de DH Hackathon 2015: "Don't leave your data problems at home!" 20. Oktober 2015
Recommend
More recommend