tracer tutorial text reuse detection selection
play

TRACER TUTORIAL: TEXT REUSE DETECTION SELECTION Mar co B uchler, - PowerPoint PPT Presentation

TRACER TUTORIAL: TEXT REUSE DETECTION SELECTION Mar co B uchler, Emily Franzini and Greta Franzini TABLE OF CONTENTS 1. Wha t is Selection? 2. Selection techniques 3. Hacking 4. Conclusion and revision 2/29 REMINDER: CURRENT APPROACH 3/29


  1. TRACER TUTORIAL: TEXT REUSE DETECTION SELECTION Mar co B¨ uchler, Emily Franzini and Greta Franzini

  2. TABLE OF CONTENTS 1. Wha t is Selection? 2. Selection techniques 3. Hacking 4. Conclusion and revision 2/29

  3. REMINDER: CURRENT APPROACH 3/29

  4. WHAT IS SELECTION?

  5. QUESTION What do you associate with Selection ? 5/29

  6. A VISUALISATION OF FEATURING From biometry: 6/29

  7. SOME VOCABULARY 7/29

  8. SOME DEFINITIONS • Global ”knowledge”: Information derived from the entire corpus; • Local ”knowledge”: Information derived from the reuse unit (e.g. a sentence); • Global ”usage”: Selection is applied to e.g. the entire word list; • Local ”usage”: Selection is applied to the reuse unit (e.g. a sentence). 8/29

  9. SELECTION TECHNIQUES

  10. SELECTION: GLOBAL ”KNOWLEDGE” & GLOBAL ”USAGE” 10/29

  11. SELECTION: GLOBAL ”KNOWLEDGE” & LOCAL ”USAGE” 11/29

  12. SELECTION: LOCAL ”KNOWLEDGE” & GLOBAL ”USAGE” 12/29

  13. SELECTION: LOCAL ”KNOWLEDGE” & LOCAL ”USAGE” 13/29

  14. SELECTION: MATRIX STYLE A C D F G E B H I J K   s 1 1 1 1 0 0 1 1 0 0 0 0 s 2 1 1 0 1 1 1 0 0 0 0 0     s 3 1 1 1 1 1 0 0 0 0 0 0 = F       s 4 1 1 0 1 1 1 0 0 0 0 0   s 5 0 0 1 0 0 0 0 1 1 1 1 A C D F G E   s 1 1 1 1 0 0 1 s 2 1 1 0 1 1 1     = S s 3 1 1 1 1 1 0     s 4  1 1 0 1 1 1    s 5 0 0 1 0 0 0 14/29

  15. HOW TO MAKE THE DIFFERENT SELECTION STRATEGIES COMPARABLE? • Different Selection strategies would require different parameters . • This makes comparisons between Selection strategies difficult . • For this reason, we introduce the Feature Density: � m ′ � n j = 1 s ij i = 1 F = � n � m j = 1 f ij i = 1 15/29

  16. HACKING

  17. HACKING: CONFIGURATION 17/29

  18. HACKING Tasks: • Run on your own texts with different Preprocessing and Featuring techniques ... • eu.etrap.tracer.selection.localglobal. LocalMaxFeatureFrequencySelectorImpl • With different feature densities of 0.4, 0.6, 0.75 18/29

  19. HACKING Questions: • Run the aforementioned tasks. Compare the resulting ”tail distributions” (you find all the information in the Selection folder in e.g. *.meta ). • Compare the tail distribution between Featuring and Selection. Which influence does the Selection strategy have? • Compare the .sel -files for the different Selection strategy (use Microsoft Excel or OpenOffice to open the Selection file; sort by columns B and C). 19/29

  20. CONFIGURING THE SELECTION IMPL PARAMETER Hint: The configuration file can be found in: $ TRACER HOME/conf/tracer conf.xml 20/29

  21. FEATURE CORRELATION Stimulus Response prob. Number of prob’s Co-occurrence Significance Butter Bread 60 Bread 51 Soft 40 Cheese 49 Milk 32 Sugar 29 Margarine 27 Milk 23 Cheese 20 Margarine 22 Fat 16 Farina 18 Yellow 14 Eggs 16 Bread and butter 8 Pound 14 Box/can 6 Meat 13 Eat 6 21/29

  22. CONTRASTIVE SEMANTICS 22/29

  23. GAP BETWEEN KNOWLEDGE AND EXPERIENCE 23/29

  24. CONCLUSION AND REVISION

  25. CHECK How do Preprocessing and Featuring influence Selection? 25/29

  26. IMPORTANCE OF SELECTION • Quality of the digital representation of a reuse unit; • Speed for linking reuse units. 26/29

  27. FINITO! 27/29

  28. CONTACT Team Marco B¨ uchler, Greta Franzini and Emily Franzini. Visit us http://www.etrap.eu contact@etrap.eu 28/29

  29. LICENCE The theme this presentation is based on is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. Changes to the theme are the work of eTRAP. cba 29/29

Recommend


More recommend