your words betray you the role of language in cyber crime
play

Your words betray you! The role of language in cyber crime - PowerPoint PPT Presentation

Your words betray you! The role of language in cyber crime inves9ga9ons Awais Rashid Digital World Online World Physical World dual use P2P Study 1.6% of searches and 2.4% responses on Gnutella network alone (Study by Hughes et al.


  1. Your words betray you! The role of language in cyber crime inves9ga9ons Awais Rashid

  2. Digital World Online World Physical World

  3. dual use

  4. P2P Study • 1.6% of searches and 2.4% responses on Gnutella network alone (Study by Hughes et al. 2006) • Hundreds or thousands of searches per second – Approx. 600,000 searches per day on Gnutella alone • Specialist vocabulary: 53% of searches used such keywords and 88% of responses. – Vocabulary changes over Gme.

  5. Top 100 Frequent Searches SEARCH FREQUENCY Popularity Topic

  6. Core of Distributors

  7. Chat and Social Networking

  8. Digital Personas

  9. Do you Know Who you are Talking to? ? ? 18.3%

  10. Experience from Isis: ProtecGng Children in Online Social Networks (EPSRC/ESRC) iCOP: IdenGfying and Catching Originators in P2P Networks (EC Safer Internet Programme)

  11. DetecGng DecepGve Digital Personas

  12. StylisGc Language “Fingerprint” Individual_1 Individual_2 New text New text Individual_3 Individual_4 New text New text

  13. Age and Gender Analysis Stylis;c Classifier Features Female Reference Data Sets Word level Male Distance Measure SyntacGc level SemanGc level

  14. No DecepGon – Age (Precision) 100 95 90 77.35% 72.24% 85 80 75 70 65 Precision (%) 60 55 50 45 40 35 30 25 Level 1 20 Level 2 15 Level 3 10 Level 4 5 Level 5 0 0 10 20 30 40 50 60 70 80 90 100 Threshold (%)

  15. No DecepGon – Age (Recall) 100 Level 1 95 Level 2 90 Level 3 85 Level 4 80 Level 5 75 70 65 60 Recall (%) 55 50 45 40 35 30 25 20 15 10 5 0 0 10 20 30 40 50 60 70 80 90 100 Threshold (%)

  16. No DecepGon - Gender 100 Recall 95 Precision 90 66.86% 71.07% 85 80 75 70 Recall / Precision (%) 65 60 55 50 45 40 35 30 25 20 15 10 5 0 0 10 20 30 40 50 60 70 80 90 100 Threshold (%)

  17. DecepGon DetecGon ? ? 18.3%

  18. DecepGon DetecGon ? ? 84.29%

  19. DecepGon DetecGon ? ? 93.18%

  20. Is it being used? • Being used by law enforcement following trials and commercialisaGon via a spin-out company (RelaGve Insight) • UK case study for Internet Governance Forum in 2009, 2010 • Featured in internaGonal TV and print news media • Part of evidence to UK Select Commigee on Child ProtecGon and EU Policy frameworks. • Chosen as one of the 100 Big Ideas for the future by UniversiGes UK and Research Councils UK (2011) • Mobile App built on the digital persona analysis demonstrated to the Prime Minister at WeProtect, Dec. 2014 • An Impact Case Study for REF2014

  21. DetecGng Specialist TacGcs, e.g., Vocabulary

  22. DetecGng new/unknown CSA media in P2P Networks § Using query analysis to automaGcally triage and idenGfy potenGal candidates for new CSA media § New text analysis techniques to automaGcally flag potenGal CSA media based on their filename • (Semi-)automaGc video and image analysis techniques to assess CSA content 30

  23. Filename ClassificaGon Key challenges § Compiling a CSA dataset § Filenames = short text samples § Presence of non-standard forms & “specialised” vocabulary 31

  24. Filename ClassificaGon (2) Dataset § Manual collecGon through LE à 268 CSA filenames § Legal pornography sites à 10K non-CSA filenames § simulate real-life data distribuGon in P2P 32

  25. Filename ClassificaGon (3) Feature Selec;on § Seman9c features • Known CSA keywords • Explicit language use • References to children, young age • Family relaGons Original filename ptl0lita12yo.jpeg Seman9c Feats. [paedo_keyword] [child_ref] 33

  26. Filename ClassificaGon (4) Feature Selec;on § Character n -grams • slices of 2, 3 and 4 consecuGve characters ptl0lita12yo.jpeg Original filename Char. 2-grams pt tl l0 0l li it ta a1 12 2y yo Char. 3-grams ptl tl0 l0l 0li lit ita ta1 a12 12y 2yo Char. 4-grams ptl0 tl0l l0li 0lit lita ita1 ta12 a12y 12yo 34

  27. Filename ClassificaGon (5) Experimental Setup § Support Vector Machines (LibShortText) § 5-fold cross-validaGon § EvaluaGon: • Overall system accuracy • Precision, Recall and F-score per class label 35

  28. Filename ClassificaGon (6) Results Scores SVM classifier (%) Precision Recall F-score Seman;c CSA 5.7 21.3 9.0 feats. Non-CSA 97.7 90.6 94.0 Char. n-grams CSA 89.8 62.3 73.6 Non-CSA 99.0 99.8 99.4 Combined CSA 89.9 66.1 76.1 Non-CSA 99.1 99.8 99.5 36

  29. The iCOP Toolkit

  30. Is it being used? • Training days for European Law Enforcement personnel – ParGcipants from 8 European countries and Interpol – Hands-on sessions on live P2P data • Live demonstraGon at Interpol at end of project • Being uGlised by several law enforcement agencies in Europe

  31. Further InformaGon Isis A. Rashid, A. Baron, P. Rayson, C. May-Chahal, P. Greenwood, J. Walkerdine (2013). “Who Am I? Analysing Digital Personas in Cyber Crime Inves;ga;ons” , IEEE Computer, 46(4). C. May-Chahal, C. Mason, A. Rashid, P. Greenwood, J. Walkerdine, P. Rayson (2014). “Safeguarding Cyborg Childhoods: Incorpora;ng the On/Offline Behaviour of Children into Everyday Social Work Prac;ces” , BriGsh Journal of Social Work.

  32. Further InformaGon iCOP C. Peersman, C. Schulze, A. Rashid, M. Brennan, C. Fischer (2014). “iCOP: Automa;cally Iden;fying New Child Abuse Media in P2P Networks” , IEEE Symposium on Security and Privacy Workshops 2014: 124-131 C. Peersman, C. Schulze, A. Rashid, M. Brennan, C. Fischer (2016). “iCOP: live forensics to reveal previously unknown criminal media on P2P networks”, Digital InvesGgaGon, 18, pp. 50-64.

  33. Further InformaGon General M. Edwards, A. Rashid, P. Rayson (2015). “A Systema;c Survey of Online Data Mining Technology Intended for Law Enforcement” , ACM CompuGng Surveys, 48(1). A. Rashid, J. Weckert, R. Lucas: (2009). “SoZware Engineering Ethics in a Digital World” IEEE Computer 42(6): 34-41. A. Rashid, K. Moore, C. May-Chahal, R. Chitchyan (2015). “Managing emergent ethical concerns for soZware engineering in society” , Proc. ICSE 2015, Soqware Engineering in Society, pp. 523-526. IEEE

Recommend


More recommend