authorship obfuscation
play

Authorship Obfuscation Using Heuristic Search Masters Thesis Defence - PowerPoint PPT Presentation

Authorship Obfuscation Using Heuristic Search Masters Thesis Defence by Janek Bevendorff on 20 June 2018 Supervisors: Prof. Dr. Benno Stein, PD Dr. Andreas Jakoby Unmasking for short texts Obfuscation against unmasking Obfuscation


  1. Authorship Obfuscation Using Heuristic Search Master’s Thesis Defence by Janek Bevendorff on 20 June 2018 Supervisors: Prof. Dr. Benno Stein, PD Dr. Andreas Jakoby

  2. • Unmasking for short texts • Obfuscation against unmasking • Obfuscation against compression models • Authorship verification quality measure proposal • Obfuscation safety analysis and definitions • Side effect analysis • JS ∆ as authorship metric • Adaptive obfuscation • Design of an admissible obfuscation heuristic • Analysis of consistency and monotonicity properties • Design and implementation of an efficient obfuscation framework • Development of obfuscation operators • Inspection of search space challenges and solutions 20.06.2018 2

  3. • Unmasking for short texts • Obfuscation against unmasking • Obfuscation against compression models • Authorship verification quality measure proposal • Obfuscation safety analysis and definitions • Side effect analysis • JS ∆ as authorship metric • Adaptive obfuscation • Design of an admissible obfuscation heuristic • Analysis of consistency and monotonicity properties • Design and implementation of an efficient obfuscation framework • Development of obfuscation operators • Inspection of search space challenges and solutions 20.06.2018 3

  4. Authorship

  5. 20.06.2018 5

  6. 20.06.2018 5

  7. 20.06.2018 6

  8. ? 20.06.2018 6

  9. Koppel and Schler, Authorship verification as a one-class problem, 2004 20.06.2018 7

  10. Koppel and Schler, Authorship verification as a one-class problem, 2004 20.06.2018 7

  11. Koppel and Schler, Authorship verification as a one-class problem, 2004 20.06.2018 7

  12. Koppel and Schler, Authorship verification as a one-class problem, 2004 20.06.2018 7

  13. Koppel and Schler, Authorship verification as a one-class problem, 2004 20.06.2018 7

  14. Koppel and Schler, Authorship verification as a one-class problem, 2004 20.06.2018 7

  15. Koppel and Schler, Authorship verification as a one-class problem, 2004 20.06.2018 8

  16. Same author 20.06.2018 9

  17. Different authors Same author 20.06.2018 9

  18. 20.06.2018 10

  19. having rest Island Livesey gentlemen the Dr Treasure 20.06.2018 10

  20. my having adventures of rest morning Island Livesey begin the gentlemen will the Dr certain Treasure story I 20.06.2018 11

  21. 20.06.2018 12

  22. 1.0 Same author Different authors 0.9 0.8 Accuracy 0.7 0.6 0.5 0 3 6 9 12 15 18 21 Rounds 20.06.2018 13

  23. 1.0 Same author Different authors 0.9 0.8 Accuracy 0.7 0.6 0.5 0 3 6 9 12 15 18 21 Rounds 20.06.2018 14

  24. Confidence Level Threshold Precision % Classified 0.9 1.00 6.2 0.8 1.00 12.5 Very High 0.7 1.00 13.8 0.6 1.00 18.8 High 0.5 1.00 30.0 Training 0.4 0.93 43.8 0.3 0.83 55.0 Moderate 0.2 0.68 70.0 0.1 0.82 87.5 Low 0.0 0.76 100.0 Test 20.06.2018 15

  25. Obfuscation

  26. 20.06.2018 17

  27. 20.06.2018 17

  28. 1.0 Same author Different authors 0.9 0.8 Accuracy 0.7 0.6 0.5 0 3 6 9 12 15 18 21 Rounds 20.06.2018 18

  29. 1.0 Same author Different authors 0.9 0.8 Accuracy 0.7 0.6 0.5 0 3 6 9 12 15 18 21 Rounds 20.06.2018 18

  30. 1.0 Same author Different authors 0.9 0.8 Accuracy 0.7 0.6 0.5 0 3 6 9 12 15 18 21 Rounds 20.06.2018 19

  31. 𝑄[𝑗] ԡ KLD 𝑄 𝑅 = ෍ 𝑄[𝑗] log 2 𝑅[𝑗] 𝑗 20.06.2018 20

  32. 𝑄[𝑗] ԡ KLD 𝑄 𝑅 = ෍ 𝑄[𝑗] log 2 𝑅[𝑗] 𝑗 ԡ ԡ 𝑄 𝑅 = KLD 𝑄 𝑁 + KLD 𝑅 𝑁 ԡ JSD 2 𝑁 = 𝑄 + 𝑅 2 20.06.2018 21

  33. 𝑄[𝑗] ԡ KLD 𝑄 𝑅 = ෍ 𝑄[𝑗] log 2 𝑅[𝑗] 𝑗 ԡ ԡ 𝑄 𝑅 = KLD 𝑄 𝑁 + KLD 𝑅 𝑁 → maximize ԡ JSD 2 𝑁 = 𝑄 + 𝑅 2 20.06.2018 21

  34. 𝜖 𝑄[𝑗] 𝑄[𝑗] 𝜖𝑅[𝑗] 𝑄[𝑗] log 2 = − 𝑅[𝑗] 𝑅[𝑗] ln 2 20.06.2018 22

  35. 𝜖 𝑄[𝑗] 𝑄[𝑗] 𝜖𝑅[𝑗] 𝑄[𝑗] log 2 = − 𝑅[𝑗] 𝑅[𝑗] ln 2 R KL (𝑗) = 𝑄[𝑗] 𝑅[𝑗] 20.06.2018 22

  36. 𝜖 𝑄[𝑗] 𝑄[𝑗] 𝜖𝑅[𝑗] 𝑄[𝑗] log 2 = − 𝑅[𝑗] 𝑅[𝑗] ln 2 R KL (𝑗) = 𝑄[𝑗] → maximize 𝑅[𝑗] 20.06.2018 22

  37. n-grams ranked left to right n-gram frequencies ny_ ly_ par bor y_h hel eme gro dis gre Text 1 Text 1 Text 2 (to be obfuscated) Text 2 (to be obfuscated) 20.06.2018 23

  38. n-grams ranked left to right n-gram frequencies ny_ ly_ par bor y_h hel eme gro dis gre Text 1 Text 1 Text 2 (to be obfuscated) Text 2 (to be obfuscated) 20.06.2018 23

  39. n-grams ranked left to right n-gram frequencies ny_ ly_ par bor y_h hel eme gro dis gre Text 1 Text 1 Text 2 (to be obfuscated) Text 2 (to be obfuscated) 20.06.2018 23

  40. n-grams ranked left to right n-gram frequencies ny_ ly_ par bor y_h hel eme gro dis gre Text 1 Text 1 Text 2 (to be obfuscated) Text 2 (to be obfuscated) 20.06.2018 23

  41. n-grams ranked left to right n-gram frequencies ny_ ly_ par bor y_h hel eme gro dis gre Text 1 Text 1 Text 2 (to be obfuscated) Text 2 (to be obfuscated) 20.06.2018 23

  42. n-grams ranked left to right n-gram frequencies ny_ ly_ par bor y_h hel eme gro dis gre Text 1 Text 1 Text 2 (to be obfuscated) Text 2 (to be obfuscated) 20.06.2018 23

  43. n-grams ranked left to right n-gram frequencies ny_ ly_ par bor y_h hel eme gro dis gre Text 1 Text 1 Text 2 (to be obfuscated) Text 2 (to be obfuscated) 20.06.2018 23

  44. 1.4 Same author Different authors 1.2 JS distance (JS ∆ ) 1.0 0.8 0.6 JS Δ = 2 ⋅ JSD( ԡ 𝑄 𝑅) 0.4 2 7 2 8 2 9 2 10 2 11 2 12 2 13 2 14 Text length (characters) 20.06.2018 24

  45. 1.4 Same author Different authors ɛ 0 1.2 JS distance (JS ∆ ) 1.0 0.8 0.6 JS Δ = 2 ⋅ JSD( ԡ 𝑄 𝑅) 0.4 2 7 2 8 2 9 2 10 2 11 2 12 2 13 2 14 Text length (characters) 20.06.2018 24

  46. 1.4 Same author Different authors ɛ 0 1.2 ɛ 0.5 JS distance (JS ∆ ) 1.0 0.8 0.6 JS Δ = 2 ⋅ JSD( ԡ 𝑄 𝑅) 0.4 2 7 2 8 2 9 2 10 2 11 2 12 2 13 2 14 Text length (characters) 20.06.2018 24

  47. Confidence Level Threshold Precision % Classified 0.9 1.00 6.2 0.8 1.00 12.5 Very High 0.7 1.00 13.8 0.6 1.00 18.8 High 0.5 1.00 30.0 0.4 0.93 43.8 0.3 0.83 55.0 Moderate 0.2 0.68 70.0 0.1 0.82 87.5 Low 0.0 0.76 100.0 20.06.2018 25

  48. Confidence Level Threshold Precision % Classified Confidence Level Threshold Precision % Classified 0.9 1.00 6.2 0.9 0.00 2.5 0.8 1.00 12.5 0.8 0.00 5.0 Very High Very High 0.7 1.00 13.8 0.7 0.00 8.7 0.6 1.00 18.8 0.6 0.00 17.5 High High 0.5 1.00 30.0 0.5 0.00 27.5 0.4 0.93 43.8 0.4 0.00 42.5 0.3 0.83 55.0 0.3 0.67 66.7 Moderate Moderate 0.2 0.68 70.0 0.2 0.50 70.0 0.1 0.82 87.5 0.1 0.42 85.0 Low Low 0.0 0.76 100.0 0.0 0.53 100.0 20.06.2018 25

  49. Heuristic Search

  50. s 20.06.2018 27

  51. s C LOSED 20.06.2018 27

  52. s C LOSED O PEN 20.06.2018 27

  53. s 𝑔(𝑜) 20.06.2018 27

  54. s 𝑔(𝑜) 20.06.2018 27

  55. s 𝑔(𝑜) 20.06.2018 27

  56. s 20.06.2018 27

  57. s 𝑔(𝑜) 20.06.2018 27

  58. 𝑔 𝑜 = 𝑕 𝑜 + ℎ(𝑜) 20.06.2018 28

  59. 𝑔 𝑜 = 𝑕 𝑜 + ℎ(𝑜) ℎ 𝑜 ≤ ℎ ∗ (𝑜) 20.06.2018 28

  60. ℎ 𝑞𝑠𝑗𝑝𝑠 𝑜 = 𝜁 − JS Δ 𝑜 20.06.2018 29

  61. ℎ 𝑞𝑠𝑗𝑝𝑠 𝑜 = 𝜁 − JS Δ 𝑜 𝑕(𝑜) 𝑕 𝑜𝑝𝑠𝑛 𝑜 = JS Δ 𝑜 − JS Δ 0 20.06.2018 29

  62. ℎ 𝑜 = ℎ 𝑞𝑠𝑗𝑝𝑠 𝑜 ⋅ 𝑕 𝑜𝑝𝑠𝑛 𝑜 20.06.2018 29

  63. 𝜁 Linear Gain 20.06.2018 30

  64. 𝜁 𝑕(𝑜) Linear Gain 20.06.2018 30

  65. 𝜁 JS Δ 𝑕(𝑜) Linear Gain 20.06.2018 30

  66. ℎ(𝑜) 𝜁 JS Δ 𝑕(𝑜) Linear Gain 20.06.2018 30

  67. ℎ(𝑜) 𝜁 JS Δ 𝑕(𝑜) Linear Gain 20.06.2018 30

  68. ℎ(𝑜) 𝜁 JS Δ 𝑕(𝑜) Sublinear Gain 20.06.2018 31

  69. 𝜁 ℎ(𝑜) JS Δ 𝑕(𝑜) Sublinear Gain 20.06.2018 31

  70. 𝜁 ℎ(𝑜) JS Δ 𝑕(𝑜) Sublinear Gain 20.06.2018 31

  71. ℎ(𝑜) 𝜁 JS Δ 𝑕(𝑜) 0 100 200 300 400 Operations 20.06.2018 32

  72. ℎ(𝑜) Stepwise JS Δ 𝜁 JS Δ 𝑕(𝑜) 0 100 200 300 400 Operations 20.06.2018 32

  73. ℎ(𝑜) Stepwise JS Δ 𝜁 JS Δ 𝑕(𝑜) 0 100 200 300 400 Operations 20.06.2018 32

  74. n -gram removal abcdefg 20.06.2018 33

  75. n -gram removal abfg 20.06.2018 33

  76. n -gram removal character flip wi a rd z abfg 20.06.2018 33

  77. n -gram removal character flip wi a rd z abfg 20.06.2018 33

  78. n -gram removal character flip character map wi a rd z abfg The End. 20.06.2018 33

  79. n -gram removal character flip character map wi a rd z abfg The End! 20.06.2018 33

  80. n -gram removal character flip character map wi a rd z abfg The End! house synonym 20.06.2018 33

Recommend


More recommend