Authorship Obfuscation Using Heuristic Search Master’s Thesis Defence by Janek Bevendorff on 20 June 2018 Supervisors: Prof. Dr. Benno Stein, PD Dr. Andreas Jakoby
• Unmasking for short texts • Obfuscation against unmasking • Obfuscation against compression models • Authorship verification quality measure proposal • Obfuscation safety analysis and definitions • Side effect analysis • JS ∆ as authorship metric • Adaptive obfuscation • Design of an admissible obfuscation heuristic • Analysis of consistency and monotonicity properties • Design and implementation of an efficient obfuscation framework • Development of obfuscation operators • Inspection of search space challenges and solutions 20.06.2018 2
• Unmasking for short texts • Obfuscation against unmasking • Obfuscation against compression models • Authorship verification quality measure proposal • Obfuscation safety analysis and definitions • Side effect analysis • JS ∆ as authorship metric • Adaptive obfuscation • Design of an admissible obfuscation heuristic • Analysis of consistency and monotonicity properties • Design and implementation of an efficient obfuscation framework • Development of obfuscation operators • Inspection of search space challenges and solutions 20.06.2018 3
Authorship
20.06.2018 5
20.06.2018 5
20.06.2018 6
? 20.06.2018 6
Koppel and Schler, Authorship verification as a one-class problem, 2004 20.06.2018 7
Koppel and Schler, Authorship verification as a one-class problem, 2004 20.06.2018 7
Koppel and Schler, Authorship verification as a one-class problem, 2004 20.06.2018 7
Koppel and Schler, Authorship verification as a one-class problem, 2004 20.06.2018 7
Koppel and Schler, Authorship verification as a one-class problem, 2004 20.06.2018 7
Koppel and Schler, Authorship verification as a one-class problem, 2004 20.06.2018 7
Koppel and Schler, Authorship verification as a one-class problem, 2004 20.06.2018 8
Same author 20.06.2018 9
Different authors Same author 20.06.2018 9
20.06.2018 10
having rest Island Livesey gentlemen the Dr Treasure 20.06.2018 10
my having adventures of rest morning Island Livesey begin the gentlemen will the Dr certain Treasure story I 20.06.2018 11
20.06.2018 12
1.0 Same author Different authors 0.9 0.8 Accuracy 0.7 0.6 0.5 0 3 6 9 12 15 18 21 Rounds 20.06.2018 13
1.0 Same author Different authors 0.9 0.8 Accuracy 0.7 0.6 0.5 0 3 6 9 12 15 18 21 Rounds 20.06.2018 14
Confidence Level Threshold Precision % Classified 0.9 1.00 6.2 0.8 1.00 12.5 Very High 0.7 1.00 13.8 0.6 1.00 18.8 High 0.5 1.00 30.0 Training 0.4 0.93 43.8 0.3 0.83 55.0 Moderate 0.2 0.68 70.0 0.1 0.82 87.5 Low 0.0 0.76 100.0 Test 20.06.2018 15
Obfuscation
20.06.2018 17
20.06.2018 17
1.0 Same author Different authors 0.9 0.8 Accuracy 0.7 0.6 0.5 0 3 6 9 12 15 18 21 Rounds 20.06.2018 18
1.0 Same author Different authors 0.9 0.8 Accuracy 0.7 0.6 0.5 0 3 6 9 12 15 18 21 Rounds 20.06.2018 18
1.0 Same author Different authors 0.9 0.8 Accuracy 0.7 0.6 0.5 0 3 6 9 12 15 18 21 Rounds 20.06.2018 19
𝑄[𝑗] ԡ KLD 𝑄 𝑅 = 𝑄[𝑗] log 2 𝑅[𝑗] 𝑗 20.06.2018 20
𝑄[𝑗] ԡ KLD 𝑄 𝑅 = 𝑄[𝑗] log 2 𝑅[𝑗] 𝑗 ԡ ԡ 𝑄 𝑅 = KLD 𝑄 𝑁 + KLD 𝑅 𝑁 ԡ JSD 2 𝑁 = 𝑄 + 𝑅 2 20.06.2018 21
𝑄[𝑗] ԡ KLD 𝑄 𝑅 = 𝑄[𝑗] log 2 𝑅[𝑗] 𝑗 ԡ ԡ 𝑄 𝑅 = KLD 𝑄 𝑁 + KLD 𝑅 𝑁 → maximize ԡ JSD 2 𝑁 = 𝑄 + 𝑅 2 20.06.2018 21
𝜖 𝑄[𝑗] 𝑄[𝑗] 𝜖𝑅[𝑗] 𝑄[𝑗] log 2 = − 𝑅[𝑗] 𝑅[𝑗] ln 2 20.06.2018 22
𝜖 𝑄[𝑗] 𝑄[𝑗] 𝜖𝑅[𝑗] 𝑄[𝑗] log 2 = − 𝑅[𝑗] 𝑅[𝑗] ln 2 R KL (𝑗) = 𝑄[𝑗] 𝑅[𝑗] 20.06.2018 22
𝜖 𝑄[𝑗] 𝑄[𝑗] 𝜖𝑅[𝑗] 𝑄[𝑗] log 2 = − 𝑅[𝑗] 𝑅[𝑗] ln 2 R KL (𝑗) = 𝑄[𝑗] → maximize 𝑅[𝑗] 20.06.2018 22
n-grams ranked left to right n-gram frequencies ny_ ly_ par bor y_h hel eme gro dis gre Text 1 Text 1 Text 2 (to be obfuscated) Text 2 (to be obfuscated) 20.06.2018 23
n-grams ranked left to right n-gram frequencies ny_ ly_ par bor y_h hel eme gro dis gre Text 1 Text 1 Text 2 (to be obfuscated) Text 2 (to be obfuscated) 20.06.2018 23
n-grams ranked left to right n-gram frequencies ny_ ly_ par bor y_h hel eme gro dis gre Text 1 Text 1 Text 2 (to be obfuscated) Text 2 (to be obfuscated) 20.06.2018 23
n-grams ranked left to right n-gram frequencies ny_ ly_ par bor y_h hel eme gro dis gre Text 1 Text 1 Text 2 (to be obfuscated) Text 2 (to be obfuscated) 20.06.2018 23
n-grams ranked left to right n-gram frequencies ny_ ly_ par bor y_h hel eme gro dis gre Text 1 Text 1 Text 2 (to be obfuscated) Text 2 (to be obfuscated) 20.06.2018 23
n-grams ranked left to right n-gram frequencies ny_ ly_ par bor y_h hel eme gro dis gre Text 1 Text 1 Text 2 (to be obfuscated) Text 2 (to be obfuscated) 20.06.2018 23
n-grams ranked left to right n-gram frequencies ny_ ly_ par bor y_h hel eme gro dis gre Text 1 Text 1 Text 2 (to be obfuscated) Text 2 (to be obfuscated) 20.06.2018 23
1.4 Same author Different authors 1.2 JS distance (JS ∆ ) 1.0 0.8 0.6 JS Δ = 2 ⋅ JSD( ԡ 𝑄 𝑅) 0.4 2 7 2 8 2 9 2 10 2 11 2 12 2 13 2 14 Text length (characters) 20.06.2018 24
1.4 Same author Different authors ɛ 0 1.2 JS distance (JS ∆ ) 1.0 0.8 0.6 JS Δ = 2 ⋅ JSD( ԡ 𝑄 𝑅) 0.4 2 7 2 8 2 9 2 10 2 11 2 12 2 13 2 14 Text length (characters) 20.06.2018 24
1.4 Same author Different authors ɛ 0 1.2 ɛ 0.5 JS distance (JS ∆ ) 1.0 0.8 0.6 JS Δ = 2 ⋅ JSD( ԡ 𝑄 𝑅) 0.4 2 7 2 8 2 9 2 10 2 11 2 12 2 13 2 14 Text length (characters) 20.06.2018 24
Confidence Level Threshold Precision % Classified 0.9 1.00 6.2 0.8 1.00 12.5 Very High 0.7 1.00 13.8 0.6 1.00 18.8 High 0.5 1.00 30.0 0.4 0.93 43.8 0.3 0.83 55.0 Moderate 0.2 0.68 70.0 0.1 0.82 87.5 Low 0.0 0.76 100.0 20.06.2018 25
Confidence Level Threshold Precision % Classified Confidence Level Threshold Precision % Classified 0.9 1.00 6.2 0.9 0.00 2.5 0.8 1.00 12.5 0.8 0.00 5.0 Very High Very High 0.7 1.00 13.8 0.7 0.00 8.7 0.6 1.00 18.8 0.6 0.00 17.5 High High 0.5 1.00 30.0 0.5 0.00 27.5 0.4 0.93 43.8 0.4 0.00 42.5 0.3 0.83 55.0 0.3 0.67 66.7 Moderate Moderate 0.2 0.68 70.0 0.2 0.50 70.0 0.1 0.82 87.5 0.1 0.42 85.0 Low Low 0.0 0.76 100.0 0.0 0.53 100.0 20.06.2018 25
Heuristic Search
s 20.06.2018 27
s C LOSED 20.06.2018 27
s C LOSED O PEN 20.06.2018 27
s 𝑔(𝑜) 20.06.2018 27
s 𝑔(𝑜) 20.06.2018 27
s 𝑔(𝑜) 20.06.2018 27
s 20.06.2018 27
s 𝑔(𝑜) 20.06.2018 27
𝑔 𝑜 = 𝑜 + ℎ(𝑜) 20.06.2018 28
𝑔 𝑜 = 𝑜 + ℎ(𝑜) ℎ 𝑜 ≤ ℎ ∗ (𝑜) 20.06.2018 28
ℎ 𝑞𝑠𝑗𝑝𝑠 𝑜 = 𝜁 − JS Δ 𝑜 20.06.2018 29
ℎ 𝑞𝑠𝑗𝑝𝑠 𝑜 = 𝜁 − JS Δ 𝑜 (𝑜) 𝑜𝑝𝑠𝑛 𝑜 = JS Δ 𝑜 − JS Δ 0 20.06.2018 29
ℎ 𝑜 = ℎ 𝑞𝑠𝑗𝑝𝑠 𝑜 ⋅ 𝑜𝑝𝑠𝑛 𝑜 20.06.2018 29
𝜁 Linear Gain 20.06.2018 30
𝜁 (𝑜) Linear Gain 20.06.2018 30
𝜁 JS Δ (𝑜) Linear Gain 20.06.2018 30
ℎ(𝑜) 𝜁 JS Δ (𝑜) Linear Gain 20.06.2018 30
ℎ(𝑜) 𝜁 JS Δ (𝑜) Linear Gain 20.06.2018 30
ℎ(𝑜) 𝜁 JS Δ (𝑜) Sublinear Gain 20.06.2018 31
𝜁 ℎ(𝑜) JS Δ (𝑜) Sublinear Gain 20.06.2018 31
𝜁 ℎ(𝑜) JS Δ (𝑜) Sublinear Gain 20.06.2018 31
ℎ(𝑜) 𝜁 JS Δ (𝑜) 0 100 200 300 400 Operations 20.06.2018 32
ℎ(𝑜) Stepwise JS Δ 𝜁 JS Δ (𝑜) 0 100 200 300 400 Operations 20.06.2018 32
ℎ(𝑜) Stepwise JS Δ 𝜁 JS Δ (𝑜) 0 100 200 300 400 Operations 20.06.2018 32
n -gram removal abcdefg 20.06.2018 33
n -gram removal abfg 20.06.2018 33
n -gram removal character flip wi a rd z abfg 20.06.2018 33
n -gram removal character flip wi a rd z abfg 20.06.2018 33
n -gram removal character flip character map wi a rd z abfg The End. 20.06.2018 33
n -gram removal character flip character map wi a rd z abfg The End! 20.06.2018 33
n -gram removal character flip character map wi a rd z abfg The End! house synonym 20.06.2018 33
Recommend
More recommend