towards dependable steganalysis
play

Towards dependable steganalysis Tom Pevn a , c , Andrew D. Ker b a - PowerPoint PPT Presentation

Towards dependable steganalysis Tom Pevn a , c , Andrew D. Ker b a Cisco systems, Inc., Cognitive Research Team in Prague, CZ b Department of Computer Science, University of Oxford, UK c Department of Computers, CVUT in Prague, CZ 10th


  1. Towards dependable steganalysis Tomáš Pevný a , c , Andrew D. Ker b a Cisco systems, Inc., Cognitive Research Team in Prague, CZ b Department of Computer Science, University of Oxford, UK c Department of Computers, CVUT in Prague, CZ 10th February 2015 SPIE/IS&T Electronic Imaging

  2. Motivation 1 0 . 8 Detection accuracy 0 . 6 0 . 4 0 . 2 0 0 0 . 2 0 . 4 0 . 6 0 . 8 1 False positive rate

  3. Motivation 1 0 . 8 Detection accuracy 0 . 6 0 . 4 0 . 2 0 10 − 6 10 − 5 10 − 4 10 − 3 10 − 2 10 − 1 10 0 False positive rate

  4. Millions of images ◮ In 2014, Yahoo! released 100 million CC Flickr images. ◮ Selected images with quality factor 80 and known camera, split into two sets: Training & 449 395 cover 449 395 stego from 4781 users validation Testing 4 062 128 cover 407 417 stego from 43026 users ◮ Stego images: nsF5 at 0.5 bits per nonzero coefficient. ◮ JRM features computed from every image.

  5. Motivation What is a good benchmark? ◮ Equal prior error rate? ◮ Emphasizing false positives? Our error measure (FP-50) False positive rate at 50% detection accuracy.

  6. Motivation What is a good benchmark? ◮ Equal prior error rate? ◮ Emphasizing false positives? Our error measure (FP-50) False positive rate at 50% detection accuracy.

  7. Mathematical formulation Exact optimization criterion � �� � arg min I f ( x ) > median { f ( y ) | y ∼ stego } f ∈ F E x ∼ cover ◮ I ( · ) is the indicator function ◮ F set of classifiers Simplifications ◮ Restrict F to linear classifiers. � �� ◮ argmin f ∈ F E x ∼ cover � I f ( x ) > E y ∼ stego [ f ( y )]

  8. Mathematical formulation Exact optimization criterion � �� � arg min I f ( x ) > median { f ( y ) | y ∼ stego } f ∈ F E x ∼ cover ◮ I ( · ) is the indicator function ◮ F set of classifiers Simplifications ◮ Restrict F to linear classifiers. � �� ◮ argmin f ∈ F E x ∼ cover � I f ( x ) > E y ∼ stego [ f ( y )]

  9. Approximation by square loss I square 8 6 loss 4 2 0 − 2 − 1 0 1 2 distance from the hyperplane optimization criterion � 2 + λ � w � 2 w T ( x − ¯ ∑ � argmin y ) w x cover

  10. Approximation by hinge loss I 3 hinge 2 loss 1 0 − 2 − 1 0 1 2 distance from the hyperplane optimization criterion 0 , w T ( x − ¯ + λ � w � 2 ∑ � � argmin max y − 1 ) w x cover

  11. Approximation by exponential loss 8 I exp 6 loss 4 2 0 − 2 − 1 0 1 2 distance from the hyperplane optimization criterion e ( w T ( x − ¯ y ) ) + λ � w � 2 ∑ argmin w x cover

  12. Toy example Banana Set Banana Set 8 8 6 6 4 4 2 2 Feature 2 Feature 2 0 0 -2 -2 -4 -4 -6 -6 -8 -8 -10 -10 -10 -5 0 5 10 -10 -5 0 5 10 Feature 1 Feature 1 Fisher linear discriminant Optimizing exponential loss

  13. Linear classifiers on JRM features ◮ 22510 features ◮ 2 x 40 000 training images ◮ 2 x 250 000 validation images weighted SVM ∗ FP-50 FLD Square loss Exponential loss 1 . 11 · 10 − 4 2 . 18 · 10 − 5 1 . 45 · 10 − 5 training set 0 2 . 52 · 10 − 4 1 . 99 · 10 − 4 5 . 61 · 10 − 4 9 . 87 · 10 − 4 validation set ∗ argmin w η E x ∼ cover max { 0 , w T x } +( 1 − η ) E y ∼ stego max { 0 , − w T y } + λ � w � 2

  14. Optimizing an ensemble Ensembles based on random subspaces à la Kodovský: ◮ L base learners, ◮ Each trained on random d sub features, and all data. Two thresholds: ◮ base learner threshold: optimize equal prior accuracy ◮ Neyman-Pearson criterion (identical FP rate) ◮ voting threshold: majority vote ◮ arbitrary threshold

  15. Optimizing an ensemble Ensembles based on random subspaces à la Kodovský: ◮ L base learners, ◮ Each trained on random d sub features, and all data. Two thresholds: ◮ base learner threshold: optimize equal prior accuracy ◮ Neyman-Pearson criterion (identical FP rate) ◮ voting threshold: majority vote ◮ arbitrary threshold

  16. Optimizing an ensemble Ensembles based on random subspaces à la Kodovský: ◮ L base learners, ◮ Each trained on random d sub features, and all data. Two thresholds: ◮ base learner threshold: optimize equal prior accuracy ◮ Neyman-Pearson criterion (identical FP rate) ◮ voting threshold: majority vote ◮ arbitrary threshold

  17. ROC of ensembles 1 ◮ 2 x 40 000 training images 0 . 8 Detection accuracy ◮ 2 x 250 000 validation images 0 . 6 0 . 4 FLD 0 . 2 Square loss Exponential loss 0 10 − 6 10 − 5 10 − 4 10 − 3 10 − 2 10 − 1 10 0 False positive rate L = 300, d sub = 1000

  18. ROC of ensembles 1 ◮ 2 x 40 000 training images 0 . 8 Detection accuracy ◮ 2 x 250 000 validation images 0 . 6 0 . 4 FLD 0 . 2 Square loss Exponential loss 0 10 − 6 10 − 5 10 − 4 10 − 3 10 − 2 10 − 1 10 0 False positive rate L = 300, d sub = 500

  19. ROC of ensembles 1 ◮ 2 x 40 000 training images 0 . 8 Detection accuracy ◮ 2 x 250 000 validation images 0 . 6 0 . 4 FLD 0 . 2 Square loss Exponential loss 0 10 − 6 10 − 5 10 − 4 10 − 3 10 − 2 10 − 1 10 0 False positive rate L = 300, d sub = 250

  20. ROC of ensembles 1 ◮ 2 x 40 000 training images 0 . 8 Detection accuracy ◮ 2 x 250 000 validation images 0 . 6 0 . 4 FLD 0 . 2 Square loss Exponential loss 0 10 − 6 10 − 5 10 − 4 10 − 3 10 − 2 10 − 1 10 0 False positive rate L = 300, d sub = 100

  21. ROC of ensembles 1 ◮ 4.5M image testing set: 0 . 8 Detection accuracy ◮ False negative rate 51.2% ◮ False positive rate 5 . 56 · 10 − 5 0 . 6 0 . 4 FLD 0 . 2 Square loss Exponential loss 0 10 − 6 10 − 5 10 − 4 10 − 3 10 − 2 10 − 1 10 0 False positive rate L = 300, d sub = 100

  22. Errors on testing set Base learner Thresholds False negative rate False positive rate 1 . 33 · 10 − 3 9 . 07 · 10 − 3 FLD Traditional 4 . 58 · 10 − 1 3 . 26 · 10 − 4 FLD Proposed 5 . 12 · 10 − 1 5 . 56 · 10 − 5 Exponential loss Proposed

  23. Summary ◮ Classifiers derived from the FP-50 measure. ◮ Can derive same classifiers in two different ways. ◮ Various convex surrogates for step function: ◮ Non-smooth loss is difficult to optimize. ◮ Exponential loss encourages over-fitting. ◮ Square loss (FLD) has a hidden weakness. ◮ Ensemble subdimension is an indirect regularizer. ◮ Ensemble thresholds need to be optimized differently.

  24. Summary Banana Set 20 15 10 5 Feature 2 0 -5 -10 -15 -20 -20 -15 -10 -5 0 5 10 15 20 Feature 1

  25. Summary I square 8 6 loss 4 2 0 − 2 − 1 0 1 2 distance from the hyperplane

  26. Summary ◮ We detected lousy, very high-bit rate, steganography with 1 in 18000 false positive rate.

Recommend


More recommend