Towards dependable steganalysis Tom Pevn a , c , Andrew D. Ker b a - PowerPoint PPT Presentation

Towards dependable steganalysis Tomáš Pevný a , c , Andrew D. Ker b a Cisco systems, Inc., Cognitive Research Team in Prague, CZ b Department of Computer Science, University of Oxford, UK c Department of Computers, CVUT in Prague, CZ 10th February 2015 SPIE/IS&T Electronic Imaging

Motivation 1 0 . 8 Detection accuracy 0 . 6 0 . 4 0 . 2 0 0 0 . 2 0 . 4 0 . 6 0 . 8 1 False positive rate

Motivation 1 0 . 8 Detection accuracy 0 . 6 0 . 4 0 . 2 0 10 − 6 10 − 5 10 − 4 10 − 3 10 − 2 10 − 1 10 0 False positive rate

Millions of images ◮ In 2014, Yahoo! released 100 million CC Flickr images. ◮ Selected images with quality factor 80 and known camera, split into two sets: Training & 449 395 cover 449 395 stego from 4781 users validation Testing 4 062 128 cover 407 417 stego from 43026 users ◮ Stego images: nsF5 at 0.5 bits per nonzero coefficient. ◮ JRM features computed from every image.

Motivation What is a good benchmark? ◮ Equal prior error rate? ◮ Emphasizing false positives? Our error measure (FP-50) False positive rate at 50% detection accuracy.

Mathematical formulation Exact optimization criterion � �� arg min I f ( x ) > median { f ( y ) | y ∼ stego } f ∈ F E x ∼ cover ◮ I ( · ) is the indicator function ◮ F set of classifiers Simplifications ◮ Restrict F to linear classifiers. � �� ◮ argmin f ∈ F E x ∼ cover � I f ( x ) > E y ∼ stego [ f ( y )]

Approximation by square loss I square 8 6 loss 4 2 0 − 2 − 1 0 1 2 distance from the hyperplane optimization criterion � 2 + λ � w � 2 w T ( x − ¯ ∑ � argmin y ) w x cover

Approximation by hinge loss I 3 hinge 2 loss 1 0 − 2 − 1 0 1 2 distance from the hyperplane optimization criterion 0 , w T ( x − ¯ + λ � w � 2 ∑ � � argmin max y − 1 ) w x cover

Approximation by exponential loss 8 I exp 6 loss 4 2 0 − 2 − 1 0 1 2 distance from the hyperplane optimization criterion e ( w T ( x − ¯ y ) ) + λ � w � 2 ∑ argmin w x cover

Toy example Banana Set Banana Set 8 8 6 6 4 4 2 2 Feature 2 Feature 2 0 0 -2 -2 -4 -4 -6 -6 -8 -8 -10 -10 -10 -5 0 5 10 -10 -5 0 5 10 Feature 1 Feature 1 Fisher linear discriminant Optimizing exponential loss

Linear classifiers on JRM features ◮ 22510 features ◮ 2 x 40 000 training images ◮ 2 x 250 000 validation images weighted SVM ∗ FP-50 FLD Square loss Exponential loss 1 . 11 · 10 − 4 2 . 18 · 10 − 5 1 . 45 · 10 − 5 training set 0 2 . 52 · 10 − 4 1 . 99 · 10 − 4 5 . 61 · 10 − 4 9 . 87 · 10 − 4 validation set ∗ argmin w η E x ∼ cover max { 0 , w T x } +( 1 − η ) E y ∼ stego max { 0 , − w T y } + λ � w � 2

Optimizing an ensemble Ensembles based on random subspaces à la Kodovský: ◮ L base learners, ◮ Each trained on random d sub features, and all data. Two thresholds: ◮ base learner threshold: optimize equal prior accuracy ◮ Neyman-Pearson criterion (identical FP rate) ◮ voting threshold: majority vote ◮ arbitrary threshold

ROC of ensembles 1 ◮ 2 x 40 000 training images 0 . 8 Detection accuracy ◮ 2 x 250 000 validation images 0 . 6 0 . 4 FLD 0 . 2 Square loss Exponential loss 0 10 − 6 10 − 5 10 − 4 10 − 3 10 − 2 10 − 1 10 0 False positive rate L = 300, d sub = 1000

ROC of ensembles 1 ◮ 4.5M image testing set: 0 . 8 Detection accuracy ◮ False negative rate 51.2% ◮ False positive rate 5 . 56 · 10 − 5 0 . 6 0 . 4 FLD 0 . 2 Square loss Exponential loss 0 10 − 6 10 − 5 10 − 4 10 − 3 10 − 2 10 − 1 10 0 False positive rate L = 300, d sub = 100

Errors on testing set Base learner Thresholds False negative rate False positive rate 1 . 33 · 10 − 3 9 . 07 · 10 − 3 FLD Traditional 4 . 58 · 10 − 1 3 . 26 · 10 − 4 FLD Proposed 5 . 12 · 10 − 1 5 . 56 · 10 − 5 Exponential loss Proposed

Summary ◮ Classifiers derived from the FP-50 measure. ◮ Can derive same classifiers in two different ways. ◮ Various convex surrogates for step function: ◮ Non-smooth loss is difficult to optimize. ◮ Exponential loss encourages over-fitting. ◮ Square loss (FLD) has a hidden weakness. ◮ Ensemble subdimension is an indirect regularizer. ◮ Ensemble thresholds need to be optimized differently.

Summary Banana Set 20 15 10 5 Feature 2 0 -5 -10 -15 -20 -20 -15 -10 -5 0 5 10 15 20 Feature 1

Summary I square 8 6 loss 4 2 0 − 2 − 1 0 1 2 distance from the hyperplane

Summary ◮ We detected lousy, very high-bit rate, steganography with 1 in 18000 false positive rate.

Towards dependable steganalysis Tom Pevn a , c , Andrew D. Ker b a - PowerPoint PPT Presentation

Towards dependable steganalysis Tom Pevn a , c , Andrew D. Ker b a Cisco systems, Inc., Cognitive Research Team in Prague, CZ b Department of Computer Science, University of Oxford, UK c Department of Computers, CVUT in Prague, CZ 10th

Steganography and Steganalysis in digital age Tom Pevn Agent Technology Center, CTU 3rd

Statistics and Steganalysis CSM25 Secure Information Hiding Dr Hans Georg Schaathun University

Pooled steganalysis in JPEG: how to deal with the spreading strategy? Ahmad ZAKARIA 1 , 2 , Marc

Image Forensics and Steganalysis (Hans) Georg Schaathun Department of Computing University of

Steganalysis by Ensemble Classifiers with Boosting by Regression, and Post-Selection of Features

Modern Steganalysis Can Detect YASS Jan Kodovsk, Tom Pevn, Jessica Fridrich January 18,

A Two-Factor Error Model for Quantitative Steganalysis Security, Steganography and Watermarking

Steganalysis in high dimensions: Fusing classifiers built on random subspaces Jan Kodovsk,

Statistics and Steganalysis CSM25 Secure Information Hiding Dr Hans Georg Schaathun University

Anti-Honeypot Technology Thorsten Holz Laboratory for Dependable Distributed Systems

Towards a high performance parallel platform for dependable embedded systems Mitsuhisa Sato

Optimizing Pixel Predictors for Steganalysis Vojtch Holub and Jessica Fridrich Dept. of

A General Framework for the Structural Steganalysis of LSB Replacement Andrew Ker

On Completeness of Feature Spaces in Blind Steganalysis Jan Kodovsk y Jessica Fridrich

The Challenges of Rich Features in Universal Steganalysis Tom Pevn a and Andrew D. Ker b a

Quantitative Evaluation of Pairs and RS Steganalysis Andrew Ker Oxford University Computing

Software Engineering I (02161) Week 6 Assoc. Prof. Hubert Baumeister DTU Compute Technical

Advanced Scientific Computing with R 1. Overview Michael Hahsler Southern Methodist University

Conditional independence, Nave Bayes and Bayesian Networks Jo Joint Probability Distribution

Design Patterns II Department of Computer Science University of Maryland, College Park Observer

Integer Linked Lists An integer list is either: (1) empty, represented by (null) Lists, Too

The FT eb pp: Coding responsively Dr Robert Shilston (rob@labs.ft.com) Director, FT Labs

Challenges and Conflicts of Linked Data in Archives Gregory Wiedeman, @GregWiedeman University

Embedding Lua scripts for Redis in C & other lessons learned https://nchan.slact.net talk

Sambuz

Useful Links

Newsletter

Mail Us

Towards dependable steganalysis Tom Pevn a , c , Andrew D. Ker b a - PowerPoint PPT Presentation

Towards dependable steganalysis Tom Pevn a , c , Andrew D. Ker b a Cisco systems, Inc., Cognitive Research Team in Prague, CZ b Department of Computer Science, University of Oxford, UK c Department of Computers, CVUT in Prague, CZ 10th

Steganography and Steganalysis in digital age Tom Pevn Agent Technology Center, CTU 3rd

Statistics and Steganalysis CSM25 Secure Information Hiding Dr Hans Georg Schaathun University

Pooled steganalysis in JPEG: how to deal with the spreading strategy? Ahmad ZAKARIA 1 , 2 , Marc

Image Forensics and Steganalysis (Hans) Georg Schaathun Department of Computing University of

Steganalysis by Ensemble Classifiers with Boosting by Regression, and Post-Selection of Features

Modern Steganalysis Can Detect YASS Jan Kodovsk, Tom Pevn, Jessica Fridrich January 18,

A Two-Factor Error Model for Quantitative Steganalysis Security, Steganography and Watermarking

Steganalysis in high dimensions: Fusing classifiers built on random subspaces Jan Kodovsk,

Statistics and Steganalysis CSM25 Secure Information Hiding Dr Hans Georg Schaathun University

Anti-Honeypot Technology Thorsten Holz Laboratory for Dependable Distributed Systems

Towards a high performance parallel platform for dependable embedded systems Mitsuhisa Sato

Optimizing Pixel Predictors for Steganalysis Vojtch Holub and Jessica Fridrich Dept. of

A General Framework for the Structural Steganalysis of LSB Replacement Andrew Ker

On Completeness of Feature Spaces in Blind Steganalysis Jan Kodovsk y Jessica Fridrich

The Challenges of Rich Features in Universal Steganalysis Tom Pevn a and Andrew D. Ker b a

Quantitative Evaluation of Pairs and RS Steganalysis Andrew Ker Oxford University Computing

Software Engineering I (02161) Week 6 Assoc. Prof. Hubert Baumeister DTU Compute Technical

Advanced Scientific Computing with R 1. Overview Michael Hahsler Southern Methodist University

Conditional independence, Nave Bayes and Bayesian Networks Jo Joint Probability Distribution

Design Patterns II Department of Computer Science University of Maryland, College Park Observer

Integer Linked Lists An integer list is either: (1) empty, represented by (null) Lists, Too

The FT eb pp: Coding responsively Dr Robert Shilston (rob@labs.ft.com) Director, FT Labs

Challenges and Conflicts of Linked Data in Archives Gregory Wiedeman, @GregWiedeman University

Embedding Lua scripts for Redis in C &amp; other lessons learned https://nchan.slact.net talk

Sambuz

Useful Links

Newsletter

Mail Us

Embedding Lua scripts for Redis in C & other lessons learned https://nchan.slact.net talk