Matryoshka: Hiding Secret Communicatjon in Plain Sight Iris Safaka , Christjna Fragouli, Katerina Argyraki
● Free communicatjon systems → Give away some privacy 2
Wanna play tennis today? ● Free communicatjon systems → Give away some privacy 3
Wanna play tennis today? R a c k e t s t o d a y o n l y R a c k e t s t o d a y o n l y $ 5 0 . 0 0 , c l i c k h e r e ! $ 5 0 . 0 0 , c l i c k h e r e ! ● Free communicatjon systems → Give away some privacy ● Users are mostly aware of this trade-of 4
I don't like the government 5
I don't like the government We want your Alice likes tennis user data but not the government... ● Governments and courts request user data from tech companies – Eg. Google handed in data for 100K user accounts (2014) 6
Messaging Provider Alice Bob Eve - Alice and Bob wish to communicate privately - Eve always wants to know what they talk about Encryptjon? 7
Messaging Provider Alice Bob Eve - Alice and Bob wish to communicate privately - Eve always wants to know what they talk about Encryptjon? Interruptjon of free service 8
Messaging Provider Alice Bob Let's check them out Eve - Alice and Bob wish to communicate privately - Eve always wants to know what they talk about Encryptjon? Looking suspicious How about hiding the secret communicatjon? 9
Steganography ● Hide secret data within other “innocent” data I love you I love you Messaging Provider Stego Stego Alice Bob Eve 10
Steganography ● Hide secret data within other “innocent” data I love you I love you Messaging Provider Stego Stego Alice Bob Eve Alice likes cats 11
Linguistjc steganography ● Traditjonal approaches apply automated modifjcatjons – Embed secret message into a given text – Eg. synonym substjtutjon, sentence manipulatjon etc. ● Drawbacks – Introduce unnaturalness to the text – Require ofg-line access to resources – Modest covert rates Our goal: human-like text, implementable, high rate 12
Matryoshka 13
Bob Alice I love you I love you Compression Decompression 01100111 01100111 Bits to words Words to bits nice weather nice weather User enhancement Text cleaning Such a nice weather today! Messaging Provider Eve 14 Challenge: minimize user interventjon
I love you Mixed Hufman Coding 00011111 Dictjonary 0000 cat, cook, nice 0001 nice, play, cool nice cool ... ... play weather 1111 cool, weather, run cool run Text Corpus N-gram Language Model 0.8 weather nice nice weather 0.1 run User Enhancement Interface Such a nice weather today! 15
Encoder design ● Mixed Hufman Compression – Character Hufgman → names, unusual words, etc. – Word Hufgman → frequent English words ● Dictjonary – Maps bit sequences to sets of words – More frequent than infrequent words & repetjtjons ● N-gram Language Model – Models how dictjonary words appear in Natural Language ● User Enhancement Interface – Assist the user in completjng the sentences 16
Decoder design Such a nice weather today! Dictjonary Dictjonary 0000 0000 cat, cook, nice cat, cook, nice 0001 0001 nice, play, cool nice, play, cool ... ... ... ... 00011111 00001111 1111 1111 cool, weather, run cool, weather, run ● Repeatjng words in dictjonary creates ambiguity ● Probabilistjc decoder – K-order Markov model of English characters – Drops early improbable sequences 17
Evaluatjon ● Experimentatjon with human users in Amazon's Mechanical Turk “ I have become tjred of facebook's many years of existence. The change over the years by the engineers sucks. It seems facebook's wacky algorithm will never make sense. The posts make the code on facebook obsolete. ” “ Does facebook's CEO feed people feed dogs. Can't yet use data base set book. Two posts are uses people facebook apps. Mary Cox able humans into keeping up. ” 18
Evaluatjon ● Experimentatjon with human users in Amazon's Mechanical Turk ● User efort – Average task completjon tjme approx 5 mins – Average of 5 extra words inserted per sentence ● End-to-end covert rate – Average 3 bits per word – Eg. to hide 5 words we need to send 73 words ● Decoder error rate – Zero error rate (~95%) – Partjally corrupted messages (~15% chars.) 19
Evaluatjon ● Automatjc test: Is a sentence NL or not? 20
Summary ● Linguistjc steganography for reclaiming some privacy ● Human-like text, implementable, high covert rate ● Prototype implementatjon ● Experimentatjon on Mechanical Turk ● Automated steganalysis test 21
Next steps ● Investjgate alternatjve automated steganalysis tests – Eg. using Word Embeddings ● Identjfy further vulnerabilitjes and test ● Finalize system implementatjon Questjons ? 22
More recommend