Determine the true author of anonymous documents 1 /31
Team Worden • Marc Barrowclift Stakeholder - Rachel Greenstadt • Travis Dutko Advisor - Jeff Salvage • Corey Everitt • Jiakang Jin • Eric Nordstrom • Ivan "Frankie" Orrego 2 /31
What Is Worden? • Protect & identify authors of anonymous works • Real world example - JK Rowling’s Cuckoo’s Calling 3 /31
What Is Worden? • Our project - Dramatically restructure JStylo • Why care? 4 /31
Live Demo IDENTIFY ANONYMOUS DOCUMENT 5 /31
UI Design & User studies Average Domain Expert Intermediate 6 /31
Original Client / Server Architecture • IKVM.Net • Transpile .jar Dependencies into C# classes • Rapid prototyping due to familiarity 7 /31
Final Client / Server Architecture • Data prep and packaging done on client side to meet deadlines • Angular.js MVC Client • Spring MVC Server 8 /31
Client Architecture • 2 way data bound • Allows proper HTTP abstraction • Handles DOM manipulation • Control over information flow • Highly modularized 9 /31
Server Architecture • Only utilizing the “C” in MVC 1. Picks up HTTP traffic 2. Repackages it 3. Pipes to JStylo 4. Returns a JSON response to controller • SaaS (Stylometry as a Service) - Independent of web browser 10 /31
Backend System Architecture • Feature Extraction Engine - Reduces documents to data • Machine Learning Engine - Interprets data 11 /31
Backend System Architecture • Feature Extraction Engine - Convert raw words into numeric data - Tools: JGAAP, Stanford POS tagger. 12 /31
Backend System Architecture • Feature Extraction Engine Example: “Sell as a great software engineering projct” - Convert raw words into numeric data Word Bigrams • - Tools: JGAAP, Stanford POS tagger. Sell as great software as a software engineering a great engineering projct Misspellings • project → projct: count = 1 Letter Bigrams • in: 2 ro: 1 ng: 2 ll: 1 … se: 1 ea: 1 13 /31
Backend System Architecture • Machine Learning Engine - Interprets/Classifies data - Tools: Weka, apache spark 14 /31
Backend System Architecture Image Source: Wikipedia • Machine Learning Engine - Interprets/Classifies data - Tools: Weka, apache spark 15 /31
Design & Construction • Open source development - Builds upon work from dozens of research students • Apache Spark machine learning library added • Refactoring - Separate each component into its own, independent module - Decouple third party library, WEKA 16 /31
Before 17 /31
After 18 /31
Refactoring Progress 19 /31
Design & Construction Cont. • Design Patterns - Creational: Builder (API), Singleton - Structural: Adapter (Machine Learning integration), Decorator (Feature Extraction Engine), Facade (API) • Testing - 76% of code touched was covered 20 /31
Machine Learning Adapter 21 /31
Feature Extraction Decorator 22 /31
Annotated Demo TEST YOUR OWN DOCUMENT 23 /31
24 /31
25 /31
26 /31
27 /31
28 /31
Impact • Enhance understanding of privacy vulnerabilities in a surveillance world • Education is enough to combat naïve attacks 1 • Evolving JStylo into the next stage of its lifecycle 1. https://www.princeton.edu/~aylinc/papers/Aylin_PETS12_anonymouth.pdf 29 /31
Software Evolution • Increased Ease of Extension - Decoupling / Modularization - Industry Standard Design Pattern - New Machine Learning libraries - Better dependency management / updatability • New methods of processing - Web front end - Cluster-computing - JSON API 1. https://www.princeton.edu/~aylinc/papers/Aylin_PETS12_anonymouth.pdf 30 /31
Software Evolution Cont. • Future Work - More machine learning libraries - Feature Extraction overhaul - Verification – solving new problems 1. https://www.princeton.edu/~aylinc/papers/Aylin_PETS12_anonymouth.pdf 31 /31
A Special Thanks To Our Sponser For sponsoring Worden’s server
Questions?
Recommend
More recommend