neural augmented static analysis of android communication
play

Neural-Augmented Static Analysis of Android Communication Jinman - PowerPoint PPT Presentation

Neural-Augmented Static Analysis of Android Communication Jinman Zhao , Aws Albarghouthi, Vaibhav Rastogi, Somesh Jha, Damien Octeau University of Wisconsin-Madison, Google Use machine learning Key Idea to refine results from static


  1. Neural-Augmented Static Analysis of Android Communication Jinman Zhao , Aws Albarghouthi, Vaibhav Rastogi, Somesh Jha, Damien Octeau University of Wisconsin-Madison, Google

  2. Use machine learning Key Idea to refine results from static analysis.

  3. Static Analysis: False Positives Program & Property Static Analyzer Must True Unsure... Must False False Positives Ranking problem

  4. Machine Learning to Augment Program & Property Static Analyzer Must True Likelihood ∈ [0, 1] Unsure... Must False Train Model Predict

  5. Link Inference for Android Communication Inter-Component Communication Links Program & Property Static Analyzer Must True May Must False Must True Likelihood ∈ [0, 1] Must False Links Links Links Train Model Predict

  6. Task Link Inference in Android Communication

  7. Android ICC: A User’s Experience (xxx) xxx-xxxx Restaurant Malicious APP 1234 Alice St. Orlando, FL Send a message I’d like to make a reservation ... Inter-Component Intent Component Communication w/ Filter

  8. Android ICC: An Example Code View (part of) the resolution logic Intent ICC link? Yes! Filter

  9. (Bigger part of) the resolution logic (Octeau et al., POPL’16)

  10. Previous Work: PRIMO PRIMO (Octeau et al., POPL’16) uses a hand-crafted ● probabilistic model that assigns probabilities to ICC links inferred by static analysis. Laborious, error-prone and requiring expert domain knowledge. ○ Difficulty catching up with constantly evolving Android system. ○

  11. Questions

  12. #1 How can we triage may links with minimal expert domain knowledge? Neural networks.

  13. #2 How can we process inputs of complex data types in a systematic way? Type-directed encoder.

  14. #3 How do our models perform? Very good!

  15. #4 Are the models learning the right things? Seems like so.

  16. We are not trying to… We are trying to… Propose new NN Propose systematic way ● ● module to construct NN Eliminate use of domain Provide decent ● ● knowledge performance without Rule out manual effort expert knowledge ● Use less labour with ● more automation

  17. How can we triage may Approach links with minimal expert domain knowledge? Part 1

  18. Link-Inference Neural Network LINN: An end-to-end encoder-and-classifier architecture. Must Train Model True Links [0,1] May Classifier Links Predict Encoder Encoder Must True Intent Filter Links

  19. How can we process inputs Approach of complex data types in a systematic way? Part 2

  20. Model [0,1] Classifier Encoder Encoder Intent Filter

  21. Type-Directed Encoder TDE: mapping type signature to neural network architecture. Rules Instan TDE Input Type TDE tiation Template Type signature Neural network Neural network template

  22. An example: Encoding Product Types Instance t := (a, b) t-en : R l Type T := tuple(A, B) encT comb R n ⨉ R m ➝ R l a-en : R n b-en : R m encA encB encA encB a : A b : A t : T

  23. Rules for type-directed encoding

  24. Android ICC: Our Abstraction intent Type signatures tuple Intent intent := tuple(act, cats) act cats Action act := optional(string) Categories cats := set(string) optional set Filter filter := tuple(acts, cats) string string Actions acts := set(string) list Categories cats := set(string) list char char

  25. Type-Directed Encoder intent-en intent comb act-en cats-en tuple act cats union aggr Rules optional set str-en str-en string string flat flat list char-en list char-en char char enum enum char char Type signature Neural network template

  26. Type-Directed Encoder: Instantiation intent-en comb TreeLSTM act-en cats-en union aggr switch TreeLSTM Instantiation str-en str-en flat flat CNN CNN char-en char-en enum enum lookup lookup char char Neural network Neural network template ( typed-tree )

  27. Type-Directed Encoder: Instantiation intent-en comb concat act-en cats-en union aggr switch max Instantiation str-en str-en flat flat RNN RNN char-en char-en enum enum lookup lookup char char Neural network Neural network template ( str-rnn )

  28. A systematic way to build and explore structured NN.

  29. Are our models correctly Experiments predicting links?

  30. Setup ● Dataset of 10,500 Android APPs from Google Play. ● IC3 (Octeau et al., ICSE’15) for static analysis. ● PRIMO’s abstract matching for may/must partition. ● Simulated ground truth for may links. ● 4 instantiations of the TDE architecture. # pairs # positive # negative training set 105,108 63,168 41,940 testing set 43,680 29,260 14,420

  31. All instantiated models perform as good as PRIMO.

  32. Correlation Our best model ( typed-tree ) fills the correlation gap by 72% compared to PRIMO despite the harder setting.

  33. More Results for Our Best Model ROC (left) and the distribution of predicted likelihood (right) from typed-tree model. Distribution Correlation

  34. How do we know the model Interpretability is learning the right thing?

  35. Sensitivity to Masking Picking distinctive values Ignoring less useful parts

  36. default Learned Encodings (.*) Semantically closer values receive more similar encodings. None Visualized by t-SNE.

  37. ● Neural-augmented static analysis ● Type-directed encoder Conclusion ● Increased accuracy with less domain knowledge ● Interpretability study

  38. ● Apply to other analysis tasks Future Works ● Push machine learning into static analysis procedure

  39. Thanks for listening! Q & A

Recommend


More recommend