a method to evaluate cfg comparison algorithms
play

A Method to Evaluate CFG Comparison Algorithms Patrick P.F. Chan - PowerPoint PPT Presentation

A Method to Evaluate CFG Comparison Algorithms Patrick P.F. Chan Christian Collberg Research problem Which CFG similarity algorithm is better? I come up with a new algorithm, how does it compare to the existing ones? Is there a


  1. A Method to Evaluate CFG Comparison Algorithms Patrick P.F. Chan Christian Collberg

  2. Research problem • Which CFG similarity algorithm is better? • I come up with a new algorithm, how does it compare to the existing ones? • Is there a systematic way to compare CFG similarity algorithms?

  3. Research outcomes • A methodology to evaluate and compare CFG similarity algorithms • Comparison results of four CFG similarity algorithms • A survey of existing CFG similarity algorithms • A publicly available evaluation framework

  4. What is CFG? • CFG stands for c ontrol- f low g raph • A CFG represents all possible execution paths of a function • And thus, it encodes its behavior

  5. Entry a = input() if a % 2 == 0 True False print print “odd” “even” Exit

  6. Why do we compare CFGs?

  7. Why do we compare CFGs? • Malware detection / classification CFGs of malware Match

  8. Why do we compare CFGs? • Software theft detection How similar? Original software Suspected pirated software

  9. Why do we compare CFGs? • Programming assignments grading How similar? Assignment Submission Solution

  10. Why do we compare CFGs? • Code clones detection How similar?

  11. Why do we compare CFGs? • Detection of changes between different versions of a program

  12. Why do we compare CFGs? • Detection of changes between different versions of a program Match the nodes of the enhanced CFGs

  13. This leads to many algorithms to compare CFGs…

  14. Let’s use two existing algorithms to compare these two CFGs 1 1 2 3 2 3 4 4 5 CFG A CFG B

  15. Algorithm 1 from Kruegel et al. • Extract subgraphs that have k nodes (k-subgraphs) from CFGs and match them

  16. 1 1 2 3 2 3 4 4 5 CFG A CFG B 1 1 1 1 1 2 2 2 2 2 3 2 3 4 5 4 5 4 No match!

  17. Algorithm 2 from Hu et al. • Approximates the minimum number of edit operations needed to transform one graph into another graph

  18. 1 1 2 3 2 3 4 4 5 CFG A CFG B Cost of matching nodes Cost of deleting nodes in CFG A Cost of matching node 1 of CFG A to node 1 of Cost of deleting CFG B node 4 of CFG B Cost of deleting node 1 of CFG B Cost of matching dummy Cost of deleting nodes in CFG B nodes

  19. 1 1 2 3 2 3 4 4 5 CFG A CFG B Total cost = 5

  20. And there are many other algorithms… • Algorithm from Vujosˇevic ́ -Janicˇic ́ et al. iteratively builds a similarity matrix between the nodes of the two CFGs, based on the similarity of their neighbor • Algorithm from Sokolsky et al. models the control flow graphs using Labeled Transition Systems (LTS)

  21. But which one is the best?

  22. Evaluation of CFG similarity algorithms • Start by generating CFGs G 1 , G 2 ,...,G i with increasing edit distances with respect to a seed CFG G 0 • i.e. ED(G 0 ,G i ) = i • Use the algorithm under evaluation to rank the CFGs such that the higher is the similarity score between G i and G 0 given by that algorithm, the higher G i is ranked • Get a “goodness score” for the algorithm by comparing the ranking it produces to the ground truth ⟨ G 1 , G 2 , G 3 ,... ⟩ , using ranking correlation algorithms such as sortedness or Pearson correlation

  23. Example G 0

  24. Example G 1 ED = 1 G 0 ED = 2 ED = 3 G 2 G 3

  25. Example G 1 ED = 1 G 0 ED = 2 ED = 3 G 2 G 3 Ranking: ⟨ G 1 , G 2 , G 3 ⟩

  26. Example G 1 G 1 Sim A = 0.4 ED = 1 G 0 G 0 Sim A = 0.8 ED = 2 Sim A = 0.1 G 3 ED = 3 G 2 G 2 G 3 Ranking: ⟨ G 1 , G 2 , G 3 ⟩

  27. Example G 1 G 1 Sim A = 0.4 ED = 1 G 0 G 0 Sim A = 0.8 ED = 2 Sim A = 0.1 G 3 ED = 3 G 2 G 2 G 3 Ranking: ⟨ G 1 , G 2 , G 3 ⟩ Ranking: ⟨ G 3 , G 1 , G 2 ⟩

  28. Example G 1 G 1 Sim A = 0.4 ED = 1 G 0 G 0 Sim A = 0.8 ED = 2 Sim A = 0.1 G 3 ED = 3 G 2 G 2 G 3 Pearson correlation = -0.5 Ranking: ⟨ G 1 , G 2 , G 3 ⟩ Ranking: ⟨ G 3 , G 1 , G 2 ⟩

  29. Two questions remain… 1. What is the definition of the edit distance between two CFGs? 2. How to generate those CFGs such that they have increasing edit distances with the seed CFG G 0 ?

  30. What is the definition of the edit distance between two CFGs? • The Graph Edit Distance is a function ED : (G i , G j ) → N that computes the smallest number of edit operations needed to transform Gi into Gj. • There are four possible edit operations

  31. What is the definition of the edit distance between two CFGs? • Add a zero-degree node 1 1 2 3 2 3 4

  32. What is the definition of the edit distance between two CFGs? • Add an edge between two existing nodes 1 1 2 3 2 3 4 4

  33. What is the definition of the edit distance between two CFGs? • Delete an edge between two existing nodes 1 1 2 3 2 3 4 4

  34. What is the definition of the edit distance between two CFGs? • Delete a zero-degree node 1 1 2 3 2 4 4

  35. How to generate those CFGs such that they have increasing edit distances with the seed CFG G0? a b c d G 0

  36. How to generate those CFGs such that they have increasing edit distances with the seed CFG G0? a b c d Add Delete Node Edge Add Add Edge Edge a a a a e b c b c b c b c d d d d For every possible edit operation that can be applied to G 0 , apply that and generate a new graph

  37. How to generate those CFGs such that they have increasing edit distances with the seed CFG G0? Obtain the Do the same for the Edit Distance Graph (EDG) newly generated graphs

  38. How to generate those CFGs such that they have increasing edit distances with the seed CFG G0? a b c d Add Delete Node Edge Add Add Edge Edge a a a a e b c b c b c b c d d d d Add Add Edge Edge a Randomly pick a CFG on b c each level and they d become our G 1 , G 2 , G 3 ,…

  39. Implementation • Re-coded four CFG similarity algorithms in Python • Implemented the evaluation framework • Generated an EDG with five levels • Picked 100 test cases (each test case comprises five CFGs)

  40. Evaluation results

  41. Evaluation results

  42. Evaluation results

  43. Evaluation results

  44. Evaluation results “Goodness score” statistics of the four algorithms

  45. Evaluation results Time used by the four algorithms to finish 100 test cases

  46. Related work • An evaluation framework for text plagiarism detection • Generate artificial plagiarism cases • Shuffling, removing, inserting, or replacing words or short phrases at random

  47. Related work • An evaluation framework for code clone detection tools • Inject mutated code fragments into the code base

  48. Future work • Generate CFGs with instructions in the nodes Edit instructions => huge EDG

  49. Try our framework • http://cfgsim.cs.arizona.edu/ • Evaluate existing algorithms • Compare your own algorithm with the others • Fine tune your algorithm

  50. Summary • A methodology to evaluate CFG similarity algorithms • Publicly available evaluation framework • Serves as a benchmark for CFG similarity algorithms users / researchers

  51. Thank you!

Recommend


More recommend