Graph Matching Networks for Learning the Similarity of Graph Structured Objects Yujia Li, Chenjie Gu, Thomas Dullien*, Oriol Vinyals, Pushmeet Kohli *
Graph structured data appear in many applications Molecules Scene Graphs* Programs** Binaries Image credit: *Johnson et al. Image Retrieval using Scene Graphs. **Brockschmidt et al. Generative Code Modeling with Graphs Graph Matching Networks — Yujia Li
Graph structured data appear in many applications Molecules Scene Graphs* Programs** Binaries Sofuware Semantic Image Drug Discovery Code Search Vulnerabilities Retrieval Image credit: *Johnson et al. Image Retrieval using Scene Graphs. **Brockschmidt et al. Generative Code Modeling with Graphs Graph Matching Networks — Yujia Li
Finding similar graphs Graph structures vary a lot Nodes and edges can have atuributes Reasoning about both the graph structure and the semantics The notion of “similarity” varies across Query Graph problems Candidate Graphs Graph Matching Networks — Yujia Li
The binary function similarity search problem contains vulnerability? 00000000: 7f45 4c46 0201 0100 .ELF.... 00000008: 0000 0000 0000 0000 ........ 00000010: 0300 3e00 0100 0000 ..>..... 00000018: 4005 0000 0000 0000 @....... 00000020: 4000 0000 0000 0000 @....... 00000028: 7819 0000 0000 0000 x....... 00000030: 0000 0000 4000 3800 ....@.8. EXE 00000038: 0900 4000 1e00 1d00 ..@..... 00000040: 0600 0000 0400 0000 ........ 00000048: 4000 0000 0000 0000 @....... 00000050: 4000 0000 0000 0000 @....... Graph Matching Networks — Yujia Li
The binary function similarity search problem push %rbp mov %rsp,%rbp mov %edi,-0x4(%rbp) binary analysis contains vulnerability? cmpl $0x0,-0x4(%rbp) jle 660 <f+0x16> 000000000000064a <f>: 00000000: 7f45 4c46 0201 0100 .ELF.... 64a: 55 push %rbp 00000008: 0000 0000 0000 0000 ........ 64b: 48 89 e5 mov %rsp,%rbp 00000010: 0300 3e00 0100 0000 ..>..... 64e: 89 7d fc mov %edi,-0x4(%rbp) 00000018: 4005 0000 0000 0000 @....... 651: 83 7d fc 00 cmpl $0x0,-0x4(%rbp) 00000020: 4000 0000 0000 0000 @....... 655: 7e 09 jle 660 <f+0x16> 00000028: 7819 0000 0000 0000 x....... 657: 8b 45 fc mov -0x4(%rbp),%eax 00000030: 0000 0000 4000 3800 ....@.8. 65a: 0f af 45 fc imul -0x4(%rbp),%eax EXE 00000038: 0900 4000 1e00 1d00 ..@..... 65e: eb 06 jmp 666 <f+0x1c> 00000040: 0600 0000 0400 0000 ........ mov -0x4(%rbp),%eax 660: 8b 45 fc mov -0x4(%rbp),%eax mov -0x4(%rbp),%eax 00000048: 4000 0000 0000 0000 @....... add $0x1,%eax 663: 83 c0 01 add $0x1,%eax imul -0x4(%rbp),%eax 00000050: 4000 0000 0000 0000 @....... 666: 5d pop %rbp jmp 666 <f+0x1c> 667: c3 retq pop %rbp retq graph sizes in our dataset: from 10 to 10 3 Graph Matching Networks — Yujia Li
The binary function similarity search problem search in a library of binaries with known vulnerabilities similar contains binary vulnerability? analysis 00000000: 7f45 4c46 0201 0100 .ELF.... 00000008: 0000 0000 0000 0000 ........ 00000010: 0300 3e00 0100 0000 ..>..... 00000018: 4005 0000 0000 0000 @....... 00000020: 4000 0000 0000 0000 @....... not similar 00000028: 7819 0000 0000 0000 x....... 00000030: 0000 0000 4000 3800 ....@.8. EXE 00000038: 0900 4000 1e00 1d00 ..@..... 00000040: 0600 0000 0400 0000 ........ 00000048: 4000 0000 0000 0000 @....... 00000050: 4000 0000 0000 0000 @....... Graph Matching Networks — Yujia Li
The binary function similarity search problem search in a library of binaries with known vulnerabilities similar contains binary vulnerability? analysis 00000000: 7f45 4c46 0201 0100 .ELF.... 00000008: 0000 0000 0000 0000 ........ 00000010: 0300 3e00 0100 0000 ..>..... 00000018: 4005 0000 0000 0000 @....... 00000020: 4000 0000 0000 0000 @....... not similar 00000028: 7819 0000 0000 0000 x....... 00000030: 0000 0000 4000 3800 ....@.8. EXE 00000038: 0900 4000 1e00 1d00 ..@..... 00000040: 0600 0000 0400 0000 ........ 00000048: 4000 0000 0000 0000 @....... 00000050: 4000 0000 0000 0000 @....... Graph Matching Networks — Yujia Li
Most existing approaches Mostly hand-engineered algorithms / heuristics with limited learning: Graph hashes (graph → descriptor) : widely used in security applications - human-designed hash functions that encode graph structure - good at exact matches, not so good at estimating similarity Graph kernels (pair of graphs → similarity) : popular in various graph-level prediction tasks - human-designed kernels as a measure of similarity between graphs - the design of kernels is imporuant for pergormance Graph Matching Networks — Yujia Li
Difgerent graph similarity estimation paradigms Graph matching Graph embedding Graph → descriptor Compute distance jointly on the pair of graphs Measure distance on descriptors More computation for betuer accuracy Fast hashing based retrieval Graph Matching Networks — Yujia Li
Graph similarity learning Learn a similarity (or distance) function d( , ) → small d( , ) → large Graph Matching Networks — Yujia Li
Graph similarity learning Supervised learning on labeled pairs Learn a similarity (or distance) function or triplets d( , ) → small t = +1 ⇒ G 1 , G 2 similar ⇒ d(G 1 , G 2 ) ↙ t = -1 ⇒ G 1 , G 2 not similar ⇒ d(G 1 , G 2 ) ↗ d( , ) → large G 1 , G 2 similar, G 1 , G 3 not similar ⇒ d(G 1 , G 2 ) ↙ d(G 1 , G 3 ) ↗ Graph Matching Networks — Yujia Li
Learning graph embeddings with Graph Neural Nets d(G 1 , G 2 ) = Euclidean/Hamming distance( embed (G 1 ), embed (G 2 )) Graph Matching Networks — Yujia Li
Learning graph embeddings with Graph Neural Nets d(G 1 , G 2 ) = Euclidean/Hamming distance( embed (G 1 ), embed (G 2 )) embed( ) = Aggregate over Graph Input Graph Message Passing Graph Matching Networks — Yujia Li
Graph embedding model details Messages: Aggregation: Node updates: sum pooling, atuention pooling etc. Graph Matching Networks — Yujia Li
Graph Matching Networks h 1 , h 2 = embed-and-match (G 1 , G 2 ) d(G 1 , G 2 ) = Euclidean/Hamming distance(h 1 , h 2 ) Graph Matching Networks — Yujia Li
Graph Matching Networks h 1 , h 2 = embed-and-match (G 1 , G 2 ) d(G 1 , G 2 ) = Euclidean/Hamming distance(h 1 , h 2 ) Atuention: Weighted difgerence: Graph Matching Networks — Yujia Li
Graph Matching Networks h 1 , h 2 = embed-and-match (G 1 , G 2 ) d(G 1 , G 2 ) = Euclidean/Hamming distance(h 1 , h 2 ) Total cross-graph message Efgectively: match node i to the closest node in the other graph and take the difgerence. Graph Matching Networks — Yujia Li
Other variants Other variants of GNNs for embedding: - e.g. Graph Convolutional Networks (GCNs), which is a simpler variant without modeling edge features Siamese networks: - instead of using Euclidean or Hamming distance, learn a distance score through a neural net - d(G 1 , G 2 ) = MLP (concat( embed (G 1 ), embed (G 2 ))) - learn the embedding model and the scoring MLP jointly Graph Matching Networks — Yujia Li
Graph Embedding Similarity score Siamese Network Similarity score Graph Matching Similarity score Graph Matching Networks — Yujia Li
Experiments Graph edit distance Control-fmow graph based Mesh graph retrieval learning binary function similarity search Data : Data : mesh graphs for 100 object synthetic graphs Data : classes (COIL-DEL dataset) compile fgmpeg with Similarity : difgerent compilers and Similarity : small edit distance → similar optimization levels . mesh for the same object class → similar Similarity : binary functions associated with the same original function → similar Graph Matching Networks — Yujia Li
Synthetic task: graph edit distance learning Training and evaluating on graphs of size n, and edge density (probability) p Measuring pair classifjcation AUC / triplet prediction accuracy . Learned models do betuer than WL kernel. Matching model betuer than embedding model. Graph Matching Networks — Yujia Li
Results on binary function similarity search Hand-engineered baseline (graph hashing + locality sensitive hashing) vs GNN embedding vs GMN . Graph topology only vs jointly over structures and features . Graph Matching Networks — Yujia Li
Results on binary function similarity search 1) learned approaches betuer than hand-engineered solution 2) matching betuer than embedding alone 3) joint modeling of structure and features betuer than structure alone 4) pergormance betuer with more graph propagation steps Graph Matching Networks — Yujia Li
More ablation studies GMNs consistently betuer than alternatives. Siamese vs matching : fusing two graphs early betuer than only at the end. Graph Matching Networks — Yujia Li
Learned atuention patuerns We never supervise the cross-graph atuention, but the model still learns some interesting atuention patuerns. Graph Matching Networks — Yujia Li
Learned atuention patuerns When the two graphs are identical, the learned atuention patuern may (not always) correspond to node matching. Afuer 10 message passing steps Model trained on the edit distance learning task. Graph Matching Networks — Yujia Li
Recommend
More recommend