IEEE S&P 2017 VUDDY: A Scalable Approach for Vulnerable Code Clone Detection Seulbae Kim , Seunghoon Woo, Heejo Lee, and Hakjoo Oh Korea University May 23, 2017
Question • Number of unpatched vulnerabilities in smartphone firmware’s source code? 200+ unpatched vulnerable code clones detected! Computer & Communication Security Lab., Korea University 1
Motivation • Number of open source software is increasing Computer & Communication Security Lab., Korea University 2
Motivation • Code clones – reused code fragments • Major cause of vulnerability propagation CVE-2016-5195 Computer & Communication Security Lab., Korea University 3
Problem: Scalable & Accurate Vulnerable Code Clone Discovery Computer & Communication Security Lab., Korea University 4
Scalable & Accurate Vulnerable Code Clone discovery • Scalability Software systems are getting bigger Linux kernel – 25.4 MLoC accuracy “L” Smart TV – 35 MLoC scalability Computer & Communication Security Lab., Korea University 5
Scalable & Accurate Vulnerable Code Clone discovery • Accuracy scalability FP == increased time and efforts accuracy Computer & Communication Security Lab., Korea University 6
Scalable & Accurate Vulnerable Code Clone discovery • Previous approaches accuracy Line-level Token-level matching matching Jang et al., Kamiya et al., Graph/tree ReDeBug (S&P’12) CCFinder (TSE’02) matching Bag-of-tokens Jiang et al ., (ICSE’07) matching Sasaki et al., Sajnani et al., FCFinder (MSR’10) SourcererCC (ICSE’16) File-level matching scalability Computer & Communication Security Lab., Korea University 7
Scalable & Accurate Vulnerable Code Clone discovery • Goal accuracy ? Line-level Token-level matching matching Jang et al., Kamiya et al., Graph/tree ReDeBug (S&P’12) CCFinder (TSE’02) matching Bag-of-tokens Jiang et al ., (ICSE’07) matching Sasaki et al., Sajnani et al., FCFinder (MSR’10) SourcererCC (ICSE’16) File-level matching scalability Computer & Communication Security Lab., Korea University 8
Proposed Method: VUDDY Computer & Communication Security Lab., Korea University 9
Demonstration of VUDDY Computer & Communication Security Lab., Korea University 10
Proposed method: VUDDY • VUDDY: VUlnerable coDe clone DiscoverY Computer & Communication Security Lab., Korea University 11
Proposed method: VUDDY • VUDDY: VUlnerable coDe clone DiscoverY • Searches for vulnerable code clones Computer & Communication Security Lab., Korea University 12
Proposed method: VUDDY • VUDDY: VUlnerable coDe clone DiscoverY • Searches for vulnerable code clones • Scales beyond 1 BLoC target Computer & Communication Security Lab., Korea University 13
Proposed method: VUDDY • VUDDY: VUlnerable coDe clone DiscoverY • Searches for vulnerable code clones • Scales beyond 1 BLoC target • Detects both known & unknown vulnerability Computer & Communication Security Lab., Korea University 14
Proposed method: VUDDY • VUDDY: VUlnerable coDe clone DiscoverY • Searches for vulnerable code clones • Scales beyond 1 BLoC target • Detects both known & unknown vulnerability • Low false positive rate Computer & Communication Security Lab., Korea University 15
Proposed method: VUDDY • Overview fingerprinting dictionary vulnerable functions fingerprint dictionary comparison vulnerable of vulnerable functions code clones fingerprinting A Program a target program fingerprint dictionary of target functions Computer & Communication Security Lab., Korea University 16
Collecting vulnerable code • Vulnerability patching Old code New code CVE patch (vulnerable) (fixed) Computer & Communication Security Lab., Korea University 17
Collecting vulnerable code • Reconstructing vulnerability from security patch Old code Software repository CVE patch (vulnerable) Computer & Communication Security Lab., Korea University 18
Fingerprinting a program A Program Computer & Communication Security Lab., Korea University 19
Fingerprinting a program 1. Retrieve all functions from a program int sum (int a, int b) { return a + b; } void increment() { int num = 80; A Program num++; // no return } void printer (char* src) { printf(“%s”, src); } Computer & Communication Security Lab., Korea University 20
Fingerprinting a program 2. Apply abstraction and normalization to functions int sum (int a, int b) { returnfparam+fparam; return a + b; } void increment() { int num = 80; dtypelvar=80;lvar++; A Program num++; // no return } void printer (char* src) { funccall (“%s”, fparam); printf (“%s”, src); } Computer & Communication Security Lab., Korea University 21
Fingerprinting a program 3. Compute length and hash value int sum (int a, int b) length : 20 { returnfparam+fparam; return a + b; hash val: C94D9910… } void increment() { length : 20 int num = 80; dtypelvar=80;lvar++; A Program hash val: D6E77882… num++; // no return } void printer (char* src) length : 23 { funccall (“%s”, fparam); printf (“%s”, src); hash val: 9A45E4A1… } Computer & Communication Security Lab., Korea University 22
Fingerprinting a program 4. Store in a dictionary length : 20 hash val: C94D9910… “Fingerprint dictionary” 20: [C94D9910, D6E77882] length : 20 A Program hash val: D6E77882… 23: [9A45E4A1] length : 23 hash val: 9A45E4A1… Computer & Communication Security Lab., Korea University 23
Abstraction • Transform function by replacing • Formal parameters Level 0: No abstraction • Data types 1 void avg (float arr [], int len ) { 2 static float sum = 0; • Local variables 3 unsigned int i; 4 • Function names 5 for (i = 0; i < len ; i++) { 6 sum += arr [i]; 7 } 8 9 printf (“%f %d \ n”, sum/ len , validate (sum)); 10 } Computer & Communication Security Lab., Korea University 24
Abstraction • Transform function by replacing • Formal parameters Level 1: Formal parameter abstraction 1 void avg (float FPARAM [], int FPARAM ) { • Data types 2 static float sum = 0; • Local variables 3 unsigned int i; 4 • Function names 5 for (i = 0; i < FPARAM ; i++) { 6 sum += FPARAM [i]; 7 } 8 9 printf (“%f %d \ n”, sum/ FPARAM , validate (sum)); 10 } Computer & Communication Security Lab., Korea University 25
Abstraction • Transform function by replacing • Formal parameters Level 2: Local variable name abstraction 1 void avg (float FPARAM[], int FPARAM) { • Data types 2 static float LVAR = 0; • Local variables 3 unsigned int LVAR ; 4 • Function names 5 for ( LVAR = 0; LVAR < FPARAM; LVAR ++) { 6 LVAR += FPARAM[ LVAR ]; 7 } 8 9 printf (“%f %d \ n”, LVAR /FPARAM, validate ( LVAR )); 10 } Computer & Communication Security Lab., Korea University 26
Abstraction • Transform function by replacing • Formal parameters Level 3: Data type abstraction 1 DTYPE avg ( DTYPE FPARAM[], DTYPE FPARAM) { • Data types 2 DTYPE LVAR = 0; • Local variables 3 unsigned DTYPE LVAR; 4 • Function names 5 for (LVAR = 0; LVAR < FPARAM; LVAR ++) { 6 LVAR += FPARAM[LVAR]; 7 } 8 9 printf (“%f %d \ n”, LVAR/FPARAM, validate (LVAR)); 10 } Computer & Communication Security Lab., Korea University 27
Abstraction • Transform function by replacing • Formal parameters Level 4: Function call abstraction 1 DTYPE avg (DTYPE FPARAM[], DTYPE FPARAM) { • Data types 2 DTYPE LVAR = 0; • Local variables 3 unsigned DTYPE LVAR; 4 • Function names 5 for (LVAR = 0; LVAR < FPARAM; LVAR ++) { 6 LVAR += FPARAM[LVAR]; 7 } 8 9 FUNCCALL (“%f %d \ n”, LVAR/FPARAM, FUNCCALL (LVAR)); 10 } Computer & Communication Security Lab., Korea University 28
Normalization • Remove • comments 1 DTYPE avg (DTYPE FPARAM[], DTYPE FPARAM) { • tabs 2 DTYPE LVAR = 0; • white spaces 3 unsigned DTYPE LVAR; 4 • CRLF 5 for (LVAR = 0; LVAR < FPARAM; LVAR ++) { 6 LVAR += FPARAM[LVAR]; • Convert into lowercase 7 } 8 9 FUNCCALL (“%f %d \ n”, LVAR/FPARAM, FUNCCALL (LVAR)); 10 } dtypelvar=0;unsigneddtypelvar;for(lvar=0;lvar<fparam;lvar++){lvar+=fparam[lvar];} funccall (“% f %d\n ”, lvar/fparam, funccall (lvar)); Computer & Communication Security Lab., Korea University 29
Vulnerable code clone detection • By comparing two fingerprint dictionaries repository fingerprint dictionary of vulnerable functions Computer & Communication Security Lab., Korea University 30
Vulnerable code clone detection • By comparing two fingerprint dictionaries repository fingerprint dictionary of vulnerable functions target program fingerprint dictionary of target functions Computer & Communication Security Lab., Korea University 31
Vulnerable code clone detection • By comparing two fingerprint dictionaries 20: [ABCDEF01, C94D9910] 21: [D155F630] 22: [C67F45FD, DDBF3838] repository fingerprint dictionary of vulnerable functions 20: [C94D9910, D6E77882] 23: [9A45E4A1] target program fingerprint dictionary of target functions Computer & Communication Security Lab., Korea University 32
Recommend
More recommend