Our contributions 2. Self-supervised learning Collect unlabeled programs ○ Corrupt and get diagnostic feedback (e.g. run compiler) ○ ⇒ Extra training data : <broken code, feedback, fixed code> Working code Corrupted 5 int i, n; 5 int i, n; corrupt compiler 6 string A; 6 char A; Error! 7 cin >> n; 7 cin >> n . line 7: expected ‘;’ 8 A.resize(n); 8 A.resize(n); 9 for (i=0;i<n;i++){ 9 for (i=0;i<n;i++){ 10 cin >> A[i]; 10 cin >> A[j]; 11 cout << i; } 11 cout << i; } 21
Our results Improved performance on two applications DeepFix: correct intro programming assignments in C ● SPoC: correct output of C++ program synthesis ● DeepFix Test SPoC TestP 22
Outline Innovations ● 1. Reasoning via program-feedback graph 2. Self-supervised learning Evaluations ● 1. DeepFix 2. SPoC Analysis & Examples ● Takeaways ● 23
1. Reasoning via program-feedback graph 24
1. Reasoning via program-feedback graph Challenges How to connect two modalities: program and feedback ? ● How to model the reasoning of repair (e.g. tracking symbols)? ● ? Source code 4 int main() { Compiler message 5 char tmp, a, b; request for 6 map<string,int> mp; member ‘size ’ in 7 cin >> a >> b; ‘a’, which is of 8 int i, j; non-class type 9 for (i = 0; i < a .size () ... ‘char ’ 10 tmp.push_back(a [i]); 11 string tmp1 = tmp; 25
1. Reasoning via program-feedback graph Our solution: program-feedback graph Source code 4 int main() { Compiler message 5 char tmp, a, b; char a b request for 6 map<string,int> mp; size member ‘size ’ in a 7 cin >> a >> b; b ‘a’, which is of ‘a’ 8 int i, j; non-class type 9 for (i = 0; i < a .size () ... a size ‘Char ’ char 10 tmp.push_back(a [i]); a 11 string tmp1 = tmp; 26
1. Reasoning via program-feedback graph Our solution: program-feedback graph Join program & feedback through symbols relevant to program repair ● → shared/abstracted semantic space Source code 4 int main() { Compiler message 5 char tmp, a, b; char a b request for 6 map<string,int> mp; size member ‘size ’ in a 7 cin >> a >> b; b ‘a’, which is of ‘a’ 8 int i, j; non-class type 9 for (i = 0; i < a .size () ... a size ‘Char ’ char 10 tmp.push_back(a [i]); a 11 string tmp1 = tmp; 27
1. Reasoning via program-feedback graph Our solution: program-feedback graph Join program & feedback through symbols relevant to program repair ● → shared/abstracted semantic space Reason over this space using graph attention ● Source code 4 int main() { Compiler message 5 char tmp, a, b; char a b request for 6 map<string,int> mp; size member ‘size ’ in a 7 cin >> a >> b; b ‘a’, which is of ‘a’ 8 int i, j; non-class type 9 for (i = 0; i < a .size () ... a size ‘Char ’ char 10 tmp.push_back(a [i]); a 11 string tmp1 = tmp; 28
1. Reasoning via program-feedback graph How to construct graph? Recognize token types ● Source code 4 int main() { Compiler message 5 char tmp, a, b; request for 6 map<string,int> mp; member ‘size ’ in 7 cin >> a >> b; ‘a’, which is of 8 int i, j; non-class type 9 for (i = 0; i < a .size () ... ‘char ’ 10 tmp.push_back(a [i]); 11 string tmp1 = tmp; 29
1. Reasoning via program-feedback graph How to construct graph? Recognize token types ● <data type> Source code 4 int main() { Compiler message 5 char tmp, a, b; request for 6 map<string,int> mp; member ‘size ’ in 7 cin >> a >> b; ‘a’, which is of 8 int i, j; non-class type 9 for (i = 0; i < a .size () ... ‘char ’ 10 tmp.push_back(a [i]); 11 string tmp1 = tmp; 30
1. Reasoning via program-feedback graph How to construct graph? Recognize token types ● <data type> <identifier> Source code 4 int main() { Compiler message 5 char tmp, a, b; request for 6 map<string,int> mp; member ‘size ’ in 7 cin >> a >> b; ‘a’, which is of 8 int i, j; non-class type 9 for (i = 0; i < a .size () ... ‘char ’ 10 tmp.push_back(a [i]); 11 string tmp1 = tmp; 31
1. Reasoning via program-feedback graph How to construct graph? Recognize token types ● <data type> <identifier> Source code 4 int main() { Compiler message 5 char tmp, a, b; request for 6 map<string,int> mp; member ‘size ’ in 7 cin >> a >> b; ‘a’, which is of 8 int i, j; non-class type 9 for (i = 0; i < a .size () ... ‘char ’ 10 tmp.push_back(a [i]); 11 string tmp1 = tmp; 32
1. Reasoning via program-feedback graph How to construct graph? Recognize token types ● <data type> <identifier> Source code 4 int main() { Compiler message 5 char tmp, a, b; request for 6 map<string,int> mp; member ‘size ’ in 7 cin >> a >> b; ‘a’, which is of 8 int i, j; non-class type 9 for (i = 0; i < a .size () ... ‘char ’ 10 tmp.push_back(a [i]); 11 string tmp1 = tmp; 33
1. Reasoning via program-feedback graph How to construct graph? Recognize token types ● <data type> <identifier> Source code 4 int main() { Compiler message 5 char tmp, a, b; <operator> request for 6 map<string,int> mp; member ‘size ’ in 7 cin >> a >> b; ‘a’, which is of 8 int i, j; non-class type 9 for (i = 0; i < a .size () ... ‘char ’ 10 tmp.push_back(a [i]); 11 string tmp1 = tmp; 34
1. Reasoning via program-feedback graph How to construct graph? Recognize token types ● <data type> <identifier> Source code 4 int main() { Compiler message 5 char tmp, a, b; <operator> request for 6 map<string,int> mp; member ‘ size ’ in 7 cin >> a >> b; ‘ a ’ , which is of 8 int i, j; non-class type 9 for (i = 0; i < a .size () ... ‘ char ’ 10 tmp.push_back(a [i]); <diagnostic argument> 11 string tmp1 = tmp; 35
1. Reasoning via program-feedback graph How to construct graph? Recognize token types ● <data type> <identifier> Source code 4 int main() { Compiler message 5 char tmp, a, b; <operator> request for 6 map<string,int> mp; member ‘ size ’ in 7 cin >> a >> b; ‘ a ’ , which is of 8 int i, j; non-class type 9 for (i = 0; i < a .size () ... ‘ char ’ 10 tmp.push_back(a [i]); <diagnostic argument> 11 string tmp1 = tmp; 36
1. Reasoning via program-feedback graph How to construct graph? Recognize token types ● <data type> <identifier> Source code 4 int main() { Compiler message 5 char tmp, a, b; <operator> request for 6 map<string,int> mp; member ‘ size ’ in 7 cin >> a >> b; ‘ a ’ , which is of 8 int i, j; non-class type 9 for (i = 0; i < a .size () ... ‘ char ’ 10 tmp.push_back(a [i]); <diagnostic argument> 11 string tmp1 = tmp; 37
1. Reasoning via program-feedback graph How to construct graph? Recognize token types ● Nodes : diagnostic arguments ● Source code 4 int main() { Compiler message 5 char tmp, a, b; request for 6 map<string,int> mp; size member ‘size ’ in 7 cin >> a >> b; ‘a’, which is of ‘a’ 8 int i, j; non-class type 9 for (i = 0; i < a .size () ... char ‘Char ’ 10 tmp.push_back(a [i]); 11 string tmp1 = tmp; 38
1. Reasoning via program-feedback graph How to construct graph? Recognize token types ● Nodes : diagnostic arguments, their occurences ● Source code 4 int main() { Compiler message 5 char tmp, a, b; a request for 6 map<string,int> mp; size member ‘size ’ in a 7 cin >> a >> b; ‘a’, which is of ‘a’ 8 int i, j; non-class type 9 for (i = 0; i < a .size () ... a char ‘Char ’ 10 tmp.push_back(a [i]); a 11 string tmp1 = tmp; 39
1. Reasoning via program-feedback graph How to construct graph? Recognize token types ● Nodes : diagnostic arguments, their occurences ● Source code 4 int main() { Compiler message 5 char tmp, a, b; char a request for 6 map<string,int> mp; size member ‘size ’ in a 7 cin >> a >> b; ‘a’, which is of ‘a’ 8 int i, j; non-class type 9 for (i = 0; i < a .size () ... a char ‘Char ’ 10 tmp.push_back(a [i]); a 11 string tmp1 = tmp; 40
1. Reasoning via program-feedback graph How to construct graph? Recognize token types ● Nodes : diagnostic arguments, their occurences ● Source code 4 int main() { Compiler message 5 char tmp, a, b; char a request for 6 map<string,int> mp; size member ‘size ’ in a 7 cin >> a >> b; ‘a’, which is of ‘a’ 8 int i, j; non-class type 9 for (i = 0; i < a .size () ... a size char ‘Char ’ 10 tmp.push_back(a [i]); a 11 string tmp1 = tmp; 41
1. Reasoning via program-feedback graph How to construct graph? Recognize token types ● Nodes : diagnostic arguments, their occurences, and all identifiers ● Source code 4 int main() { Compiler message 5 char tmp, a, b; char a b request for 6 map<string,int> mp; size member ‘size ’ in a 7 cin >> a >> b; b ‘a’, which is of ‘a’ 8 int i, j; non-class type 9 for (i = 0; i < a .size () ... a size char ‘Char ’ 10 tmp.push_back(a [i]); a 11 string tmp1 = tmp; 42
1. Reasoning via program-feedback graph How to construct graph? Recognize token types ● Nodes : diagnostic arguments, their occurences, and all identifiers ● Edges : connect identical tokens to capture semantic correspondence ● Source code 4 int main() { Compiler message 5 char tmp, a, b; char a b request for 6 map<string,int> mp; size member ‘size ’ in a 7 cin >> a >> b; b ‘a’, which is of ‘a’ 8 int i, j; non-class type 9 for (i = 0; i < a .size () ... a size char ‘Char ’ 10 tmp.push_back(a [i]); a 11 string tmp1 = tmp; 43
1. Reasoning via program-feedback graph Model Initial encoding ● Graph attention ● Recontextualization ● Decoding ● 44
1. Reasoning via program-feedback graph Model (Initial encoding) Source code 1 int main() { 2 char tmp, a, b; 3 map<string,int> mp; ... Compiler message 9: request for member ‘size ’ … 45
1. Reasoning via program-feedback graph Model (Initial encoding) Source code 1 int main() { 2 char tmp, a, b; 3 map<string,int> mp; ... x 11 x 12 x 13 Compiler message Line 1 9: request for member ‘size ’ … 46
1. Reasoning via program-feedback graph Model (Initial encoding) Source code 1 int main() { 2 char tmp, a, b; 3 map<string,int> mp; ... x 11 x 12 x 13 x 11 x 12 x 13 Compiler message Line 1 Line 2 9: request for member ‘size ’ … 47
1. Reasoning via program-feedback graph Model (Initial encoding) Source code 1 int main() { 2 char tmp, a, b; 3 map<string,int> mp; ... x 11 x 12 x 13 x 11 x 12 x 13 x 11 x 12 x 13 Compiler message Line 1 Line 2 Line 3 9: request for member Source code ‘size ’ … 48
1. Reasoning via program-feedback graph Model (Initial encoding) Source code 1 int main() { 2 char tmp, a, b; 3 map<string,int> mp; ... x 11 x 12 x 13 x 11 x 12 x 13 x 11 x 12 x 13 i err m 1 m 2 m 3 Compiler message Line 1 Line 2 Line 3 Line idx Msg content 9: request for member Feedback Source code ‘size ’ … (compiler message) 49
1. Reasoning via program-feedback graph Model (Initial encoding) Source code 1 int main() { 2 char tmp, a, b; Position embedding LSTM code (1) LSTM code (1) LSTM code (1) LSTM msg (1) 3 map<string,int> mp; ... x 11 x 12 x 13 x 11 x 12 x 13 x 11 x 12 x 13 i err m 1 m 2 m 3 Compiler message Line 1 Line 2 Line 3 Line idx Msg content 9: request for member Feedback Source code ‘size ’ … (compiler message) 50
1. Reasoning via program-feedback graph Model (Graph attention) Source code Graph Attention 1 int main() { 2 char tmp, a, b; Position embedding LSTM code (1) LSTM code (1) LSTM code (1) LSTM msg (1) 3 map<string,int> mp; ... x 11 x 12 x 13 x 11 x 12 x 13 x 11 x 12 x 13 i err m 1 m 2 m 3 Compiler message Line 1 Line 2 Line 3 Line idx Msg content 9: request for member Feedback Source code ‘size ’ … (compiler message) 51
1. Reasoning via program-feedback graph Model (Graph attention) Message passing across tokens with long-range dependencies ● Source code hx 11 hx 12 hx 13 ... 1 int main() { ’ Line 1 hm 1 Multi-Head 2 char tmp, a, b; Attention 3 map<string,int> mp; Aggregate ... hx 21 hx 22 hx 23 ... hm 1 hm 2 hm 3 .. Line 2 Compiler message Compiler 9: request for member message hx 31 hx 32 hx 33 ... ‘size ’ … Line 3 Program-Feedback Graph 52
1. Reasoning via program-feedback graph Model (Graph attention) Source code Graph Attention 1 int main() { 2 char tmp, a, b; Position embedding LSTM code (1) LSTM code (1) LSTM code (1) LSTM msg (1) 3 map<string,int> mp; ... x 11 x 12 x 13 x 11 x 12 x 13 x 11 x 12 x 13 i err m 1 m 2 m 3 Compiler message Line 1 Line 2 Line 3 Line idx Msg content 9: request for member Feedback Source code ‘size ’ … (compiler message) 53
1. Reasoning via program-feedback graph Model (Recontextualization) Source code 1 int main() { 2 char tmp, a, b; 3 map<string,int> mp; ... Graph Attention Compiler message 9: request for member ‘size ’ … Line 1 Line 2 Line 3 Line idx Msg content 54
1. Reasoning via program-feedback graph Model (Recontextualization) Source code 1 int main() { 2 char tmp, a, b; 3 map<string,int> mp; ... LSTM code (2) LSTM code (2) LSTM code (2) LSTM msg (2) Graph Attention Compiler message 9: request for member ‘size ’ … Line 1 Line 2 Line 3 Line idx Msg content 55
1. Reasoning via program-feedback graph Model (Recontextualization) Source code 1 int main() { LSTM code (3) 2 char tmp, a, b; 3 map<string,int> mp; ... LSTM code (2) LSTM code (2) LSTM code (2) LSTM msg (2) Graph Attention Compiler message 9: request for member ‘size ’ … Line 1 Line 2 Line 3 Line idx Msg content 56
1. Reasoning via program-feedback graph Model (Decoding) Localize = 2 MLP + softmax Source code 1 int main() { LSTM code (3) 2 char tmp, a, b; 3 map<string,int> mp; ... LSTM code (2) LSTM code (2) LSTM code (2) LSTM msg (2) Graph Attention Compiler message 9: request for member ‘size ’ … Line 1 Line 2 Line 3 Line idx Msg content 57
1. Reasoning via program-feedback graph Model (Decoding) Localize = 2 Repair = "string tmp,a,b;" MLP Pointer-Generator + softmax Decoder Source code 1 int main() { LSTM code (3) 2 char tmp, a, b; 3 map<string,int> mp; ... LSTM code (2) LSTM code (2) LSTM code (2) LSTM msg (2) Graph Attention Compiler message 9: request for member ‘size ’ … Line 1 Line 2 Line 3 Line idx Msg content 58
1. Reasoning via program-feedback graph Model overview 59
2. Self-supervised learning 60
2. Self-supervised learning Why? Labeled datasets of program repair are small (10-100K examples) ● Vast amount of unlabeled programs available online ● Can we leverage them to improve learning? ● >> 1M submissions > 30M repos 61
2. Self-supervised learning Our idea (outline) Step 1. Collect unlabeled, working programs y Design (randomized) program corruption procedure P Step 2. Step 3. Corrupt and get diagnostic feedback (e.g. run compiler) ⇒ Extra training data : <broken code x , feedback f , fixed code y > Step 4. Use them for pre-training 62
2. Self-supervised learning 1. Collect unlabeled programs Our target tasks (DeepFix & SPoC) are in C/C++ ● Collect 300K working C++ programs from codeforces.com ● 63
2. Self-supervised learning 2. How to design corruption procedure P ? Look at common errors (know your enemy) ● Statistics Error type Common compiler messages Expected ... expected @@@ (e.g. expected ‘;’ before... ) ● operator/punctuator missing @@@ (e.g. missing " ) ● primary expression redeclaration/conflicting declaration Identifier type invalid conversion from <type> to <type> Identifier undeclared @@@ was not declared ‘else’ without a previous ‘if’ Others no matching function for call to... [Gupta et al. 17] [Mesbah et al. 19] [Kulal et al. 19] 64
2. Self-supervised learning 2. How to design corruption procedure P ? Look at common errors (know your enemy) ● Statistics Error type Common compiler messages Experienced Expected ... expected @@@ (e.g. expected ‘;’ before... ) ● operator/punctuator 9% missing @@@ (e.g. missing " ) ● primary expression redeclaration/conflicting declaration Identifier type 9% invalid conversion from <type> to <type> Identifier undeclared 62% @@@ was not declared ‘else’ without a previous ‘if’ Others 20% no matching function for call to... [Gupta et al. 17] [Mesbah et al. 19] [Kulal et al. 19] 65
2. Self-supervised learning 2. How to design corruption procedure P ? Look at common errors (know your enemy) ● Statistics Error type Common compiler messages Experienced Beginner Expected ... 48% expected @@@ (e.g. expected ‘;’ before... ) ● operator/punctuator 9% ● 37% missing @@@ (e.g. missing " ) ● primary expression ● 11 redeclaration/conflicting declaration Identifier type 9% 5% invalid conversion from <type> to <type> Identifier undeclared 62% 33% @@@ was not declared ‘else’ without a previous ‘if’ Others 20% 14% no matching function for call to... [Gupta et al. 17] [Mesbah et al. 19] [Kulal et al. 19] 66
2. Self-supervised learning 2. How to design corruption procedure P ? Look at common errors (know your enemy) ● Statistics Error type Common compiler messages Experienced Beginner SPoC Expected ... 48% 35% expected @@@ (e.g. expected ‘;’ before... ) ● operator/punctuator 9% ● 37% ● 29% missing @@@ (e.g. missing " ) ● primary expression ● 11 ● 6% redeclaration/conflicting declaration Identifier type 9% 5% 18% invalid conversion from <type> to <type> Identifier undeclared 62% 33% 31% @@@ was not declared ‘else’ without a previous ‘if’ Others 20% 14% 16% no matching function for call to... [Gupta et al. 17] [Mesbah et al. 19] [Kulal et al. 19] 67
2. Self-supervised learning 2. How to design corruption procedure P ? Look at common errors (know your enemy) ● Statistics Error type Common compiler messages Avg. Experienced Beginner SPoC 30% Expected ... 48% 35% expected @@@ (e.g. expected ‘;’ before... ) ● operator/punctuator ● 23% 9% ● 37% ● 29% missing @@@ (e.g. missing " ) ● primary expression ● 11 ● 6% ● 7% redeclaration/conflicting declaration 11% Identifier type 9% 5% 18% invalid conversion from <type> to <type> 42% Identifier undeclared 62% 33% 31% @@@ was not declared ‘else’ without a previous ‘if’ Others 17% 20% 14% 16% no matching function for call to... [Gupta et al. 17] [Mesbah et al. 19] [Kulal et al. 19] 68
2. Self-supervised learning 2. How to design corruption procedure P ? Look at common errors (know your enemy) ● Statistics Error type Common compiler messages Avg. Experienced Beginner SPoC 30% Expected ... 48% 35% expected @@@ (e.g. expected ‘;’ before... ) ● operator/punctuator ● 23% 9% ● 37% ● 29% missing @@@ (e.g. missing " ) ● primary expression ● 11 ● 6% ● 7% redeclaration/conflicting declaration 11% Identifier type 9% 5% 18% invalid conversion from <type> to <type> 42% Identifier undeclared 62% 33% 31% @@@ was not declared ‘else’ without a previous ‘if’ Others 17% 20% 14% 16% no matching function for call to... [Gupta et al. 17] [Mesbah et al. 19] [Kulal et al. 19] 69
2. Self-supervised learning 2. How to design corruption procedure P ? Look at common errors ● Design perturbation modules M to cause those errors ● 70
2. Self-supervised learning 2. How to design corruption procedure P ? Look at common errors ● Design perturbation modules M to cause those errors ● Perturbation module Example return 0; } Syntax (delete/insert/replace operators .;(){}'"+, etc.) → return 0; } } 71
2. Self-supervised learning 2. How to design corruption procedure P ? Look at common errors ● Design perturbation modules M to cause those errors ● Perturbation module Example return 0; } Syntax (delete/insert/replace operators .;(){}'"+, etc.) → return 0; } } string tmp; ID-type (delete/insert/replace type ) → char tmp; 72
2. Self-supervised learning 2. How to design corruption procedure P ? Look at common errors ● Design perturbation modules M to cause those errors ● Perturbation module Example return 0; } Syntax (delete/insert/replace operators .;(){}'"+, etc.) → return 0; } } string tmp; ID-type (delete/insert/replace type ) → char tmp; int a, b=0, m; ID-typo (delete/insert/replace IDentifier ) → int a, m; 73
2. Self-supervised learning 2. How to design corruption procedure P ? Look at common errors ● Design perturbation modules M to cause those errors ● Perturbation module Example return 0; } Syntax (delete/insert/replace operators .;(){}'"+, etc.) → return 0; } } string tmp; ID-type (delete/insert/replace type ) → char tmp; int a, b=0, m; ID-typo (delete/insert/replace IDentifier ) → int a, m; if (n >= 0) Keyword (delete/insert/replace keyword/call ) → while (n >= 0) 74
2. Self-supervised learning 2. Our corruption procedure P Look at common errors ● Design perturbation modules M to cause those errors ● P : Sample 1-5 modules from M , and apply to program sequentially ● 75
2. Self-supervised learning 2. Our corruption procedure P Look at common errors ● Design perturbation modules M to cause those errors ● P : Sample 1-5 modules from M , and apply to program sequentially ● e.g. ID-type , ID-typo , Syntax . 76
2. Self-supervised learning 2. Our corruption procedure P Look at common errors ● Design perturbation modules M to cause those errors ● P : Sample 1-5 modules from M , and apply to program sequentially ● e.g. ID-type , ID-typo , Syntax . Working code 5 int i, n; 6 string A; 7 cin >> n; 8 A.resize(n); 9 for (i = 0; i < n; i++){ 10 cin >> A[i]; 11 cout << i; } 77
2. Self-supervised learning 2. Our corruption procedure P Look at common errors ● Design perturbation modules M to cause those errors ● P : Sample 1-5 modules from M , and apply to program sequentially ● e.g. ID-type , ID-typo , Syntax . Working code Perturbed 1 5 int i, n; 5 int i, n; 6 string A; 6 char A; 7 cin >> n; 7 cin >> n; 8 A.resize(n); 8 A.resize(n); 9 for (i = 0; i < n; i++){ 9 for (i = 0; i < n; i++){ 10 cin >> A[i]; 10 cin >> A[i]; 11 cout << i; } 11 cout << i; } 78
2. Self-supervised learning 2. Our corruption procedure P Look at common errors ● Design perturbation modules M to cause those errors ● P : Sample 1-5 modules from M , and apply to program sequentially ● e.g. ID-type , ID-typo , Syntax . Working code Perturbed 1 Perturbed 2 5 int i, n; 5 int i, n; 5 int i, n; 6 string A; 6 char A; 6 char A; 7 cin >> n; 7 cin >> n; 7 cin >> n; 8 A.resize(n); 8 A.resize(n); 8 A.resize(n); 9 for (i = 0; i < n; i++){ 9 for (i = 0; i < n; i++){ 9 for (i = 0; i < n; i++){ 10 cin >> A[i]; 10 cin >> A[i]; 10 cin >> A[j]; 11 cout << i; } 11 cout << i; } 11 cout << i; } 79
2. Self-supervised learning 2. Our corruption procedure P Look at common errors ● Design perturbation modules M to cause those errors ● P : Sample 1-5 modules from M , and apply to program sequentially ● e.g. ID-type , ID-typo , Syntax . Working code Perturbed 1 Perturbed 2 Perturbed 3 5 int i, n; 5 int i, n; 5 int i, n; 5 int i, n; 6 string A; 6 char A; 6 char A; 6 char A; 7 cin >> n; 7 cin >> n; 7 cin >> n; 7 cin >> n . 8 A.resize(n); 8 A.resize(n); 8 A.resize(n); 8 A.resize(n); 9 for (i = 0; i < n; i++){ 9 for (i = 0; i < n; i++){ 9 for (i = 0; i < n; i++){ 9 for (i = 0; i < n; i++){ 10 cin >> A[i]; 10 cin >> A[i]; 10 cin >> A[j]; 10 cin >> A[j]; 11 cout << i; } 11 cout << i; } 11 cout << i; } 11 cout << i; } 80
2. Self-supervised learning 3. Prepare pre-training data 300K working programs from codeforces.com ● Working code 5 int i, n; 6 string A; 7 cin >> n; 8 A.resize(n); 9 for (i=0;i<n;i++){ 10 cin >> A[i]; 11 cout << i; } 81
2. Self-supervised learning 3. Prepare pre-training data 300K working programs from codeforces.com ● For each program, create corrupted versions by applying P ● Working code Corrupted 5 int i, n; 5 int i, n; P 6 string A; 6 char A; 7 cin >> n; 7 cin >> n . 8 A.resize(n); 8 A.resize(n); 9 for (i=0;i<n;i++){ 9 for (i=0;i<n;i++){ 10 cin >> A[i]; 10 cin >> A[j]; 11 cout << i; } 11 cout << i; } 82
2. Self-supervised learning 3. Prepare pre-training data 300K working programs from codeforces.com ● For each program, create corrupted versions by applying P ● Working code Corrupted 5 int i, n; 5 int i, n; P compiler 6 string A; 6 char A; Error! 7 cin >> n; 7 cin >> n . line 7: expected ‘;’ 8 A.resize(n); 8 A.resize(n); 9 for (i=0;i<n;i++){ 9 for (i=0;i<n;i++){ 10 cin >> A[i]; 10 cin >> A[j]; 11 cout << i; } 11 cout << i; } 83
2. Self-supervised learning 3. Prepare pre-training data 300K working programs from codeforces.com ● For each program, create corrupted versions by applying P ● ⇒ New program repair examples: <broken code, feedback, fixed code> Working code Corrupted 5 int i, n; 5 int i, n; P compiler 6 string A; 6 char A; Error! 7 cin >> n; 7 cin >> n . line 7: expected ‘;’ 8 A.resize(n); 8 A.resize(n); 9 for (i=0;i<n;i++){ 9 for (i=0;i<n;i++){ 10 cin >> A[i]; 10 cin >> A[j]; 11 cout << i; } 11 cout << i; } 84
2. Self-supervised learning 3. Prepare pre-training data 300K working programs from codeforces.com ● For each program, create corrupted versions by applying P ● ⇒ New program repair examples: <broken code, feedback, fixed code> Working code Corrupted 5 int i, n; 5 int i, n; P compiler 6 string A; 6 char A; Error! 7 cin >> n; 7 cin >> n . line 7: expected ‘;’ 8 A.resize(n); 8 A.resize(n); 9 for (i=0;i<n;i++){ 9 for (i=0;i<n;i++){ 10 cin >> A[i]; 10 cin >> A[j]; 11 cout << i; } 11 cout << i; } 85
2. Self-supervised learning 3. Prepare pre-training data 300K working programs from codeforces.com ● For each program, create corrupted versions by applying P ● ⇒ New program repair examples: <broken code, feedback, fixed code> Working code Corrupted 5 int i, n; 5 int i, n; P compiler 6 string A; 6 char A; Error! 7 cin >> n; 7 cin >> n . line 7: expected ‘;’ 8 A.resize(n); 8 A.resize(n); 9 for (i=0;i<n;i++){ 9 for (i=0;i<n;i++){ 10 cin >> A[i]; 10 cin >> A[j]; 11 cout << i; } 11 cout << i; } 86
2. Self-supervised learning 3. Prepare pre-training data 300K working programs from codeforces.com ● For each program, create corrupted versions by applying P ● ⇒ New program repair examples: <broken code, feedback, fixed code> Working code Corrupted 5 int i, n; 5 int i, n; P compiler 6 string A; 6 char A; Error! 7 cin >> n; 7 cin >> n . line 7: expected ‘;’ 8 A.resize(n); 8 A.resize(n); 9 for (i=0;i<n;i++){ 9 for (i=0;i<n;i++){ 10 cin >> A[i]; 10 cin >> A[j]; 11 cout << i; } 11 cout << i; } 87
2. Self-supervised learning What’s interesting? Typically, pre-training task ≠ target task (e.g. masked LM v.s. QA) ● Here, targeted pre-training (pre-training task = target task = program repair) ● More direct pre-training structure ○ Data distributions can be different between pre-training & target ○ 88
Evaluation 1: DeepFix 89
Evaluation 1: DeepFix Task Repair C programs ● May have multiple error lines ● Apply repair model iteratively (up to 5 times) ● [Gupta et al., 17] 90
Evaluation 1: DeepFix Our model outputs Input code 4 int main() { 5 int n; 6 int * m[2]; 7 m[0] = malloc(n*sizeof(int)); 8 m[1] = malloc(n*sizeof(int)); 9 for (i = 0; i < n; i++) { 10 m[0][i] = -1; 11 m[1][i] = -1; } 12 return 0 } 91
Evaluation 1: DeepFix Our model outputs Input code 4 int main() { 5 int n; 6 int * m[2]; 7 m[0] = malloc(n*sizeof(int)); 8 m[1] = malloc(n*sizeof(int)); 9 for (i = 0; i < n; i++) { 10 m[0][i] = -1; 11 m[1][i] = -1; } 12 return 0 } Error message line 9: ‘i’ undeclared 92
Evaluation 1: DeepFix Our model outputs DrRepair Input code Attempt 1 4 int main() { 4 int main() { 5 int n; 5 int n, i; 6 int * m[2]; 6 int * m[2]; 7 m[0] = malloc(n*sizeof(int)); 7 m[0] = malloc(n*sizeof(int)); 8 m[1] = malloc(n*sizeof(int)); 8 m[1] = malloc(n*sizeof(int)); 9 for (i = 0; i < n; i++) { 9 for (i = 0; i < n; i++) { 10 m[0][i] = -1; 10 m[0][i] = -1; 11 m[1][i] = -1; } 11 m[1][i] = -1; } 12 return 0 } 12 return 0 } Error message line 9: ‘i’ undeclared 93
Evaluation 1: DeepFix Our model outputs DrRepair Input code Attempt 1 4 int main() { 4 int main() { 5 int n; 5 int n, i; 6 int * m[2]; 6 int * m[2]; 7 m[0] = malloc(n*sizeof(int)); 7 m[0] = malloc(n*sizeof(int)); 8 m[1] = malloc(n*sizeof(int)); 8 m[1] = malloc(n*sizeof(int)); 9 for (i = 0; i < n; i++) { 9 for (i = 0; i < n; i++) { 10 m[0][i] = -1; 10 m[0][i] = -1; 11 m[1][i] = -1; } 11 m[1][i] = -1; } 12 return 0 } 12 return 0 } Error message Error message line 9: ‘i’ undeclared line 12: expected ‘;’ before ‘}’ 94
Evaluation 1: DeepFix Our model outputs DrRepair DrRepair Input code Attempt 1 Attempt 2 4 int main() { 4 int main() { 4 int main() { 5 int n; 5 int n, i; 5 int n; 6 int * m[2]; 6 int * m[2]; 6 int * m[2]; 7 m[0] = malloc(n*sizeof(int)); 7 m[0] = malloc(n*sizeof(int)); 7 m[0] = malloc(n*sizeof(int)); 8 m[1] = malloc(n*sizeof(int)); 8 m[1] = malloc(n*sizeof(int)); 8 m[1] = malloc(n*sizeof(int)); 9 for (i = 0; i < n; i++) { 9 for (i = 0; i < n; i++) { 9 for (i = 0; i < n; i++) { 10 m[0][i] = -1; 10 m[0][i] = -1; 10 m[0][i] = -1; 11 m[1][i] = -1; } 11 m[1][i] = -1; } 11 m[1][i] = -1; } 12 return 0 } 12 return 0 } 12 return 0 ; } Error message Error message line 9: ‘i’ undeclared line 12: expected ‘;’ before ‘}’ 95
Evaluation 1: DeepFix Our model outputs DrRepair DrRepair Input code Attempt 1 Attempt 2 4 int main() { 4 int main() { 4 int main() { 5 int n; 5 int n, i; 5 int n; 6 int * m[2]; 6 int * m[2]; 6 int * m[2]; 7 m[0] = malloc(n*sizeof(int)); 7 m[0] = malloc(n*sizeof(int)); 7 m[0] = malloc(n*sizeof(int)); 8 m[1] = malloc(n*sizeof(int)); 8 m[1] = malloc(n*sizeof(int)); 8 m[1] = malloc(n*sizeof(int)); 9 for (i = 0; i < n; i++) { 9 for (i = 0; i < n; i++) { 9 for (i = 0; i < n; i++) { 10 m[0][i] = -1; 10 m[0][i] = -1; 10 m[0][i] = -1; 11 m[1][i] = -1; } 11 m[1][i] = -1; } 11 m[1][i] = -1; } 12 return 0 } 12 return 0 } 12 return 0 ; } Error message Error message Compiled!! line 9: ‘i’ undeclared line 12: expected ‘;’ before ‘}’ 96
Evaluation 1: DeepFix Results Test (full repair accuracy) Prior works do not use compiler messages 97
Evaluation 1: DeepFix Results Test (full repair accuracy) Prior works do not use compiler messages 98
Evaluation 1: DeepFix Results Use of compiler messages is Test (full repair accuracy) important Prior works do not use compiler messages 99
Evaluation 1: DeepFix Results Use of compiler messages is Test (full repair accuracy) important Prior works do not use compiler messages 100
Recommend
More recommend