phog probabilistic model for code
play

PHOG: Probabilistic Model for Code Pavol Bielik , Veselin Raychev, - PowerPoint PPT Presentation

PHOG: Probabilistic Model for Code Pavol Bielik , Veselin Raychev, Martin Vechev Software Reliability Lab Department of Computer Science ETH Zurich Vision Statistical Programming Tool Probabilistic Model number of 15 million repositories


  1. PHOG: Probabilistic Model for Code Pavol Bielik , Veselin Raychev, Martin Vechev Software Reliability Lab Department of Computer Science ETH Zurich

  2. Vision Statistical Programming Tool Probabilistic Model number of 15 million repositories repositories Billions of lines of code High quality, tested, maintained programs last 8 years

  3. Statistical Programming Tools Write new code [PLDI’14] : Port code [ONWARD’14]: Code Completion Programming Language Translation Camera camera = Camera.open(); camera.SetDisplayOrientation(90); ? Understand code/security [POPL’15]: Debug code: JavaScript Deobfuscation Statistical Bug Detection Type Prediction likely error ... for x in range(a): print a[x] www.jsnice.org All of these benefit from the probabilistic model for code.

  4. Statistical Programming Tools Write new code [PLDI’14] : Port code [ONWARD’14]: Code Completion Programming Language Translation Camera camera = Camera.open(); camera.SetDisplayOrientation(90); Programming Languages ? + Understand code/security [POPL’15]: Debug code: JavaScript Deobfuscation Statistical Bug Detection Machine Learning Type Prediction likely error ... for x in range(a): print a[x] www.jsnice.org All of these benefit from the probabilistic model for code.

  5. Model Requirements Existing Programs Learning Model Probabilistic Model Widely Efficient Explainable High Precision Applicable Learning Predictions

  6. Model Requirements Existing Programs Learning Model Probabilistic Model PHOG: Probabilistic Higher Order Grammar Widely Efficient Explainable High Precision Applicable Learning Predictions

  7. Example Query awaitReset = function(){ ... return defer.promise; } awaitRemoved = function(){ fail(function(error){ if (error.status === 401){ ... } P Correct defer.reject(error); promise 0.67 prediction }); notify 0.12 ... resolve 0.11 PHOG return defer. ? reject 0.03 }

  8. Challenges awaitReset = function(){ Long distance ... dependencies return defer.promise; } awaitRemoved = function(){ fail(function(error){ if (error.status === 401){ ... } P Correct defer.reject(error); promise 0.67 prediction }); notify 0.12 ... resolve 0.11 PHOG return defer. ? reject 0.03 }

  9. Challenges awaitReset = function(){ Long distance ... dependencies return defer.promise; } Program semantics awaitRemoved = function(){ fail(function(error){ if (error.status === 401){ ... } P Correct defer.reject(error); promise 0.67 prediction }); notify 0.12 ... resolve 0.11 PHOG return defer. ? reject 0.03 }

  10. Challenges awaitReset = function(){ Long distance ... dependencies return defer.promise; } Program semantics awaitRemoved = function(){ fail(function(error){ Explainable predictions if (error.status === 401){ ... } P Correct defer.reject(error); promise 0.67 prediction }); notify 0.12 ... resolve 0.11 PHOG return defer. ? reject 0.03 }

  11. Existing Approaches for Code Syntactic [Hindle et al., 2012] [Allamanis et al., 2015] (features) label conditioning context arg max P( x | ) x Bad fit for programs

  12. Existing Approaches for Code Syntactic Semantic [Hindle et al., 2012] [Nguyen et al., 2013] [Allamanis et al., 2015] [Allamanis et al., 2014] [Raychev et al., 2014] (features) label conditioning context defer arg max P( x | ) arg max P( x | ) reject x x promise Bad fit for Hard-coded heuristics programs Task & Language specific

  13. PHOG: Concepts Program synthesis learns a function that explains the data. The function returns a conditioning context for a given query. Use function to build a probabilistic model. Generalizes PCFGs to allow conditioning on richer context.

  14. Generalizing PCFG Context Free Grammar � → � 1 … � n P Property → x 0.05 Property → y 0.03 Property → promise 0.001

  15. PHOG: Generalizes PCFG Context Free Grammar Higher Order Grammar � → � 1 … � n � [ � ] → � 1 … � n P P Property[reject, promise] → promise 0.67 Property → x 0.05 Property[reject, promise] → notify 0.12 Property → y 0.03 Property[reject, promise] → resolve 0.11 Property → promise 0.001

  16. Conditioning on Richer Context � [ � ] → � 1 … � n What is the best conditioning context?

  17. Conditioning on Richer Context � [ � ] → � 1 … � n What is the best conditioning context? - APIs - Identifiers - Control Structures - Fields - Constants - …

  18. Conditioning on Richer Context � [ � ] → � 1 … � n What is the best conditioning context? - APIs - Identifiers - Control Structures - Fields - Constants - … ? � Source Conditioning Code Context

  19. Higher Order Grammar Production Rules R: � [ � ] → � 1 … � n Function: f: → � Parametrize the grammar by a function used to dynamically obtain the context

  20. Higher Order Grammar Production Rules R: � [ � ] → � 1 … � n Function: f: AST → � Parametrize the grammar by a function used to dynamically obtain the context

  21. Higher Order Grammar Production Rules R: � [ � ] → � 1 … � n Function: f: AST → � ( ) f � Source Abstract Function Conditioning Code Syntax Tree Application Context

  22. Function Representation In general: Unrestricted programs (Turing complete) Our Work: TCond Language for navigating over trees and accumulating context TCond ::= � | WriteOp TCond | MoveOp TCond MoveOp ::= Up, Left, Right, DownFirst, DownLast, NextDFS, PrevDFS, NextLeaf, PrevLeaf, PrevNodeType, PrevNodeValue, PrevNodeContext WriteOp ::= WriteValue, WriteType, WritePos

  23. Expressing functions: TCond Language Up Left WriteValue � ← � ∙ TCond ::= � | WriteOp TCond | MoveOp TCond MoveOp ::= Up, Left, Right, DownFirst, DownLast, NextDFS, PrevDFS, NextLeaf, PrevLeaf, PrevNodeType, PrevNodeValue, PrevNodeContext WriteOp ::= WriteValue, WriteType, WritePos

  24. Example � Query TCond Program elem.notify( ... , ... , { position: ‘top’, hide: false, ? } );

  25. Example � Query TCond elem.notify( Left {} ... , WriteValue {hide} ... , { position: ‘top’, hide: false, ? } );

  26. Example � Query TCond elem.notify( Left {} ... , WriteValue {hide} ... , Up {hide} { WritePos {hide, 3} position: ‘top’, hide: false, ? } );

  27. Example � Query TCond elem.notify( Left {} ... , WriteValue {hide} ... , Up {hide} { WritePos {hide, 3} position: ‘top’, Up {hide, 3} hide: false, DownFirst {hide, 3} ? DownLast {hide, 3} } WriteValue {hide, 3, notify} );

  28. Example � Query TCond elem.notify( Left {} ... , WriteValue {hide} ... , Up {hide} { WritePos {hide, 3} position: ‘top’, Up {hide, 3} hide: false, DownFirst {hide, 3} ? DownLast {hide, 3} } WriteValue {hide, 3, notify} ); { Previous Property, Parameter Position, API name }

  29. Learning PHOG Existing Dataset Program Synthesis Enumerative search Genetic programming f best = arg min cost(D, f) f ∊ TCond TCond ::= � | WriteOp TCond | MoveOp TCond MoveOp ::= Up, Left, Right, ... |d| << |D| WriteOp ::= WriteValue, WriteType, ... |cost(d, f) - cost(D,f)| < � TCond Language Representative sampling Learning Programs from Noisy Data. POPL ’16, ACM.

  30. Evaluation Probabilistic Model of JavaScript Language 20k TCond learning 100k PHOG training 50k Blind Set

  31. Evaluation Code Completion Error Rate PCFG 49.9% n-gram 28.7% Naive Bayes 45.8% SVM 29.5% PHOG 18.5%

  32. Evaluation Code Completion Error Rate Example Identifier 38% contains = jQuery … Property 35% start = list. length ; String 48% ‘[‘ + attrs + ‘]’ Number 36% canvas(xy[0], xy[ 1 ], …) RegExp 34% line.replace( /(&nbsp;| )+/ , …) UnaryExpr 3% if (!events || ! …) BinaryExpr 26% while (++index < …) LogicalExpr 8% frame = frame || …

  33. Evaluation Training Time Queries per Second PCFG 1 min 71 000 n-gram 4 min 15 000 Naive Bayes 3 min 10 000 SVM 36 hours 12 500 PHOG 162 + 3 min 50 000

  34. PHOG: Probabilistic Higher Order Grammar Widely Efficient Explainable High Precision Applicable Learning Predictions Key Ideas: dataset - Learn a function that explains the data. The function dynamically obtains the best conditioning context for a given query. f best = arg min cost(D, f) f ∊ TCond - Define a new generative model that is TCond ::= � | WriteOp TCond | MoveOp TCond parametrized by such learned function. MoveOp ::= Up, Left, Right, ... WriteOp ::= WriteValue, WriteType, ... PHOG( f best ) TCond Language

Recommend


More recommend