diagnosing type errors with class
play

Diagnosing Type Errors with Class Danfeng Zhang Dimitrios - PowerPoint PPT Presentation

Diagnosing Type Errors with Class Danfeng Zhang Dimitrios Vytiniotis Simon Peyton-Jones Andrew C. Myers Cornell University MSR Cambridge PLDI 2015 Distinguished Paper Award Error localization is difficult for ML type systems It is a


  1. Diagnosing Type Errors with Class Danfeng Zhang Dimitrios Vytiniotis Simon Peyton-Jones Andrew C. Myers Cornell University MSR Cambridge PLDI 2015 Distinguished Paper Award

  2. Error localization is difficult for ML type systems “It is a truism that most bugs are detected only at a great distance from their source .” Mitchell Wand Finding the source of type errors , POPL’86 Even worse in sophisticated type systems The Glasgow Haskell Compiler (GHC)  Type classes  Type families  GADTs  Type signatures 2

  3. Inference Engine Actual mistake: ‘==’ should be ‘−’ GHC: Bool is not a numerical type Error messages are sometimes confusing 3

  4. SHErrLoc: Static Holistic Error Locator Most likely error cause A general , expressive and accurate error localization method, which handles the highly expressive type system of GHC 4

  5. General Error Localization [Zhang&Myers’14] Programs Cannot diagnose OCaml Jif Others Haskell errors Constraints Based on Bayesian interpretation Constraints Analysis General Diagnosis Heuristics The error cause is likely to be • Simple • Able to explain all errors • Not used often on correct paths Cause 5

  6. Key Contributions Haskell Program Cause fact n = if n == 0 then 1 else n * fac (n == 1) A Bayesian model that Constraints accounts for the richer A highly expressive graph representation constraint language Bayesian reasoning Constraints Analysis A decidable and efficient constraint analysis algorithm Cause

  7. Roadmap Haskell Program fact n = if n == 0 then 1 else n * fac (n == 1) Constraints A highly expressive constraint language 7

  8. Type Checking as Constraint Solving • ML type system Constructors: Int, Bool, List – Constraint elements: types Variables: 𝛽, 𝛾, 𝛿 – Constraints: type equalities Element Syntax of Constraints 𝐹 ∷= 𝛽 𝑑𝑝𝑜 𝐹 1 , … , 𝐹 𝑜 𝐽 ∷= 𝐹 1 = 𝐹 2 𝐷 ∷= 𝑗 𝐽 𝑗 Constraint 8

  9. Type Classes Instances of a type class, called Num Intuitively, a set of types 9

  10. Modeling Type Class Constraints Syntax of Constraints 𝐹 ∷= 𝛽 𝑑𝑝𝑜 𝐹 1 , … , 𝐹 𝑜 𝐽 ∷= 𝐹 1 = 𝐹 2 𝑑𝑚𝑏 𝐹 1 , … , 𝐹 𝑜 𝐷 ∷= 𝑗 𝐽 𝑗 A type class is a Our constraint language set of its instances 𝐹 ∷= 𝛽 𝑑𝑝𝑜 𝐹 1 , … , 𝐹 𝑜 𝐽 ∷= 𝐹 1 ≤ 𝐹 2 𝐷 ∷= 𝑗 𝐽 𝑗 𝐽 ∷= 𝐹 1 ≤ 𝐹 2 𝐎𝐯𝐧 𝛽 ≔ 𝛽 ≤ 𝐎𝐯𝐧 𝐹 1 = 𝐹 2 ≔ 𝐹 1 ≤ 𝐹 2 ∧ 𝐹 2 ≤ 𝐹 1 10

  11. Modeling Type Class Constraints Syntax of Constraints 𝐹 ∷= 𝛽 𝑑𝑝𝑜 𝐹 1 , … , 𝐹 𝑜 𝐽 ∷= 𝐹 1 = 𝐹 2 𝑑𝑚𝑏 𝐹 1 , … , 𝐹 𝑜 𝐷 ∷= 𝑗 𝐽 𝑗 A type class is a Our constraint language set of its instances 𝐹 ∷= 𝛽 𝑑𝑝𝑜 𝐹 1 , … , 𝐹 𝑜 𝐽 ∷= 𝐹 1 ≤ 𝐹 2 𝐷 ∷= 𝑗 𝐽 𝑗 𝐽 ∷= 𝐹 1 ≤ 𝐹 2 Concise; inequalities directly map to edges in a graph 11

  12. Types are Checked Under Hypotheses • Type signatures and GADTs introduce hypotheses Haskell Program Constraints double :: Num a => a -> a a ≤ Num ⊢ a ≤ Num double n = n * 2 Hypothesis: a is an Constraint instance of Num hypothesis Constraints are checked under hypotheses 12

  13. Types are Checked Under Axioms • Instance declaration may introduce ( global) axioms For all a, a is an instance of Eq Eq implies list of a is an instance of Eq Eq Haskell Program instance Eq a => Eq [a] where {...} Constraint example with axioms: Int ≤ Eq ∧ ∀𝑏. 𝑏 ≤ Eq ⇒ 𝑏 ≤ Eq ⊢ Int ≤ Eq Int ≤ Eq ⇒ Int ≤ Eq Hypothesis Axiom 13

  14. Modeling Hypothesis and Axioms Syntax of SHErrLoc Constraints 𝐹 ∷= 𝛽 𝑑𝑝𝑜 𝐹 1 , … , 𝐹 𝑜 𝑅 ∷= ∀𝑏. 𝑗 𝐽 𝑗 ⇒ 𝐽 𝐽 ∷= 𝐹 1 ≤ 𝐹 2 𝐷 ∷= 𝑘 ( 𝑗 𝑅 𝑗 ⊢ 𝐽 𝑘 ) • Constraints ( 𝐷 ): inequalities under quantified axioms • Quantified axioms ( 𝑅 ): implication rules • Hypotheses (e.g., Int ≤ Num ): degenerate axioms 14

  15. The Full Constraint Language • Also supports – Functions on constraint elements – Nested universally and existentially quantified variables • Is expressive enough to model – type classes, type families, GADTs, type signatures (refer to the paper for more details) 15

  16. Roadmap Haskell Program Cause fact n = if n == 0 then 1 else n * fac (n == 1) Constraints A highly expressive constraint language Constraints Analysis A decidable and efficient constraint analysis algorithm

  17. Constraint Graph in a Nutshell • Graph construction (simple case) – Node: constraint element – Directed edge: partial ordering 17

  18. Constraint Analysis in a Nutshell Bool is not an instance of Num Num n fact n = if n == 0 then 1 0 else n * fac (n == 1) n == 1 18

  19. Limitations of Previous Algorithms [Barrett et al. ’00, Melski&Reps’00,Zhang&Myers’10] Int ≤ C ⊢ 𝛽 = Bool ∧ [𝛽] ≤ C ≰ Bool C a type class Previous algorithms under-saturates the graph C Satisfiable : Previous algs only add 𝛽 = Int edges ( [Bool] is not in [𝛽] the graph!) Satisfiable : 𝛽 = Bool 𝛽 Bool 19

  20. New Algorithm Int ≤ C ⊢ 𝛽 = Bool ∧ [𝛽] ≤ C Key idea: add new edges and nodes during saturation C New algorithm adds new nodes [𝛽] [Bool] 𝛽 Bool Key challenge: naive algorithms either fail to terminate, or under-saturate the graph 20

  21. New Algorithm in a Nutshell Black node: node before saturation White node: added during saturation C Nodes added based on patterns 1. one edge with two black nodes 2. a black/white node [𝛽] [Bool] Recursion check: if white node, not 𝛽 Bool added based on the edge in pattern Lemma: the algorithm always terminates 21

  22. Constraint Analysis • The analysis also handles – Functions on constraint elements – Hypotheses – Quantified axioms (refer to the paper for more details) • Performance – Empirically: quadratic in graph size 22

  23. Roadmap Haskell Program Cause fact n = if n == 0 then 1 else n * fac (n == 1) A Bayesian model that Constraints accounts for the richer A highly expressive graph representation constraint language Bayesian reasoning Constraints Analysis A decidable and efficient constraint analysis algorithm Cause 23

  24. Likelihood Estimation [Zhang&Myers’14] Explanation: a set of locations 𝑙 𝐹 𝑄 2 # sat paths using 𝐹 A ranking metric based 𝑄 locations in E 1 1 − 𝑄 2 on Bayesian reasoning ( 𝑄 1 , 𝑄 2 are tunable parameters) White nodes • Simplifying assumption break this – Satisfiability of paths are independent assumption

  25. C Satisfiability depends on edges between 𝛽 and Bool [𝛽] [Bool] 𝛽 Bool Redundant Paths (definition in paper) • Observation: some paths using white nodes provide neither positive nor negative evidence Lemma: the satisfiability of any redundant path depends on non-redundant paths 25

  26. New Ranking Metric non-redundant 𝑙 𝐹 𝑄 2 𝐹 𝑄 # sat paths use 1 1 − 𝑄 2 constraints in E • Intuitively, General Diagnosis Heuristics The error cause is likely to be • Simple • Able to explain all errors • Not used often on correct non-redundant paths • Top candidates returned by an efficient A* algorithm [Zhang&Myers’14 ] 26

  27. little Evaluation effort Modified 50 atop GHC • Implementation 20K+ LOC – From Haskell programs to constraints – SHErrLoc GHC Constraints Constraint ~400 LOC Translator SHErrLoc ~7500 LOC SHErrLoc Constraint Error Reports Constraints Graph Diagnosis 27

  28. Evaluation Setup • Benchmarks – CE Benchmark: analyzed 77 Haskell programs collected from papers about type-error diagnosis, used in [Chen&Erwig’14] – Helium benchmark: analyzed 228 programs with type- checking errors, logged by the Helium tool [Hage’14] • Ground truth – CE Benchmark: already well-marked – Helium benchmark: user’s actual fix • Correctness – only when the programmer mistake is returned by tools 28

  29. Accuracy on the CE Benchmark 100% 90% Other tool misses the error  SHErrLoc finds the correct error 80% 70% 60% Both find the correct error  50% Both miss the correct error 40% 30% SHErrLoc misses the error  Other tool finds the correct error 20% 10% 0% Comparison with GHC Comparison with the Helium tool [Heeren et al.’03 ] SHErrLoc uses no Haskell-specific heuristics! 29

  30. Accuracy on the Helium Benchmark 100% 90% Other tool misses the error  SHErrLoc finds the correct error 80% 70% 60% Both find the correct error  50% Both miss the correct error 40% 30% Other tool finds the correct error  20% SHErrLoc misses the error 10% 0% Comparison with the Helium tool Comparison with GHC 30

  31. Related Work • General error localization [ Zhang&Myers’14 ] – Cannot handle the type system of GHC – Simpler constraints and constraint analysis algorithm • Program analyses as constraint solving [e.g., Aiken’99, Foster et al. ’06] – No support for hypotheses and axioms • Diagnosing Haskell error [e.g., Heeren et al’03,Hage&Heeren’07,Chen&Erwig’14] – Haskell-specific heuristics – Unable to handle all of the sophisticated features of GHC 31

Recommend


More recommend