specialised vs declarative data mining
play

Specialised vs Declarative Data Mining Software Testing - PowerPoint PPT Presentation

Specialised vs Declarative Data Mining Software Testing Applications Nadjib Lazaar , CNRS, University of Montpellier Join works with: M. Maamar, Y. Lebbah, S. Loudni, C. Bessiere, et. al. SIMULA, Oslo, 11 oct. 2018 DATA MINING 2 DATA


  1. EXAMPLE θ = 3 D (2 I , ⊆ ) Closedness M θ = { P ∈ I| freq ( P ) ≥ θ ∧ ∀ P 0 ⊃ P : freq ( P 0 ) < θ } � 10

  2. EXAMPLE θ = 3 D (2 I , ⊆ ) Closedness M θ = { P ∈ I| freq ( P ) ≥ θ ∧ ∀ P 0 ⊃ P : freq ( P 0 ) < θ } � 11

  3. CONDENSED REPRESENTATION � 12

  4. CONDENSED REPRESENTATION � 12

  5. CONDENSED REPRESENTATION � 12

  6. CONDENSED REPRESENTATION Dataset #Frequent #Closed #Maximal 151 807 3 292 230 Zoo-1 Mushroom 155 734 3 287 453 9 967 402 46 802 5 191 Lymph 27 . 10 7 1 827 264 189 205 Hepa;;s � 12

  7. SPECIALIZED VS DECLARATIVE DATA MINING � 13

  8. SPECIALIZED VS DECLARATIVE DATA MINING dataset � 13

  9. SPECIALIZED VS DECLARATIVE DATA MINING Query Basic user’s constraints dataset � 13

  10. SPECIALIZED VS DECLARATIVE DATA MINING Query Basic user’s constraints + Specialised Miner dataset � 13

  11. SPECIALIZED VS DECLARATIVE DATA MINING Query Basic user’s constraints + Specialised Miner Patterns dataset � 13

  12. SPECIALIZED VS DECLARATIVE DATA MINING Query Basic user’s constraints + Specialised Miner Patterns dataset Limitations: Dealing with sophisticated user’s constraints [Wojciechowski and Zakrzewicz, 02] � 13

  13. SPECIALIZED VS DECLARATIVE DATA MINING Query Basic user’s constraints Sophisticated user’s + constraints Specialised Miner Patterns dataset Limitations: Dealing with sophisticated user’s constraints [Wojciechowski and Zakrzewicz, 02] � 13

  14. SPECIALIZED VS DECLARATIVE DATA MINING Query Basic user’s constraints Sophisticated user’s + constraints Specialised Miner 1 Patterns dataset preprocessing Limitations: Dealing with sophisticated user’s constraints [Wojciechowski and Zakrzewicz, 02] � 13

  15. SPECIALIZED VS DECLARATIVE DATA MINING Query Basic user’s constraints Sophisticated user’s 2 + constraints Specialised post- Miner processing 1 Patterns dataset preprocessing Limitations: Dealing with sophisticated user’s constraints [Wojciechowski and Zakrzewicz, 02] � 13

  16. SPECIALIZED VS DECLARATIVE DATA MINING Query new algo 3 Basic user’s constraints Sophisticated user’s 2 + constraints Specialised post- Miner processing 1 Patterns dataset preprocessing Limitations: Dealing with sophisticated user’s constraints [Wojciechowski and Zakrzewicz, 02] � 13

  17. SPECIALIZED VS DECLARATIVE DATA MINING Query new algo 3 Basic user’s constraints Sophisticated user’s 2 + constraints Specialised post- Miner processing 1 Patterns dataset preprocessing Limitations: Dealing with sophisticated user’s constraints [Wojciechowski and Zakrzewicz, 02] Need: Declarative way to deal with more complex queries ➤ Declarative data Mining � 13

  18. SPECIALIZED VS DECLARATIVE DATA MINING Query new algo 3 Basic user’s constraints Sophisticated user’s 2 + CP model constraints post- + CP solver processing 1 Patterns dataset preprocessing Limitations: Dealing with sophisticated user’s constraints [Wojciechowski and Zakrzewicz, 02] Need: Declarative way to deal with more complex queries ➤ Declarative data Mining � 14

  19. SPECIALIZED VS DECLARATIVE DATA MINING Query new algo 3 Basic user’s constraints Sophisticated user’s 2 + CP model constraints post- + CP solver processing 1 Patterns dataset preprocessing Limitations: Dealing with sophisticated user’s constraints [Wojciechowski and Zakrzewicz, 02] Need: Declarative way to deal with more complex queries ➤ Declarative data Mining � 14

  20. SPECIALIZED VS DECLARATIVE DATA MINING Query Basic user’s constraints Sophisticated user’s + CP model constraints + CP solver Patterns dataset Limitations: Dealing with sophisticated user’s constraints [Wojciechowski and Zakrzewicz, 02] Need: Declarative way to deal with more complex queries ➤ Declarative data Mining � 14

  21. SPECIALISED VS DECLARATIVE DATA MINING � 15

  22. SPECIALISED VS DECLARATIVE DATA MINING � 15

  23. SPECIALISED VS DECLARATIVE DATA MINING Specialised is the winner! � 15

  24. SPECIALISED VS DECLARATIVE DATA MINING Specialised is the winner! � 15

  25. SPECIALISED VS DECLARATIVE DATA MINING Specialised is the winner! Declarative is the winner! � 15

  26. SPECIALISED VS DECLARATIVE DATA MINING � 16

  27. SPECIALISED VS DECLARATIVE DATA MINING Preprocessing + Specialised step vs Declarative � 16

  28. SPECIALISED VS DECLARATIVE DATA MINING � 17

  29. SPECIALISED VS DECLARATIVE DATA MINING Specialised + postprocessing vs Declarative � 17

  30. CONCLUSIONS (PART I) � 18

  31. CONCLUSIONS (PART I) ➤ Specialised methods are suitable for: ➤ Enumerating Patterns ➤ Taking into account classic constraints (simple queries) � 18

  32. CONCLUSIONS (PART I) ➤ Specialised methods are suitable for: ➤ Enumerating Patterns ➤ Taking into account classic constraints (simple queries) ➤ Declarative methods are suitable for: ➤ Taking into account user’s constraints (complex queries) ➤ Iterative data mining process � 18

  33. CONCLUSIONS (PART I) ➤ Specialised methods are suitable for: ➤ Enumerating Patterns ➤ Taking into account classic constraints (simple queries) ➤ Declarative methods are suitable for: ➤ Taking into account user’s constraints (complex queries) ➤ Iterative data mining process Time left? � 18

  34. FAULT LOCALISATION � 19

  35. FAULT LOCALISATION ➤ The need: identify a subset of statements that are susceptible to explain a fault in a program ➤ Precision <=> Efficiency � 19

  36. FAULT LOCALISATION ➤ The need: identify a subset of statements that are susceptible to explain a fault in a program ➤ Precision <=> Efficiency ➤ Spectrum-based approaches: (ranking metrics - suspiciousness score) ➤ Tarantula [Jones and Harrold 05] ➤ Ochiai [Abreu et al. 07] ➤ Jaccard [Abreu et al. 07] ➤ … � 19

  37. FAULT LOCALISATION (MOTIVATIONS) � 20

  38. FAULT LOCALISATION (MOTIVATIONS) ➤ Pros: Quick localisation � 20

  39. FAULT LOCALISATION (MOTIVATIONS) ➤ Pros: Quick localisation ➤ Cons: independent evaluation of each statement at the expense of accuracy � 20

  40. FAULT LOCALISATION (MOTIVATIONS) � 21

  41. FAULT LOCALISATION (MOTIVATIONS) Test cases Program : Character counter tc 1 tc 2 tc 3 tc 4 tc 5 tc 6 tc 7 tc 8 function count (char *s) { int let, dig, other, i = 0; char c; e 1 : while (c = s[i++]) { 1 1 1 1 1 1 1 1 e 2 : 1 1 1 1 1 1 0 1 if(’A’<=c && ’Z’>=c) e 3 : 1 1 1 1 1 1 0 0 let += 2; //- fault - e 4 : 1 1 1 1 1 0 0 1 else if ( ’a’<=c && ’z’>=c ) e 5 : 1 1 0 0 1 0 0 0 let += 1; e 6 : 1 1 1 1 0 0 0 1 else if ( ’0’<=c && ’9’>=c ) e 7 : 0 1 0 1 0 0 0 0 dig += 1; e 8 : 1 0 1 0 0 0 0 1 else if (isprint (c)) e 9 : 1 0 1 0 0 0 0 1 other += 1; e 10 : printf("%d %d %d \ n", let, dig, other); } 1 1 1 1 1 1 1 1 Passing/Failing F F F F F F P P � 21

  42. FAULT LOCALISATION (MOTIVATIONS) Test cases Program : Character counter tc 1 tc 2 tc 3 tc 4 tc 5 tc 6 tc 7 tc 8 function count (char *s) { int let, dig, other, i = 0; char c; e 1 : while (c = s[i++]) { 1 1 1 1 1 1 1 1 e 2 : 1 1 1 1 1 1 0 1 if(’A’<=c && ’Z’>=c) e 3 : 1 1 1 1 1 1 0 0 let += 2; //- fault - e 4 : 1 1 1 1 1 0 0 1 else if ( ’a’<=c && ’z’>=c ) e 5 : 1 1 0 0 1 0 0 0 let += 1; e 6 : 1 1 1 1 0 0 0 1 else if ( ’0’<=c && ’9’>=c ) e 7 : 0 1 0 1 0 0 0 0 dig += 1; e 8 : 1 0 1 0 0 0 0 1 else if (isprint (c)) e 9 : 1 0 1 0 0 0 0 1 other += 1; e 10 : printf("%d %d %d \ n", let, dig, other); } 1 1 1 1 1 1 1 1 Passing/Failing F F F F F F P P � 21

  43. FAULT LOCALISATION (MOTIVATIONS) Test cases Program : Character counter tc 1 tc 2 tc 3 tc 4 tc 5 tc 6 tc 7 tc 8 function count (char *s) { int let, dig, other, i = 0; char c; e 1 : while (c = s[i++]) { 1 1 1 1 1 1 1 1 e 2 : 1 1 1 1 1 1 0 1 if(’A’<=c && ’Z’>=c) e 3 : 1 1 1 1 1 1 0 0 let += 2; //- fault - e 4 : 1 1 1 1 1 0 0 1 else if ( ’a’<=c && ’z’>=c ) e 5 : 1 1 0 0 1 0 0 0 let += 1; e 6 : 1 1 1 1 0 0 0 1 else if ( ’0’<=c && ’9’>=c ) e 7 : 0 1 0 1 0 0 0 0 dig += 1; e 8 : 1 0 1 0 0 0 0 1 else if (isprint (c)) e 9 : 1 0 1 0 0 0 0 1 other += 1; e 10 : printf("%d %d %d \ n", let, dig, other); } 1 1 1 1 1 1 1 1 Passing/Failing F F F F F F P P � 21

  44. FAULT LOCALISATION (MOTIVATIONS) � 22

  45. FAULT LOCALISATION (MOTIVATIONS) ➤ Pros: Quick localisation � 22

  46. FAULT LOCALISATION (MOTIVATIONS) ➤ Pros: Quick localisation ➤ Cons: independent evaluation of each statement at the expense of accuracy � 22

  47. FAULT LOCALISATION (MOTIVATIONS) ➤ Pros: Quick localisation ➤ Cons: independent evaluation of each statement at the expense of accuracy ➤ Need: more finer-grained localisation, taking into account user’s constraints � 22

  48. FAULT LOCALISATION (MOTIVATIONS) ➤ Pros: Quick localisation ➤ Cons: independent evaluation of each statement at the expense of accuracy ➤ Need: more finer-grained localisation, taking into account user’s constraints ➤ How: Use of Declarative Data Mining � 22

  49. FAULT LOCALISATION (MOTIVATIONS) Test cases Program : Character counter tc 1 tc 2 tc 3 tc 4 tc 5 tc 6 tc 7 tc 8 function count (char *s) { int let, dig, other, i = 0; char c; e 1 : while (c = s[i++]) { 1 1 1 1 1 1 1 1 e 2 : 1 1 1 1 1 1 0 1 if(’A’<=c && ’Z’>=c) e 3 : 1 1 1 1 1 1 0 0 let += 2; //- fault - e 4 : 1 1 1 1 1 0 0 1 else if ( ’a’<=c && ’z’>=c ) e 5 : 1 1 0 0 1 0 0 0 let += 1; e 6 : 1 1 1 1 0 0 0 1 else if ( ’0’<=c && ’9’>=c ) e 7 : 0 1 0 1 0 0 0 0 dig += 1; e 8 : 1 0 1 0 0 0 0 1 else if (isprint (c)) e 9 : 1 0 1 0 0 0 0 1 other += 1; e 10 : printf("%d %d %d \ n", let, dig, other); } 1 1 1 1 1 1 1 1 Passing/Failing F F F F F F P P � 23

  50. FAULT LOCALISATION (MOTIVATIONS) Test cases Program : Character counter tc 1 tc 2 tc 3 tc 4 tc 5 tc 6 tc 7 tc 8 function count (char *s) { int let, dig, other, i = 0; char c; e 1 : while (c = s[i++]) { 1 1 1 1 1 1 1 1 e 2 : 1 1 1 1 1 1 0 1 if(’A’<=c && ’Z’>=c) e 3 : 1 1 1 1 1 1 0 0 Fault localisation let += 2; //- fault - e 4 : 1 1 1 1 1 0 0 1 else if ( ’a’<=c && ’z’>=c ) = e 5 : 1 1 0 0 1 0 0 0 let += 1; Mining Task e 6 : 1 1 1 1 0 0 0 1 else if ( ’0’<=c && ’9’>=c ) e 7 : 0 1 0 1 0 0 0 0 dig += 1; e 8 : 1 0 1 0 0 0 0 1 else if (isprint (c)) e 9 : 1 0 1 0 0 0 0 1 other += 1; e 10 : printf("%d %d %d \ n", let, dig, other); } 1 1 1 1 1 1 1 1 Passing/Failing F F F F F F P P � 23

  51. PATTERN SUSPICIOUSNESS DEGREE (PSD) � 24

  52. PATTERN SUSPICIOUSNESS DEGREE (PSD) ➤ PSD function. Given a pattern P of a program: PSD ( P ) = freq − ( P ) + | F AIL | − freq + ( P ) | P ASS | +1 � 24

  53. PATTERN SUSPICIOUSNESS DEGREE (PSD) ➤ PSD function. Given a pattern P of a program: PSD ( P ) = freq − ( P ) + | F AIL | − freq + ( P ) | P ASS | +1 ➤ PSD-dominance relation. Given two patterns P i and P j P i B P SD P j ⇔ PSD ( P i ) > PSD ( P j ) � 24

  54. PATTERN SUSPICIOUSNESS DEGREE (PSD) ➤ PSD function. Given a pattern P of a program: PSD ( P ) = freq − ( P ) + | F AIL | − freq + ( P ) | P ASS | +1 ➤ PSD-dominance relation. Given two patterns P i and P j P i B P SD P j ⇔ PSD ( P i ) > PSD ( P j ) ➤ Top-k suspicious patterns. top-k= { P | 6 9 P 1 , . . . , P k : 8 1  j  k, P j B P SD P } � 24

  55. FCP-MINER TOOL (SOME RESULTS) � 25

  56. CONCLUSIONS (PART II) � 26

Recommend


More recommend