EXAMPLE θ = 3 D (2 I , ⊆ ) Closedness M θ = { P ∈ I| freq ( P ) ≥ θ ∧ ∀ P 0 ⊃ P : freq ( P 0 ) < θ } � 10
EXAMPLE θ = 3 D (2 I , ⊆ ) Closedness M θ = { P ∈ I| freq ( P ) ≥ θ ∧ ∀ P 0 ⊃ P : freq ( P 0 ) < θ } � 11
CONDENSED REPRESENTATION � 12
CONDENSED REPRESENTATION � 12
CONDENSED REPRESENTATION � 12
CONDENSED REPRESENTATION Dataset #Frequent #Closed #Maximal 151 807 3 292 230 Zoo-1 Mushroom 155 734 3 287 453 9 967 402 46 802 5 191 Lymph 27 . 10 7 1 827 264 189 205 Hepa;;s � 12
SPECIALIZED VS DECLARATIVE DATA MINING � 13
SPECIALIZED VS DECLARATIVE DATA MINING dataset � 13
SPECIALIZED VS DECLARATIVE DATA MINING Query Basic user’s constraints dataset � 13
SPECIALIZED VS DECLARATIVE DATA MINING Query Basic user’s constraints + Specialised Miner dataset � 13
SPECIALIZED VS DECLARATIVE DATA MINING Query Basic user’s constraints + Specialised Miner Patterns dataset � 13
SPECIALIZED VS DECLARATIVE DATA MINING Query Basic user’s constraints + Specialised Miner Patterns dataset Limitations: Dealing with sophisticated user’s constraints [Wojciechowski and Zakrzewicz, 02] � 13
SPECIALIZED VS DECLARATIVE DATA MINING Query Basic user’s constraints Sophisticated user’s + constraints Specialised Miner Patterns dataset Limitations: Dealing with sophisticated user’s constraints [Wojciechowski and Zakrzewicz, 02] � 13
SPECIALIZED VS DECLARATIVE DATA MINING Query Basic user’s constraints Sophisticated user’s + constraints Specialised Miner 1 Patterns dataset preprocessing Limitations: Dealing with sophisticated user’s constraints [Wojciechowski and Zakrzewicz, 02] � 13
SPECIALIZED VS DECLARATIVE DATA MINING Query Basic user’s constraints Sophisticated user’s 2 + constraints Specialised post- Miner processing 1 Patterns dataset preprocessing Limitations: Dealing with sophisticated user’s constraints [Wojciechowski and Zakrzewicz, 02] � 13
SPECIALIZED VS DECLARATIVE DATA MINING Query new algo 3 Basic user’s constraints Sophisticated user’s 2 + constraints Specialised post- Miner processing 1 Patterns dataset preprocessing Limitations: Dealing with sophisticated user’s constraints [Wojciechowski and Zakrzewicz, 02] � 13
SPECIALIZED VS DECLARATIVE DATA MINING Query new algo 3 Basic user’s constraints Sophisticated user’s 2 + constraints Specialised post- Miner processing 1 Patterns dataset preprocessing Limitations: Dealing with sophisticated user’s constraints [Wojciechowski and Zakrzewicz, 02] Need: Declarative way to deal with more complex queries ➤ Declarative data Mining � 13
SPECIALIZED VS DECLARATIVE DATA MINING Query new algo 3 Basic user’s constraints Sophisticated user’s 2 + CP model constraints post- + CP solver processing 1 Patterns dataset preprocessing Limitations: Dealing with sophisticated user’s constraints [Wojciechowski and Zakrzewicz, 02] Need: Declarative way to deal with more complex queries ➤ Declarative data Mining � 14
SPECIALIZED VS DECLARATIVE DATA MINING Query new algo 3 Basic user’s constraints Sophisticated user’s 2 + CP model constraints post- + CP solver processing 1 Patterns dataset preprocessing Limitations: Dealing with sophisticated user’s constraints [Wojciechowski and Zakrzewicz, 02] Need: Declarative way to deal with more complex queries ➤ Declarative data Mining � 14
SPECIALIZED VS DECLARATIVE DATA MINING Query Basic user’s constraints Sophisticated user’s + CP model constraints + CP solver Patterns dataset Limitations: Dealing with sophisticated user’s constraints [Wojciechowski and Zakrzewicz, 02] Need: Declarative way to deal with more complex queries ➤ Declarative data Mining � 14
SPECIALISED VS DECLARATIVE DATA MINING � 15
SPECIALISED VS DECLARATIVE DATA MINING � 15
SPECIALISED VS DECLARATIVE DATA MINING Specialised is the winner! � 15
SPECIALISED VS DECLARATIVE DATA MINING Specialised is the winner! � 15
SPECIALISED VS DECLARATIVE DATA MINING Specialised is the winner! Declarative is the winner! � 15
SPECIALISED VS DECLARATIVE DATA MINING � 16
SPECIALISED VS DECLARATIVE DATA MINING Preprocessing + Specialised step vs Declarative � 16
SPECIALISED VS DECLARATIVE DATA MINING � 17
SPECIALISED VS DECLARATIVE DATA MINING Specialised + postprocessing vs Declarative � 17
CONCLUSIONS (PART I) � 18
CONCLUSIONS (PART I) ➤ Specialised methods are suitable for: ➤ Enumerating Patterns ➤ Taking into account classic constraints (simple queries) � 18
CONCLUSIONS (PART I) ➤ Specialised methods are suitable for: ➤ Enumerating Patterns ➤ Taking into account classic constraints (simple queries) ➤ Declarative methods are suitable for: ➤ Taking into account user’s constraints (complex queries) ➤ Iterative data mining process � 18
CONCLUSIONS (PART I) ➤ Specialised methods are suitable for: ➤ Enumerating Patterns ➤ Taking into account classic constraints (simple queries) ➤ Declarative methods are suitable for: ➤ Taking into account user’s constraints (complex queries) ➤ Iterative data mining process Time left? � 18
FAULT LOCALISATION � 19
FAULT LOCALISATION ➤ The need: identify a subset of statements that are susceptible to explain a fault in a program ➤ Precision <=> Efficiency � 19
FAULT LOCALISATION ➤ The need: identify a subset of statements that are susceptible to explain a fault in a program ➤ Precision <=> Efficiency ➤ Spectrum-based approaches: (ranking metrics - suspiciousness score) ➤ Tarantula [Jones and Harrold 05] ➤ Ochiai [Abreu et al. 07] ➤ Jaccard [Abreu et al. 07] ➤ … � 19
FAULT LOCALISATION (MOTIVATIONS) � 20
FAULT LOCALISATION (MOTIVATIONS) ➤ Pros: Quick localisation � 20
FAULT LOCALISATION (MOTIVATIONS) ➤ Pros: Quick localisation ➤ Cons: independent evaluation of each statement at the expense of accuracy � 20
FAULT LOCALISATION (MOTIVATIONS) � 21
FAULT LOCALISATION (MOTIVATIONS) Test cases Program : Character counter tc 1 tc 2 tc 3 tc 4 tc 5 tc 6 tc 7 tc 8 function count (char *s) { int let, dig, other, i = 0; char c; e 1 : while (c = s[i++]) { 1 1 1 1 1 1 1 1 e 2 : 1 1 1 1 1 1 0 1 if(’A’<=c && ’Z’>=c) e 3 : 1 1 1 1 1 1 0 0 let += 2; //- fault - e 4 : 1 1 1 1 1 0 0 1 else if ( ’a’<=c && ’z’>=c ) e 5 : 1 1 0 0 1 0 0 0 let += 1; e 6 : 1 1 1 1 0 0 0 1 else if ( ’0’<=c && ’9’>=c ) e 7 : 0 1 0 1 0 0 0 0 dig += 1; e 8 : 1 0 1 0 0 0 0 1 else if (isprint (c)) e 9 : 1 0 1 0 0 0 0 1 other += 1; e 10 : printf("%d %d %d \ n", let, dig, other); } 1 1 1 1 1 1 1 1 Passing/Failing F F F F F F P P � 21
FAULT LOCALISATION (MOTIVATIONS) Test cases Program : Character counter tc 1 tc 2 tc 3 tc 4 tc 5 tc 6 tc 7 tc 8 function count (char *s) { int let, dig, other, i = 0; char c; e 1 : while (c = s[i++]) { 1 1 1 1 1 1 1 1 e 2 : 1 1 1 1 1 1 0 1 if(’A’<=c && ’Z’>=c) e 3 : 1 1 1 1 1 1 0 0 let += 2; //- fault - e 4 : 1 1 1 1 1 0 0 1 else if ( ’a’<=c && ’z’>=c ) e 5 : 1 1 0 0 1 0 0 0 let += 1; e 6 : 1 1 1 1 0 0 0 1 else if ( ’0’<=c && ’9’>=c ) e 7 : 0 1 0 1 0 0 0 0 dig += 1; e 8 : 1 0 1 0 0 0 0 1 else if (isprint (c)) e 9 : 1 0 1 0 0 0 0 1 other += 1; e 10 : printf("%d %d %d \ n", let, dig, other); } 1 1 1 1 1 1 1 1 Passing/Failing F F F F F F P P � 21
FAULT LOCALISATION (MOTIVATIONS) Test cases Program : Character counter tc 1 tc 2 tc 3 tc 4 tc 5 tc 6 tc 7 tc 8 function count (char *s) { int let, dig, other, i = 0; char c; e 1 : while (c = s[i++]) { 1 1 1 1 1 1 1 1 e 2 : 1 1 1 1 1 1 0 1 if(’A’<=c && ’Z’>=c) e 3 : 1 1 1 1 1 1 0 0 let += 2; //- fault - e 4 : 1 1 1 1 1 0 0 1 else if ( ’a’<=c && ’z’>=c ) e 5 : 1 1 0 0 1 0 0 0 let += 1; e 6 : 1 1 1 1 0 0 0 1 else if ( ’0’<=c && ’9’>=c ) e 7 : 0 1 0 1 0 0 0 0 dig += 1; e 8 : 1 0 1 0 0 0 0 1 else if (isprint (c)) e 9 : 1 0 1 0 0 0 0 1 other += 1; e 10 : printf("%d %d %d \ n", let, dig, other); } 1 1 1 1 1 1 1 1 Passing/Failing F F F F F F P P � 21
FAULT LOCALISATION (MOTIVATIONS) � 22
FAULT LOCALISATION (MOTIVATIONS) ➤ Pros: Quick localisation � 22
FAULT LOCALISATION (MOTIVATIONS) ➤ Pros: Quick localisation ➤ Cons: independent evaluation of each statement at the expense of accuracy � 22
FAULT LOCALISATION (MOTIVATIONS) ➤ Pros: Quick localisation ➤ Cons: independent evaluation of each statement at the expense of accuracy ➤ Need: more finer-grained localisation, taking into account user’s constraints � 22
FAULT LOCALISATION (MOTIVATIONS) ➤ Pros: Quick localisation ➤ Cons: independent evaluation of each statement at the expense of accuracy ➤ Need: more finer-grained localisation, taking into account user’s constraints ➤ How: Use of Declarative Data Mining � 22
FAULT LOCALISATION (MOTIVATIONS) Test cases Program : Character counter tc 1 tc 2 tc 3 tc 4 tc 5 tc 6 tc 7 tc 8 function count (char *s) { int let, dig, other, i = 0; char c; e 1 : while (c = s[i++]) { 1 1 1 1 1 1 1 1 e 2 : 1 1 1 1 1 1 0 1 if(’A’<=c && ’Z’>=c) e 3 : 1 1 1 1 1 1 0 0 let += 2; //- fault - e 4 : 1 1 1 1 1 0 0 1 else if ( ’a’<=c && ’z’>=c ) e 5 : 1 1 0 0 1 0 0 0 let += 1; e 6 : 1 1 1 1 0 0 0 1 else if ( ’0’<=c && ’9’>=c ) e 7 : 0 1 0 1 0 0 0 0 dig += 1; e 8 : 1 0 1 0 0 0 0 1 else if (isprint (c)) e 9 : 1 0 1 0 0 0 0 1 other += 1; e 10 : printf("%d %d %d \ n", let, dig, other); } 1 1 1 1 1 1 1 1 Passing/Failing F F F F F F P P � 23
FAULT LOCALISATION (MOTIVATIONS) Test cases Program : Character counter tc 1 tc 2 tc 3 tc 4 tc 5 tc 6 tc 7 tc 8 function count (char *s) { int let, dig, other, i = 0; char c; e 1 : while (c = s[i++]) { 1 1 1 1 1 1 1 1 e 2 : 1 1 1 1 1 1 0 1 if(’A’<=c && ’Z’>=c) e 3 : 1 1 1 1 1 1 0 0 Fault localisation let += 2; //- fault - e 4 : 1 1 1 1 1 0 0 1 else if ( ’a’<=c && ’z’>=c ) = e 5 : 1 1 0 0 1 0 0 0 let += 1; Mining Task e 6 : 1 1 1 1 0 0 0 1 else if ( ’0’<=c && ’9’>=c ) e 7 : 0 1 0 1 0 0 0 0 dig += 1; e 8 : 1 0 1 0 0 0 0 1 else if (isprint (c)) e 9 : 1 0 1 0 0 0 0 1 other += 1; e 10 : printf("%d %d %d \ n", let, dig, other); } 1 1 1 1 1 1 1 1 Passing/Failing F F F F F F P P � 23
PATTERN SUSPICIOUSNESS DEGREE (PSD) � 24
PATTERN SUSPICIOUSNESS DEGREE (PSD) ➤ PSD function. Given a pattern P of a program: PSD ( P ) = freq − ( P ) + | F AIL | − freq + ( P ) | P ASS | +1 � 24
PATTERN SUSPICIOUSNESS DEGREE (PSD) ➤ PSD function. Given a pattern P of a program: PSD ( P ) = freq − ( P ) + | F AIL | − freq + ( P ) | P ASS | +1 ➤ PSD-dominance relation. Given two patterns P i and P j P i B P SD P j ⇔ PSD ( P i ) > PSD ( P j ) � 24
PATTERN SUSPICIOUSNESS DEGREE (PSD) ➤ PSD function. Given a pattern P of a program: PSD ( P ) = freq − ( P ) + | F AIL | − freq + ( P ) | P ASS | +1 ➤ PSD-dominance relation. Given two patterns P i and P j P i B P SD P j ⇔ PSD ( P i ) > PSD ( P j ) ➤ Top-k suspicious patterns. top-k= { P | 6 9 P 1 , . . . , P k : 8 1 j k, P j B P SD P } � 24
FCP-MINER TOOL (SOME RESULTS) � 25
CONCLUSIONS (PART II) � 26
Recommend
More recommend