application of information
play

Application of Information or How I learnt to stop Theory to Fault - PowerPoint PPT Presentation

p ( x ) log 2 p ( x ) x X X ) = CREST Open Workshop #41 Application of Information or How I learnt to stop Theory to Fault worrying and love the bomb entropy Localisation by Shin Yoo This talk is Not a theoretical


  1. p ( x ) log 2 p ( x ) x ∈ X X ) = − CREST Open Workshop #41 Application of Information or “How I learnt to stop Theory to Fault worrying and love the bomb entropy” Localisation by Shin Yoo

  2. This talk is… ❖ Not a theoretical masterclass on application of Shannon Entropy to software engineering, unfortunately ❖ Rather a story of a clueless software engineer who learnt to appreciate the power of information theory

  3. The Problem Domain ❖ Fault Localisation: given observations from test execution (which includes both passing and failing test cases), identify where the faulty statement lies.

  4. Spectra Based Fault Localisation e p e f − e p + n p + 1 Program Spectrum Formula (Suspiciousness) Higher ranking = Fewer statements to check Tests Ranking

  5. Spectra-Based Fault Localisation Structural Test Test Test Spectrum Tarantula Rank Elements t 1 t 2 t 3 e p e f n p n f 1 0 0 2 0.00 9 s 1 • 1 0 0 2 0.00 9 s 2 • e f 1 0 0 2 0.00 9 s 3 • e f + n f 1 0 0 2 0.00 9 s 4 Tarantula = • e p e f 1 0 0 2 0.00 9 s 5 e p + n p + • e f + n f 1 1 0 1 0.33 4 s 6 • • s 7 (faulty) 0 2 1 0 1.00 1 • • 1 1 0 1 0.33 4 s 8 • • 1 2 0 0 0.50 2 s 9 • • • Result P F F

  6. How do we evaluate these? e p ( Wong 2 = e f − e p Op 2 = e f − − 1 if n f > 0 e p + n p + 1 Op 1 = n p otherwise e f 2 e f e f + n p + 2( e p + n f ) e f + n f + e p e f e f e f + n f e f Tarantula = Jaccard = e p e f e p + n p + e f + n f + e p e f + n f + e p e f + n f 2( e f + n p ) e f + n p e f e p 2( e f + n p ) + e p + n f n f + e p AMPLE = | | − e f + n f e p + n p e f e f Ochiai = p ( e f + n f ) · ( e f + e p ) n f + e p e f + n p e f Wong 1 = e f e f + n f + e p + n p 2 e f e f e f + n f + e p + n p e f + n f 2 e f + n f + e p e f + n p − n f − e p e p e f e p + n p + 1 e f e f e f + n f e f + n f + e p + n p 2( + )  e f + n f e f + e p if e p ≤ 2 e p   Wong 3 = e f − h, h = 2 + 0 . 1( e p − 2) if 2 < e p ≤ 10  2 . 8 + 0 . 001( e p − 10) if e p > 10 

  7. Expense Metric Statement Ranking S Y E ( τ , p, b ) = Ranking of b according to τ B S X Number of statements in p ∗ 100 B S Y F S X F ❖ Assumes that the developer checks the ranking from top to S Y bottom S X A A ❖ The higher the faulty statement is ranked, the earlier the fault is Formula X Formula Y found

  8. Does every test execution help you? ❖ When a statement is executed by a failing test, we suspect it more; by a passing test, we suspect it less. ❖ Ideally , we want the failing test to only execute the faulty statement, which is not possible of course. ❖ Practically , we want the subset of test runs that gives us the most distinguishing power, and we want this as early as possible.

  9. What is the information gain of executing one more test?

  10. τ ( s j | T i ) Convert suspiciousness P T i ( B ( s j )) = P m j =1 τ ( s j | T i ) into probability m Compute the Shannon Entropy X H T i ( S ) = − P T i ( B ( s j )) · log P T i ( B ( s j )) of Fault Locality j =1 P T i +1 ( B ( s j )) = P T i +1 ( B ( s j ) | F ( t i +1 )) · α + Assuming the failure rate observed so far, compute lookahead P P T i +1 ( B ( s j ) |¬ F ( t i +1 )) · (1 − α ) We can predict the information gain of a test case!

  11. sed, v2, F_AG_19 grep, v3, F_KP_3 flex, v5, F_AA_4 flex, v5, F_JR_2 1.0 1.0 1.0 1.0 Suspiciousness Suspiciousness Suspiciousness Suspiciousness 0.5 FLINT FLINT FLINT FLINT TCP TCP TCP TCP Random Random Random Random 10 40 Exp. Reduction FLINT Exp. Reduction FLINT Exp. Reduction FLINT Exp. Reduction FLINT 10 10 Exp. Reduction Greedy Exp. Reduction Greedy Exp. Reduction Greedy Exp. Reduction Greedy Expense Reduction Expense Reduction Expense Reduction Expense Reduction 30 5 0 20 0 0 − 5 10 − 20 − 10 0 − 15 − 20 − 40 − 20 0 0 0 0 20 20 20 20 40 40 40 40 60 60 60 60 80 80 80 80 100 100 100 100 Percentage of Executed Tests Percentage of Executed Tests Percentage of Executed Tests Percentage of Executed Tests

  12. Lessons Learned #1 ❖ Probabilistic view works! Even when there are some wrinkles in your formulations. ❖ Software artefacts tend to exhibit continuity (e.g. coverage of a test case does not change dramatically between versions, etc). This helps the point 1.

  13. Problem Solved…? ❖ Various empirical study established partial rankings between formulas at first. ❖ Then a theoretical study proved the dominance between formulas and their performance in Expense metrics.

  14. But then machines arrived. Aside: we also automatically evolved formulas using GP, which we then proved cannot be bettered by humans. So technically machines arrived twice .

  15. Machine Based Evaluation ❖ Qi et al. took a backward approach ❖ Use suspicious score as weights to mutate program states until Genetic Programming can repair the fault. ❖ The better the localisation, the quicker the repair will be found.

  16. Strange Results ❖ Theory says Jaccard formula is worse than Op2. ❖ But machines found it much easier to repair programs when using the localisation from Jaccard. ❖ Why?

  17. Abstraction destroys Information ❖ Expense metric assumes linear consumption of the result (i.e. < developer checks statements following the ranking). Same ranking, completely different ❖ GP consumes raw amount of information. suspiciousness numbers, which is a much richer source of information.

  18. New Evaluation Metric technique, L , that can always pinpoint s f as follows: ❖ Following the way we ⇢ 1 ( s i = s f ) L ( s i ) = (0 < ✏ ⌧ 1 , s i 2 S, s i 6 = s f ) predicted information yield, ✏ that we can convert outputs of FL techniques that we should be able to describe ⌧ ( s i ) the true fault locality as a P τ ( s i ) = i =1 ⌧ ( s i ) , (1 ≤ i ≤ n ) P n probability distribution. erts suspiciousness scores given by any ❖ Subsequently, measure the ln P L ( s i ) X D KL ( P L || P τ ) = P τ ( s i ) P L ( s i ) cross-entropy between the true i distribution and one generated Locality Information Loss (LIL) by any technique. defined with Kullback-Leibler divergence

  19. Worth a thousand words. Op2 (LIL=7.34) Ochiai (LIL=5.96) Faulty Statement Suspiciousness Suspiciousness 0.8 0.8 0.4 0.4 Faulty Statement 0.0 0.0 Executed Statements Executed Statements Jaccard (LIL=4.92) MUSE (LIL=0.40) Faulty Statement Faulty Statement Suspiciousness Suspiciousness 0.8 0.8 0.4 0.4 0.0 0.0 Executed Statements Executed Statements

  20. Lessons Learned #2 ❖ Entropy measures are much richer than simply counting something: it gives you a holistic view. ❖ Cross-entropy is a vastly underused tool in software engineering in general.

  21. Spectra Based Fault Localisation e p e f − e p + n p + 1 Program Spectrum Formula (Suspiciousness) Higher ranking = Fewer statements to check Tests Ranking

  22. grep, v3, F_KP_3 1.0 Suspiciousness 0.5 FLINT TCP Random Exp. Reduction FLINT 10 Exp. Reduction Greedy Expense Reduction 0 − 10 − 20 0 20 40 60 80 100 Percentage of Executed Tests

Recommend


More recommend