Model Quality & Metamorphic Testing Seminar SE4AI 18.06.2020 | - PowerPoint PPT Presentation

Model Quality & Metamorphic Testing Seminar SE4AI 18.06.2020 | Seminar SE4AI | Johannes Wehrstein & Anjali Tewari | 1

Outline 1. Evaluating Model Quality (Anjali Tewari) ▪ Properties and Factors ▪ Metrics and Measures ▪ Improving MQ 2. Metamorphic Testing (Johannes Wehrstein) ▪ Oracle Problem Deriving Relations ▪ ▪ Proving Sufficiency of MT 3. Questions & Discussion 18.06.2020 | Seminar SE4AI | Johannes Wehrstein & Anjali Tewari | 2

MODEL QUALITY 18.06.2020 | Seminar SE4AI | Johannes Wehrstein & Anjali Tewari | 3

Artificial Intelligence Life Cycle Talagala, Nisha. “7 Artificial Intelligence Trends and How They Work With Operational Machine Learning.” Oracle Data Science , blogs.oracle.com/datascience/7-artificial- intelligence-trends-and-how-they-work-with-operational- machine-learning-v2. 18.06.2020 | Seminar SE4AI | Johannes Wehrstein & Anjali Tewari | 4

ML Testing Properties Model Correctness Robustness Security relevance Data Privacy Efficiency Fairness Interpretability Zhang, Jie M., et al. “Machine Learning Testing: Survey, Landscapes and Horizons.” IEEE Transactions on Software Engineering, 2020, pp. 1– 1 18.06.2020 | Seminar SE4AI | Johannes Wehrstein & Anjali Tewari | 5

Factors that affect Model Quality Underfitted Bias: • Due to misrepresentation in training sets • Not enough variance in the testing sets Good Fit/Robust Outdated models: Model Quality is everchanging because data is everchanging Overfitting/Underfitting: striking the balance between Overfitted generalization and optimization 18.06.2020 | Seminar SE4AI | Johannes Wehrstein & Anjali Tewari | 6

Metrics for Model Quality Bayes Error Rate: Human Performance Rate Depending on the type of problem, there can be: Regression Errors • Mean Squared Error(MSE) • Root-Mean-Squared-Error(RMSE). • Mean-Absolute-Error(MAE). • R² or Coefficient of Determination. • Adjusted R² Classification Errors 18.06.2020 | Seminar SE4AI | Johannes Wehrstein & Anjali Tewari | 7

Classification Error Measures Actually A Actually not A AI predicts A True Positive (TP) False Positive (FP) AI predicts not A False Negative (FN) True Negative (TN) True positives and true negatives are the correct predictions False negatives are the wrong predictions or misses False positives are wrong predictions or false alarms This matrix represents 2-class problems, matrices for multi-class problems have additional rows and columns for each class. 18.06.2020 | Seminar SE4AI | Johannes Wehrstein & Anjali Tewari | 8

Measures for Model Quality Successful Classifications: 𝑈𝑄 Recall = 𝑈𝑄 + 𝐺𝑂 𝐺𝑂 False negative rate = 𝑈𝑄 + 𝐺𝑂 = 1 − Recall False Classifications (Noise): 𝑈𝑄 Precision = 𝑈𝑄 + 𝐺𝑄 𝐺𝑄 False positve rate = 𝐺𝑄 + 𝑈𝑂 Combined measure (harmonic mean): F1−Score = 2 ∗ recall ∗ precision Source: https://en.wikipedia.org/wiki/F1_score recall + precision 18.06.2020 | Seminar SE4AI | Johannes Wehrstein & Anjali Tewari | 9

Validation through Experts Domain expert evaluates the plausibility of a learned model Run Clustering • Subjective Algo • Time-intensive • Costly Interpret Result Visually Explore But sometimes the only option (e .g. Clustering) Manually Refine A better solution: Compare generated clusters with manually created clusters 18.06.2020 | Seminar SE4AI | Johannes Wehrstein & Anjali Tewari | 10

Validation on Data Using Test Set / Validation Set Using K-Fold Validation Using Iterative K-Fold Validation with Shuffling 18.06.2020 | Seminar SE4AI | Johannes Wehrstein & Anjali Tewari | 11

On-line Validation On-line validation : test learned model in a fielded application Pro Cons Best estimate for overall utility Bad model may be costly Methods: • Telemetry • A/B Testing 18.06.2020 | Seminar SE4AI | Johannes Wehrstein & Anjali Tewari | 12

Improving Model Quality Avoidable bias • Training a bigger model • Training longer optimization models Variance in data • Getting more data • Different regularization techniques • Enlarging hyper-parameter search space Overfitting to Validation set Data Mismatch 18.06.2020 | Seminar SE4AI | Johannes Wehrstein & Anjali Tewari | 13

METAMORPHIC TESTING 18.06.2020 | Seminar SE4AI | Johannes Wehrstein & Anjali Tewari | 14

Scenario Assume we have following scenario: 1. ML based Service 2. Data Scarcity / No Test Oracle Aim : Make sure that Learning Algorithm works well 18.06.2020 | Seminar SE4AI | Johannes Wehrstein & Anjali Tewari | 15

Solving The Oracle Problem ASSERTION N-VERSION METAMORPHIC CHECKING PROGRAMMING TESTING 18.06.2020 | Seminar SE4AI | Johannes Wehrstein & Anjali Tewari | 16

Metamorphic Testing Approach for both: test case generation test result verification Originally proposed for generating new test cases based on successful ones (Chen et al, 1998) Central element: Metamorphic Relations (MRs) Metamorphic Testing: A New Approach for Generating Next Test Cases (Chen et al, 1998) 18.06.2020 | Seminar SE4AI | Johannes Wehrstein & Anjali Tewari | 17

Metamorphic Relations (MRs) Source Dataset 𝑔 : function / algorithm 𝑌 : Input space 𝑍: Output space Apply MRs ℛ ⊆ 𝑌 𝑜 × 𝑍 𝑜 , 𝑜 ≥ 2 𝑆(𝑦 1 , 𝑦 2 , … , 𝑦 𝑜 , 𝑔 𝑦 1 , 𝑔 𝑦 2 , … , 𝑔 𝑦 𝑜 ) Follow-up Dataset Caveat : • MRs = Relations between Testcases ( 𝑜 ≥ 2 ), not between Inputs & Outputs ( → Assertion Testing) 18.06.2020 | Seminar SE4AI | Johannes Wehrstein & Anjali Tewari | 19

Metamorphic Testing Process Run (learning) Generate follow-up Develop MRs algorithm on Evaluation dataset follow-up dataset 18.06.2020 | Seminar SE4AI | Johannes Wehrstein & Anjali Tewari | 20

Deriving Metamorphic Relations Derive from Derive from problem learning algorithm 18.06.2020 | Seminar SE4AI | Johannes Wehrstein & Anjali Tewari | 21

Deriving MRs from learning algorithm 1. Consistence with affine transformation 2. Permutation of class labels / attributes 3. Addition of uninformative attributes 4. Consistence with re-prediction 5. Removal of classes … → MRs are independent from underlying problem Testing and Validating Machine Learning Classifiers by Metamorphic Testing: Xie et al (2009) 18.06.2020 | Seminar SE4AI | Johannes Wehrstein & Anjali Tewari | 22

Metamorphic Testing Execute MT Refinement of (create follow- MRs up dataset, run algorithm) Evalution (check effectiveness of MT) 18.06.2020 | Seminar SE4AI | Johannes Wehrstein & Anjali Tewari | 23

Proving Sufficiency of MT • Evaluate testing with test coverage ( → mostly impossible for ML) • Mutant Testing • Mutated Tests 18.06.2020 | Seminar SE4AI | Johannes Wehrstein & Anjali Tewari | 24

MT: Advantages / Disadvantages Advantages Disadvantages Difficult Requires „fast“ Simplicity in Straightforward generation of learning concept implementation MR algorithms Difficulty dealing Ease of Low costs with automation indeterminism 18.06.2020 | Seminar SE4AI | Johannes Wehrstein & Anjali Tewari | 25

Model Quality & Metamorphic Testing Seminar SE4AI 18.06.2020 | - PowerPoint PPT Presentation

Model Quality & Metamorphic Testing Seminar SE4AI 18.06.2020 | Seminar SE4AI | Johannes Wehrstein & Anjali Tewari | 1 Outline 1. Evaluating Model Quality (Anjali Tewari) Properties and Factors Metrics and Measures

RESEARCH TOPICS Metamorphic testing OVERVIEW Fuzz testing Regression testing:

Metamorphic Rocks Basement of Parashant Geological Adventures at Parashant Lesson 4

Engineering Geology Metamorphic Rocks Hussien Al - deeky 1 Engineering Geology Definition

The Remote Metamorphic Engine Detecting, Evading, Attacking the AI and Reverse Engineering Amro

Metamorphic Testing: A Simple Method for Testing Non-Testable Programs Tsong Yueh Chen

Testing Terminology System testing Types of errors Function testing Structure

Levels of Testing Chapter 12 Beyond unit testing Developer Testing stages Unit testing

How Time Variability of Testing the Model Current Map with 8 . . . Testing the Model . . .

A Revisit of the Integration of Metamorphic Testing and Test Suite Based Automated Program

Metamorphic Testing Tsong Yueh Chen Swinburne University of Technology 20 th CREST Open Workshop

Automated Test Data Generation on the Analyses of Feature Models A Metamorphic Testing Approach

Software Testing Software testing 1 V model Software testing 2 Program testing goals To

Property-Based Testing Matt Bachmann @mattbachmann Testing is Important Testing is Important

Software Testing Overview What is software testing? General testing criteria Testing

Software testing Software Testing Introduction Testing levels Automated testing Principles and

1. Test page This page is for testing. This page is for testing. This page is for testing.

Determination of modular forms by fundamental Fourier coefficients Abhishek Saha University of

Determination of the Hyperon Induced Polarization and PolarizationTransfer Coefficients for

Marcel Dettling Institute for Data Analysis and Process Design Zurich University of Applied

Correlation and SLIDES PREPARED SLIDES PREPARED BY BY BY BY Regression LLOYD R. LLOYD R.

The Determination of an Environmental Service for a Contingent Valuation Study Using R to

Numerical summary of data Measures of location: mode , median, mean, Measures of spread:

Estimating variance David L Miller Now we can make predictions Now we are dangerous.

Two Binomial Coefficient Analogues Bruce Sagan Department of Mathematics Michigan State

Sambuz

Useful Links

Newsletter

Mail Us