Model Quality & Metamorphic Testing Seminar SE4AI 18.06.2020 | Seminar SE4AI | Johannes Wehrstein & Anjali Tewari | 1
Outline 1. Evaluating Model Quality (Anjali Tewari) ▪ Properties and Factors ▪ Metrics and Measures ▪ Improving MQ 2. Metamorphic Testing (Johannes Wehrstein) ▪ Oracle Problem Deriving Relations ▪ ▪ Proving Sufficiency of MT 3. Questions & Discussion 18.06.2020 | Seminar SE4AI | Johannes Wehrstein & Anjali Tewari | 2
MODEL QUALITY 18.06.2020 | Seminar SE4AI | Johannes Wehrstein & Anjali Tewari | 3
Artificial Intelligence Life Cycle Talagala, Nisha. “7 Artificial Intelligence Trends and How They Work With Operational Machine Learning.” Oracle Data Science , blogs.oracle.com/datascience/7-artificial- intelligence-trends-and-how-they-work-with-operational- machine-learning-v2. 18.06.2020 | Seminar SE4AI | Johannes Wehrstein & Anjali Tewari | 4
ML Testing Properties Model Correctness Robustness Security relevance Data Privacy Efficiency Fairness Interpretability Zhang, Jie M., et al. “Machine Learning Testing: Survey, Landscapes and Horizons.” IEEE Transactions on Software Engineering, 2020, pp. 1– 1 18.06.2020 | Seminar SE4AI | Johannes Wehrstein & Anjali Tewari | 5
Factors that affect Model Quality Underfitted Bias: • Due to misrepresentation in training sets • Not enough variance in the testing sets Good Fit/Robust Outdated models: Model Quality is everchanging because data is everchanging Overfitting/Underfitting: striking the balance between Overfitted generalization and optimization 18.06.2020 | Seminar SE4AI | Johannes Wehrstein & Anjali Tewari | 6
Metrics for Model Quality Bayes Error Rate: Human Performance Rate Depending on the type of problem, there can be: Regression Errors • Mean Squared Error(MSE) • Root-Mean-Squared-Error(RMSE). • Mean-Absolute-Error(MAE). • R² or Coefficient of Determination. • Adjusted R² Classification Errors 18.06.2020 | Seminar SE4AI | Johannes Wehrstein & Anjali Tewari | 7
Classification Error Measures Actually A Actually not A AI predicts A True Positive (TP) False Positive (FP) AI predicts not A False Negative (FN) True Negative (TN) True positives and true negatives are the correct predictions False negatives are the wrong predictions or misses False positives are wrong predictions or false alarms This matrix represents 2-class problems, matrices for multi-class problems have additional rows and columns for each class. 18.06.2020 | Seminar SE4AI | Johannes Wehrstein & Anjali Tewari | 8
Measures for Model Quality Successful Classifications: 𝑈𝑄 Recall = 𝑈𝑄 + 𝐺𝑂 𝐺𝑂 False negative rate = 𝑈𝑄 + 𝐺𝑂 = 1 − Recall False Classifications (Noise): 𝑈𝑄 Precision = 𝑈𝑄 + 𝐺𝑄 𝐺𝑄 False positve rate = 𝐺𝑄 + 𝑈𝑂 Combined measure (harmonic mean): F1−Score = 2 ∗ recall ∗ precision Source: https://en.wikipedia.org/wiki/F1_score recall + precision 18.06.2020 | Seminar SE4AI | Johannes Wehrstein & Anjali Tewari | 9
Validation through Experts Domain expert evaluates the plausibility of a learned model Run Clustering • Subjective Algo • Time-intensive • Costly Interpret Result Visually Explore But sometimes the only option (e .g. Clustering) Manually Refine A better solution: Compare generated clusters with manually created clusters 18.06.2020 | Seminar SE4AI | Johannes Wehrstein & Anjali Tewari | 10
Validation on Data Using Test Set / Validation Set Using K-Fold Validation Using Iterative K-Fold Validation with Shuffling 18.06.2020 | Seminar SE4AI | Johannes Wehrstein & Anjali Tewari | 11
On-line Validation On-line validation : test learned model in a fielded application Pro Cons Best estimate for overall utility Bad model may be costly Methods: • Telemetry • A/B Testing 18.06.2020 | Seminar SE4AI | Johannes Wehrstein & Anjali Tewari | 12
Improving Model Quality Avoidable bias • Training a bigger model • Training longer optimization models Variance in data • Getting more data • Different regularization techniques • Enlarging hyper-parameter search space Overfitting to Validation set Data Mismatch 18.06.2020 | Seminar SE4AI | Johannes Wehrstein & Anjali Tewari | 13
METAMORPHIC TESTING 18.06.2020 | Seminar SE4AI | Johannes Wehrstein & Anjali Tewari | 14
Scenario Assume we have following scenario: 1. ML based Service 2. Data Scarcity / No Test Oracle Aim : Make sure that Learning Algorithm works well 18.06.2020 | Seminar SE4AI | Johannes Wehrstein & Anjali Tewari | 15
Solving The Oracle Problem ASSERTION N-VERSION METAMORPHIC CHECKING PROGRAMMING TESTING 18.06.2020 | Seminar SE4AI | Johannes Wehrstein & Anjali Tewari | 16
Metamorphic Testing Approach for both: test case generation test result verification Originally proposed for generating new test cases based on successful ones (Chen et al, 1998) Central element: Metamorphic Relations (MRs) Metamorphic Testing: A New Approach for Generating Next Test Cases (Chen et al, 1998) 18.06.2020 | Seminar SE4AI | Johannes Wehrstein & Anjali Tewari | 17
Example Relations for Shortest Path in Graph Program: P(𝐻, 𝑏, 𝑐) (computes shortest path between vertices 𝑏 and 𝑐 in undirected graph 𝐻 ) Proving that result is really the shortest path: difficult Metamorphic Relations 𝑄 𝐻, 𝑐, 𝑏 = |𝑄 𝐻, 𝑏, 𝑐 | 𝑄 𝐻, 𝑏, 𝑐 + |𝑄(𝐻, 𝑐, 𝑑)| ≥ |𝑄 𝐻, 𝑏, 𝑑 | 18.06.2020 | Seminar SE4AI | Johannes Wehrstein & Anjali Tewari | 18
Metamorphic Relations (MRs) Source Dataset 𝑔 : function / algorithm 𝑌 : Input space 𝑍: Output space Apply MRs ℛ ⊆ 𝑌 𝑜 × 𝑍 𝑜 , 𝑜 ≥ 2 𝑆(𝑦 1 , 𝑦 2 , … , 𝑦 𝑜 , 𝑔 𝑦 1 , 𝑔 𝑦 2 , … , 𝑔 𝑦 𝑜 ) Follow-up Dataset Caveat : • MRs = Relations between Testcases ( 𝑜 ≥ 2 ), not between Inputs & Outputs ( → Assertion Testing) 18.06.2020 | Seminar SE4AI | Johannes Wehrstein & Anjali Tewari | 19
Metamorphic Testing Process Run (learning) Generate follow-up Develop MRs algorithm on Evaluation dataset follow-up dataset 18.06.2020 | Seminar SE4AI | Johannes Wehrstein & Anjali Tewari | 20
Deriving Metamorphic Relations Derive from Derive from problem learning algorithm 18.06.2020 | Seminar SE4AI | Johannes Wehrstein & Anjali Tewari | 21
Deriving MRs from learning algorithm 1. Consistence with affine transformation 2. Permutation of class labels / attributes 3. Addition of uninformative attributes 4. Consistence with re-prediction 5. Removal of classes … → MRs are independent from underlying problem Testing and Validating Machine Learning Classifiers by Metamorphic Testing: Xie et al (2009) 18.06.2020 | Seminar SE4AI | Johannes Wehrstein & Anjali Tewari | 22
Metamorphic Testing Execute MT Refinement of (create follow- MRs up dataset, run algorithm) Evalution (check effectiveness of MT) 18.06.2020 | Seminar SE4AI | Johannes Wehrstein & Anjali Tewari | 23
Proving Sufficiency of MT • Evaluate testing with test coverage ( → mostly impossible for ML) • Mutant Testing • Mutated Tests 18.06.2020 | Seminar SE4AI | Johannes Wehrstein & Anjali Tewari | 24
MT: Advantages / Disadvantages Advantages Disadvantages Difficult Requires „fast“ Simplicity in Straightforward generation of learning concept implementation MR algorithms Difficulty dealing Ease of Low costs with automation indeterminism 18.06.2020 | Seminar SE4AI | Johannes Wehrstein & Anjali Tewari | 25
Recommend
More recommend