REFSQ’17, Essen, Germany March 2nd, 2017 ON THE EQUIVALENCE BETWEEN GRAPHICAL AND TABULAR REPRESENTATIONS FOR SECURITY RISK ASSESSMENT Katsiaryna Labunets 1 , Fabio Massacci 1 , Federica Paci 2 1 University of Trento, Italy (<name.surname@unitn.it>) 2 University of Southampton, UK (F.M.Paci@soton.ac.uk)
2 The Problem [1/2] • Several security risk assessment (SRA) methods and standards to identify threats and possible security requirements are available • Academia relies on graphical methods (e.g. Anti-Goals, Secure Tropos, CORAS) • Industry opts for tabular methods (OCTAVE, ISO 27005, NIST 800-30) • REFSQ’17 representation stats : • 5 papers discuss graphical notations (i*, Use Сases, BPMN diagrams), • 3 papers on mixed methods, • 1 paper studied requirements in natural language.
3 The Problem [2/2] • Are graphical methods actually better? • No clear winner from the past experiments Both methods have • [ESEM 2013]: clear process • Graph > Table w.r.t. # of threats (p < 5%) • Table > Graph w.r.t. # of security controls (p < 5%) • Graph =? Table w.r.t. perceived efficacy (not statistically significant) Tabular method has • [EmpiRE at RE 2014]: less clear process • Graph =? Table w.r.t # of threats and controls (not statistically significant) • Graph > Table w.r.t. perceived efficacy (p <5%) • Are they really different?
4 Research Questions • RQ1: Are tabular and graphical SRA methods equivalent w.r.t. actual efficacy? • RQ2: Are tabular and graphical SRA methods equivalent w.r.t. preceived efficacy? How to answer?
5 Difference tests • Problem • H 0 : μ A = μ B • H a : μ A. ≠ μ B • Test: t-test, Wilcoxon, Mann-Whitney, etc. • we can only reject the null hypothesis H 0 . • we cannot accept the alternative hypothesis H a . • Lack of evidence for difference ≠ evidence for equiavalence • How different two methods should be in order to be considered different?
6 Equivalence test • Two One-Sided Tests (TOST) [Schuimann, 1981] • Problem • ẟ defines the range whithin which two methods are considered to be equivalent • Percentage ([80%;125%] by FDA or [70%;143%] by EU) for rational data • Fixed value (e.g. 0.6 for ordinal values on 1-5 Likert scale with 3 as a mean value) for ordinal data • We can use t-test, or Wilcoxon, or Mann-Whitney, etc.
7 Experimental Design • Goal • Compare graphical and tabular representation w.r.t. to the actual and perceived efficacy of an SRA method when applied by novices. • Treatments • Method: Graphical and tabular SRA methods used in industry • Task: Conduct SRA for each of four security tasks 1. Identity Management security (IM), 2. Access Management security (AM), 3. Web Application and Database security (WebApp/DB), 4. Network and Infrastructural security (Network/Infr). • Experiment: we conducted two controlled experiments in 2015 and 2016 years.
8 Experimental Execution • ATM Domain • Remotely Operated Tower (ROT) Scenario by Eurocontrol • Unmanned Air Traffic Management (UTM) Scenario by NASA • Methods: • Graphical CORAS by SINTEF • Tabular SecRAM by SESAR • Participants were provided with a catalogues of security threats and controls* • Participants: 35 and 48 MSc students in Computer Science were involved in ROT2015 and UTM2016 controlled experiments * M. de Gramatica, K. Labunets , F. Massacci, F. Paci and A. Tedeschi. “ The Role of Catalogues of Threats and Security Controls in Security Risk Assessment: An Empirical Study with ATM Professionals ”. In Proc. of REFSQ’15.
9 Experimental Protocol METHOD DESIGNERS PARTICIPANTS + RESEARCHERS DOMAIN EXPERTS DOMAIN EXPERTS EVALUATION TRAINING APPLICATION TRAINING ON SECURITY TOPICS ROT/UTM IM AM WebApp/DB Network/Infr SCENARIO REPORTS GROUP 1 GROUPS REPORT DELIVER QUALITY GRAPHICAL Groups Groups Groups Groups RESULTS ASSESSMENT METHOD of Type A of Type B of Type A of Type B TABULAR Groups Groups Groups Groups METHOD of Type B of Type A of Type B of Type A METHODS FOCUS GROUPS GROUP X INTERVIEW Q1 Q3 1 Q3 2 INITIAL METHOD FINAL METHOD BACKGROUND IMPRESSION IMPRESSION Type A Type B ROT2015 9 groups 9 groups UTM2016 13 groups 11 groups
10 Results: Actual Efficacy Actual Efficacy: whether the treatment improves performance of the task Exp Act. Tabular Graphical ẟ mean TOST Efficacy Mean Mean Tab-Graph p-value ROT2015 Threats 3.17 2.95 +0.22 0.0009 SC 3.28 2.97 +0.31 0.001 UTM2016 Threats 3.28 3.24 +0.04 6.3*10 -6 SC 3.31 3.29 +0.02 2.4*10 -7 Table ≈ Graph (both experiments) w.r.t. quality of threats and controls
11 Results: Perceived Efficacy Exp Perc Tabular Graphical ẟ mean TOST Efficacy Mean Mean Tab-Graph p-value ROT2015 PEOU 3.63 3.20 +0.43 0.08 PU 3.54 3.05 +0.37 0.18 UTM2016 PEOU 3.74 3.60 +0.14 2.6*10 -5 PU 3.67 3.29 +0.38 0.03 ROT2015 • PEOU & PU: Tabular ? Graphical – inconclusive UTM2016 • PEOU & PU: Tabular ≈ Graphical
12 Threats to Validity • Difference between two experiments (internal validity) • Low statistical significance (conclusion validity) • Use of students instead of practitioners (external validity) • Simple scenario (external validity)
13 Conclusions • No difference? – check equivalence test • How to measure Actual Efficacy: Quantity vs. Quality? • Both graphical and tabular methods have similar support for SRA • Clear process matters! • What is next? • Comprehensibility of risk modeling notations • Labunets et al. “Model Comprehension for Security Risk Assessment: An Empirical Comparison of Tabular vs. Graphical Representations”. Empirical Software Engineering , 2017.
Recommend
More recommend