Virtual Mutation Analysis of Relational Database Schemas University of Sheffield Phil McMinn Allegheny College Gregory M. Kapfhammer University of Sheffield Chris J. Wright
Relational Databases – Why Should We (Still) Care? A vital component of many software systems Despite the wave of interest in “NoSQL” technologies, Relational Databases are still popular (and faster) For developers: schemas provide self-documentation
Relational Databases – Why Should We (Still) Care? A vital component of many software systems Despite the wave of interest in “NoSQL” technologies, Relational Databases are still popular (and faster) For developers: schemas provide self-documentation Relational Databases are still important, popular and relevant
A Relational Database Schema
A Relational Database Schema Table
A Relational Database Schema Table Column and data type
Integrity Constraints Prevent invalid data being entered into the database Encode domain logic
Integrity Constraints Prevent invalid data being entered into the database Encode domain logic
Testing the Schema Correctly accepted by the schema Correctly rejected by the schema Correctly rejected by the schema
Testing the Schema Correctly accepted by the schema Correctly rejected by the schema Correctly rejected by the schema
Testing the Schema Correctly accepted by the schema Correctly rejected by the schema Correctly rejected by the schema
Testing the Schema Correctly accepted by the schema Correctly rejected by the schema Correctly rejected by the schema
Why Do We Need to Do This?
Why Do We Need to Do This? To trap common errors when designing a schema For example: lack of uniqueness property on usernames, out of range values
Why Do We Need to Do This? To trap common errors when designing a schema For example: lack of uniqueness property on usernames, out of range values To test development behaviour vs deployment DBMSs have subtly different behaviors
Why Do We Need to Do This? To trap common errors when designing a schema For example: lack of uniqueness property on usernames, out of range values To test development behaviour vs deployment DBMSs have subtly different behaviors Nobody throws away a database of data To test the success of database migrations
Why Do We Need to Do This? To trap common errors when designing a schema For example: lack of uniqueness property on usernames, out of range values To test development behaviour vs deployment DBMSs have subtly different behaviors Nobody throws away a database of data To test the success of database migrations Industry advice Destroying database consistency can have huge cost implications
Mutation Analysis Once a test suite has been created, its fault finding capability can be estimated with mutation analysis. For relational database schema testing, mutants are created by making small changes to the schema
Mutation Analysis Once a test suite has been created, its fault finding capability can be estimated with mutation analysis. For relational database schema testing, mutants are created by making small changes to the schema
Mutation Analysis Once a test suite has been created, its fault finding capability can be estimated with mutation analysis. For relational database schema testing, mutants are created by making small changes to the schema
Mutation Analysis Once a test suite has been created, its fault finding capability can be estimated with mutation analysis. For relational database schema testing, mutants are created by making small changes to the schema
Mutation Analysis is Costly schema mutants
Mutation Analysis is Costly mutant schema mutants + test suite database Mutant killed / alive
Mutation Analysis is Costly mutant schema mutants + test suite database Mutant killed / alive SchemaAnalyst
Mutation Analysis is Costly
Mutation Analysis is Costly DO FEWER
Mutation Analysis is Costly DO FEWER DO SMARTER
Mutation Analysis is Costly DO FEWER DO SMARTER DO FASTER
Mutation Analysis is Costly mutant schema mutants + test suite database Mutant killed / alive SchemaAnalyst
Mutation Analysis is Costly mutant schema mutants + test suite database Mutant killed / alive SchemaAnalyst
Mutation Analysis is Costly mutant schema mutants + e h t h t i w g n i t a c i t n i u n m o m s e o i c r e f u o q t s L o Q test suite c S h g g n i H i t u c e x database e d n a S M B D Mutant killed / alive SchemaAnalyst
Reducing the Cost mutant schema mutants + test suite database Mutant killed / alive SchemaAnalyst
Reducing the Cost mutant schema mutants + d a e h r e v o n o i t a c test suite i n u m m o c o database n , l a c o L Mutant killed / alive SchemaAnalyst
Reducing the Cost mutant schema mutants + test suite model of database Mutant killed / alive SchemaAnalyst
Reducing the Cost Virtual mutant Mutation Analysis schema mutants + test suite model of database Mutant killed / alive SchemaAnalyst
Reducing the Cost Virtual mutant Mutation Analysis schema mutants + d a e h r e v o n test suite model of o i t u c e x e r e database w o L Mutant killed / alive SchemaAnalyst
The Model
The Model Integrity constraint predicate icp1
The Model Integrity constraint predicate icp1 icp2 icp3
The Model Integrity constraint predicate icp1 icp2 icp3 icp4 icp5
The Model Integrity constraint predicate icp1 icp2 icp3 icp4 icp5 Form an acceptance predicate for the table: ap = icp1 /\ icp2 /\ icp3 /\ icp4 /\ icp5
The Model Integrity constraint predicate icp1 icp2 icp3 icp4 icp5 Form an acceptance predicate for the table: ap = icp1 /\ icp2 /\ icp3 /\ icp4 /\ icp5 True when DBMS would accept the data False otherwise
Virtual DBMS Models
Empirical Study RQ1. What is the relative efficiency of the virtual approach? RQ2. What are the time savings? RQ3. How do mutation scores compare when the standard approach is run for as long as the virtual one?
Subject Schemas
Mutation Analysis Time (Log Transformed) 1 1 1 0 0 0 1 0 1 0 1 0 C 0 0 0 0 0 0 o 0 0 0 0 0 0 f f 0 0 0 0 0 0 e e O r d e r s E m p l o y e e I n v e n t RQ1: Efficiency o r y I s o 3 J 1 W 6 6 h o M i s o S Standard z e i l r l a v P e r e r m i s s i o n N s i s t W e a t h e r P e r s o n P r o Database Schema d u c t s C o f f e e O r d e r s E m p l o y e e I n v e n t o r y I s o 3 J 1 W 6 6 h o M i s o S z e i Virtual l r l a v P e r e r m i s s i o n N s i s t W e a t h e r P e r s o n P r o d u c t s SQLite PostgreSQL HyperSQL
Mutation Analysis Time (Log Transformed) Virtual Mutation Analysis is significantly more efficient for 1 1 1 0 0 0 1 0 1 0 1 0 C 0 0 0 0 0 0 o 0 0 0 0 0 0 f f 0 0 0 0 0 0 e e O r d e r s E m p l o y e e I n v Postgres and HyperSQL, but not SQLite e n t RQ1: Efficiency o r y I s o 3 J 1 W 6 6 h o M i s o S Standard z e i l r l a v P e r e r m i s s i o n N s i s t W e a t h e r P e r s o n P r o Database Schema d u c t s C o f f e e O r d e r s E m p l o y e e I n v e n t o r y I s o 3 J 1 W 6 6 h o M i s o S z e i Virtual l r l a v P e r e r m i s s i o n N s i s t W e a t h e r P e r s o n P r o d u c t s SQLite PostgreSQL HyperSQL
RQ2: Time Savings HyperSQL PostgreSQL SQLite 100 Percentage of Mean Time Saved 50 0 − 50 − 100 50 100 150 Number of Mutants
RQ2: Time Savings HyperSQL PostgreSQL SQLite 100 Virtual Mutation Analysis yields large time savings for Percentage of Mean Time Saved Postgres and HyperSQL but not always with SQLite, 50 leading to an average time saving of 51% overall 0 − 50 − 100 50 100 150 Number of Mutants
Total Number of Mutants C o 1 1 f f e 5 0 5 e 0 0 0 0 O r d e E r m s p l o y e I n e v e n t o r y I s J RQ3: Comparison o W 3 M h 1 o o 6 z 6 i i s l HyperSQL l S a P e r e v r e m r i s N s i i o s n t W s e a t h e r P e r s o P n r o d u C c t o s f f e e O r d e E r m s p l o y e I n e v e n t o r y I s J o Database Schema W 3 M h 1 Selective o o 6 z PostgreSQL 6 i i s l l S a P e r e v r e m r i s N s i i o s n t W s e a t h e Virtual r P e r s o P n r o d u C c t o s f f e e O r d e E r m s p l o y e I n e v e n t o r y I s J o W M 3 h 1 o o 6 z 6 i i s l l S a P e SQLite r e v r e m r i s N s i i o s n t W s e a t h e r P e r s o P n r o d u c t s
Recommend
More recommend