Uniform Sampling of SAT Solutions for Configurable Systems: Are We There Yet? Gilles Perrouin gilles.perrouin@unamur.be @GPerrouin Journée GLE / LOUISE / RIMEL - GDR GPL CNRS - Talence 12 Avril 2019 gilles.perrouin@unamur.be �1
Uniform Sampling of SAT Solutions for Configurable Systems: Are We There Yet? Quentin Mathieu Xavier Maxime Plazar Acher Devroey Cordy gilles.perrouin@unamur.be �2
Configurable Systems �3
A Universe of Options 0 33 48 320 10,000+ # Variants (independent Boolean options) 2 33 2 320 �4
Modelling the Universe: Feature Models AND Multifunction printer Print Scan Fax Connection Laser Inkjet USB Ethernet XOR OR Optional feature K. Kang, S. Cohen, J. Hess, W. Novak, and S. Peterson. Feature-Oriented Domain Analysis (FODA) Feasibility Study. Technical Report CMU/SEI-90-TR-21, 1990. gilles.perrouin@unamur.be �5
Modelling the Universe: Feature Models Multifunction printer Print Scan Fax Connection Conjunctive Normal Form (CNF) Laser Inkjet USB Ethernet M. Mendonca, A. Wasowski, and K. Czarnecki, Sat-based analysis of feature models is easy, SPLC ’09 gilles.perrouin@unamur.be �6
Configurable Systems Sampling Combinatorial Interaction Testing M. F. Johansen, Ø. Haugen, and F. Fleurey, An algorithm for generating t- wise covering arrays from large feature models, SPLC 2012 B. J. Garvin, M. B. Cohen, and M. B. Dwyer, Evaluating improvements to a meta-heuristic search for constrained interaction testing, EMSE 2011. Dissimilarity Testing C. Henard, M. Papadakis, G. Perrouin, J. Klein, P. Heymans, and Y. L. Traon, Bypassing the combinatorial explosion: Using similarity to generate and prioritize t-wise test configurations for software product lines, TSE , 2014. Dedicated Heuristics F. Medeiros, C. Kästner, M. Ribeiro, R. Gheyi, and S. Apel, A comparison of 10 sampling algorithms for configurable systems, ICSE 2016. gilles.perrouin@unamur.be �7
Configurable Systems Sampling Combinatorial Interaction Testing Dissimilarity Testing Dedicated Heuristics gilles.perrouin@unamur.be �8
Randomness influence in Sampling Let’s take the (first) configurations returned by the solver… Most-enabled-disabled (sample of 2 variants) 0 fault covered (but covers 33% of 6 faults on average) A. Halin, A. Nuttinck, M. Acher, X. Devroey, G. Perrouin, and B. Baudry, Test them all, is it worth it? assessing configuration sampling on the jhipster web development stack, EMSE, 2018. Hampers dissimilarity sampling in a search-based exploration: “local” focus due solver’s internal order C. Henard, M. Papadakis, G. Perrouin, J. Klein, P. Heymans, and Y. L. Traon, Bypassing the combinatorial explosion: Using similarity to generate and prioritize t-wise test configurations for software product lines, TSE, 2014. gilles.perrouin@unamur.be �9
Uniform Random Sampling Uniform Random Sampling helps with population initialisation in Evolutionary Algorithms H. Maaranen, K. Miettinen, and M. M. Mäkelä, Quasi-random initial population for genetic algorithms, Comput. Math. Appl, 2004. A. de Perthuis de Laillevault, B. Doerr, and C. Doerr, Money for nothing: Speeding up evolutionary algorithms through better initialization, GECCO 2015. Uniform Random Sampling may outperform t-wise sampling A. Arcuri and L. Briand, Formal analysis of the probability of interaction fault detection using random testing, TSE , 2012. gilles.perrouin@unamur.be �10
Uniform Random SAT Sampling Unigen S. Chakraborty, K. S. Meel, and M. Y. Vardi, A scalable and nearly uniform generator of sat witnesses, CAV 2013. S. Chakraborty, D. J. Fremont, K. S. Meel, S. A. Seshia, and M. Y. Vardi, On parallel scalable uniform SAT witness generation, TACAS 2015 QuickSampler R. Dutra, K. Laeufer, J. Bachrach, and K. Sen, Efficient sampling of SAT solutions for testing, ICSE 2018 gilles.perrouin@unamur.be �11
Motivation Since feature models are also SAT formulas we can use UniGen/ QuickSampler to generate uniform samples for configurable systems. Are we there yet ? gilles.perrouin@unamur.be �12
Research Questions RQ1 (scalability and execution time): Are UniGen and QuickSampler able to generate samples out of feature models? RQ2 (uniformity): Do UniGen and QuickSampler generate uniform configuraQons out of feature models? RQ3 (relevance for tesQng): How does QuickSampler’s sacrifices on uniformity impact its bug-finding ability in JHipster? gilles.perrouin@unamur.be �13
Empirical Evaluation 128 Feature Models 0 10,000+ previous benchmarks: 7.7*10417 SAT SoluQons 1048 SAT SoluQons gilles.perrouin@unamur.be �14
Scalability (RQ1) Does not produce any (valid by construcQon) sample within 2 hours for all 128 feature models but Does produce 1 million samples (1 sample/ms) within 2 hours for all 128 feature models but 4. It produces 75 % of valid samples (invalid ones can be removed by SAT check a^erwards) gilles.perrouin@unamur.be �15
Uniformity (RQ2) Counting the number of times each possible solution is generated does not scale as it requires 4 times as many samples as the feature model has solutions (10 50 ). Subsampling biases towards uniformity Rather, we focus on the frequency of appearance of individual features (f th ) and measure deviations (dev): dev ( v ) = 100 * | f th ( v ) − f obs ( v ) | f th ( v ) = # SAT ( ϕ ∧ v ) # SAT ( ϕ ) f th ( v ) gilles.perrouin@unamur.be �16
QuickSampler Deviation: AIM711 Individual feature deviations shown in ascending order 50 % 10 % gilles.perrouin@unamur.be �17
QuickSampler Deviation: toybox Individual feature deviations shown in ascending order 50 % 10 % gilles.perrouin@unamur.be �18
QuickSampler Deviation: ucLinux Individual feature deviations shown in ascending order 50 % 10 % gilles.perrouin@unamur.be �19
Uniformity (RQ2) QuickSampler is not close to uniformity for feature models with deviations up to 800% (in contrast with non-feature models benchmarks previously used) Negligible observed deviations on non- feature models and gilles.perrouin@unamur.be �20
Relevance for testing (RQ3) OSS full stack configurator, 26,000+ configurations exhaustively assessed in previous work (Halin et al ) => Ground truth dev(MongoDB) = 116% dev(Cassandra) = 107%, … dev(Uaa) = 9% QuickSampler slightly over-represent Uaa involved in 3 of the 6 interactions bugs identified in JHipster. Under-representation of some features does not impact bug finding in JHipster. gilles.perrouin@unamur.be �21
Future Work Fully assess Minimal Independent Support role in the scalability of Unigen (no improvement seen yet) Experiment with new uniform sampler (developed while the study was made): Sharma, S., Gupta, R., Roy, S., & Meel, K. S. (2018). Knowledge Compilation meets Uniform Sampling. EPiC Series in Computing, 57, 620-636. Investigate if uniform sampling is a cost- effective bug finding strategy gilles.perrouin@unamur.be �22
Multifunction printer Print Scan Fax Connection Laser Inkjet USB Ethernet 128 Feature Models Uniform but not scalable (Mostly) Scalable but not uniform gilles.perrouin@unamur.be �23
Uniform Sampling of SAT Solutions for Configurable Systems: Are We There Yet? Quentin Mathieu Gilles Xavier Maxime Plazar Acher Perrouin Devroey Cordy gilles.perrouin@unamur.be �24
Recommend
More recommend