Applying Computational Learning Theory to Software Testing Neil Walkinshaw
Computational Learning Theory (CLT) • Theoretical analysis of questions of learnability . • Is it feasible to learn a given type of concept from a set of examples? • What are the performance bounds? Time complexity? Gold, E. Mark. "Language • Several theoretical learning frameworks to identification in the limit." Information and control 10.5 (1967) characterise a learning problem. • Identification In the Limit (IIL) Mitchell, Tom M. "Generalization as search." Artificial intelligence(1982) • Version Spaces • Probably Approximately Correct (PAC) learning Valiant, Leslie G. "A theory of the learnable." Communications of the ACM (1984) • Vapnik-Chervonenkis Theory • … Vapnik, Vladimir Naumovich, and Vlamimir Vapnik. Statistical learning theory, Wiley, 1998.
Machine Learning Insights from CLT • Numerous positive / negative learnability results. Gold, E. Mark. "Language identification in the limit." Information and control 10.5 (1967): • Finite languages, supra-finite languages from positive 447-474. data, regular grammars with a helpful teacher, Angluin, Dana. "Queries and conjunctive concepts (in PAC), disjunctions of concept learning." Machine learning 2.4 (1988): 319-342. conjunctive concepts (in PAC), … Valiant, Leslie G. "A theory of the learnable." Communications of the • The number of required examples can be ACM 27.11 (1984): 1134-1142. bounded explicitly … Haussler, David. "Quantifying • … in PAC if we can bound the size of the inductive bias: AI learning Version Space or … algorithms and Valiant's learning framework." Artificial intelligence 36.2 (1988): 177-221. • … in PAC if we can bound the size of the Vapnik Chervonenkis dimension … Blumer, Anselm, et al. "Learnability and the Vapnik-Chervonenkis dimension." Journal of the ACM • Derived by close examination of relationship (JACM) 36.4 (1989): 929-965. between learning algorithm and subject system.
Applications to Testing? Weyuker "Assessing test data adequacy through program inference." ACM TOPLAS (1983) • Several theoretical frameworks that link testing and ML Budd, and Angluin. "Two notions of correctness and their relation to • Learnability ➔ Testability testing." Acta Informatica 18.1 (1982): 31-45. • Bounds on training set size ➔ Bounds on test set size. Walkinshaw, "Assessing test adequacy for black-box systems without specifications." ICTSS • Current results can rarely simply be carried over. (2011) • Software representations are necessarily complex. Romanik, "Approximate testing and its relationship to learning." • Rarely valid to represent software as (e.g.) a FSM. Theoretical Computer Science 188.1 (1997): 79-99. • CLT results assume that learning is `distribution free’ - i.e. apply to any distribution over the training sample Romanik and Vitter (1996). Using Vapnik–Chervonenkis Dimension to • Reasoning wrt. distributions of test inputs is Analyze the Testing Complexity of Program Segments. Information notoriously problematic. and Computation, 128(2), 87-108. Hamlet, R. (1994). Random testing. Goal: Improved theoretical frameworks to reason Encyclopedia of software about learnability (and testability) of generic Engineering. software systems
Recommend
More recommend