the automatic statistician
play

The Automatic Statistician an AI for Data Science Zoubin Ghahramani - PowerPoint PPT Presentation

The Automatic Statistician an AI for Data Science Zoubin Ghahramani Department of Engineering University of Cambridge zoubin@eng.cam.ac.uk http://learning.eng.cam.ac.uk/zoubin/ Intelligent Machines, Nijmegen, 2015 James Robert Lloyd David


  1. The Automatic Statistician an AI for Data Science Zoubin Ghahramani Department of Engineering University of Cambridge zoubin@eng.cam.ac.uk http://learning.eng.cam.ac.uk/zoubin/ Intelligent Machines, Nijmegen, 2015 James Robert Lloyd David Duvenaud Roger Grosse Josh Tenenbaum Cambridge Cambridge → Harvard MIT → Toronto MIT

  2. T HERE IS A GROWING NEED FOR DATA ANALYSIS ◮ We live in an era of abundant data ◮ The McKinsey Global Institute claim ◮ “The United States alone faces a shortage of 140,000 to 190,000 people with analytical expertise and 1.5 million managers and analysts with the skills to understand and make decisions based on the analysis of big data.” ◮ Diverse fields increasingly relying on expert statisticians, machine learning researchers and data scientists e.g. ◮ Computational sciences (e.g. biology, astronomy, . . . ) ◮ Online advertising ◮ Quantitative finance ◮ . . . James Robert Lloyd and Zoubin Ghahramani 2 / 43

  3. W HAT WOULD AN AUTOMATIC STATISTICIAN DO ? Language of models Translation Report Data Search Model Prediction Checking Evaluation James Robert Lloyd and Zoubin Ghahramani 3 / 43

  4. G OALS OF THE AUTOMATIC STATISTICIAN PROJECT ◮ Provide a set of tools for understanding data that require minimal expert input ◮ Uncover challenging research problems in e.g. ◮ Automated inference ◮ Model construction and comparison ◮ Data visualisation and interpretation ◮ Advance the field of machine learning in general James Robert Lloyd and Zoubin Ghahramani 4 / 43

  5. I NGREDIENTS OF AN AUTOMATIC STATISTICIAN Language of models Translation Data Search Model Prediction Report Evaluation Checking ◮ An open-ended language of models ◮ Expressive enough to capture real-world phenomena. . . ◮ . . . and the techniques used by human statisticians ◮ A search procedure ◮ To efficiently explore the language of models ◮ A principled method of evaluating models ◮ Trading off complexity and fit to data ◮ A procedure to automatically explain the models ◮ Making the assumptions of the models explicit.. . ◮ . . . in a way that is intelligible to non-experts James Robert Lloyd and Zoubin Ghahramani 5 / 43

  6. P REVIEW : A N ENTIRELY AUTOMATIC ANALYSIS Raw data Full model posterior with extrapolations 700 700 600 600 500 500 400 400 300 300 200 200 100 100 0 1950 1952 1954 1956 1958 1960 1962 1950 1952 1954 1956 1958 1960 1962 Four additive components have been identified in the data ◮ A linearly increasing function. ◮ An approximately periodic function with a period of 1.0 years and with linearly increasing amplitude. ◮ A smooth function. ◮ Uncorrelated noise with linearly increasing standard deviation. James Robert Lloyd and Zoubin Ghahramani 6 / 43

  7. D EFINING A LANGUAGE OF MODELS Language of models Translation Report Data Search Model Prediction Checking Evaluation James Robert Lloyd and Zoubin Ghahramani 7 / 43

  8. D EFINING A LANGUAGE OF REGRESSION MODELS ◮ Regression consists of learning a function f : X → Y from inputs to outputs from example input / output pairs ◮ Language should include simple parametric forms ... ◮ e.g. Linear functions, Polynomials, Exponential functions ◮ ...as well as functions specified by high level properties ◮ e.g. Smoothness, Periodicity ◮ Inference should be tractable for all models in language James Robert Lloyd and Zoubin Ghahramani 8 / 43

  9. W E CAN BUILD REGRESSION MODELS WITH G AUSSIAN PROCESSES ◮ GP s are distributions over functions such that any finite subset of function evaluations, ( f ( x 1 ) , f ( x 2 ) , . . . f ( x N )) , have a joint Gaussian distribution ◮ A GP is completely specified by ◮ Mean function, µ ( x ) = E ( f ( x )) ◮ Covariance / kernel function, k ( x , x ′ ) = Cov ( f ( x ) , f ( x ′ )) ◮ Denoted f ∼ GP ( µ, k ) f ( x ) f ( x ) f ( x ) f ( x ) GP Posterior Mean GP Posterior Mean GP Posterior Mean GP Posterior Mean GP Posterior Uncertainty GP Posterior Uncertainty GP Posterior Uncertainty GP Posterior Uncertainty Data Data Data x x x x James Robert Lloyd and Zoubin Ghahramani 9 / 43

  10. T HE ATOMS OF OUR LANGUAGE Five base kernels 0 0 0 0 0 Squared Periodic Linear Constant White exp. (SE) (P ER ) (L IN ) (C) noise (WN) Encoding for the following types of functions Smooth Periodic Linear Constant Gaussian functions functions functions functions noise James Robert Lloyd and Zoubin Ghahramani 10 / 43

  11. T HE COMPOSITION RULES OF OUR LANGUAGE ◮ Two main operations: addition, multiplication 0 0 quadratic locally L IN × L IN SE × P ER functions periodic 0 0 periodic plus periodic plus L IN + P ER SE + P ER linear trend smooth trend James Robert Lloyd and Zoubin Ghahramani 11 / 43

  12. A N EXPRESSIVE LANGUAGE OF MODELS Regression model Kernel SE + WN GP smoothing C + L IN + WN Linear regression � SE + WN Multiple kernel learning � SE + � P ER + WN Trend, cyclical, irregular C + � cos + WN Fourier decomposition � cos + WN Sparse spectrum GP s � SE × cos + WN Spectral mixture e.g. CP ( SE , SE ) + WN Changepoints e.g. SE + L IN × WN Heteroscedasticity Note: cos is a special case of our version of P ER James Robert Lloyd and Zoubin Ghahramani 12 / 43

  13. D ISCOVERING A GOOD MODEL VIA SEARCH Language of models Translation Report Data Search Model Prediction Checking Evaluation James Robert Lloyd and Zoubin Ghahramani 13 / 43

  14. D ISCOVERING A GOOD MODEL VIA SEARCH ◮ Language defined as the arbitrary composition of five base kernels (WN , C , L IN , SE , P ER ) via three operators ( + , × , CP). ◮ The space spanned by this language is open-ended and can have a high branching factor requiring a judicious search ◮ We propose a greedy search for its simplicity and similarity to human model-building James Robert Lloyd and Zoubin Ghahramani 14 / 43

  15. E XAMPLE : M AUNA L OA K EELING C URVE RQ 60 Start 40 20 RQ 0 SE L IN P ER −20 2000 2005 2010 ... ... P ER + RQ P ER × RQ SE + RQ ... ... SE + P ER + RQ SE × ( P ER + RQ ) ... ... ... James Robert Lloyd and Zoubin Ghahramani 15 / 43

  16. E XAMPLE : M AUNA L OA K EELING C URVE ( Per + RQ ) 40 Start 30 20 10 RQ SE L IN P ER 0 2000 2005 2010 ... ... SE + RQ P ER + RQ P ER × RQ ... ... SE + P ER + RQ SE × ( P ER + RQ ) ... ... ... James Robert Lloyd and Zoubin Ghahramani 15 / 43

  17. E XAMPLE : M AUNA L OA K EELING C URVE SE × ( Per + RQ ) 50 Start 40 30 20 RQ SE L IN P ER 10 2000 2005 2010 ... ... P ER + RQ P ER × RQ SE + RQ ... ... SE + P ER + RQ SE × ( P ER + RQ ) ... ... ... James Robert Lloyd and Zoubin Ghahramani 15 / 43

  18. E XAMPLE : M AUNA L OA K EELING C URVE ( SE + SE × ( Per + RQ ) ) 60 Start 40 20 RQ SE L IN P ER 0 2000 2005 2010 ... ... P ER + RQ P ER × RQ SE + RQ ... ... SE + P ER + RQ SE × ( P ER + RQ ) ... ... ... James Robert Lloyd and Zoubin Ghahramani 15 / 43

  19. M ODEL EVALUATION Language of models Translation Report Data Search Model Prediction Checking Evaluation James Robert Lloyd and Zoubin Ghahramani 16 / 43

  20. M ODEL EVALUATION ◮ After proposing a new model its kernel parameters are optimised by conjugate gradients ◮ We evaluate each optimised model, M , using the model evidence (marginal likelihood) which can be computed analytically for GP s ◮ We penalise the marginal likelihood for the optimised kernel parameters using the Bayesian Information Criterion (BIC): − 0 . 5 × BIC ( M ) = log p ( D | M ) − p 2 log n where p is the number of kernel parameters, D represents the data, and n is the number of data points. James Robert Lloyd and Zoubin Ghahramani 17 / 43

  21. A UTOMATIC TRANSLATION OF MODELS Language of models Translation Report Data Search Model Prediction Checking Evaluation James Robert Lloyd and Zoubin Ghahramani 18 / 43

  22. A UTOMATIC TRANSLATION OF MODELS ◮ Search can produce arbitrarily complicated models from open-ended language but two main properties allow description to be automated ◮ Kernels can be decomposed into a sum of products ◮ A sum of kernels corresponds to a sum of functions ◮ Therefore, we can describe each product of kernels separately ◮ Each kernel in a product modifies a model in a consistent way ◮ Each kernel roughly corresponds to an adjective James Robert Lloyd and Zoubin Ghahramani 19 / 43

  23. S UM OF PRODUCTS NORMAL FORM Suppose the search finds the following kernel SE × ( WN × L IN + CP ( C , P ER )) James Robert Lloyd and Zoubin Ghahramani 20 / 43

  24. S UM OF PRODUCTS NORMAL FORM Suppose the search finds the following kernel SE × ( WN × L IN + CP ( C , P ER )) The changepoint can be converted into a sum of products SE × ( WN × L IN + C × σ + P ER × ¯ σ ) James Robert Lloyd and Zoubin Ghahramani 20 / 43

  25. S UM OF PRODUCTS NORMAL FORM Suppose the search finds the following kernel SE × ( WN × L IN + CP ( C , P ER )) The changepoint can be converted into a sum of products SE × ( WN × L IN + C × σ + P ER × ¯ σ ) Multiplication can be distributed over addition SE × WN × L IN + SE × C × σ + SE × P ER × ¯ σ James Robert Lloyd and Zoubin Ghahramani 20 / 43

Recommend


More recommend