Overview of Machine Learning “introducing the field and some of its key concepts” Thomas Sch¨ on Division of Systems and Control Department of Information Technology Uppsala University. Email: thomas.schon@it.uu.se, www: user.it.uu.se/~thosc112 Overview of Machine Learning, Autonomous systems, WASP PhD course Thomas Sch¨ on, 2016.
What is machine learning all about? ”Machine learning is about learning, reasoning and acting based on data.” “It is one of today’s most rapidly growing technical fields, lying at the intersection of computer science and statistics, and at the core of artificial intelligence and data science.” Ghahramani, Z. Probabilistic machine learning and artificial intelligence . Nature 521:452-459, 2015. Jordan, M. I. and Mitchell, T. M. Machine Learning: Trends, perspectives and prospects . Science , 349(6245):255-260, 2015. 1 / 48 Overview of Machine Learning, Autonomous systems, WASP PhD course Thomas Sch¨ on, 2016.
A probabilistic approach Machine learning is about methods allowing computers/machines automatically make use of data to solve tasks . Data on its own is typically useless, it is only when we can extract knowledge from the data that it becomes useful. Representation of the data: A model with unknown (a.k.a. latent or missing) variables related to the knowledge we are looking for. Key concept: Uncertainty . Key ingredient: Data . Probability theory and statistics provide the theory and practice that is needed for representing and manipulating uncertainty about data, models and predictions. Learn the unknown variables from the data . 2 / 48 Overview of Machine Learning, Autonomous systems, WASP PhD course Thomas Sch¨ on, 2016.
The data – model relationship The first step in the extraction of knowledge from data often amounts to finding the unknown parameters in the model using the data that we have available. To do this the learning system needs links between the latent and the observed data. The links are made via assumptions and taken together these assumptions constitute the model . A mathematical model is a compact representation (set of assumptions) of the data that in precise mathematical form captures the key properties of the underlying system. 3 / 48 Overview of Machine Learning, Autonomous systems, WASP PhD course Thomas Sch¨ on, 2016.
Mathematical models in machine learning To enable reasoning about uncertainty we make use of extensive use of statistics and probability theory in building our models. We often work with very flexible models and methods, such as for example Gaussian processes, neural networks (deep learning). Simpler models, like the linear regression remains of key importance. Typically these simpler models are used as components within model complex models. “All models are wrong but some are useful.” Uncertainty plays a fundamental role since any reasonable model will be uncertain when making predictions of unobserved data. 4 / 48 Overview of Machine Learning, Autonomous systems, WASP PhD course Thomas Sch¨ on, 2016.
Mathematical models – representations The performance of an algorithms typically depends on which representation that is used for the data. Learned representations often provide better solutions than hand-designed representations. When solving a problem – start by thinking about which model/representation to use ! 5 / 48 Overview of Machine Learning, Autonomous systems, WASP PhD course Thomas Sch¨ on, 2016.
Representation learning Learning Multiple Componen Figure 1.5 Output Problem: How can we learn good representations of data? Mapping from Output Output features Ex. Deep learning (DL) solves the problem by introducing Additional Mapping from Mapping from layers of more Output representations that are features features abstract features expressed in terms of other, simpler representations. Hand- Hand- Simple designed designed Features features program features International Conference on Learning Representations Input Input Input Input http://www.iclr.cc/ Deep Classic learning Rule-based machine systems Representation learning learning From http://www.deeplearningbook.org/ 6 / 48 Overview of Machine Learning, Autonomous systems, WASP PhD course Thomas Sch¨ on, 2016.
The two basic rules from probability theory Let x and y be continuous random variables. Let p ( · ) denote a general probability density function. 1. Marginalization (integrate out a variable): � p ( x ) = p ( x, y )d y. 2. Conditional probability: p ( x, y ) = p ( x | y ) p ( y ) . Combine them into Bayes’ rule: p ( y | x ) = p ( x | y ) p ( y ) p ( x | y ) p ( y ) � = p ( x | y ) p ( y )d y. p ( x ) 7 / 48 Overview of Machine Learning, Autonomous systems, WASP PhD course Thomas Sch¨ on, 2016.
Key objects – Learning a model D - measured data. z - unknown model variables. The full probabilistic model is given by p ( D, z ) = p ( D | z ) p ( z ) � �� � ���� data distribution prior Inference amounts to computing the posterior distribution prior data distribution � �� � ���� p ( D | z ) p ( z ) p ( z | D ) = p ( D ) � �� � model evidence Soon we will make this much more concrete. 8 / 48 Overview of Machine Learning, Autonomous systems, WASP PhD course Thomas Sch¨ on, 2016.
The model – inference relationship The problem of inferring (estimating) a model based on data leads to computational challenges, both • Integration: e.g. the HD integrals arising during marg. (averaging over all possible parameter values z ): � p ( D ) = p ( D | z ) p ( z )d z. • Optimization: e.g. when extracting point estimates, for example by maximizing the posterior or the likelihood � z = arg max p ( D | z ) z Typically impossible to compute exactly, use approximate methods • Monte Carlo (MC), Markov chain MC (MCMC), and sequential MC (SMC). • Variational inference (VI). 9 / 48 Overview of Machine Learning, Autonomous systems, WASP PhD course Thomas Sch¨ on, 2016.
Example of 3 of the 4 cornerstones The three cornerstones: 1. Data , 2. Model , and 3. Inference . Aim: Compute the position and orientation of the different body segments of a person moving around indoors (motion capture). Sensors (data) used: • 3D Accelerometer • 3D Gyroscope • 3D Magnetometer A situation where we need to find latent variables based on observed data. We need a model to extract knowledge from the observed data. 10 / 48 Overview of Machine Learning, Autonomous systems, WASP PhD course Thomas Sch¨ on, 2016.
Example data-model-inference Illustrate the use of three different models: 1. Integration of the observations from the sensors. 2. Add a biomechanical model. 3. Add a world model. Add ultrawideband (UWB) measurements for absolute position. Manon Kok, Jeroen D. Hol and Thomas B. Sch¨ on. An optimization-based approach to human body motion capture using inertial sensors . In Proceedings of the 19th World Congress of the International Federation of Automatic Control (IFAC) , Cape Town, South Africa, August 2014. Manon Kok, Jeroen D. Hol and Thomas B. Sch¨ on. Indoor positioning using ultrawideband and inertial measurements . IEEE Transactions on Vehicular Technology , 64(4):1293-1303, April, 2015. 11 / 48 Overview of Machine Learning, Autonomous systems, WASP PhD course Thomas Sch¨ on, 2016.
Example – ambient magnetic field map The Earth’s magnetic field sets a background for the ambient magnetic field. Deviations make the field vary from point to point. Aim: Build a map (i.e., a model) of the magnetic environment based on measurements from magnetometers. Solution: Customized Gaussian process that obeys Maxwell’s equations. www.youtube.com/watch?v=enlMiUqPVJo Arno Solin, Manon Kok, Niklas Wahlstr¨ om, Thomas B. Sch¨ on and Simo S¨ arkk¨ a. Modeling and interpolation of the ambient magnetic field by Gaussian processes . arXiv:1509.04634, 2015. 12 / 48 Overview of Machine Learning, Autonomous systems, WASP PhD course Thomas Sch¨ on, 2016.
Example – waveNet A generative model capable to reading a written text using an artificial voice, beating all existing techniques. Using enough samples of a persons voice this can be used to synthesize new a written text using this particular voice. Application example: Audiobooks? https://deepmind.com/blog/wavenet-generative-model-raw-audio/ van den Oord, A., Dieleman, S., Zen, H., Simonyan, K., Vinyals, O., Graves, A., Kalchbrenner, N., Senior, A. and Kavukcuoglu, K. WaveNet: a generative model for raw audio . arXiv:1609.03499v2 , September, 2016. 13 / 48 Overview of Machine Learning, Autonomous systems, WASP PhD course Thomas Sch¨ on, 2016.
The nature of Machine Learning It is sometimes (often...) easier to solve a problem by starting from examples of input-output data than trying to manually program it. In ML we start from the data. We need models that are flexible enough to capture the properties of the data that are necessary to achieve a certain task. There are basically two ways of building flexible models: 1. Models that use a large (but fixed) number of parameters compared with the data set. ( parametric , ex. deep learning) 2. Models that use more parameters as we get access to more data. ( non-parametric , ex Gaussian process) 14 / 48 Overview of Machine Learning, Autonomous systems, WASP PhD course Thomas Sch¨ on, 2016.
Recommend
More recommend