machine learning and visualisation
play

Machine Learning and Visualisation Ian T. Nabney Aston University, - PowerPoint PPT Presentation

Machine Learning and Visualisation Ian T. Nabney Aston University, Birmingham, UK March 2015 Ian T. Nabney Machine Learning and Visualisation 1/45 Outline The challenge of hidden knowledge Data visualisation: latent variable models Data


  1. Machine Learning and Visualisation Ian T. Nabney Aston University, Birmingham, UK March 2015 Ian T. Nabney Machine Learning and Visualisation 1/45

  2. Outline The challenge of hidden knowledge Data visualisation: latent variable models Data visualisation: topographic mappings Non-linear modelling and feature selection Ian T. Nabney Machine Learning and Visualisation 2/45

  3. Acknowledgements Collaborators Chris Bishop, Mike Tipping, David Lowe, Markus Sv´ ensen, Chris Williams Peter Ti˜ no, Yi Sun, Dharmesh Maniyar, John Owen Phil Laflin, Bruce Williams, Paola Gaolini, Jens L¨ osel Martin Schroeder, Ain Abdul Karim, Dan Cornford, Cliff Bailey, Naomi Hubber, Shahzad Mumtaz, Midhel Randrianandrasana Richard Barnes, Colin Smith, Dan Wells Ian T. Nabney Machine Learning and Visualisation 3/45

  4. Hidden Knowledge Hidden Knowledge Understanding the vast quantities of data that surround us is a real challenge; particularly in situations with a lot of variables We can understand more of it with help. Machine learning is the computer-based generation of models from data. A model is a parameterised function from input attributes to an output prediction. Parameters in the model express the hidden connection between inputs and predictions. They are learned from data. Ian T. Nabney Machine Learning and Visualisation 4/45

  5. Data Visualisation What is Visualisation? Goal of visualisation is to present data in a human-readable way. Visualisation is an important tool for developing a better understanding of large complex datasets. It is particularly helpful for users such as research scientists or clinicians who are not specialists in data modelling. Detection of outliers. Clustering and segmentation. Aid to feature selection. Feedback on results of analysis. Two aspects: data projection and information visualisation. Ian T. Nabney Machine Learning and Visualisation 5/45

  6. Data Visualisation Data Projection The goal is to project data to a lower-dimensional space (usually 2d) while preserving as much information or structure as possible. Once the projection is done standard information visualisation approaches can be used to support user interaction. The quantity and complexity of many datasets means that simple visualisation methods, such as Principal Component Analysis, are not very effective. Ian T. Nabney Machine Learning and Visualisation 6/45

  7. Data Visualisation Information Visualisation Shneiderman: Overview first; zoom and filter; details on demand. Overview provided by projection. Zooming possible in Matlab plots. Filtering by user interaction; e.g. specify pattern of values that is of interest. Details by providing local information. See more of this later on practical examples. Ian T. Nabney Machine Learning and Visualisation 7/45

  8. Data Visualisation Information Visualisation Examples Word Cloud (www.wordle.net) Ian T. Nabney Machine Learning and Visualisation 8/45

  9. Data Visualisation Uncertainty Doubt is not a pleasant condition, but certainty is absurd. Voltaire Real data is noisy. We are forced to deal with uncertainty, yet we need to be quantitative. The optimal formalism for inference in the presence of uncertainty is probability theory. We assume the presence of an underlying regularity to make predictions. Bayesian inference allows us to reason probabilistically about the model as well as the data. Ian T. Nabney Machine Learning and Visualisation 9/45

  10. Data Visualisation Data Projection D y3 Define f to optimise some f(y; W) criterion. PCA is minimal y2 V variance; Sammon mapping is minimal stress. y1 Ian T. Nabney Machine Learning and Visualisation 10/45

  11. Data Visualisation What can we learn from this? 10 Sinus VEL VER 5 0 −5 −10 −15 −20 −20 −15 −10 −5 0 5 10 Ian T. Nabney Machine Learning and Visualisation 11/45

  12. Data Visualisation Projection What is the simplest way to project data? A linear map. What is the best way to linearly project data? Want to preserve as much information as possible. If we assume that information is measured by variance this implies choosing new coordinate axes along directions of maximal variance; these can be found by analysing the covariance matrix of the data. This gives Principal Component Analysis (PCA). For large datasets, the end result is usually a circular blob in the middle of the screen. Ian T. Nabney Machine Learning and Visualisation 12/45

  13. Data Visualisation PCA Let S be the covariance matrix of the data, so that S ij = 1 � ( x n i − x i )( x n j − x j ) N n The first q principal components are the first q eigenvectors w j of S , ordered by the size of the eigenvalues λ j . The percentage of the variance explained by the first q PC’s is � q j =1 λ j � d j =1 λ j where the data dimension is d . These vectors are orthonormal (perpendicular and unit length). The variance when the data is projected onto them is maximal. Plot the sorted principal values: plot(-sort(-eig(cov(data)))); Ian T. Nabney Machine Learning and Visualisation 13/45

  14. Data Visualisation: Topographic Mappings Topographic Mappings Basic aim is that distances in the visualisation space are as close a possible to those in original data space. Given a dissimilarity matrix d ij , we want to map data points x i to points y i in a feature space such that their dissimilarities in feature space, ˜ d ij , are as close as possible to the d ij . We say that the map preserves similarities. The stress measure is used as objective function � 2 � d ij − ˜ d ij 1 � E = � ij d ij d ij i < j Ian T. Nabney Machine Learning and Visualisation 14/45

  15. Data Visualisation: Topographic Mappings Multi-Dimensional Scaling Given distances or dissimilarities d rs between every pair of observations try to preserve these as far as possible in lower dimensional space. In classical scaling, the distance between the objects is assumed to be Euclidean. A linear projection then corresponds to PCA. The Sammon mapping is a non-linear multidimensional scaling technique more general (and more widely used) than classical scaling. Neuroscale is a neural network based scaling technique that has the advantage of actually giving a map that generalises! Ian T. Nabney Machine Learning and Visualisation 15/45

  16. Data Visualisation: Topographic Mappings Neuroscale Ian T. Nabney Machine Learning and Visualisation 16/45

  17. Data Visualisation: Topographic Mappings Biological Application: Streptomyces Gene Expression Data supplied by Colin Smith (Surrey University). Streptomyces Coelicolor is a bacterium which undergoes developmental changes correlated to sporulation and production of antibiotics. 7825 genes include more than 20 clusters coding for secondary metabolites including a large proportion of regulatory genes. The dataset consists of ten time points from 16 to 67 hours after inoculation of the growth medium. Analysis based on 3067 genes that were significantly expressed. SCO6283, SCO6284, SCO6277, SCO6278 co-regulated genes involved in synthesis of type I polyketide, SCO3245 in synthesis of lipid. Ian T. Nabney Machine Learning and Visualisation 17/45

  18. Data Visualisation: Topographic Mappings Streptomycin Life of streptomycin Bioinformatics Measuring the expression levels of thousands of genes over multiple timepoints. Ian T. Nabney Machine Learning and Visualisation 18/45

  19. Data Visualisation: Topographic Mappings SCO6283, SCO6284, SCO6277, SCO6278 in cluster 11, SCO3245 in cluster 12. Ian T. Nabney Machine Learning and Visualisation 19/45

  20. Data Visualisation: Topographic Mappings Genes involved with synthesis of two distinct secondary metabolites may be coregulated by a common network. Ian T. Nabney Machine Learning and Visualisation 20/45

  21. Data Visualisation: Latent Variable Models Latent Variable Models The projection approach is one way of reducing the data complexity. An alternative view is to hypothesise how the data might have been generated. Hidden Connections A hidden connection is stronger than an obvious one. Heraclitus Ian T. Nabney Machine Learning and Visualisation 21/45

  22. Data Visualisation: Latent Variable Models Latent Variable Models How is the idea of hidden connections applied to statistical pattern recognition? Separate the observed variables and the latent variables. Latent variables generate observations. Use (probabilistic) inference to deduce what is happening in latent variable space. Often use Bayes’ Theorem: P ( L|O ) = P ( O|L ) P ( L ) P ( O ) Static case: GTM. Two latent variables and a non-linear transformation to observation space. Dynamic case: Hidden Markov Models: discrete state space. Speech recognition. State Space Models: continuous state space. Tracking. Ian T. Nabney Machine Learning and Visualisation 22/45

  23. Data Visualisation: Latent Variable Models Visualisation with Density Models Construct a generative model for the data mapping from a low-dimensional latent space H to the data space D . Maps latent variables r to observed variables x giving a probability density p ( x | r ). To visualise the data we want to map from observed variables to latent variables: use Bayes’ theorem to compute p ( r | x ) = p ( x | r ) p ( r ) . p ( x ) Plot a summary statistic of p ( r i | x i ) for each data point x i : usually the mean. If the mapping is linear and there is a single Gaussian noise model, we recover PCA. Ian T. Nabney Machine Learning and Visualisation 23/45

Recommend


More recommend