Statistics and learning Multivariate statistics 1 Emmanuel - PowerPoint PPT Presentation

Statistics and learning Multivariate statistics 1 Emmanuel Rachelson and Matthieu Vignes ISAE SupAero Wednesday 25 th September 2013 E. Rachelson & M. Vignes (ISAE) SAD 2013 1 / 15

Motivating examples (1) Cider get different measures gathered in E. Rachelson & M. Vignes (ISAE) SAD 2013 2 / 15

Motivating examples (1) I claim that represents 75% of the variance in the data ! E. Rachelson & M. Vignes (ISAE) SAD 2013 2 / 15

Motivating examples (2) A nice representation of ?? E. Rachelson & M. Vignes (ISAE) SAD 2013 3 / 15

Motivating examples (2) Information can be summarised in a sense to be precised in E. Rachelson & M. Vignes (ISAE) SAD 2013 3 / 15

Take-home message ’Simple’, descriptive data analysis. And interpretations ! ◮ Input : An array of data (can be more than 2D). E. Rachelson & M. Vignes (ISAE) SAD 2013 4 / 15

Take-home message ’Simple’, descriptive data analysis. And interpretations ! ◮ Input : An array of data (can be more than 2D). ◮ Identify statistical units of the population/sample and variables under study. E. Rachelson & M. Vignes (ISAE) SAD 2013 4 / 15

Take-home message ’Simple’, descriptive data analysis. And interpretations ! ◮ Input : An array of data (can be more than 2D). ◮ Identify statistical units of the population/sample and variables under study. ◮ Describe the variables → type, univariate description before you move on to... E. Rachelson & M. Vignes (ISAE) SAD 2013 4 / 15

Take-home message ’Simple’, descriptive data analysis. And interpretations ! ◮ Input : An array of data (can be more than 2D). ◮ Identify statistical units of the population/sample and variables under study. ◮ Describe the variables → type, univariate description before you move on to... ◮ ...bivariate ( e.g. simple regression) and multivariate data analysis. E. Rachelson & M. Vignes (ISAE) SAD 2013 4 / 15

Take-home message ’Simple’, descriptive data analysis. And interpretations ! ◮ Input : An array of data (can be more than 2D). ◮ Identify statistical units of the population/sample and variables under study. ◮ Describe the variables → type, univariate description before you move on to... ◮ ...bivariate ( e.g. simple regression) and multivariate data analysis. ◮ The goals are to describe the data and to summarise its informational content: highlight patterns in the data, represent in low-dimensions most of its variations. E. Rachelson & M. Vignes (ISAE) SAD 2013 4 / 15

Take-home message ’Simple’, descriptive data analysis. And interpretations ! ◮ Input : An array of data (can be more than 2D). ◮ Identify statistical units of the population/sample and variables under study. ◮ Describe the variables → type, univariate description before you move on to... ◮ ...bivariate ( e.g. simple regression) and multivariate data analysis. ◮ The goals are to describe the data and to summarise its informational content: highlight patterns in the data, represent in low-dimensions most of its variations. ◮ Important point: do not forget to interpret the analysis you produce ! E. Rachelson & M. Vignes (ISAE) SAD 2013 4 / 15

Take-home message ’Simple’, descriptive data analysis. And interpretations ! ◮ Input : An array of data (can be more than 2D). ◮ Identify statistical units of the population/sample and variables under study. ◮ Describe the variables → type, univariate description before you move on to... ◮ ...bivariate ( e.g. simple regression) and multivariate data analysis. ◮ The goals are to describe the data and to summarise its informational content: highlight patterns in the data, represent in low-dimensions most of its variations. ◮ Important point: do not forget to interpret the analysis you produce ! ◮ Output : a nice (set of) representations of the data with key points to explain what’s in it ! E. Rachelson & M. Vignes (ISAE) SAD 2013 4 / 15

First: univariate statistics ◮ Any data set to be ’analysed’ need to be explored first ! E. Rachelson & M. Vignes (ISAE) SAD 2013 5 / 15

First: univariate statistics ◮ Any data set to be ’analysed’ need to be explored first ! ◮ Tools might look simplistic but robust in interpretations. E. Rachelson & M. Vignes (ISAE) SAD 2013 5 / 15

First: univariate statistics ◮ Any data set to be ’analysed’ need to be explored first ! ◮ Tools might look simplistic but robust in interpretations. ◮ Way to get familiar with data set at hand: missing obs., erroneous/atypic points (outliers), (exp.) bias, rare modalities, variable distribution. . . E. Rachelson & M. Vignes (ISAE) SAD 2013 5 / 15

First: univariate statistics ◮ Any data set to be ’analysed’ need to be explored first ! ◮ Tools might look simplistic but robust in interpretations. ◮ Way to get familiar with data set at hand: missing obs., erroneous/atypic points (outliers), (exp.) bias, rare modalities, variable distribution. . . ◮ Allow analyst to pre-process the data: transformation(s), class recoding. . . E. Rachelson & M. Vignes (ISAE) SAD 2013 5 / 15

First: univariate statistics ◮ Any data set to be ’analysed’ need to be explored first ! ◮ Tools might look simplistic but robust in interpretations. ◮ Way to get familiar with data set at hand: missing obs., erroneous/atypic points (outliers), (exp.) bias, rare modalities, variable distribution. . . ◮ Allow analyst to pre-process the data: transformation(s), class recoding. . . Quantitative variables ◮ From collected data to statistical table (frequency table). ◮ a prelude to graphical representation: ’stem-and-leaf’ presentation. ◮ Bar and cumulative diagrams; histograms & (Kernel) density est. ◮ Quantiles and box(-and-whisker) plot. ◮ Numerical features (centrality, dispersion. . . ). ◮ Minor differences for continuous and discrete quantitative variables. E. Rachelson & M. Vignes (ISAE) SAD 2013 5 / 15

Univariate statistics (con’d) Qualitative variable ◮ Nominal vs. ordinal variables. ◮ No numerical summary from data itself → tables (frequency or percentages) and graphics (bar or pie charts). E. Rachelson & M. Vignes (ISAE) SAD 2013 6 / 15

Univariate statistics (con’d) Qualitative variable ◮ Nominal vs. ordinal variables. ◮ No numerical summary from data itself → tables (frequency or percentages) and graphics (bar or pie charts). Genomic data E. Rachelson & M. Vignes (ISAE) SAD 2013 6 / 15

Descriptive bivariate statistics before it’s difficult to represent it We now consider the simultaneous study of 2 variables X and Y . The main objective is to highlight a relationship between these variables. Sometimes it can be interpreted as a cause. E. Rachelson & M. Vignes (ISAE) SAD 2013 7 / 15

Descriptive bivariate statistics before it’s difficult to represent it We now consider the simultaneous study of 2 variables X and Y . The main objective is to highlight a relationship between these variables. Sometimes it can be interpreted as a cause. Two quantitative variables ◮ Scatter plot (may need to scale variables). ◮ Give a relationship index. E.g. covariance and correlation: y ) and corr( X, Y ) = cov( X,Y ) cov( X, Y ) = 1 � i ( x i − ¯ x )( y i − ¯ . And n σ X σ Y interpret. E. Rachelson & M. Vignes (ISAE) SAD 2013 7 / 15

Descriptive bivariate statistics (cont’d) A quantitative variable X and a qualitative variable Y ◮ Parallel boxplots. ◮ Partial mean and sd on subpop. for all level of Y . → decomposition σ 2 X = σ 2 E + σ 2 R , where σ 2 E : variance explained by the partition of Y and σ 2 R : residual (between groups) variance. The ratio σ 2 E /σ 2 X is an link index between X and Y . E. Rachelson & M. Vignes (ISAE) SAD 2013 8 / 15

Descriptive bivariate statistics (cont’d) A quantitative variable X and a qualitative variable Y ◮ Parallel boxplots. ◮ Partial mean and sd on subpop. for all level of Y . → decomposition σ 2 X = σ 2 E + σ 2 R , where σ 2 E : variance explained by the partition of Y and σ 2 R : residual (between groups) variance. The ratio σ 2 E /σ 2 X is an link index between X and Y . Two qualitative variables ◮ Contingency table ◮ Mosaic plots with areas ∝ frequencies. ◮ Relationship index: χ 2 = � � ( n kl − s kl ) 2 s kl E. Rachelson & M. Vignes (ISAE) SAD 2013 8 / 15

Towards multidimensional statistics Adapting/generalising what’s been seen previously: ◮ Matrix of correlations (symetric, positive-definite) ◮ Point of clouds (3D) / scatter plot matrix E. Rachelson & M. Vignes (ISAE) SAD 2013 9 / 15

Principal Component Analysis (PCA) an introduction ◮ The bivariate study raised the obvious question of representing p > 2 variable data sets. ◮ Mathematically speaking, it’s only a change of basis (from canonical to factor-driven). It is optimal in some sense. Toy example Math. Phys. Engl. Fren. Mike 32 31 25 26 Helen 41 38 39 42 Alan 30 36 55 49 Dona 74 73 79 74 Peter 71 71 59 62 Brigit 54 51 28 35 John 26 34 70 58 William 65 62 43 47 Pam 46 48 62 61 E. Rachelson & M. Vignes (ISAE) SAD 2013 10 / 15

Statistics and learning Multivariate statistics 1 Emmanuel - PowerPoint PPT Presentation

Statistics and learning Multivariate statistics 1 Emmanuel Rachelson and Matthieu Vignes ISAE SupAero Wednesday 25 th September 2013 E. Rachelson & M. Vignes (ISAE) SAD 2013 1 / 15 Motivating examples (1) Cider get different measures

Official Statistics Matt Dray, Assistant Statistician Official Statistics 2 Official

Areal statistics Barry Rowlingson Research Fellow DataCamp Spatial Statistics in R Borders

The Pulse monitors: Statistics Smartpods PULSE 1 - Improve Facility Efficiencies 2 - Increase

Quality Assurance in Official Statistics Directorate of Economics & Statistics, Planning

UK Bleeding Disorder Statistics UK Bleeding Disorder Statistics UK Bleeding Disorder Statistics

The Statistics Network The Statistics Network Statistics network Compute servers Desktop PCs

1 Practical Information 2 Introduction to Statistics Per Bruun Brockhoff 3 Descriptive Statistics:

Statistics for Social Sciences I: Introduction to Statistics Introduction to Statistics

Categorical Probability and Statistics Peter McCullagh Department of Statistics University of

Order Statistics and Applications Rosemary Smith Introduction to Order Statistics Unordered

Statistics for Machine Learning Prof. Seungchul Lee Industrial AI Lab. Statistics and

AP Biology and Statistics Statistics Statistics help to better understand the meaning of a

Order Statistics and Pitman Closeness Katherine F. Davies Department of Statistics University of

The Power and Limits of Statistics DPRRGSP 2018-11-29 @ReinhardFurrer Applied Statistics

Bayesian statistics DS GA 1002 Probability and Statistics for Data Science

Statistics in Schools Classrooms Powered by Census Data CENSUS.GOV/SCHOOLS Statistics in

Wordly Wise afford boast Goal: Students will read with accuracy and apply chord exceptional

Admin UI & JavaScript Modernisation Initiative introductions Twelve months ago at DC

SHAHAB iCV Research Group iCV Image Processing & Mathematical Modelling Computer

INTRO TO SONGWRITING WITH DAN WHAT IS A SONG??? You tell me :) Songs are ancient and

1 Introduction 4 How to Avoid Troubled Projects Apply proper engineering

SELF-CARE REMINDERS Remember you are not alone. This is being felt around the world.

Statistics and learning Support Vector Machines S A c bastien Gadat Toulouse School of

NO DEAL BREXIT CUSTOMS WORKSHOP FOR ACCREDITED TRADERS KAREN WHEELER, DIRECTOR-GENERAL BORDER

Sambuz

Useful Links

Newsletter

Mail Us

Statistics and learning Multivariate statistics 1 Emmanuel - PowerPoint PPT Presentation

Statistics and learning Multivariate statistics 1 Emmanuel Rachelson and Matthieu Vignes ISAE SupAero Wednesday 25 th September 2013 E. Rachelson & M. Vignes (ISAE) SAD 2013 1 / 15 Motivating examples (1) Cider get different measures

Official Statistics Matt Dray, Assistant Statistician Official Statistics 2 Official

Areal statistics Barry Rowlingson Research Fellow DataCamp Spatial Statistics in R Borders

The Pulse monitors: Statistics Smartpods PULSE 1 - Improve Facility Efficiencies 2 - Increase

Quality Assurance in Official Statistics Directorate of Economics &amp; Statistics, Planning

UK Bleeding Disorder Statistics UK Bleeding Disorder Statistics UK Bleeding Disorder Statistics

The Statistics Network The Statistics Network Statistics network Compute servers Desktop PCs

1 Practical Information 2 Introduction to Statistics Per Bruun Brockhoff 3 Descriptive Statistics:

Statistics for Social Sciences I: Introduction to Statistics Introduction to Statistics

Categorical Probability and Statistics Peter McCullagh Department of Statistics University of

Order Statistics and Applications Rosemary Smith Introduction to Order Statistics Unordered

Statistics for Machine Learning Prof. Seungchul Lee Industrial AI Lab. Statistics and

AP Biology and Statistics Statistics Statistics help to better understand the meaning of a

Order Statistics and Pitman Closeness Katherine F. Davies Department of Statistics University of

The Power and Limits of Statistics DPRRGSP 2018-11-29 @ReinhardFurrer Applied Statistics

Bayesian statistics DS GA 1002 Probability and Statistics for Data Science

Statistics in Schools Classrooms Powered by Census Data CENSUS.GOV/SCHOOLS Statistics in

Wordly Wise afford boast Goal: Students will read with accuracy and apply chord exceptional

Admin UI &amp; JavaScript Modernisation Initiative introductions Twelve months ago at DC

SHAHAB iCV Research Group iCV Image Processing &amp; Mathematical Modelling Computer

INTRO TO SONGWRITING WITH DAN WHAT IS A SONG??? You tell me :) Songs are ancient and

1 Introduction 4 How to Avoid Troubled Projects Apply proper engineering

SELF-CARE REMINDERS Remember you are not alone. This is being felt around the world.

Statistics and learning Support Vector Machines S A c bastien Gadat Toulouse School of

NO DEAL BREXIT CUSTOMS WORKSHOP FOR ACCREDITED TRADERS KAREN WHEELER, DIRECTOR-GENERAL BORDER

Sambuz

Useful Links

Newsletter

Mail Us

Quality Assurance in Official Statistics Directorate of Economics & Statistics, Planning

Admin UI & JavaScript Modernisation Initiative introductions Twelve months ago at DC

SHAHAB iCV Research Group iCV Image Processing & Mathematical Modelling Computer