multiblock method for categorical variables
play

Multiblock Method for Categorical Variables Application to air - PowerPoint PPT Presentation

1. Position of the problem 2. Methods 3. Case study 4. Conclusions & perspectives Multiblock Method for Categorical Variables Application to air quality in pig farms S. Bougeard 1 , E.M. Qannari 2 & C. Fablet 1 1 French agency for food,


  1. 1. Position of the problem 2. Methods 3. Case study 4. Conclusions & perspectives Multiblock Method for Categorical Variables Application to air quality in pig farms S. Bougeard 1 , E.M. Qannari 2 & C. Fablet 1 1 French agency for food, environmental and occupational health safety (Anses), Department of Epidemiology, Ploufragan, France 2 Nantes-Atlantic National College of Veterinary Medicine, Food Science and Engineering (Oniris), Department of Chemometrics and Sensometrics, Nantes, France 1 / 15

  2. 1. Position of the problem 2. Methods 3. Case study 4. Conclusions & perspectives Table of contents Position of the problem 1 Methods 2 Categorical multiblock Redundancy Analysis (Cat-mbRA) Alternative methods Case study 3 Study of air quality in pig farms Relationships between variables Risk factors for inappropriate air quality Method comparison 4 Conclusions & perspectives 2 / 15

  3. 1. Position of the problem 2. Methods 3. Case study 4. Conclusions & perspectives Statistical issues for epidemiological surveys 1. Advantages & limits of usual procedures Generalized linear models Well-adapted for categorical variables, Limited number of explanatory variables, Constraints when y consists of more than two categories. 2. Expectations Decision trees, random forest, Global optimization criterion with boosting, bagging, SVM eigensolution, Small misclassification errors, Assessment of the risk factors, No regression coefficients. Factorial representation of data. → Multiblock modelling extended to categorical data. 3 / 15

  4. 1. Position of the problem 2. Methods 3. Case study 4. Conclusions & perspectives Statistical issues for epidemiological surveys 1. Advantages & limits of usual procedures Generalized linear models Well-adapted for categorical variables, Limited number of explanatory variables, Constraints when y consists of more than two categories. 2. Expectations Decision trees, random forest, Global optimization criterion with boosting, bagging, SVM eigensolution, Small misclassification errors, Assessment of the risk factors, No regression coefficients. Factorial representation of data. → Multiblock modelling extended to categorical data. 3 / 15

  5. 1. Position of the problem 2. Methods 3. Case study 4. Conclusions & perspectives Statistical issues for epidemiological surveys 1. Advantages & limits of usual procedures Generalized linear models Well-adapted for categorical variables, Limited number of explanatory variables, Constraints when y consists of more than two categories. 2. Expectations Decision trees, random forest, Global optimization criterion with boosting, bagging, SVM eigensolution, Small misclassification errors, Assessment of the risk factors, No regression coefficients. Factorial representation of data. → Multiblock modelling extended to categorical data. 3 / 15

  6. 1. Position of the problem 2. Methods 21. Cat-mbRA 3. Case study 22. Alternative methods 4. Conclusions & perspectives Table of contents Position of the problem 1 Methods 2 Categorical multiblock Redundancy Analysis (Cat-mbRA) Alternative methods Case study 3 Study of air quality in pig farms Relationships between variables Risk factors for inappropriate air quality Method comparison 4 Conclusions & perspectives 4 / 15

  7. 1. Position of the problem 2. Methods 21. Cat-mbRA 3. Case study 22. Alternative methods 4. Conclusions & perspectives Categorical multiblock Redundancy Analysis P X k is the projector onto the subspace spanned by the dummy variables associated with x k . Criterion to maximize ∑ k cov 2 ( u ( 1 ) , t ( 1 ) ) , with k || t ( 1 ) || = || v ( 1 ) || = 1 k ∑ k || P X k u ( 1 ) || 2 = v ( 1 ) ′ ˜ Y ′ ∑ k P X k ˜ Yv ( 1 ) with || v ( 1 ) || = 1 First order solution v ( 1 ) is the eigenvector of ∑ k ˜ Y ′ P X k ˜ Y associated with the largest eigenvalue λ ( 1 ) = ∑ k || P X k u ( 1 ) || 2 The latent variables represent the categorical variable coding : t ( 1 ) = X k w ( 1 ) , u ( 1 ) = ˜ Yv ( 1 ) k k 5 / 15

  8. 1. Position of the problem 2. Methods 21. Cat-mbRA 3. Case study 22. Alternative methods 4. Conclusions & perspectives Categorical multiblock Redundancy Analysis P X k is the projector onto the subspace spanned by the dummy variables associated with x k . Criterion to maximize ∑ k cov 2 ( u ( 1 ) , t ( 1 ) ) , with k || t ( 1 ) || = || v ( 1 ) || = 1 k ∑ k || P X k u ( 1 ) || 2 = v ( 1 ) ′ ˜ Y ′ ∑ k P X k ˜ Yv ( 1 ) with || v ( 1 ) || = 1 First order solution v ( 1 ) is the eigenvector of ∑ k ˜ Y ′ P X k ˜ Y associated with the largest eigenvalue λ ( 1 ) = ∑ k || P X k u ( 1 ) || 2 The latent variables represent the categorical variable coding : t ( 1 ) = X k w ( 1 ) , u ( 1 ) = ˜ Yv ( 1 ) k k 5 / 15

  9. 1. Position of the problem 2. Methods 21. Cat-mbRA 3. Case study 22. Alternative methods 4. Conclusions & perspectives Categorical multiblock Redundancy Analysis P X k is the projector onto the subspace spanned by the dummy variables associated with x k . Criterion to maximize ∑ k cov 2 ( u ( 1 ) , t ( 1 ) ) , with k || t ( 1 ) || = || v ( 1 ) || = 1 k ∑ k || P X k u ( 1 ) || 2 = v ( 1 ) ′ ˜ Y ′ ∑ k P X k ˜ Yv ( 1 ) with || v ( 1 ) || = 1 First order solution v ( 1 ) is the eigenvector of ∑ k ˜ Y ′ P X k ˜ Y associated with the largest eigenvalue λ ( 1 ) = ∑ k || P X k u ( 1 ) || 2 The latent variables represent the categorical variable coding : t ( 1 ) = X k w ( 1 ) , u ( 1 ) = ˜ Yv ( 1 ) k k 5 / 15

  10. 1. Position of the problem 2. Methods 21. Cat-mbRA 3. Case study 22. Alternative methods 4. Conclusions & perspectives Categorical multiblock Redundancy Analysis P X k is the projector onto the subspace spanned by the dummy variables associated with x k . Criterion to maximize ∑ k cov 2 ( u ( 1 ) , t ( 1 ) ) , with k || t ( 1 ) || = || v ( 1 ) || = 1 k ∑ k || P X k u ( 1 ) || 2 = v ( 1 ) ′ ˜ Y ′ ∑ k P X k ˜ Yv ( 1 ) with || v ( 1 ) || = 1 First order solution v ( 1 ) is the eigenvector of ∑ k ˜ Y ′ P X k ˜ Y associated with the largest eigenvalue λ ( 1 ) = ∑ k || P X k u ( 1 ) || 2 The latent variables represent the categorical variable coding : t ( 1 ) = X k w ( 1 ) , u ( 1 ) = ˜ Yv ( 1 ) k k 5 / 15

  11. 1. Position of the problem 2. Methods 21. Cat-mbRA 3. Case study 22. Alternative methods 4. Conclusions & perspectives Categorical multiblock Redundancy Analysis P X k is the projector onto the subspace spanned by the dummy variables associated with x k . Criterion to maximize ∑ k cov 2 ( u ( 1 ) , t ( 1 ) ) , with k || t ( 1 ) || = || v ( 1 ) || = 1 k ∑ k || P X k u ( 1 ) || 2 = v ( 1 ) ′ ˜ Y ′ ∑ k P X k ˜ Yv ( 1 ) with || v ( 1 ) || = 1 First order solution v ( 1 ) is the eigenvector of ∑ k ˜ Y ′ P X k ˜ Y associated with the largest eigenvalue λ ( 1 ) = ∑ k || P X k u ( 1 ) || 2 The latent variables represent the categorical variable coding : t ( 1 ) = X k w ( 1 ) , u ( 1 ) = ˜ Yv ( 1 ) k k 5 / 15

  12. 1. Position of the problem 2. Methods 21. Cat-mbRA 3. Case study 22. Alternative methods 4. Conclusions & perspectives Categorical multiblock Redundancy Analysis (Cat-mbRA) Partial components ( t 1 ,..., t K ) P X k is the projector onto the subspace spanned by the dummy variables associated with x k . Projection of u ( 1 ) onto each subspace P Xk u ( 1 ) spanned by X k → t ( 1 ) = k || P Xk u ( 1 ) || Synthesis with a global component t t ( 1 ) sums up all the partial codings : t ( 1 ) = ∑ k a ( 1 ) t ( 1 ) with k k ∑ k a ( 1 ) 2 = 1, k || P Xk u ( 1 ) || t ( 1 ) = ∑ k ∑ l || P Xl u ( 1 ) || 2 t ( 1 ) √ = k ∑ k P Xk u ( 1 ) √ ∑ l || P Xl u ( 1 ) || 2 Higher order solutions while considering the residuals of the orthogonal projections of ( X 1 ,..., X K ) onto the subspaces spanned by t ( 1 ) , ( t ( 1 ) , t ( 2 ) ) , . . . 6 / 15

  13. 1. Position of the problem 2. Methods 21. Cat-mbRA 3. Case study 22. Alternative methods 4. Conclusions & perspectives Categorical multiblock Redundancy Analysis (Cat-mbRA) Partial components ( t 1 ,..., t K ) P X k is the projector onto the subspace spanned by the dummy variables associated with x k . Projection of u ( 1 ) onto each subspace P Xk u ( 1 ) spanned by X k → t ( 1 ) = k || P Xk u ( 1 ) || Synthesis with a global component t t ( 1 ) sums up all the partial codings : t ( 1 ) = ∑ k a ( 1 ) t ( 1 ) with k k ∑ k a ( 1 ) 2 = 1, k || P Xk u ( 1 ) || t ( 1 ) = ∑ k ∑ l || P Xl u ( 1 ) || 2 t ( 1 ) √ = k ∑ k P Xk u ( 1 ) √ ∑ l || P Xl u ( 1 ) || 2 Higher order solutions while considering the residuals of the orthogonal projections of ( X 1 ,..., X K ) onto the subspaces spanned by t ( 1 ) , ( t ( 1 ) , t ( 2 ) ) , . . . 6 / 15

Recommend


More recommend