multiple nested reductions of single data modes as a tool
play

Multiple Nested Reductions of Single Data Modes as a Tool to Deal - PowerPoint PPT Presentation

Multiple Nested Reductions of Single Data Modes as a Tool to Deal with Large Data Sets Iven Van Mechelen and Katrijn Van Deun K.U.Leuven Psychology Department and Center for Computational Systems Biology Invited IFCS session at COMPSTAT 2010


  1. Multiple Nested Reductions of Single Data Modes as a Tool to Deal with Large Data Sets Iven Van Mechelen and Katrijn Van Deun K.U.Leuven Psychology Department and Center for Computational Systems Biology Invited IFCS session at COMPSTAT 2010

  2. Overview: • introduction • principles • example 1: existing model • example 2: novel model • discussion 2

  3. Overview: • introduction • principles • example 1: existing model • example 2: novel model • discussion 3

  4. Introduction • in many research areas: - accessibility of novel measurement technologies - data tsunami: highdimensional data sets - example: various types of ‘omics’ data 4

  5. Introduction • in many research areas: - accessibility of novel measurement technologies - data tsunami: highdimensional data sets - example: various types of ‘omics’ data 5

  6. Introduction • in many research areas: - accessibility of novel measurement technologies - data tsunami: highdimensional data sets - example: various types of ‘omics’ data • concerted use of technologies in many settings - data sets with large number of experimental units 6

  7. Introduction (ctd) • problems: 7

  8. Introduction (ctd) • problems: - redundancies, dependencies, ill-conditioned optimization problems 8

  9. Introduction (ctd) • problems: - redundancies, dependencies, ill-conditioned optimization problems - computational bottlenecks 9

  10. Introduction (ctd) • problems: - redundancies, dependencies, ill-conditioned optimization problems - computational bottlenecks - displaying output prohibitive 10

  11. Introduction (ctd) • possible solution: classical reduction methods (categorical: clustering; continuous: dimension reduction) 11

  12. Introduction (ctd) • possible solution: classical reduction methods (categorical: clustering; continuous: dimension reduction) • however: often breakdown of such methods … 12

  13. Introduction (ctd) • possible solution: classical reduction methods (categorical: clustering; continuous: dimension reduction) • however: often breakdown of such methods … • possible rescue missions: variable selection, sparseness penalty or constraints, … 13

  14. Introduction (ctd) • possible solution: classical reduction methods (categorical: clustering; continuous: dimension reduction) • however: often breakdown of such methods … • possible rescue missions: variable selection, sparseness penalty or constraints, … • alternative solution: multiple nested reductions of single data modes (within framework of global model for data, fitted with a simultaneous optimization procedure) 14

  15. Overview: • introduction • principles • example 1: existing model • example 2: novel model • discussion 15

  16. Principles data: I × J object by variable (e.g., tissue by gene) data • matrix D variable mode j …...... object mode …….... d ij i 16

  17. Principles (ctd) • (deterministic core of) generic decomposition model (Van Mechelen & Schepers, 2007): - reduction of object (tissue) mode by means of (binary or real-valued) I × P quantification matrix A examples: 17

  18. Principles (ctd) • (deterministic core of) generic decomposition model (Van Mechelen & Schepers, 2007): - reduction of object (tissue) mode by means of (binary or real-valued) I × P quantification matrix A examples: Tissue 1 1 0 0 Tissue 2 1 0 0 Tissue 3 0 0 1 Tissue 4 0 0 1 Tissue 5 0 1 0 ... 18

  19. Principles (ctd) • (deterministic core of) generic decomposition model (Van Mechelen & Schepers, 2007): - reduction of object (tissue) mode by means of (binary or real-valued) I × P quantification matrix A examples: Tissue 1 1 1 0 Tissue 2 1 1 0 Tissue 3 1 0 1 Tissue 4 1 0 1 Tissue 5 1 0 1 ... 19

  20. Principles (ctd) • (deterministic core of) generic decomposition model (Van Mechelen & Schepers, 2007): - reduction of object (tissue) mode by means of (binary or real-valued) I × P quantification matrix A examples: Tissue 1 3.2 5.2 5.1 Tissue 2 4.1 -6.7 3.4 Tissue 3 5.8 3.9 1.9 Tissue 4 1.0 -2.1 0.5 Tissue 5 -2.3 8.0 -1.7 ... 20

  21. Principles (ctd) • (deterministic core of) generic decomposition model (Van Mechelen & Schepers, 2007): - reduction of object (tissue) mode by means of (binary or real-valued) I × P quantification matrix A - reduction of variable (gene) mode by means of (binary or real-valued) J × Q quantification matrix B 21

  22. Principles (ctd) • (deterministic core of) generic decomposition model (Van Mechelen & Schepers, 2007): - reduction of object (tissue) mode by means of (binary or real-valued) I × P quantification matrix A - reduction of variable (gene) mode by means of (binary or real-valued) J × Q quantification matrix B P × Q core matrix W - 22

  23. Principles (ctd) • (deterministic core of) generic decomposition model (Van Mechelen & Schepers, 2007): - reduction of object (tissue) mode by means of (binary or real-valued) I × P quantification matrix A - reduction of variable (gene) mode by means of (binary or real-valued) J × Q quantification matrix B P × Q core matrix W - - decomposition operator f , which is such that: ( ) = + , , D f A B W E with f ( A , B , W ) ij only depending on A i ⋅ and B j ⋅ 23

  24. Principles (ctd) ( ) = + , , D f A B W E • special cases: 24

  25. Principles (ctd) ( ) = + , , D f A B W E • special cases: - A and B binary, f additive operator: ( ) = , , t f A B W A WB = ∑∑ P Q ( ) , , f A B W a b w i p jq p q i j = = 1 1 p q (general additive two-mode clustering model) 25

  26. = ∑∑ P Q ( ) , , f A B W a b w i p jq p q i j = = 1 1 p q V 1 V 2 V 3 V 4 V 5 V 6 V 7 A • A • 2 1 0 0 0 0 0 0 0 0 0 O 1 O 1 0 2 2 2 0 0 0 1 0 O 2 O 2 A 1 0 0 2 2 2 0 0 0 O 3 O 3 1 1 0 2 2 5 3 3 0 O 4 O 4 0 1 0 0 0 3 3 3 0 O 5 O 5 0 0 0 0 0 0 0 0 0 O 6 O 6 2 0 B • B • 0 1 1 1 0 0 0 1 1 W B • B 0 0 0 1 1 1 0 B • 0 3 2 2 A • A • V 1 V 2 V 3 V 4 V 5 V 6 V 7 1 2 26

  27. Principles (ctd) ( ) = + , , D f A B W E • special cases (ctd): - A and B real-valued, W identity matrix, f additive operator: ( ) = , , t f A B W AB = ∑ P ( ) , , f A B W a p b i jp ij = 1 p (principal component analysis) 27

  28. Principles (ctd) ( ) = + , , D f A B W E • special cases (ctd): - A and B real-valued, W identity matrix, f Euclidean distance-based operator: 1 ⎡ ⎤ ( ) 2 P ( ) ∑ 2 = − , , ⎢ ⎥ f A B W a b ip jp ij ⎣ ⎦ = 1 p (multidimensional unfolding) 28

  29. Principles (ctd) ( ) = + , , D f A B W E • multiple nested reductions: - decomposition of core matrix W : ( ) = * * * , , * W f A B W and therefore: ( ) ( ) = + * * * , , , , * D f A B f A B W E with A * denoting a P × P* quantification matrix, B * a Q × Q* quantification matrix, f * a decomposition operator, and with f* ( A *, B *, W *) pq only depending on A * p ⋅ and B * q ⋅ 29

  30. Principles (ctd) ( ) ( ) = + * * * , , , , * D f A B f A B W E • remarks: - each of the quantification matrices ( A , A *, B , B *) can be an identity matrix (no reduction), a binary matrix (categorical, cluster-based reduction), or a real- valued matrix (continuous, dimension reduction) - model is to be estimated as a whole, making use of one overall objective or loss function (unlike in ‘ tandem ’ approaches) 30

  31. Overview: • introduction • principles • example 1: existing model • example 2: novel model • discussion 31

  32. Example 1: Existing model ( ) ( ) = + * * * , , , , * D f A B f A B W E • two-mode unfolding clustering: - A and B binary partition matrices, f additive operator (i.e., outer model = two-mode partitioning) - A* and B* real-valued matrices, W * identity matrix, f Euclidean-distance based operator (i.e., inner model = multidimensional unfolding) ⎡ ⎤ 1 ⎡ ⎤ * P Q P ( ) 2 ⎢ ⎥ ∑∑ ∑ 2 = ∗ − ∗ + ⎢ ⎥ d a b a b e ⎢ ⎥ * * ij i p jq p p qp ij ⎣ ⎦ = = = 1 1 * 1 ⎢ p q p ⎥ ⎣ ⎦ 32

  33. Example 1: Existing model (ctd) ⎡ ⎤ 1 ⎡ ⎤ * P Q P ( ) 2 ⎢ ⎥ ∑∑ ∑ 2 = ∗ − ∗ + ⎢ ⎥ d a b a b e ⎢ ⎥ * * ij i p jq p p qp ij ⎣ ⎦ = = = 1 1 * 1 ⎢ ⎥ p q p ⎣ ⎦ • two-mode unfolding clustering: (ctd) - originally proposed (in deterministic form) by Van Mechelen & Schepers (2007) - stochastic variant (making use of double mixture approach) proposed by Vera, Macías & Heiser (2009) under the name dual latent class unfolding - special case: A or B identity matrix (outer categorical reduction of one mode only): latent class unfolding as proposed by De Soete & Heiser (1993) 33

  34. Example 1: Existing model (ctd) • application (Vera et al.): respondent by statement on internet use 34

  35. Overview: • introduction • principles • example 1: existing model • example 2: novel model • discussion 35

Recommend


More recommend