Multiple Nested Reductions of Single Data Modes as a Tool to Deal with Large Data Sets Iven Van Mechelen and Katrijn Van Deun K.U.Leuven Psychology Department and Center for Computational Systems Biology Invited IFCS session at COMPSTAT 2010
Overview: • introduction • principles • example 1: existing model • example 2: novel model • discussion 2
Overview: • introduction • principles • example 1: existing model • example 2: novel model • discussion 3
Introduction • in many research areas: - accessibility of novel measurement technologies - data tsunami: highdimensional data sets - example: various types of ‘omics’ data 4
Introduction • in many research areas: - accessibility of novel measurement technologies - data tsunami: highdimensional data sets - example: various types of ‘omics’ data 5
Introduction • in many research areas: - accessibility of novel measurement technologies - data tsunami: highdimensional data sets - example: various types of ‘omics’ data • concerted use of technologies in many settings - data sets with large number of experimental units 6
Introduction (ctd) • problems: 7
Introduction (ctd) • problems: - redundancies, dependencies, ill-conditioned optimization problems 8
Introduction (ctd) • problems: - redundancies, dependencies, ill-conditioned optimization problems - computational bottlenecks 9
Introduction (ctd) • problems: - redundancies, dependencies, ill-conditioned optimization problems - computational bottlenecks - displaying output prohibitive 10
Introduction (ctd) • possible solution: classical reduction methods (categorical: clustering; continuous: dimension reduction) 11
Introduction (ctd) • possible solution: classical reduction methods (categorical: clustering; continuous: dimension reduction) • however: often breakdown of such methods … 12
Introduction (ctd) • possible solution: classical reduction methods (categorical: clustering; continuous: dimension reduction) • however: often breakdown of such methods … • possible rescue missions: variable selection, sparseness penalty or constraints, … 13
Introduction (ctd) • possible solution: classical reduction methods (categorical: clustering; continuous: dimension reduction) • however: often breakdown of such methods … • possible rescue missions: variable selection, sparseness penalty or constraints, … • alternative solution: multiple nested reductions of single data modes (within framework of global model for data, fitted with a simultaneous optimization procedure) 14
Overview: • introduction • principles • example 1: existing model • example 2: novel model • discussion 15
Principles data: I × J object by variable (e.g., tissue by gene) data • matrix D variable mode j …...... object mode …….... d ij i 16
Principles (ctd) • (deterministic core of) generic decomposition model (Van Mechelen & Schepers, 2007): - reduction of object (tissue) mode by means of (binary or real-valued) I × P quantification matrix A examples: 17
Principles (ctd) • (deterministic core of) generic decomposition model (Van Mechelen & Schepers, 2007): - reduction of object (tissue) mode by means of (binary or real-valued) I × P quantification matrix A examples: Tissue 1 1 0 0 Tissue 2 1 0 0 Tissue 3 0 0 1 Tissue 4 0 0 1 Tissue 5 0 1 0 ... 18
Principles (ctd) • (deterministic core of) generic decomposition model (Van Mechelen & Schepers, 2007): - reduction of object (tissue) mode by means of (binary or real-valued) I × P quantification matrix A examples: Tissue 1 1 1 0 Tissue 2 1 1 0 Tissue 3 1 0 1 Tissue 4 1 0 1 Tissue 5 1 0 1 ... 19
Principles (ctd) • (deterministic core of) generic decomposition model (Van Mechelen & Schepers, 2007): - reduction of object (tissue) mode by means of (binary or real-valued) I × P quantification matrix A examples: Tissue 1 3.2 5.2 5.1 Tissue 2 4.1 -6.7 3.4 Tissue 3 5.8 3.9 1.9 Tissue 4 1.0 -2.1 0.5 Tissue 5 -2.3 8.0 -1.7 ... 20
Principles (ctd) • (deterministic core of) generic decomposition model (Van Mechelen & Schepers, 2007): - reduction of object (tissue) mode by means of (binary or real-valued) I × P quantification matrix A - reduction of variable (gene) mode by means of (binary or real-valued) J × Q quantification matrix B 21
Principles (ctd) • (deterministic core of) generic decomposition model (Van Mechelen & Schepers, 2007): - reduction of object (tissue) mode by means of (binary or real-valued) I × P quantification matrix A - reduction of variable (gene) mode by means of (binary or real-valued) J × Q quantification matrix B P × Q core matrix W - 22
Principles (ctd) • (deterministic core of) generic decomposition model (Van Mechelen & Schepers, 2007): - reduction of object (tissue) mode by means of (binary or real-valued) I × P quantification matrix A - reduction of variable (gene) mode by means of (binary or real-valued) J × Q quantification matrix B P × Q core matrix W - - decomposition operator f , which is such that: ( ) = + , , D f A B W E with f ( A , B , W ) ij only depending on A i ⋅ and B j ⋅ 23
Principles (ctd) ( ) = + , , D f A B W E • special cases: 24
Principles (ctd) ( ) = + , , D f A B W E • special cases: - A and B binary, f additive operator: ( ) = , , t f A B W A WB = ∑∑ P Q ( ) , , f A B W a b w i p jq p q i j = = 1 1 p q (general additive two-mode clustering model) 25
= ∑∑ P Q ( ) , , f A B W a b w i p jq p q i j = = 1 1 p q V 1 V 2 V 3 V 4 V 5 V 6 V 7 A • A • 2 1 0 0 0 0 0 0 0 0 0 O 1 O 1 0 2 2 2 0 0 0 1 0 O 2 O 2 A 1 0 0 2 2 2 0 0 0 O 3 O 3 1 1 0 2 2 5 3 3 0 O 4 O 4 0 1 0 0 0 3 3 3 0 O 5 O 5 0 0 0 0 0 0 0 0 0 O 6 O 6 2 0 B • B • 0 1 1 1 0 0 0 1 1 W B • B 0 0 0 1 1 1 0 B • 0 3 2 2 A • A • V 1 V 2 V 3 V 4 V 5 V 6 V 7 1 2 26
Principles (ctd) ( ) = + , , D f A B W E • special cases (ctd): - A and B real-valued, W identity matrix, f additive operator: ( ) = , , t f A B W AB = ∑ P ( ) , , f A B W a p b i jp ij = 1 p (principal component analysis) 27
Principles (ctd) ( ) = + , , D f A B W E • special cases (ctd): - A and B real-valued, W identity matrix, f Euclidean distance-based operator: 1 ⎡ ⎤ ( ) 2 P ( ) ∑ 2 = − , , ⎢ ⎥ f A B W a b ip jp ij ⎣ ⎦ = 1 p (multidimensional unfolding) 28
Principles (ctd) ( ) = + , , D f A B W E • multiple nested reductions: - decomposition of core matrix W : ( ) = * * * , , * W f A B W and therefore: ( ) ( ) = + * * * , , , , * D f A B f A B W E with A * denoting a P × P* quantification matrix, B * a Q × Q* quantification matrix, f * a decomposition operator, and with f* ( A *, B *, W *) pq only depending on A * p ⋅ and B * q ⋅ 29
Principles (ctd) ( ) ( ) = + * * * , , , , * D f A B f A B W E • remarks: - each of the quantification matrices ( A , A *, B , B *) can be an identity matrix (no reduction), a binary matrix (categorical, cluster-based reduction), or a real- valued matrix (continuous, dimension reduction) - model is to be estimated as a whole, making use of one overall objective or loss function (unlike in ‘ tandem ’ approaches) 30
Overview: • introduction • principles • example 1: existing model • example 2: novel model • discussion 31
Example 1: Existing model ( ) ( ) = + * * * , , , , * D f A B f A B W E • two-mode unfolding clustering: - A and B binary partition matrices, f additive operator (i.e., outer model = two-mode partitioning) - A* and B* real-valued matrices, W * identity matrix, f Euclidean-distance based operator (i.e., inner model = multidimensional unfolding) ⎡ ⎤ 1 ⎡ ⎤ * P Q P ( ) 2 ⎢ ⎥ ∑∑ ∑ 2 = ∗ − ∗ + ⎢ ⎥ d a b a b e ⎢ ⎥ * * ij i p jq p p qp ij ⎣ ⎦ = = = 1 1 * 1 ⎢ p q p ⎥ ⎣ ⎦ 32
Example 1: Existing model (ctd) ⎡ ⎤ 1 ⎡ ⎤ * P Q P ( ) 2 ⎢ ⎥ ∑∑ ∑ 2 = ∗ − ∗ + ⎢ ⎥ d a b a b e ⎢ ⎥ * * ij i p jq p p qp ij ⎣ ⎦ = = = 1 1 * 1 ⎢ ⎥ p q p ⎣ ⎦ • two-mode unfolding clustering: (ctd) - originally proposed (in deterministic form) by Van Mechelen & Schepers (2007) - stochastic variant (making use of double mixture approach) proposed by Vera, Macías & Heiser (2009) under the name dual latent class unfolding - special case: A or B identity matrix (outer categorical reduction of one mode only): latent class unfolding as proposed by De Soete & Heiser (1993) 33
Example 1: Existing model (ctd) • application (Vera et al.): respondent by statement on internet use 34
Overview: • introduction • principles • example 1: existing model • example 2: novel model • discussion 35
Recommend
More recommend