Orthogonal grey simultaneous component analysis to distinguish common and distinctive information in coupled data Martijn Schouteden Katrijn Van Deun Iven Van Mechelen
Outline • Introduction – Coupled data – Research questions • Method – Simultaneous component method – Problem – Solution: DISCO-GSCA • Illustration – Results • Conclusion
Outline • Introduction – Coupled data – Research questions • Method – Simultaneous component method – Problem – Solution: DISCO-GSCA • Illustration – Results • Conclusion
Introduction • Coupled data: data that consist of different data blocks, which all contain information about the same entities – E.g. • Data blocks = GC/MS and LC/MS • Variables = E. coli metabolites • Objects = condition Metabolites Condition LC/MS GC/MS Smilde et al. (2005)
Introduction • Coupled data: data that consist of different data blocks, which all contain information about the same entities – E.g. • Data blocks = GC/MS and LC/MS • Variables = E. coli metabolites • Objects = condition 1 … J 1 1 … J 2 1 Metabolites Condition . LC/MS GC/MS . . I Smilde et al. (2005)
• Finding mechanisms that underly the coupled data • RESEARCH QUESTIONS : which mechanisms are – common for both data blocks and – distinctive for a single data block? Which metabolome processes are measured by both separation techniques? Which processes are measured by just one of the two?
Outline • Introduction – Coupled data – Research questions • Method – Simultaneous component method – Problem – Solution: DISCO-GSCA • Illustration – Results • Conclusion
Outline • Introduction – Coupled data – Research questions • Method – Simultaneous component method – Problem – Solution: DISCO-GSCA • Illustration – Results • Conclusion
Simultaneous Component Analysis • Finding underlying mechanisms in – ONE data block Principal Component Analysis (PCA, Jolliffe, 2002) – More data blocks Simultaneous Component Analysis (SCA, Van Deun et al., 2009)
Simultaneous Component Analysis 1 . LC/MS GC/MS . . I 1 … J 1 1 … J 2
Simultaneous Component Analysis LC/MS GC/MS 1 . . LC/MS GC/MS . I 1 … J 1+J2
Simultaneous Component Analysis LC/MS GC/MS 1 . . LC/MS GC/MS . I 1 … J 1+J2 X conc
Simultaneous Component Analysis LC/MS GC/MS 1 . . LC/MS GC/MS . I 1 … J 1+J2 conc = x + X ' ' P P T LC GC E E LC GC x + ' Data = Scores Loadings Error P E conc conc × ( + ) ×( + ) × I R R J J I J J × ( + ) I J J 1 2 1 2 1 2
Simultaneous Component Analysis LC/MS GC/MS 1 . . LC/MS GC/MS . I 1 … J 1+J2 conc = x + X ' ' P P T LC GC E E LC GC x + ' Data = Scores Loadings Error P E conc conc × ( + ) ×( + ) × I R R J J I J J × ( + ) I J J 1 2 1 2 1 2 2 ' Objective: min X - TP conc conc T,P conc
• Distinctive mechanisms = simultaneous components that underly only one data block • Common mechanisms = simultaneous components that underly both data blocks
• Distinctive mechanisms = simultaneous components that underly only one data block • Common mechanisms = simultaneous components that underly both data blocks • E.g., ) = ' X TP conc conc ⎡ ⎤ ' ' | = T P P ⎣ ⎦ LC GC [ ] ⎡ ⎤ L L 0 0 | x x x ⎢ ⎥ = ⎢ ⎥ M ⎢ ⎥ ⎣ ⎦ x
• Distinctive mechanisms = simultaneous components that underly only one data block • Common mechanisms = simultaneous components that underly both data blocks • E.g., ) = ' X TP conc conc ⎡ ⎤ ' ' | = T P P ⎣ ⎦ LC GC [ ] ⎡ ⎤ L L 0 0 | x x x ⎢ ⎥ = ⎢ ⎥ M ⎢ ⎥ ⎣ ⎦ x Distinctive component for GC/MS
• Distinctive mechanisms = simultaneous components that underly only one data block • Common mechanisms = simultaneous components that underly both data blocks • E.g., = ⎣ ⎡ ⎤ ' ' ' | P P P ⎦ conc LC GC ⎡ ⎤ L L | 0 0 x x ⎢ ⎥ = ⎢ L L 0 0 | x x ⎥ ⎢ ⎥ L L ⎣ | ⎦ x x x x
• Distinctive mechanisms = simultaneous components that underly only one data block • Common mechanisms = simultaneous components that underly both data blocks • E.g., ⎡ ⎤ = ⎣ ' ' ' | P P P ⎦ conc LC GC ⎡ ⎤ L L | 0 0 D1 x x ⎢ ⎥ = ⎢ L L D2 0 0 | x x ⎥ ⎢ ⎥ L L ⎣ | ⎦ x x x x C
Problem • Distinctive mechanisms = simultaneous components that underly only one data block • Common mechanisms = simultaneous components that underly both data blocks • E.g., ⎡ ⎤ = ⎣ ' ' ' | P P P ⎦ conc LC GC ⎡ ⎤ L L | 0 0 D1 x x ⎢ ⎥ = ⎢ L L D2 0 0 | x x ⎥ ⎢ ⎥ L L ⎣ | ⎦ x x x x C � However… SC method: obtaining such a pattern is outside control…
Problem • Distinctive mechanisms = simultaneous components that underly only one data block • Common mechanisms = simultaneous components that underly both data blocks • E.g., ⎡ ⎤ = ⎣ ' ' ' a g e a g e a g e | P P P t r t t r t t r t ⎦ conc LC GC ⎡ ⎤ L L | 0 0 D1 x x ⎢ ⎥ = ⎢ L L D2 0 0 | x x ⎥ ⎢ ⎥ L L ⎣ | ⎦ x x x x C � However… SC method: obtaining such a pattern is outside control…
Solution: DISCO-GSCA • Predecessors: – DISCO-SCA (Schouteden et al., 2010) – Grey Component Analysis (GCA, Westerhuis et al., 2007)
Solution: DISCO-GSCA λ - Impose target structure to a certain power ( ) ( ) 2 2 + λ • − ' target = min X - TP W P P ' T T I conc conc conc conc , T P conc
Solution: DISCO-GSCA λ - Impose target structure to a certain power ( ) ( ) 2 2 + λ • − ' target = min X - TP W P P ' T T I conc conc conc conc , T P conc ⎛ ⎞ ⎡ ⎤ p p p ⎡ ⎤ 0 x x 11 12 13 ⎜ ⎟ ⎢ ⎥ ⎢ ⎥ M M M ⎜ ⎟ ⎢ ⎥ M M M ⎢ ⎥ ⎜ ⎟ ⎢ ⎥ ⎢ ⎥ p p p 0 x x ⎜ ⎟ I 1 I 2 I 3 ⎢ ⎥ 1 1 1 − ⎢ ⎥ − − − ⎜ ⎟ − − − ⎢ ⎥ ⎢ ⎥ ⎜ ⎟ ⎢ ⎥ ⎢ ⎥ p p p 0 ⎜ x x ⎟ ⎢ ( ) ( ) ( ) ⎥ + + + I I 1 I I 2 I I 3 ⎢ ⎥ 1 2 1 2 1 2 ⎜ ⎟ ⎢ ⎥ M M M ⎢ ⎥ M M M ⎜ ⎟ ⎢ ⎥ ⎢ ⎥ ⎜ ⎟ ⎣ ⎦ 0 x x ⎢ ⎥ p p p ⎣ ⎦ ⎝ ( ) ( ) ( ) ⎠ + + + I I 1 I I 2 I I 3 1 2 1 2 1 2
Solution: DISCO-GSCA λ - Impose target structure to a certain power ( ) ( ) 2 2 + λ • − ' target = min X - TP W P P ' T T I conc conc conc conc , T P conc Elementwise product ⎛ ⎞ ⎡ ⎤ ⎡ ⎤ p p p ⎡ ⎤ 0 1 0 0 x x 11 12 13 ⎜ ⎟ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ M M M ⎜ ⎟ M M M ⎢ ⎥ M M M ⎢ ⎥ ⎢ ⎥ ⎜ ⎟ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ p p p 0 1 0 0 x x ⎜ ⎟ I 1 I 2 I 3 ⎢ ⎥ 1 1 1 ⎢ ⎥ • − ⎢ ⎥ − − − − − − ⎜ ⎟ − − − ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎜ ⎟ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ 1 0 0 p p p 0 ⎜ x x ⎟ ⎢ ( ) ( ) ( ) ⎥ + + + ⎢ ⎥ I I 1 I I 2 I I 3 ⎢ ⎥ 1 2 1 2 1 2 ⎜ ⎟ ⎢ ⎥ M M M ⎢ ⎥ M M M ⎢ ⎥ M M M ⎜ ⎟ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎜ ⎟ ⎣ 1 0 0 ⎦ ⎣ ⎦ 0 x x ⎢ ⎥ p p p ⎣ ⎦ ⎝ ( ) ( ) ( ) ⎠ + + + I I 1 I I 2 I I 3 1 2 1 2 1 2
Solution: DISCO-GSCA ( ) ( ) 2 2 + λ • − ' target = min X - TP W P P ' T T I conc conc conc conc , T P conc
Solution: DISCO-GSCA • Model selection: 3 steps – FIRST: Select the number of simultaneous components • (SCA, Van Deun et al., 2009) – SECOND: characterize these components • i.e., how many of them are common/distinctive? • (DISCO-SCA, Schouteden et al., 2010) – THIRD: define λ • L-curve (Hansen, 1992)
Outline • Introduction – Coupled data – Research questions • Method – Simultaneous component method – Problem – Solution: DISCO-GSCA • Illustration – Results • Conclusion
Outline • Introduction – Coupled data – Research questions • Method – Simultaneous component method – Problem – Solution: DISCO-GSCA • Illustration – Results • Conclusion
• Data: E. coli • Model: – 5 simultaneous components – Target: • 1 common component • 2 distinctive components for GC/MS • 2 distinctive components for LC/MS
Recommend
More recommend