inferring sparse gaussian graphical models for biological
play

Inferring Sparse Gaussian Graphical Models for Biological Network - PowerPoint PPT Presentation

Inferring Sparse Gaussian Graphical Models for Biological Network Christophe Ambroise Camille Charbonnier, Julien Chiquet, Gilles Grasseau, Catherine Matias, Yves Grandvalet Laboratoire Statistique et G enome, e d UMR CNRS 8071 -


  1. Neighborhood vs. Likelihood Pseudo-likelihood (Besag, 1975) p � P ( X 1 , . . . , X p ) ≃ P ( X j |{ X k } k � = j ) j =1 � SD − 1 Θ 2 � L ( Θ ; S ) = n 2 log det( D ) − n − n � 2 trace 2 log(2 π ) L ( Θ ; S ) = n 2 log det( Θ ) − n − n 2 trace( SΘ ) 2 log(2 π ) with D = diag( Θ ) Proposition Neighborhood selection leads to the graph maximizing the penalized pseudo-log-likelihood Proof: � � � θ ij � , where � i = − Θ maximizes the penalized pseudo-log-likelihood β j � θ jj SIMoNe: inferring structured Gaussian networks 13

  2. Penalized log-likelihood Banerjee et al. , JMLR 2008 ˆ Θ λ = arg max L iid ( Θ ; S ) − λ � Θ � ℓ 1 , Θ efficiently solved by the graphical L ASSO of Friedman et al , 2008. Ambroise, Chiquet, Matias, EJS 2009 Use adaptive penalty parameters for different coefficients L iid ( Θ ; S ) − λ � P Z ⋆ Θ � ℓ 1 , where P Z is a matrix of weights depending on the underlying clustering Z . � Works with the pseudo log-likelihood (computationally efficient). SIMoNe: inferring structured Gaussian networks 14

  3. Penalized log-likelihood Banerjee et al. , JMLR 2008 ˆ Θ λ = arg max L iid ( Θ ; S ) − λ � Θ � ℓ 1 , Θ efficiently solved by the graphical L ASSO of Friedman et al , 2008. Ambroise, Chiquet, Matias, EJS 2009 Use adaptive penalty parameters for different coefficients ˜ L iid ( Θ ; S ) − λ � P Z ⋆ Θ � ℓ 1 , where P Z is a matrix of weights depending on the underlying clustering Z . � Works with the pseudo log-likelihood (computationally efficient). SIMoNe: inferring structured Gaussian networks 14

  4. Outline Statistical models Steady-state data Time-course data Multitask learning Group-L ASSO Coop-L ASSO Algorithms and methods Overall view Model selection Latent structure Numerical experiments Performance on simulated data R package demo: the breast cancer data set SIMoNe: inferring structured Gaussian networks 15

  5. The Gaussian model for time-course data (1) Let X 1 , . . . , X n be a first order vector autoregressive process t ∈ [1 , n ] X t = Θ X t − 1 + b + ε t , where we are looking for Θ = ( θ ij ) i,j ∈P and ◮ X 0 ∼ N ( 0 p , Σ 0 ) , ◮ ε t is a Gaussian white noise with covariance σ 2 I p , ◮ cov( X t , ε s ) = 0 for s > t , so that X t is markovian. Graphical interpretation since θ ij = cov ( X t ( i ) , X t − 1 ( j ) | X t − 1 ( P\ j )) , var ( X t − 1 ( j ) | X t − 1 ( P\ j ))  θ ij = 0  X t ( i ) ⊥ ⊥ X t − 1 ( j ) | X t − 1 ( P\ j ) ⇔ or  edge ( j � i ) / ∈ network SIMoNe: inferring structured Gaussian networks 16

  6. The Gaussian model for time-course data (2) Interpr´ etation Homogeneous Markov Process SIMoNe: inferring structured Gaussian networks 17

  7. The Gaussian model for time-course data (3) Let ◮ X be the n × p matrix whose k th row is X k , ◮ S = n − 1 X ⊺ \ n X \ n be the within time covariance matrix, ◮ V = n − 1 X ⊺ \ n X \ 0 be the across time covariance matrix. The log-likelihood L time ( Θ ; S , V ) = n Trace ( VΘ ) − n 2 Trace ( Θ ⊺ SΘ ) + c. � The MLE = S − 1 V of Θ is still not defined for n < p . SIMoNe: inferring structured Gaussian networks 18

  8. Penalized log-likelihood Charbonnier, Chiquet, Ambroise, SAGMB 2010 ˆ Θ λ = arg max L time ( Θ ; S , V ) − λ � P Z ⋆ Θ � ℓ 1 Θ where P Z is a (non-symmetric) matrix of weights depending on the underlying clustering Z . Major difference with the i.i.d. case The graph is directed: θ ij = cov ( X t ( i ) , X t − 1 ( j ) | X t − 1 ( P\ j )) var ( X t − 1 ( j ) | X t − 1 ( P\ j )) � = cov ( X t ( j ) , X t − 1 ( i ) | X t − 1 ( P\ i )) . var ( X t − 1 ( i ) | X t − 1 ( P\ i )) SIMoNe: inferring structured Gaussian networks 19

  9. Outline Statistical models Steady-state data Time-course data Multitask learning Group-L ASSO Coop-L ASSO Algorithms and methods Overall view Model selection Latent structure Numerical experiments Performance on simulated data R package demo: the breast cancer data set SIMoNe: inferring structured Gaussian networks 20

  10. Coupling related problems Consider ◮ T samples concerning the expressions of the same p genes, n t is the t th sample drawn from N ( 0 p , Σ ( t ) ) , with ◮ X ( t ) 1 , . . . , X ( t ) covariance matrix S ( t ) . Multiple samples setup Go to scheme Ignoring the relationships between the tasks leads to T � L ( Θ ( t ) ; S ( t ) ) − λ pen ℓ 1 ( Θ ( t ) , Z ) . arg max Θ ( t ) ,t =1 ...,T t =1 Breaking the separability ◮ Either by modifying the objective function ◮ or the constraints. SIMoNe: inferring structured Gaussian networks 21

  11. Coupling related problems Consider ◮ T samples concerning the expressions of the same p genes, n t is the t th sample drawn from N ( 0 p , Σ ( t ) ) , with ◮ X ( t ) 1 , . . . , X ( t ) covariance matrix S ( t ) . Multiple samples setup Go to scheme Ignoring the relationships between the tasks leads to T � L ( Θ ( t ) ; S ( t ) ) − λ pen ℓ 1 ( Θ ( t ) , Z ) . arg max Θ ( t ) ,t =1 ...,T t =1 Breaking the separability ◮ Either by modifying the objective function ◮ or the constraints. SIMoNe: inferring structured Gaussian networks 21

  12. Coupling related problems Consider ◮ T samples concerning the expressions of the same p genes, n t is the t th sample drawn from N ( 0 p , Σ ( t ) ) , with ◮ X ( t ) 1 , . . . , X ( t ) covariance matrix S ( t ) . Remarks Multiple samples setup ◮ In the sequel, the Z is eluded for clarity (no loss of generality). Go to scheme Ignoring the relationships between the tasks leads to ◮ Multitask learning is easily adapted to time-course data yet only steady state version is presented here. T � L ( Θ ( t ) ; S ( t ) ) − λ pen ℓ 1 ( Θ ( t ) , Z ) . arg max Θ ( t ) ,t =1 ...,T t =1 Breaking the separability ◮ Either by modifying the objective function ◮ or the constraints. SIMoNe: inferring structured Gaussian networks 21

  13. Coupling problems through the objective function The Intertwined L ASSO � T L ( Θ ( t ) ; ˜ ˜ S ( t ) ) − λ � Θ ( t ) � ℓ 1 max Θ ( t ) ,t...,T t =1 � T t =1 n t S ( t ) is an “across-task” covariance matrix. ◮ ¯ S = 1 n S ( t ) = α S ( t ) + (1 − α )¯ ◮ ˜ S is a mixture between inner/over-tasks covariance matrices. � setting α = 0 is equivalent to pooling all the data and infer one common network, � setting α = 1 is equivalent to treating T independent problems. SIMoNe: inferring structured Gaussian networks 22

  14. Outline Statistical models Steady-state data Time-course data Multitask learning Group-L ASSO Coop-L ASSO Algorithms and methods Overall view Model selection Latent structure Numerical experiments Performance on simulated data R package demo: the breast cancer data set SIMoNe: inferring structured Gaussian networks 23

  15. Coupling Through Penalties Group-L ASSO X 1 X 2 We group parameters by sets of corresponding edges across graphs: X 3 X 4 Graphical Group-L ASSO � T � 1 / 2 � Θ ( t ) ; S ( t ) � � � 2 T � � � θ ( t ) ˜ max L − λ . ij Θ ( t ) ,t...,T t =1 i,j ∈P t =1 i � = j ⌣ Most relationships between the genes are kept or removed across all tasks simultaneously. SIMoNe: inferring structured Gaussian networks 24

  16. Coupling Through Penalties Group-L ASSO X 1 X 2 X 1 X 2 X 1 X 2 X 1 X 2 We group parameters by sets of corresponding edges across graphs: X 3 X 4 X 3 X 4 X 3 X 4 X 3 X 4 Graphical Group-L ASSO � T � 1 / 2 � Θ ( t ) ; S ( t ) � � � 2 T � � � θ ( t ) ˜ max L − λ . ij Θ ( t ) ,t...,T t =1 i,j ∈P t =1 i � = j ⌣ Most relationships between the genes are kept or removed across all tasks simultaneously. SIMoNe: inferring structured Gaussian networks 24

  17. Coupling Through Penalties Group-L ASSO X 1 X 2 X 1 X 2 X 1 X 2 X 1 X 2 We group parameters by sets of corresponding edges across graphs: X 3 X 4 X 3 X 4 X 3 X 4 X 3 X 4 Graphical Group-L ASSO � T � 1 / 2 � Θ ( t ) ; S ( t ) � � � 2 T � � � θ ( t ) ˜ max L − λ . ij Θ ( t ) ,t...,T t =1 i,j ∈P t =1 i � = j ⌣ Most relationships between the genes are kept or removed across all tasks simultaneously. SIMoNe: inferring structured Gaussian networks 24

  18. Coupling Through Penalties Group-L ASSO X 1 X 2 X 1 X 2 X 1 X 2 X 1 X 2 We group parameters by sets of corresponding edges across graphs: X 3 X 4 X 3 X 4 X 3 X 4 X 3 X 4 Graphical Group-L ASSO � T � 1 / 2 � Θ ( t ) ; S ( t ) � � � 2 T � � � θ ( t ) ˜ max L − λ . ij Θ ( t ) ,t...,T t =1 i,j ∈P t =1 i � = j ⌣ Most relationships between the genes are kept or removed across all tasks simultaneously. SIMoNe: inferring structured Gaussian networks 24

  19. Coupling Through Penalties Group-L ASSO X 1 X 2 X 1 X 2 X 1 X 2 X 1 X 2 We group parameters by sets of corresponding edges across graphs: X 3 X 4 X 3 X 4 X 3 X 4 X 3 X 4 Graphical Group-L ASSO � T � 1 / 2 � Θ ( t ) ; S ( t ) � � � 2 T � � � θ ( t ) ˜ max L − λ . ij Θ ( t ) ,t...,T t =1 i,j ∈P t =1 i � = j ⌣ Most relationships between the genes are kept or removed across all tasks simultaneously. SIMoNe: inferring structured Gaussian networks 24

  20. A Geometric View of Sparsity Constrained Optimization L ( β 1 , β 2 ) max β 1 ,β 2 L ( β 1 , β 2 ) − λ Ω( β 1 , β 2 ) β 2 β 1 SIMoNe: inferring structured Gaussian networks 25

  21. A Geometric View of Sparsity Constrained Optimization L ( β 1 , β 2 ) max β 1 ,β 2 L ( β 1 , β 2 ) − λ Ω( β 1 , β 2 ) � max L ( β 1 , β 2 ) ⇔ β 1 ,β 2 s . t . Ω( β 1 , β 2 ) ≤ c β 2 β 1 SIMoNe: inferring structured Gaussian networks 25

  22. A Geometric View of Sparsity Constrained Optimization max β 1 ,β 2 L ( β 1 , β 2 ) − λ Ω( β 1 , β 2 ) � max L ( β 1 , β 2 ) ⇔ β 1 ,β 2 Ω( β 1 , β 2 ) ≤ c s . t . β 2 β 1 SIMoNe: inferring structured Gaussian networks 25

  23. Group-L ASSO balls β (2) β (2) = 0 = 0 . 3 2 2 Admissible set for ◮ 2 tasks ( T = 2 ) ◮ 2 coefficients ( p = 2 ) 1 1 � 2 � 1 / 2 = 0 2 � � 2 β (1) β (1) β ( t ) ≤ 1 2 2 −1 1 −1 1 β (2) j 1 j =1 t =1 −1 −1 β (1) β (1) 1 1 1 1 = 0 . 3 β (1) β (1) 2 2 −1 1 −1 1 β (2) 1 −1 −1 β (1) β (1) 1 1 SIMoNe: inferring structured Gaussian networks 26

  24. Group-L ASSO balls β (2) β (2) = 0 = 0 . 3 2 2 Admissible set for ◮ 2 tasks ( T = 2 ) ◮ 2 coefficients ( p = 2 ) 1 1 � 2 � 1 / 2 = 0 2 � � 2 β (1) β (1) β ( t ) ≤ 1 2 2 −1 1 −1 1 β (2) j 1 j =1 t =1 −1 −1 β (1) β (1) 1 1 1 1 = 0 . 3 β (1) β (1) 2 2 −1 1 −1 1 β (2) 1 −1 −1 β (1) β (1) 1 1 SIMoNe: inferring structured Gaussian networks 26

  25. Group-L ASSO balls β (2) β (2) = 0 = 0 . 3 2 2 Admissible set for ◮ 2 tasks ( T = 2 ) ◮ 2 coefficients ( p = 2 ) 1 1 � 2 � 1 / 2 = 0 2 � � 2 β (1) β (1) β ( t ) ≤ 1 2 2 −1 1 −1 1 β (2) j 1 j =1 t =1 −1 −1 β (1) β (1) 1 1 1 1 = 0 . 3 β (1) β (1) 2 2 −1 1 −1 1 β (2) 1 −1 −1 β (1) β (1) 1 1 SIMoNe: inferring structured Gaussian networks 26

  26. Group-L ASSO balls β (2) β (2) = 0 = 0 . 3 2 2 Admissible set for ◮ 2 tasks ( T = 2 ) ◮ 2 coefficients ( p = 2 ) 1 1 � 2 � 1 / 2 = 0 2 � � 2 β (1) β (1) β ( t ) ≤ 1 2 2 −1 1 −1 1 β (2) j 1 j =1 t =1 −1 −1 β (1) β (1) 1 1 1 1 = 0 . 3 β (1) β (1) 2 2 −1 1 −1 1 β (2) 1 −1 −1 β (1) β (1) 1 1 SIMoNe: inferring structured Gaussian networks 26

  27. Outline Statistical models Steady-state data Time-course data Multitask learning Group-L ASSO Coop-L ASSO Algorithms and methods Overall view Model selection Latent structure Numerical experiments Performance on simulated data R package demo: the breast cancer data set SIMoNe: inferring structured Gaussian networks 27

  28. Coupling Through Penalties Coop-L ASSO X 1 X 2 Same grouping, and bet that correlations are likely to be sign consistent Gene interactions are either inhibitory or X 3 X 4 activating across assays Graphical Coop-L ASSO � S ( t ) ; Θ ( t ) � T � ˜ max L Θ ( t ) ,t...,T t =1  � 1 / 2  � T � 1 / 2 � T   � � 2 � � 2 � � � θ ( t ) θ ( t ) − λ +  , ij ij  + − t =1 t =1 i,j ∈P i � = j where [ u ] + = max(0 , u ) and [ u ] − = min(0 , u ) . ⌣ Inside a group, interaction are most likeliky sign consistent ⌣ Plausible in many other situations SIMoNe: inferring structured Gaussian networks 28

  29. Coupling Through Penalties Coop-L ASSO X 1 X 2 Same grouping, and bet that correlations are likely to be sign consistent Gene interactions are either inhibitory or X 3 X 4 activating across assays Graphical Coop-L ASSO � S ( t ) ; Θ ( t ) � T � ˜ max L Θ ( t ) ,t...,T t =1  � 1 / 2  � T � 1 / 2 � T   � � 2 � � 2 � � � θ ( t ) θ ( t ) − λ +  , ij ij  + − t =1 t =1 i,j ∈P i � = j where [ u ] + = max(0 , u ) and [ u ] − = min(0 , u ) . ⌣ Inside a group, interaction are most likeliky sign consistent ⌣ Plausible in many other situations SIMoNe: inferring structured Gaussian networks 28

  30. Coupling Through Penalties Coop-L ASSO X 1 X 2 X 1 X 2 X 1 X 2 Same grouping, and bet that correlations X 1 X 2 are likely to be sign consistent Gene interactions are either inhibitory or X 3 X 4 activating across assays X 3 X 4 X 3 X 4 X 3 X 4 Graphical Coop-L ASSO � S ( t ) ; Θ ( t ) � T � ˜ max L Θ ( t ) ,t...,T t =1  � 1 / 2  � T � 1 / 2 � T   � � 2 � � 2 � � � θ ( t ) θ ( t ) − λ +  , ij ij  + − t =1 t =1 i,j ∈P i � = j where [ u ] + = max(0 , u ) and [ u ] − = min(0 , u ) . ⌣ Inside a group, interaction are most likeliky sign consistent ⌣ Plausible in many other situations SIMoNe: inferring structured Gaussian networks 28

  31. Coupling Through Penalties Coop-L ASSO X 1 X 2 X 1 X 2 X 1 X 2 Same grouping, and bet that correlations X 1 X 2 are likely to be sign consistent Gene interactions are either inhibitory or X 3 X 4 activating across assays X 3 X 4 X 3 X 4 X 3 X 4 Graphical Coop-L ASSO � S ( t ) ; Θ ( t ) � T � ˜ max L Θ ( t ) ,t...,T t =1  � 1 / 2  � T � 1 / 2 � T   � � 2 � � 2 � � � θ ( t ) θ ( t ) − λ +  , ij ij  + − t =1 t =1 i,j ∈P i � = j where [ u ] + = max(0 , u ) and [ u ] − = min(0 , u ) . ⌣ Inside a group, interaction are most likeliky sign consistent ⌣ Plausible in many other situations SIMoNe: inferring structured Gaussian networks 28

  32. Coupling Through Penalties Coop-L ASSO X 1 X 2 X 1 X 2 X 1 X 2 Same grouping, and bet that correlations X 1 X 2 are likely to be sign consistent Gene interactions are either inhibitory or X 3 X 4 activating across assays X 3 X 4 X 3 X 4 X 3 X 4 Graphical Coop-L ASSO � S ( t ) ; Θ ( t ) � T � ˜ max L Θ ( t ) ,t...,T t =1  � 1 / 2  � T � 1 / 2 � T   � � 2 � � 2 � � � θ ( t ) θ ( t ) − λ +  , ij ij  + − t =1 t =1 i,j ∈P i � = j where [ u ] + = max(0 , u ) and [ u ] − = min(0 , u ) . ⌣ Inside a group, interaction are most likeliky sign consistent ⌣ Plausible in many other situations SIMoNe: inferring structured Gaussian networks 28

  33. Coupling Through Penalties Coop-L ASSO X 1 X 2 X 1 X 2 X 1 X 2 Same grouping, and bet that correlations X 1 X 2 are likely to be sign consistent Gene interactions are either inhibitory or X 3 X 4 activating across assays X 3 X 4 X 3 X 4 X 3 X 4 Graphical Coop-L ASSO � S ( t ) ; Θ ( t ) � T � ˜ max L Θ ( t ) ,t...,T t =1  � 1 / 2  � T � 1 / 2 � T   � � 2 � � 2 � � � θ ( t ) θ ( t ) − λ +  , ij ij  + − t =1 t =1 i,j ∈P i � = j where [ u ] + = max(0 , u ) and [ u ] − = min(0 , u ) . ⌣ Inside a group, interaction are most likeliky sign consistent ⌣ Plausible in many other situations SIMoNe: inferring structured Gaussian networks 28

  34. Coupling Through Penalties Coop-L ASSO X 1 X 2 X 1 X 2 X 1 X 2 Same grouping, and bet that correlations X 1 X 2 are likely to be sign consistent Gene interactions are either inhibitory or X 3 X 4 activating across assays X 3 X 4 X 3 X 4 X 3 X 4 Graphical Coop-L ASSO � S ( t ) ; Θ ( t ) � T � ˜ max L Θ ( t ) ,t...,T t =1  � 1 / 2  � T � 1 / 2 � T   � � 2 � � 2 � � � θ ( t ) θ ( t ) − λ +  , ij ij  + − t =1 t =1 i,j ∈P i � = j where [ u ] + = max(0 , u ) and [ u ] − = min(0 , u ) . ⌣ Inside a group, interaction are most likeliky sign consistent ⌣ Plausible in many other situations SIMoNe: inferring structured Gaussian networks 28

  35. Coop-L ASSO balls β (2) β (2) = 0 = 0 . 3 2 2 Admissible set for ◮ 2 tasks ( T = 2 ) ◮ 2 coefficients ( p = 2 ) 1 1 = 0 β (1) β (1) � 2 � 1 / 2 2 2 � � 2 −1 1 −1 1 2 � � β (2) β ( t ) 1 j −1 −1 + β (1) β (1) j =1 t =1 1 1 � 2 � 1 / 2 � � 2 2 � � − β ( t ) + ≤ 1 1 1 j + j =1 t =1 = 0 . 3 β (1) β (1) 2 2 −1 1 −1 1 β (2) 1 −1 −1 β (1) β (1) 1 1 SIMoNe: inferring structured Gaussian networks 29

  36. Coop-L ASSO balls β (2) β (2) = 0 = 0 . 3 2 2 Admissible set for ◮ 2 tasks ( T = 2 ) ◮ 2 coefficients ( p = 2 ) 1 1 = 0 β (1) β (1) � 2 � 1 / 2 2 2 � � 2 −1 1 −1 1 2 � � β (2) β ( t ) 1 j −1 −1 + β (1) β (1) j =1 t =1 1 1 � 2 � 1 / 2 � � 2 2 � � − β ( t ) + ≤ 1 1 1 j + j =1 t =1 = 0 . 3 β (1) β (1) 2 2 −1 1 −1 1 β (2) 1 −1 −1 β (1) β (1) 1 1 SIMoNe: inferring structured Gaussian networks 29

  37. Coop-L ASSO balls β (2) β (2) = 0 = 0 . 3 2 2 Admissible set for ◮ 2 tasks ( T = 2 ) ◮ 2 coefficients ( p = 2 ) 1 1 = 0 β (1) β (1) � 2 � 1 / 2 2 2 � � 2 −1 1 −1 1 2 � � β (2) β ( t ) 1 j −1 −1 + β (1) β (1) j =1 t =1 1 1 � 2 � 1 / 2 � � 2 2 � � − β ( t ) + ≤ 1 1 1 j + j =1 t =1 = 0 . 3 β (1) β (1) 2 2 −1 1 −1 1 β (2) 1 −1 −1 β (1) β (1) 1 1 SIMoNe: inferring structured Gaussian networks 29

  38. Coop-L ASSO balls β (2) β (2) = 0 = 0 . 3 2 2 Admissible set for ◮ 2 tasks ( T = 2 ) ◮ 2 coefficients ( p = 2 ) 1 1 = 0 β (1) β (1) � 2 � 1 / 2 2 2 � � 2 −1 1 −1 1 2 � � β (2) β ( t ) 1 j −1 −1 + β (1) β (1) j =1 t =1 1 1 � 2 � 1 / 2 � � 2 2 � � − β ( t ) + ≤ 1 1 1 j + j =1 t =1 = 0 . 3 β (1) β (1) 2 2 −1 1 −1 1 β (2) 1 −1 −1 β (1) β (1) 1 1 SIMoNe: inferring structured Gaussian networks 29

  39. Coop-L ASSO balls β (2) β (2) = 0 = 0 . 3 2 2 Admissible set for ◮ 2 tasks ( T = 2 ) ◮ 2 coefficients ( p = 2 ) 1 1 = 0 β (1) β (1) � 2 � 1 / 2 2 2 � � 2 −1 1 −1 1 2 � � β (2) β ( t ) 1 j −1 −1 + β (1) β (1) j =1 t =1 1 1 � 2 � 1 / 2 � � 2 2 � � − β ( t ) + ≤ 1 1 1 j + j =1 t =1 = 0 . 3 β (1) β (1) 2 2 −1 1 −1 1 β (2) 1 −1 −1 β (1) β (1) 1 1 SIMoNe: inferring structured Gaussian networks 29

  40. Coop-L ASSO balls β (2) β (2) = 0 = 0 . 3 2 2 Admissible set for ◮ 2 tasks ( T = 2 ) ◮ 2 coefficients ( p = 2 ) 1 1 = 0 β (1) β (1) � 2 � 1 / 2 2 2 � � 2 −1 1 −1 1 2 � � β (2) β ( t ) 1 j −1 −1 + β (1) β (1) j =1 t =1 1 1 � 2 � 1 / 2 � � 2 2 � � − β ( t ) + ≤ 1 1 1 j + j =1 t =1 = 0 . 3 β (1) β (1) 2 2 −1 1 −1 1 β (2) 1 −1 −1 β (1) β (1) 1 1 SIMoNe: inferring structured Gaussian networks 29

  41. Outline Statistical models Steady-state data Time-course data Multitask learning Group-L ASSO Coop-L ASSO Algorithms and methods Overall view Model selection Latent structure Numerical experiments Performance on simulated data R package demo: the breast cancer data set SIMoNe: inferring structured Gaussian networks 30

  42. The overall strategy Our basic criteria is of the form L ( Θ ; data) − λ � P Z ⋆ Θ � ℓ 1 . What we are looking for ◮ the edges, through Θ , ◮ the correct level of sparsity λ , ◮ the underlying clustering Z with connectivity matrix π Z . What SIMoNe does 1. Infer a family of networks G = { Θ λ : λ ∈ [ λ max , 0] } 2. Select G ⋆ that maximizes an information criteria 3. Learn Z on the selected network G ⋆ 4. Infer a family of networks with P Z ∝ 1 − π Z 5. Select G ⋆ Z that maximizes an information criteria SIMoNe: inferring structured Gaussian networks 31

  43. The overall strategy Our basic criteria is of the form L ( Θ ; data) − λ � P Z ⋆ Θ � ℓ 1 . What we are looking for ◮ the edges, through Θ , ◮ the correct level of sparsity λ , ◮ the underlying clustering Z with connectivity matrix π Z . What SIMoNe does 1. Infer a family of networks G = { Θ λ : λ ∈ [ λ max , 0] } 2. Select G ⋆ that maximizes an information criteria 3. Learn Z on the selected network G ⋆ 4. Infer a family of networks with P Z ∝ 1 − π Z 5. Select G ⋆ Z that maximizes an information criteria SIMoNe: inferring structured Gaussian networks 31

  44. SIMoNe Suppose you want to recover a clustered network: SIMoNE Target Adjacency Matrix Target Network SIMoNe: inferring structured Gaussian networks 32

  45. SIMoNe Start with microarray data SIMoNE Data SIMoNe: inferring structured Gaussian networks 32

  46. SIMoNe SIMoNE SIMoNE without prior Adjacency Matrix Data corresponding to G ⋆ SIMoNe: inferring structured Gaussian networks 32

  47. SIMoNe SIMoNE SIMoNE without prior Adjacency Matrix Penalty matrix PZ Data corresponding to G ⋆ Decreasing transformation Mixer π Z Connectivity matrix SIMoNe: inferring structured Gaussian networks 32

  48. SIMoNe SIMoNE SIMoNE SIMoNE without prior + Adjacency Matrix Adjacency Matrix Penalty matrix PZ Data corresponding to G ⋆ corresponding to G ⋆ Z Decreasing transformation Mixer π Z Connectivity matrix SIMoNe: inferring structured Gaussian networks 32

  49. Outline Statistical models Steady-state data Time-course data Multitask learning Group-L ASSO Coop-L ASSO Algorithms and methods Overall view Model selection Latent structure Numerical experiments Performance on simulated data R package demo: the breast cancer data set SIMoNe: inferring structured Gaussian networks 33

  50. Tuning the penalty parameter What does the literature say? Theory based penalty choices 1. Optimal order of penalty in the p ≫ n framework: √ n log p Bunea et al. 2007, Bickel et al. 2009 2. Control on the probability of connecting two distinct connectivity sets Meinshausen et al. 2006, Banerjee et al. 2008, Ambroise et al. 2009 � practically much too conservative Cross-validation ◮ Optimal in terms of prediction, not in terms of selection ◮ Problematic with small samples: changes the sparsity constraint due to sample size SIMoNe: inferring structured Gaussian networks 34

  51. Tuning the penalty parameter BIC / AIC Theorem (Zou et al. 2008) � � � � df(ˆ β lasso � ˆ β lasso ) = � λ λ 0 Straightforward extensions to the graphical framework Θ λ )log n BIC( λ ) = L ( ˆ Θ λ ; X ) − df( ˆ 2 AIC( λ ) = L ( ˆ Θ λ ; X ) − df( ˆ Θ λ ) ◮ Rely on asymptotic approximations, but still relevant for small data set ◮ Easily adapted to L iid , ˜ L iid , L time and multitask framework. SIMoNe: inferring structured Gaussian networks 35

  52. Outline Statistical models Steady-state data Time-course data Multitask learning Group-L ASSO Coop-L ASSO Algorithms and methods Overall view Model selection Latent structure Numerical experiments Performance on simulated data R package demo: the breast cancer data set SIMoNe: inferring structured Gaussian networks 36

  53. MixNet Erd¨ os-R´ enyi Mix ture for Net works The data is now the network itself A = ( a ij = 1 { θ ij � =0 } . ) i,j ∈P adjacency matrix associated to Θ : π •• π •• 6 6 5 5 7 7 n = 10 , Z 5 • = 1 3 π •• a 12 = 1 , a 15 = 0 4 4 8 8 1 2 10 π •• π •• 9 Binary case ◮ Q groups (=colors ••• ). ◮ { Z i } 1 ≤ i ≤ n i.i.d. vectors Z i = ( Z i 1 , . . . , Z iQ ) ∼ M (1 , α ) ◮ Conditional on the { Z i } ’s, the random variables A ij are independent B ( π Z i Z j ) . SIMoNe: inferring structured Gaussian networks 37

  54. MixNet Erd¨ os-R´ enyi Mix ture for Net works The data is now the network itself A = ( a ij = 1 { θ ij � =0 } . ) i,j ∈P adjacency matrix associated to Θ : π •• π •• 6 6 5 5 7 7 n = 10 , Z 5 • = 1 3 π •• a 12 = 1 , a 15 = 0 4 4 8 8 1 2 10 π •• π •• 9 Binary case ◮ Q groups (=colors ••• ). ◮ { Z i } 1 ≤ i ≤ n i.i.d. vectors Z i = ( Z i 1 , . . . , Z iQ ) ∼ M (1 , α ) ◮ Conditional on the { Z i } ’s, the random variables A ij are independent B ( π Z i Z j ) . SIMoNe: inferring structured Gaussian networks 37

  55. MixNet Erd¨ os-R´ enyi Mix ture for Net works The data is now the network itself A = ( a ij = 1 { θ ij � =0 } . ) i,j ∈P adjacency matrix associated to Θ : π •• π •• 6 6 5 5 7 7 n = 10 , Z 5 • = 1 3 π •• a 12 = 1 , a 15 = 0 4 4 8 8 1 2 10 π •• π •• 9 Binary case ◮ Q groups (=colors ••• ). ◮ { Z i } 1 ≤ i ≤ n i.i.d. vectors Z i = ( Z i 1 , . . . , Z iQ ) ∼ M (1 , α ) ◮ Conditional on the { Z i } ’s, the random variables A ij are independent B ( π Z i Z j ) . SIMoNe: inferring structured Gaussian networks 37

  56. MixNet Erd¨ os-R´ enyi Mix ture for Net works The data is now the network itself A = ( a ij = 1 { θ ij � =0 } . ) i,j ∈P adjacency matrix associated to Θ : π •• π •• 6 6 5 5 7 7 n = 10 , Z 5 • = 1 3 π •• a 12 = 1 , a 15 = 0 4 4 8 8 1 2 10 π •• π •• 9 Binary case ◮ Q groups (=colors ••• ). ◮ { Z i } 1 ≤ i ≤ n i.i.d. vectors Z i = ( Z i 1 , . . . , Z iQ ) ∼ M (1 , α ) ◮ Conditional on the { Z i } ’s, the random variables A ij are independent B ( π Z i Z j ) . SIMoNe: inferring structured Gaussian networks 37

  57. Estimation strategy Likelihoods ◮ the observed data: P ( A | α , π ) = � Z P ( A , Z | α , π ) . ◮ the complete data: P ( A , Z | α , π ) . The EM criteria � log P ( A , Z | α , π ) | A ′ � E . � requires P ( Z | A , α , π ) which is not tractable! SIMoNe: inferring structured Gaussian networks 38

  58. Variational inference Principle Approximate P ( Z | A , α , π ) by R τ ( Z ) chosen to minimize KL( R τ ( Z ); P ( Z | A , α , π )) , where R τ is such as log R τ ( Z ) = � iq Z iq log τ iq and τ are the variational parameters to optimize. Variational Bayes (Latouche et al. ) ◮ Appropriate priors on α and π , ◮ Good performances especially for the choice of Q and is thus relevant in the SIMoNe context. SIMoNe: inferring structured Gaussian networks 39

  59. Outline Statistical models Steady-state data Time-course data Multitask learning Group-L ASSO Coop-L ASSO Algorithms and methods Overall view Model selection Latent structure Numerical experiments Performance on simulated data R package demo: the breast cancer data set SIMoNe: inferring structured Gaussian networks 40

  60. Example 1: time-course data with star-pattern Simulation settings 1. 50 networks with p = 100 edges, time series of length n = 100 , 2. two classes, hubs and leaves, with proportions α = (0 . 1 , 0 . 9) , 3. P ( hub to leaf ) = 0 . 3 , P ( hub to hub ) = 0 . 1 , 0 otherwise. SIMoNe: inferring structured Gaussian networks 41

  61. Example 1: time-course data with star-pattern Simulation settings 1. 50 networks with p = 100 edges, time series of length n = 100 , 2. two classes, hubs and leaves, with proportions α = (0 . 1 , 0 . 9) , 3. P ( hub to leaf ) = 0 . 3 , P ( hub to hub ) = 0 . 1 , 0 otherwise. Boxplot of Precision values, without and with structure inference 0.8 0.6 precision = TP/(TP+FP) 0.4 0.2 precision wocl.BIC precision wocl.AIC SIMoNe: inferring structured Gaussian networks 41

  62. Example 1: time-course data with star-pattern Simulation settings 1. 50 networks with p = 100 edges, time series of length n = 100 , 2. two classes, hubs and leaves, with proportions α = (0 . 1 , 0 . 9) , 3. P ( hub to leaf ) = 0 . 3 , P ( hub to hub ) = 0 . 1 , 0 otherwise. Boxplot of Recall values, without and with structure inference 1.0 0.8 0.6 recall = TP/P (power) 0.4 0.2 recall wocl.BIC recall wcl.BIC recall wocl.AIC recall wcl.AIC SIMoNe: inferring structured Gaussian networks 41

  63. Example 1: time-course data with star-pattern Simulation settings 1. 50 networks with p = 100 edges, time series of length n = 100 , 2. two classes, hubs and leaves, with proportions α = (0 . 1 , 0 . 9) , 3. P ( hub to leaf ) = 0 . 3 , P ( hub to hub ) = 0 . 1 , 0 otherwise. Boxplot of Fallout values, without and with structure inference 0.04 ● ● ● ● ● ● ● ● 0.03 ● ● ● ● fallout = FP/N (type I error) 0.02 0.01 0.00 fallout wocl.BIC fallout wcl.BIC fallout wocl.AIC fallout wcl.AIC SIMoNe: inferring structured Gaussian networks 41

  64. Example 2: steady-state, multitask framework Simulating the tasks 1. generate a “ancestor” with p = 20 node and K = 20 edges, 2. generate T = 4 children by adding and deleting δ edges, 3. generate T = 4 Gaussian samples. Figure: ancestor and children with δ perturbations SIMoNe: inferring structured Gaussian networks 42

  65. Example 2: steady-state, multitask framework Simulating the tasks 1. generate a “ancestor” with p = 20 node and K = 20 edges, 2. generate T = 4 children by adding and deleting δ edges, 3. generate T = 4 Gaussian samples. Figure: ancestor and children with δ perturbations SIMoNe: inferring structured Gaussian networks 42

  66. Example 2: steady-state, multitask framework Simulating the tasks 1. generate a “ancestor” with p = 20 node and K = 20 edges, 2. generate T = 4 children by adding and deleting δ edges, 3. generate T = 4 Gaussian samples. Figure: ancestor and children with δ perturbations SIMoNe: inferring structured Gaussian networks 42

  67. Example 2: steady-state, multitask framework Simulating the tasks 1. generate a “ancestor” with p = 20 node and K = 20 edges, 2. generate T = 4 children by adding and deleting δ edges, 3. generate T = 4 Gaussian samples. Figure: ancestor and children with δ perturbations SIMoNe: inferring structured Gaussian networks 42

  68. Results Precision/Recall curve precision = TP/(TP+FP) recall = TP/P (power) SIMoNe: inferring structured Gaussian networks 43

  69. Results Large sample size λ max − → 0 1.0 ● ● ● ● ● ● 0.8 ● precision ● 0.6 ● 0.4 ● CoopLasso GroupLasso ● Intertwined 0.2 ● ● Independent ● ● ● ● Pooled ● 0.0 0.2 0.4 0.6 0.8 1.0 recall Figure: n t = 100 δ = 1 SIMoNe: inferring structured Gaussian networks 43

  70. Results Large sample size λ max − → 0 1.0 ● ● ● ● ● ● 0.8 ● precision ● 0.6 ● 0.4 ● CoopLasso GroupLasso ● Intertwined 0.2 ● ● Independent ● ● ● ● Pooled ● 0.0 0.2 0.4 0.6 0.8 1.0 recall Figure: n t = 100 δ = 5 SIMoNe: inferring structured Gaussian networks 43

  71. Results Medium sample size λ max − → 0 1.0 ● ● ● ● ● 0.8 precision ● 0.6 ● ● 0.4 ● CoopLasso ● GroupLasso ● 0.2 Intertwined ● ● Independent ●● ● ● Pooled ● ● 0.0 0.2 0.4 0.6 0.8 1.0 recall Figure: n t = 50 δ = 1 SIMoNe: inferring structured Gaussian networks 43

  72. Results Medium sample size λ max − → 0 1.0 ● ● ● ● ● 0.8 precision ● 0.6 ● ● 0.4 ● CoopLasso ● GroupLasso ● 0.2 Intertwined ● ● Independent ●● ● ● Pooled ● ● 0.0 0.2 0.4 0.6 0.8 1.0 recall Figure: n t = 50 δ = 5 SIMoNe: inferring structured Gaussian networks 43

  73. Results Small sample size λ max − → 0 1.0 ● ● 0.8 ● precision ● 0.6 ● ● 0.4 ● ● CoopLasso ● GroupLasso ● 0.2 Intertwined ● ● ● ● ● ● ● ● ● ● Independent Pooled 0.0 0.2 0.4 0.6 0.8 1.0 recall Figure: n t = 25 δ = 1 SIMoNe: inferring structured Gaussian networks 43

  74. Results Small sample size λ max − → 0 1.0 ● ● 0.8 ● precision ● 0.6 ● ● 0.4 ● ● CoopLasso ● GroupLasso ● 0.2 Intertwined ● ● ● ● ● ● ● ● ● ● Independent Pooled 0.0 0.2 0.4 0.6 0.8 1.0 recall Figure: n t = 25 δ = 5 SIMoNe: inferring structured Gaussian networks 43

  75. Outline Statistical models Steady-state data Time-course data Multitask learning Group-L ASSO Coop-L ASSO Algorithms and methods Overall view Model selection Latent structure Numerical experiments Performance on simulated data R package demo: the breast cancer data set SIMoNe: inferring structured Gaussian networks 44

  76. Breast cancer Prediction of the outcome of preoperative chemotherapy Two types of patients Patient response can be classified as 1. either a pathologic complete response (PCR), 2. or residual disease (not PCR). Gene expression data ◮ 133 patients (99 not PCR, 34 PCR) ◮ 26 identified genes (differential analysis) SIMoNe: inferring structured Gaussian networks 45

Recommend


More recommend