Neighborhood vs. Likelihood Pseudo-likelihood (Besag, 1975) p � P ( X 1 , . . . , X p ) ≃ P ( X j |{ X k } k � = j ) j =1 � SD − 1 Θ 2 � L ( Θ ; S ) = n 2 log det( D ) − n − n � 2 trace 2 log(2 π ) L ( Θ ; S ) = n 2 log det( Θ ) − n − n 2 trace( SΘ ) 2 log(2 π ) with D = diag( Θ ) Proposition Neighborhood selection leads to the graph maximizing the penalized pseudo-log-likelihood Proof: � � � θ ij � , where � i = − Θ maximizes the penalized pseudo-log-likelihood β j � θ jj SIMoNe: inferring structured Gaussian networks 13
Penalized log-likelihood Banerjee et al. , JMLR 2008 ˆ Θ λ = arg max L iid ( Θ ; S ) − λ � Θ � ℓ 1 , Θ efficiently solved by the graphical L ASSO of Friedman et al , 2008. Ambroise, Chiquet, Matias, EJS 2009 Use adaptive penalty parameters for different coefficients L iid ( Θ ; S ) − λ � P Z ⋆ Θ � ℓ 1 , where P Z is a matrix of weights depending on the underlying clustering Z . � Works with the pseudo log-likelihood (computationally efficient). SIMoNe: inferring structured Gaussian networks 14
Penalized log-likelihood Banerjee et al. , JMLR 2008 ˆ Θ λ = arg max L iid ( Θ ; S ) − λ � Θ � ℓ 1 , Θ efficiently solved by the graphical L ASSO of Friedman et al , 2008. Ambroise, Chiquet, Matias, EJS 2009 Use adaptive penalty parameters for different coefficients ˜ L iid ( Θ ; S ) − λ � P Z ⋆ Θ � ℓ 1 , where P Z is a matrix of weights depending on the underlying clustering Z . � Works with the pseudo log-likelihood (computationally efficient). SIMoNe: inferring structured Gaussian networks 14
Outline Statistical models Steady-state data Time-course data Multitask learning Group-L ASSO Coop-L ASSO Algorithms and methods Overall view Model selection Latent structure Numerical experiments Performance on simulated data R package demo: the breast cancer data set SIMoNe: inferring structured Gaussian networks 15
The Gaussian model for time-course data (1) Let X 1 , . . . , X n be a first order vector autoregressive process t ∈ [1 , n ] X t = Θ X t − 1 + b + ε t , where we are looking for Θ = ( θ ij ) i,j ∈P and ◮ X 0 ∼ N ( 0 p , Σ 0 ) , ◮ ε t is a Gaussian white noise with covariance σ 2 I p , ◮ cov( X t , ε s ) = 0 for s > t , so that X t is markovian. Graphical interpretation since θ ij = cov ( X t ( i ) , X t − 1 ( j ) | X t − 1 ( P\ j )) , var ( X t − 1 ( j ) | X t − 1 ( P\ j )) θ ij = 0 X t ( i ) ⊥ ⊥ X t − 1 ( j ) | X t − 1 ( P\ j ) ⇔ or edge ( j � i ) / ∈ network SIMoNe: inferring structured Gaussian networks 16
The Gaussian model for time-course data (2) Interpr´ etation Homogeneous Markov Process SIMoNe: inferring structured Gaussian networks 17
The Gaussian model for time-course data (3) Let ◮ X be the n × p matrix whose k th row is X k , ◮ S = n − 1 X ⊺ \ n X \ n be the within time covariance matrix, ◮ V = n − 1 X ⊺ \ n X \ 0 be the across time covariance matrix. The log-likelihood L time ( Θ ; S , V ) = n Trace ( VΘ ) − n 2 Trace ( Θ ⊺ SΘ ) + c. � The MLE = S − 1 V of Θ is still not defined for n < p . SIMoNe: inferring structured Gaussian networks 18
Penalized log-likelihood Charbonnier, Chiquet, Ambroise, SAGMB 2010 ˆ Θ λ = arg max L time ( Θ ; S , V ) − λ � P Z ⋆ Θ � ℓ 1 Θ where P Z is a (non-symmetric) matrix of weights depending on the underlying clustering Z . Major difference with the i.i.d. case The graph is directed: θ ij = cov ( X t ( i ) , X t − 1 ( j ) | X t − 1 ( P\ j )) var ( X t − 1 ( j ) | X t − 1 ( P\ j )) � = cov ( X t ( j ) , X t − 1 ( i ) | X t − 1 ( P\ i )) . var ( X t − 1 ( i ) | X t − 1 ( P\ i )) SIMoNe: inferring structured Gaussian networks 19
Outline Statistical models Steady-state data Time-course data Multitask learning Group-L ASSO Coop-L ASSO Algorithms and methods Overall view Model selection Latent structure Numerical experiments Performance on simulated data R package demo: the breast cancer data set SIMoNe: inferring structured Gaussian networks 20
Coupling related problems Consider ◮ T samples concerning the expressions of the same p genes, n t is the t th sample drawn from N ( 0 p , Σ ( t ) ) , with ◮ X ( t ) 1 , . . . , X ( t ) covariance matrix S ( t ) . Multiple samples setup Go to scheme Ignoring the relationships between the tasks leads to T � L ( Θ ( t ) ; S ( t ) ) − λ pen ℓ 1 ( Θ ( t ) , Z ) . arg max Θ ( t ) ,t =1 ...,T t =1 Breaking the separability ◮ Either by modifying the objective function ◮ or the constraints. SIMoNe: inferring structured Gaussian networks 21
Coupling related problems Consider ◮ T samples concerning the expressions of the same p genes, n t is the t th sample drawn from N ( 0 p , Σ ( t ) ) , with ◮ X ( t ) 1 , . . . , X ( t ) covariance matrix S ( t ) . Multiple samples setup Go to scheme Ignoring the relationships between the tasks leads to T � L ( Θ ( t ) ; S ( t ) ) − λ pen ℓ 1 ( Θ ( t ) , Z ) . arg max Θ ( t ) ,t =1 ...,T t =1 Breaking the separability ◮ Either by modifying the objective function ◮ or the constraints. SIMoNe: inferring structured Gaussian networks 21
Coupling related problems Consider ◮ T samples concerning the expressions of the same p genes, n t is the t th sample drawn from N ( 0 p , Σ ( t ) ) , with ◮ X ( t ) 1 , . . . , X ( t ) covariance matrix S ( t ) . Remarks Multiple samples setup ◮ In the sequel, the Z is eluded for clarity (no loss of generality). Go to scheme Ignoring the relationships between the tasks leads to ◮ Multitask learning is easily adapted to time-course data yet only steady state version is presented here. T � L ( Θ ( t ) ; S ( t ) ) − λ pen ℓ 1 ( Θ ( t ) , Z ) . arg max Θ ( t ) ,t =1 ...,T t =1 Breaking the separability ◮ Either by modifying the objective function ◮ or the constraints. SIMoNe: inferring structured Gaussian networks 21
Coupling problems through the objective function The Intertwined L ASSO � T L ( Θ ( t ) ; ˜ ˜ S ( t ) ) − λ � Θ ( t ) � ℓ 1 max Θ ( t ) ,t...,T t =1 � T t =1 n t S ( t ) is an “across-task” covariance matrix. ◮ ¯ S = 1 n S ( t ) = α S ( t ) + (1 − α )¯ ◮ ˜ S is a mixture between inner/over-tasks covariance matrices. � setting α = 0 is equivalent to pooling all the data and infer one common network, � setting α = 1 is equivalent to treating T independent problems. SIMoNe: inferring structured Gaussian networks 22
Outline Statistical models Steady-state data Time-course data Multitask learning Group-L ASSO Coop-L ASSO Algorithms and methods Overall view Model selection Latent structure Numerical experiments Performance on simulated data R package demo: the breast cancer data set SIMoNe: inferring structured Gaussian networks 23
Coupling Through Penalties Group-L ASSO X 1 X 2 We group parameters by sets of corresponding edges across graphs: X 3 X 4 Graphical Group-L ASSO � T � 1 / 2 � Θ ( t ) ; S ( t ) � � � 2 T � � � θ ( t ) ˜ max L − λ . ij Θ ( t ) ,t...,T t =1 i,j ∈P t =1 i � = j ⌣ Most relationships between the genes are kept or removed across all tasks simultaneously. SIMoNe: inferring structured Gaussian networks 24
Coupling Through Penalties Group-L ASSO X 1 X 2 X 1 X 2 X 1 X 2 X 1 X 2 We group parameters by sets of corresponding edges across graphs: X 3 X 4 X 3 X 4 X 3 X 4 X 3 X 4 Graphical Group-L ASSO � T � 1 / 2 � Θ ( t ) ; S ( t ) � � � 2 T � � � θ ( t ) ˜ max L − λ . ij Θ ( t ) ,t...,T t =1 i,j ∈P t =1 i � = j ⌣ Most relationships between the genes are kept or removed across all tasks simultaneously. SIMoNe: inferring structured Gaussian networks 24
Coupling Through Penalties Group-L ASSO X 1 X 2 X 1 X 2 X 1 X 2 X 1 X 2 We group parameters by sets of corresponding edges across graphs: X 3 X 4 X 3 X 4 X 3 X 4 X 3 X 4 Graphical Group-L ASSO � T � 1 / 2 � Θ ( t ) ; S ( t ) � � � 2 T � � � θ ( t ) ˜ max L − λ . ij Θ ( t ) ,t...,T t =1 i,j ∈P t =1 i � = j ⌣ Most relationships between the genes are kept or removed across all tasks simultaneously. SIMoNe: inferring structured Gaussian networks 24
Coupling Through Penalties Group-L ASSO X 1 X 2 X 1 X 2 X 1 X 2 X 1 X 2 We group parameters by sets of corresponding edges across graphs: X 3 X 4 X 3 X 4 X 3 X 4 X 3 X 4 Graphical Group-L ASSO � T � 1 / 2 � Θ ( t ) ; S ( t ) � � � 2 T � � � θ ( t ) ˜ max L − λ . ij Θ ( t ) ,t...,T t =1 i,j ∈P t =1 i � = j ⌣ Most relationships between the genes are kept or removed across all tasks simultaneously. SIMoNe: inferring structured Gaussian networks 24
Coupling Through Penalties Group-L ASSO X 1 X 2 X 1 X 2 X 1 X 2 X 1 X 2 We group parameters by sets of corresponding edges across graphs: X 3 X 4 X 3 X 4 X 3 X 4 X 3 X 4 Graphical Group-L ASSO � T � 1 / 2 � Θ ( t ) ; S ( t ) � � � 2 T � � � θ ( t ) ˜ max L − λ . ij Θ ( t ) ,t...,T t =1 i,j ∈P t =1 i � = j ⌣ Most relationships between the genes are kept or removed across all tasks simultaneously. SIMoNe: inferring structured Gaussian networks 24
A Geometric View of Sparsity Constrained Optimization L ( β 1 , β 2 ) max β 1 ,β 2 L ( β 1 , β 2 ) − λ Ω( β 1 , β 2 ) β 2 β 1 SIMoNe: inferring structured Gaussian networks 25
A Geometric View of Sparsity Constrained Optimization L ( β 1 , β 2 ) max β 1 ,β 2 L ( β 1 , β 2 ) − λ Ω( β 1 , β 2 ) � max L ( β 1 , β 2 ) ⇔ β 1 ,β 2 s . t . Ω( β 1 , β 2 ) ≤ c β 2 β 1 SIMoNe: inferring structured Gaussian networks 25
A Geometric View of Sparsity Constrained Optimization max β 1 ,β 2 L ( β 1 , β 2 ) − λ Ω( β 1 , β 2 ) � max L ( β 1 , β 2 ) ⇔ β 1 ,β 2 Ω( β 1 , β 2 ) ≤ c s . t . β 2 β 1 SIMoNe: inferring structured Gaussian networks 25
Group-L ASSO balls β (2) β (2) = 0 = 0 . 3 2 2 Admissible set for ◮ 2 tasks ( T = 2 ) ◮ 2 coefficients ( p = 2 ) 1 1 � 2 � 1 / 2 = 0 2 � � 2 β (1) β (1) β ( t ) ≤ 1 2 2 −1 1 −1 1 β (2) j 1 j =1 t =1 −1 −1 β (1) β (1) 1 1 1 1 = 0 . 3 β (1) β (1) 2 2 −1 1 −1 1 β (2) 1 −1 −1 β (1) β (1) 1 1 SIMoNe: inferring structured Gaussian networks 26
Group-L ASSO balls β (2) β (2) = 0 = 0 . 3 2 2 Admissible set for ◮ 2 tasks ( T = 2 ) ◮ 2 coefficients ( p = 2 ) 1 1 � 2 � 1 / 2 = 0 2 � � 2 β (1) β (1) β ( t ) ≤ 1 2 2 −1 1 −1 1 β (2) j 1 j =1 t =1 −1 −1 β (1) β (1) 1 1 1 1 = 0 . 3 β (1) β (1) 2 2 −1 1 −1 1 β (2) 1 −1 −1 β (1) β (1) 1 1 SIMoNe: inferring structured Gaussian networks 26
Group-L ASSO balls β (2) β (2) = 0 = 0 . 3 2 2 Admissible set for ◮ 2 tasks ( T = 2 ) ◮ 2 coefficients ( p = 2 ) 1 1 � 2 � 1 / 2 = 0 2 � � 2 β (1) β (1) β ( t ) ≤ 1 2 2 −1 1 −1 1 β (2) j 1 j =1 t =1 −1 −1 β (1) β (1) 1 1 1 1 = 0 . 3 β (1) β (1) 2 2 −1 1 −1 1 β (2) 1 −1 −1 β (1) β (1) 1 1 SIMoNe: inferring structured Gaussian networks 26
Group-L ASSO balls β (2) β (2) = 0 = 0 . 3 2 2 Admissible set for ◮ 2 tasks ( T = 2 ) ◮ 2 coefficients ( p = 2 ) 1 1 � 2 � 1 / 2 = 0 2 � � 2 β (1) β (1) β ( t ) ≤ 1 2 2 −1 1 −1 1 β (2) j 1 j =1 t =1 −1 −1 β (1) β (1) 1 1 1 1 = 0 . 3 β (1) β (1) 2 2 −1 1 −1 1 β (2) 1 −1 −1 β (1) β (1) 1 1 SIMoNe: inferring structured Gaussian networks 26
Outline Statistical models Steady-state data Time-course data Multitask learning Group-L ASSO Coop-L ASSO Algorithms and methods Overall view Model selection Latent structure Numerical experiments Performance on simulated data R package demo: the breast cancer data set SIMoNe: inferring structured Gaussian networks 27
Coupling Through Penalties Coop-L ASSO X 1 X 2 Same grouping, and bet that correlations are likely to be sign consistent Gene interactions are either inhibitory or X 3 X 4 activating across assays Graphical Coop-L ASSO � S ( t ) ; Θ ( t ) � T � ˜ max L Θ ( t ) ,t...,T t =1 � 1 / 2 � T � 1 / 2 � T � � 2 � � 2 � � � θ ( t ) θ ( t ) − λ + , ij ij + − t =1 t =1 i,j ∈P i � = j where [ u ] + = max(0 , u ) and [ u ] − = min(0 , u ) . ⌣ Inside a group, interaction are most likeliky sign consistent ⌣ Plausible in many other situations SIMoNe: inferring structured Gaussian networks 28
Coupling Through Penalties Coop-L ASSO X 1 X 2 Same grouping, and bet that correlations are likely to be sign consistent Gene interactions are either inhibitory or X 3 X 4 activating across assays Graphical Coop-L ASSO � S ( t ) ; Θ ( t ) � T � ˜ max L Θ ( t ) ,t...,T t =1 � 1 / 2 � T � 1 / 2 � T � � 2 � � 2 � � � θ ( t ) θ ( t ) − λ + , ij ij + − t =1 t =1 i,j ∈P i � = j where [ u ] + = max(0 , u ) and [ u ] − = min(0 , u ) . ⌣ Inside a group, interaction are most likeliky sign consistent ⌣ Plausible in many other situations SIMoNe: inferring structured Gaussian networks 28
Coupling Through Penalties Coop-L ASSO X 1 X 2 X 1 X 2 X 1 X 2 Same grouping, and bet that correlations X 1 X 2 are likely to be sign consistent Gene interactions are either inhibitory or X 3 X 4 activating across assays X 3 X 4 X 3 X 4 X 3 X 4 Graphical Coop-L ASSO � S ( t ) ; Θ ( t ) � T � ˜ max L Θ ( t ) ,t...,T t =1 � 1 / 2 � T � 1 / 2 � T � � 2 � � 2 � � � θ ( t ) θ ( t ) − λ + , ij ij + − t =1 t =1 i,j ∈P i � = j where [ u ] + = max(0 , u ) and [ u ] − = min(0 , u ) . ⌣ Inside a group, interaction are most likeliky sign consistent ⌣ Plausible in many other situations SIMoNe: inferring structured Gaussian networks 28
Coupling Through Penalties Coop-L ASSO X 1 X 2 X 1 X 2 X 1 X 2 Same grouping, and bet that correlations X 1 X 2 are likely to be sign consistent Gene interactions are either inhibitory or X 3 X 4 activating across assays X 3 X 4 X 3 X 4 X 3 X 4 Graphical Coop-L ASSO � S ( t ) ; Θ ( t ) � T � ˜ max L Θ ( t ) ,t...,T t =1 � 1 / 2 � T � 1 / 2 � T � � 2 � � 2 � � � θ ( t ) θ ( t ) − λ + , ij ij + − t =1 t =1 i,j ∈P i � = j where [ u ] + = max(0 , u ) and [ u ] − = min(0 , u ) . ⌣ Inside a group, interaction are most likeliky sign consistent ⌣ Plausible in many other situations SIMoNe: inferring structured Gaussian networks 28
Coupling Through Penalties Coop-L ASSO X 1 X 2 X 1 X 2 X 1 X 2 Same grouping, and bet that correlations X 1 X 2 are likely to be sign consistent Gene interactions are either inhibitory or X 3 X 4 activating across assays X 3 X 4 X 3 X 4 X 3 X 4 Graphical Coop-L ASSO � S ( t ) ; Θ ( t ) � T � ˜ max L Θ ( t ) ,t...,T t =1 � 1 / 2 � T � 1 / 2 � T � � 2 � � 2 � � � θ ( t ) θ ( t ) − λ + , ij ij + − t =1 t =1 i,j ∈P i � = j where [ u ] + = max(0 , u ) and [ u ] − = min(0 , u ) . ⌣ Inside a group, interaction are most likeliky sign consistent ⌣ Plausible in many other situations SIMoNe: inferring structured Gaussian networks 28
Coupling Through Penalties Coop-L ASSO X 1 X 2 X 1 X 2 X 1 X 2 Same grouping, and bet that correlations X 1 X 2 are likely to be sign consistent Gene interactions are either inhibitory or X 3 X 4 activating across assays X 3 X 4 X 3 X 4 X 3 X 4 Graphical Coop-L ASSO � S ( t ) ; Θ ( t ) � T � ˜ max L Θ ( t ) ,t...,T t =1 � 1 / 2 � T � 1 / 2 � T � � 2 � � 2 � � � θ ( t ) θ ( t ) − λ + , ij ij + − t =1 t =1 i,j ∈P i � = j where [ u ] + = max(0 , u ) and [ u ] − = min(0 , u ) . ⌣ Inside a group, interaction are most likeliky sign consistent ⌣ Plausible in many other situations SIMoNe: inferring structured Gaussian networks 28
Coupling Through Penalties Coop-L ASSO X 1 X 2 X 1 X 2 X 1 X 2 Same grouping, and bet that correlations X 1 X 2 are likely to be sign consistent Gene interactions are either inhibitory or X 3 X 4 activating across assays X 3 X 4 X 3 X 4 X 3 X 4 Graphical Coop-L ASSO � S ( t ) ; Θ ( t ) � T � ˜ max L Θ ( t ) ,t...,T t =1 � 1 / 2 � T � 1 / 2 � T � � 2 � � 2 � � � θ ( t ) θ ( t ) − λ + , ij ij + − t =1 t =1 i,j ∈P i � = j where [ u ] + = max(0 , u ) and [ u ] − = min(0 , u ) . ⌣ Inside a group, interaction are most likeliky sign consistent ⌣ Plausible in many other situations SIMoNe: inferring structured Gaussian networks 28
Coop-L ASSO balls β (2) β (2) = 0 = 0 . 3 2 2 Admissible set for ◮ 2 tasks ( T = 2 ) ◮ 2 coefficients ( p = 2 ) 1 1 = 0 β (1) β (1) � 2 � 1 / 2 2 2 � � 2 −1 1 −1 1 2 � � β (2) β ( t ) 1 j −1 −1 + β (1) β (1) j =1 t =1 1 1 � 2 � 1 / 2 � � 2 2 � � − β ( t ) + ≤ 1 1 1 j + j =1 t =1 = 0 . 3 β (1) β (1) 2 2 −1 1 −1 1 β (2) 1 −1 −1 β (1) β (1) 1 1 SIMoNe: inferring structured Gaussian networks 29
Coop-L ASSO balls β (2) β (2) = 0 = 0 . 3 2 2 Admissible set for ◮ 2 tasks ( T = 2 ) ◮ 2 coefficients ( p = 2 ) 1 1 = 0 β (1) β (1) � 2 � 1 / 2 2 2 � � 2 −1 1 −1 1 2 � � β (2) β ( t ) 1 j −1 −1 + β (1) β (1) j =1 t =1 1 1 � 2 � 1 / 2 � � 2 2 � � − β ( t ) + ≤ 1 1 1 j + j =1 t =1 = 0 . 3 β (1) β (1) 2 2 −1 1 −1 1 β (2) 1 −1 −1 β (1) β (1) 1 1 SIMoNe: inferring structured Gaussian networks 29
Coop-L ASSO balls β (2) β (2) = 0 = 0 . 3 2 2 Admissible set for ◮ 2 tasks ( T = 2 ) ◮ 2 coefficients ( p = 2 ) 1 1 = 0 β (1) β (1) � 2 � 1 / 2 2 2 � � 2 −1 1 −1 1 2 � � β (2) β ( t ) 1 j −1 −1 + β (1) β (1) j =1 t =1 1 1 � 2 � 1 / 2 � � 2 2 � � − β ( t ) + ≤ 1 1 1 j + j =1 t =1 = 0 . 3 β (1) β (1) 2 2 −1 1 −1 1 β (2) 1 −1 −1 β (1) β (1) 1 1 SIMoNe: inferring structured Gaussian networks 29
Coop-L ASSO balls β (2) β (2) = 0 = 0 . 3 2 2 Admissible set for ◮ 2 tasks ( T = 2 ) ◮ 2 coefficients ( p = 2 ) 1 1 = 0 β (1) β (1) � 2 � 1 / 2 2 2 � � 2 −1 1 −1 1 2 � � β (2) β ( t ) 1 j −1 −1 + β (1) β (1) j =1 t =1 1 1 � 2 � 1 / 2 � � 2 2 � � − β ( t ) + ≤ 1 1 1 j + j =1 t =1 = 0 . 3 β (1) β (1) 2 2 −1 1 −1 1 β (2) 1 −1 −1 β (1) β (1) 1 1 SIMoNe: inferring structured Gaussian networks 29
Coop-L ASSO balls β (2) β (2) = 0 = 0 . 3 2 2 Admissible set for ◮ 2 tasks ( T = 2 ) ◮ 2 coefficients ( p = 2 ) 1 1 = 0 β (1) β (1) � 2 � 1 / 2 2 2 � � 2 −1 1 −1 1 2 � � β (2) β ( t ) 1 j −1 −1 + β (1) β (1) j =1 t =1 1 1 � 2 � 1 / 2 � � 2 2 � � − β ( t ) + ≤ 1 1 1 j + j =1 t =1 = 0 . 3 β (1) β (1) 2 2 −1 1 −1 1 β (2) 1 −1 −1 β (1) β (1) 1 1 SIMoNe: inferring structured Gaussian networks 29
Coop-L ASSO balls β (2) β (2) = 0 = 0 . 3 2 2 Admissible set for ◮ 2 tasks ( T = 2 ) ◮ 2 coefficients ( p = 2 ) 1 1 = 0 β (1) β (1) � 2 � 1 / 2 2 2 � � 2 −1 1 −1 1 2 � � β (2) β ( t ) 1 j −1 −1 + β (1) β (1) j =1 t =1 1 1 � 2 � 1 / 2 � � 2 2 � � − β ( t ) + ≤ 1 1 1 j + j =1 t =1 = 0 . 3 β (1) β (1) 2 2 −1 1 −1 1 β (2) 1 −1 −1 β (1) β (1) 1 1 SIMoNe: inferring structured Gaussian networks 29
Outline Statistical models Steady-state data Time-course data Multitask learning Group-L ASSO Coop-L ASSO Algorithms and methods Overall view Model selection Latent structure Numerical experiments Performance on simulated data R package demo: the breast cancer data set SIMoNe: inferring structured Gaussian networks 30
The overall strategy Our basic criteria is of the form L ( Θ ; data) − λ � P Z ⋆ Θ � ℓ 1 . What we are looking for ◮ the edges, through Θ , ◮ the correct level of sparsity λ , ◮ the underlying clustering Z with connectivity matrix π Z . What SIMoNe does 1. Infer a family of networks G = { Θ λ : λ ∈ [ λ max , 0] } 2. Select G ⋆ that maximizes an information criteria 3. Learn Z on the selected network G ⋆ 4. Infer a family of networks with P Z ∝ 1 − π Z 5. Select G ⋆ Z that maximizes an information criteria SIMoNe: inferring structured Gaussian networks 31
The overall strategy Our basic criteria is of the form L ( Θ ; data) − λ � P Z ⋆ Θ � ℓ 1 . What we are looking for ◮ the edges, through Θ , ◮ the correct level of sparsity λ , ◮ the underlying clustering Z with connectivity matrix π Z . What SIMoNe does 1. Infer a family of networks G = { Θ λ : λ ∈ [ λ max , 0] } 2. Select G ⋆ that maximizes an information criteria 3. Learn Z on the selected network G ⋆ 4. Infer a family of networks with P Z ∝ 1 − π Z 5. Select G ⋆ Z that maximizes an information criteria SIMoNe: inferring structured Gaussian networks 31
SIMoNe Suppose you want to recover a clustered network: SIMoNE Target Adjacency Matrix Target Network SIMoNe: inferring structured Gaussian networks 32
SIMoNe Start with microarray data SIMoNE Data SIMoNe: inferring structured Gaussian networks 32
SIMoNe SIMoNE SIMoNE without prior Adjacency Matrix Data corresponding to G ⋆ SIMoNe: inferring structured Gaussian networks 32
SIMoNe SIMoNE SIMoNE without prior Adjacency Matrix Penalty matrix PZ Data corresponding to G ⋆ Decreasing transformation Mixer π Z Connectivity matrix SIMoNe: inferring structured Gaussian networks 32
SIMoNe SIMoNE SIMoNE SIMoNE without prior + Adjacency Matrix Adjacency Matrix Penalty matrix PZ Data corresponding to G ⋆ corresponding to G ⋆ Z Decreasing transformation Mixer π Z Connectivity matrix SIMoNe: inferring structured Gaussian networks 32
Outline Statistical models Steady-state data Time-course data Multitask learning Group-L ASSO Coop-L ASSO Algorithms and methods Overall view Model selection Latent structure Numerical experiments Performance on simulated data R package demo: the breast cancer data set SIMoNe: inferring structured Gaussian networks 33
Tuning the penalty parameter What does the literature say? Theory based penalty choices 1. Optimal order of penalty in the p ≫ n framework: √ n log p Bunea et al. 2007, Bickel et al. 2009 2. Control on the probability of connecting two distinct connectivity sets Meinshausen et al. 2006, Banerjee et al. 2008, Ambroise et al. 2009 � practically much too conservative Cross-validation ◮ Optimal in terms of prediction, not in terms of selection ◮ Problematic with small samples: changes the sparsity constraint due to sample size SIMoNe: inferring structured Gaussian networks 34
Tuning the penalty parameter BIC / AIC Theorem (Zou et al. 2008) � � � � df(ˆ β lasso � ˆ β lasso ) = � λ λ 0 Straightforward extensions to the graphical framework Θ λ )log n BIC( λ ) = L ( ˆ Θ λ ; X ) − df( ˆ 2 AIC( λ ) = L ( ˆ Θ λ ; X ) − df( ˆ Θ λ ) ◮ Rely on asymptotic approximations, but still relevant for small data set ◮ Easily adapted to L iid , ˜ L iid , L time and multitask framework. SIMoNe: inferring structured Gaussian networks 35
Outline Statistical models Steady-state data Time-course data Multitask learning Group-L ASSO Coop-L ASSO Algorithms and methods Overall view Model selection Latent structure Numerical experiments Performance on simulated data R package demo: the breast cancer data set SIMoNe: inferring structured Gaussian networks 36
MixNet Erd¨ os-R´ enyi Mix ture for Net works The data is now the network itself A = ( a ij = 1 { θ ij � =0 } . ) i,j ∈P adjacency matrix associated to Θ : π •• π •• 6 6 5 5 7 7 n = 10 , Z 5 • = 1 3 π •• a 12 = 1 , a 15 = 0 4 4 8 8 1 2 10 π •• π •• 9 Binary case ◮ Q groups (=colors ••• ). ◮ { Z i } 1 ≤ i ≤ n i.i.d. vectors Z i = ( Z i 1 , . . . , Z iQ ) ∼ M (1 , α ) ◮ Conditional on the { Z i } ’s, the random variables A ij are independent B ( π Z i Z j ) . SIMoNe: inferring structured Gaussian networks 37
MixNet Erd¨ os-R´ enyi Mix ture for Net works The data is now the network itself A = ( a ij = 1 { θ ij � =0 } . ) i,j ∈P adjacency matrix associated to Θ : π •• π •• 6 6 5 5 7 7 n = 10 , Z 5 • = 1 3 π •• a 12 = 1 , a 15 = 0 4 4 8 8 1 2 10 π •• π •• 9 Binary case ◮ Q groups (=colors ••• ). ◮ { Z i } 1 ≤ i ≤ n i.i.d. vectors Z i = ( Z i 1 , . . . , Z iQ ) ∼ M (1 , α ) ◮ Conditional on the { Z i } ’s, the random variables A ij are independent B ( π Z i Z j ) . SIMoNe: inferring structured Gaussian networks 37
MixNet Erd¨ os-R´ enyi Mix ture for Net works The data is now the network itself A = ( a ij = 1 { θ ij � =0 } . ) i,j ∈P adjacency matrix associated to Θ : π •• π •• 6 6 5 5 7 7 n = 10 , Z 5 • = 1 3 π •• a 12 = 1 , a 15 = 0 4 4 8 8 1 2 10 π •• π •• 9 Binary case ◮ Q groups (=colors ••• ). ◮ { Z i } 1 ≤ i ≤ n i.i.d. vectors Z i = ( Z i 1 , . . . , Z iQ ) ∼ M (1 , α ) ◮ Conditional on the { Z i } ’s, the random variables A ij are independent B ( π Z i Z j ) . SIMoNe: inferring structured Gaussian networks 37
MixNet Erd¨ os-R´ enyi Mix ture for Net works The data is now the network itself A = ( a ij = 1 { θ ij � =0 } . ) i,j ∈P adjacency matrix associated to Θ : π •• π •• 6 6 5 5 7 7 n = 10 , Z 5 • = 1 3 π •• a 12 = 1 , a 15 = 0 4 4 8 8 1 2 10 π •• π •• 9 Binary case ◮ Q groups (=colors ••• ). ◮ { Z i } 1 ≤ i ≤ n i.i.d. vectors Z i = ( Z i 1 , . . . , Z iQ ) ∼ M (1 , α ) ◮ Conditional on the { Z i } ’s, the random variables A ij are independent B ( π Z i Z j ) . SIMoNe: inferring structured Gaussian networks 37
Estimation strategy Likelihoods ◮ the observed data: P ( A | α , π ) = � Z P ( A , Z | α , π ) . ◮ the complete data: P ( A , Z | α , π ) . The EM criteria � log P ( A , Z | α , π ) | A ′ � E . � requires P ( Z | A , α , π ) which is not tractable! SIMoNe: inferring structured Gaussian networks 38
Variational inference Principle Approximate P ( Z | A , α , π ) by R τ ( Z ) chosen to minimize KL( R τ ( Z ); P ( Z | A , α , π )) , where R τ is such as log R τ ( Z ) = � iq Z iq log τ iq and τ are the variational parameters to optimize. Variational Bayes (Latouche et al. ) ◮ Appropriate priors on α and π , ◮ Good performances especially for the choice of Q and is thus relevant in the SIMoNe context. SIMoNe: inferring structured Gaussian networks 39
Outline Statistical models Steady-state data Time-course data Multitask learning Group-L ASSO Coop-L ASSO Algorithms and methods Overall view Model selection Latent structure Numerical experiments Performance on simulated data R package demo: the breast cancer data set SIMoNe: inferring structured Gaussian networks 40
Example 1: time-course data with star-pattern Simulation settings 1. 50 networks with p = 100 edges, time series of length n = 100 , 2. two classes, hubs and leaves, with proportions α = (0 . 1 , 0 . 9) , 3. P ( hub to leaf ) = 0 . 3 , P ( hub to hub ) = 0 . 1 , 0 otherwise. SIMoNe: inferring structured Gaussian networks 41
Example 1: time-course data with star-pattern Simulation settings 1. 50 networks with p = 100 edges, time series of length n = 100 , 2. two classes, hubs and leaves, with proportions α = (0 . 1 , 0 . 9) , 3. P ( hub to leaf ) = 0 . 3 , P ( hub to hub ) = 0 . 1 , 0 otherwise. Boxplot of Precision values, without and with structure inference 0.8 0.6 precision = TP/(TP+FP) 0.4 0.2 precision wocl.BIC precision wocl.AIC SIMoNe: inferring structured Gaussian networks 41
Example 1: time-course data with star-pattern Simulation settings 1. 50 networks with p = 100 edges, time series of length n = 100 , 2. two classes, hubs and leaves, with proportions α = (0 . 1 , 0 . 9) , 3. P ( hub to leaf ) = 0 . 3 , P ( hub to hub ) = 0 . 1 , 0 otherwise. Boxplot of Recall values, without and with structure inference 1.0 0.8 0.6 recall = TP/P (power) 0.4 0.2 recall wocl.BIC recall wcl.BIC recall wocl.AIC recall wcl.AIC SIMoNe: inferring structured Gaussian networks 41
Example 1: time-course data with star-pattern Simulation settings 1. 50 networks with p = 100 edges, time series of length n = 100 , 2. two classes, hubs and leaves, with proportions α = (0 . 1 , 0 . 9) , 3. P ( hub to leaf ) = 0 . 3 , P ( hub to hub ) = 0 . 1 , 0 otherwise. Boxplot of Fallout values, without and with structure inference 0.04 ● ● ● ● ● ● ● ● 0.03 ● ● ● ● fallout = FP/N (type I error) 0.02 0.01 0.00 fallout wocl.BIC fallout wcl.BIC fallout wocl.AIC fallout wcl.AIC SIMoNe: inferring structured Gaussian networks 41
Example 2: steady-state, multitask framework Simulating the tasks 1. generate a “ancestor” with p = 20 node and K = 20 edges, 2. generate T = 4 children by adding and deleting δ edges, 3. generate T = 4 Gaussian samples. Figure: ancestor and children with δ perturbations SIMoNe: inferring structured Gaussian networks 42
Example 2: steady-state, multitask framework Simulating the tasks 1. generate a “ancestor” with p = 20 node and K = 20 edges, 2. generate T = 4 children by adding and deleting δ edges, 3. generate T = 4 Gaussian samples. Figure: ancestor and children with δ perturbations SIMoNe: inferring structured Gaussian networks 42
Example 2: steady-state, multitask framework Simulating the tasks 1. generate a “ancestor” with p = 20 node and K = 20 edges, 2. generate T = 4 children by adding and deleting δ edges, 3. generate T = 4 Gaussian samples. Figure: ancestor and children with δ perturbations SIMoNe: inferring structured Gaussian networks 42
Example 2: steady-state, multitask framework Simulating the tasks 1. generate a “ancestor” with p = 20 node and K = 20 edges, 2. generate T = 4 children by adding and deleting δ edges, 3. generate T = 4 Gaussian samples. Figure: ancestor and children with δ perturbations SIMoNe: inferring structured Gaussian networks 42
Results Precision/Recall curve precision = TP/(TP+FP) recall = TP/P (power) SIMoNe: inferring structured Gaussian networks 43
Results Large sample size λ max − → 0 1.0 ● ● ● ● ● ● 0.8 ● precision ● 0.6 ● 0.4 ● CoopLasso GroupLasso ● Intertwined 0.2 ● ● Independent ● ● ● ● Pooled ● 0.0 0.2 0.4 0.6 0.8 1.0 recall Figure: n t = 100 δ = 1 SIMoNe: inferring structured Gaussian networks 43
Results Large sample size λ max − → 0 1.0 ● ● ● ● ● ● 0.8 ● precision ● 0.6 ● 0.4 ● CoopLasso GroupLasso ● Intertwined 0.2 ● ● Independent ● ● ● ● Pooled ● 0.0 0.2 0.4 0.6 0.8 1.0 recall Figure: n t = 100 δ = 5 SIMoNe: inferring structured Gaussian networks 43
Results Medium sample size λ max − → 0 1.0 ● ● ● ● ● 0.8 precision ● 0.6 ● ● 0.4 ● CoopLasso ● GroupLasso ● 0.2 Intertwined ● ● Independent ●● ● ● Pooled ● ● 0.0 0.2 0.4 0.6 0.8 1.0 recall Figure: n t = 50 δ = 1 SIMoNe: inferring structured Gaussian networks 43
Results Medium sample size λ max − → 0 1.0 ● ● ● ● ● 0.8 precision ● 0.6 ● ● 0.4 ● CoopLasso ● GroupLasso ● 0.2 Intertwined ● ● Independent ●● ● ● Pooled ● ● 0.0 0.2 0.4 0.6 0.8 1.0 recall Figure: n t = 50 δ = 5 SIMoNe: inferring structured Gaussian networks 43
Results Small sample size λ max − → 0 1.0 ● ● 0.8 ● precision ● 0.6 ● ● 0.4 ● ● CoopLasso ● GroupLasso ● 0.2 Intertwined ● ● ● ● ● ● ● ● ● ● Independent Pooled 0.0 0.2 0.4 0.6 0.8 1.0 recall Figure: n t = 25 δ = 1 SIMoNe: inferring structured Gaussian networks 43
Results Small sample size λ max − → 0 1.0 ● ● 0.8 ● precision ● 0.6 ● ● 0.4 ● ● CoopLasso ● GroupLasso ● 0.2 Intertwined ● ● ● ● ● ● ● ● ● ● Independent Pooled 0.0 0.2 0.4 0.6 0.8 1.0 recall Figure: n t = 25 δ = 5 SIMoNe: inferring structured Gaussian networks 43
Outline Statistical models Steady-state data Time-course data Multitask learning Group-L ASSO Coop-L ASSO Algorithms and methods Overall view Model selection Latent structure Numerical experiments Performance on simulated data R package demo: the breast cancer data set SIMoNe: inferring structured Gaussian networks 44
Breast cancer Prediction of the outcome of preoperative chemotherapy Two types of patients Patient response can be classified as 1. either a pathologic complete response (PCR), 2. or residual disease (not PCR). Gene expression data ◮ 133 patients (99 not PCR, 34 PCR) ◮ 26 identified genes (differential analysis) SIMoNe: inferring structured Gaussian networks 45
Recommend
More recommend