Kernel based methods for microarray and mass spectrometry data analysis Fabian Ojeda ESAT-SCD-SISTA Division Department of Electrical Engineering Katholieke Universiteit Leuven Leuven, Belgium May 20, 2011 Prof. dr. ir. B. De Moor, promotor Prof. dr. ir. J.A.K. Suykens, co-promotor Prof. dr. ir. P. Sas, chairman Prof. dr. ir. Y. Moreau Prof. dr. J. Rozenski Prof. dr. ir. M. Van Barel Prof. dr. ir. G. Bontempi, ULB
Outline 1 Background 2 Low rank updated LS-SVM 3 Sparse linear models 4 Entropy based spectral clustering 5 Conclusions F. Ojeda (KUL-ESAT-SISTA) PhD dissertation May 20, 2011 1 / 32
Outline 1 Background 2 Low rank updated LS-SVM 3 Sparse linear models 4 Entropy based spectral clustering 5 Conclusions F. Ojeda (KUL-ESAT-SISTA) PhD dissertation May 20, 2011 1 / 32
Motivation and problem description Goal Application of regularization/kernel based methods and adaptation to the areas of high dimensional and low sample/large scale data. Methods Prediction models, model selection, variable selection, clustering. Application Efficient variable selection algorithms for microarray data. Incorporate structural information of MSI data. Clustering methods for large scale gene clustering. F. Ojeda (KUL-ESAT-SISTA) PhD dissertation May 20, 2011 2 / 32
Motivation and problem description Microarray / mass spectrometry data - Simultaneous measure of thousands of genes / proteins. - Structural and prior information. - Large number of variables, low sample size. - Irrelevant variables. - Lack of labeled data. F. Ojeda (KUL-ESAT-SISTA) PhD dissertation May 20, 2011 3 / 32
Regularized learning models Microarray/MSI data representation D = { ( x i , y i ) } n i =1 , x i ∈ R d , y i ∈ R , n samples measured over d variables. d � w k x k ε i ∼ N (0 , σ 2 ) , y i = i + ε i (1) k =1 x k i : k -th component of x i . w d ) ⊤ ∈ R d Solve for ˆ w = ( ˆ w 1 , . . . , ˆ w � y − Xw � 2 ˆ w = arg min 2 + λP ( w ) . (2) λ > 0 , regularization parameter ( λ = 0 , OLS) Ridge regression P ( w ) = � w � 2 2 , LASSO P ( w ) = � w � 1 P ( · ) encodes a priori assumptions to make problem well-posed. F. Ojeda (KUL-ESAT-SISTA) PhD dissertation May 20, 2011 4 / 32
bc bc bc bc bc bc bc bc bc bc bc bc bc bc bc bc Kernel methods No assumptions about data structure. Allow introduction of prior knowledge. Input space X mapped to high dimensional space F . Non-linear general versions of linear algorithms. Kernel trick Mapping: x → ϕ ( x ) . Kernel: K ( x , z ) = ϕ ( x ) ⊤ ϕ ( z ) . +1 ϕ ( · ) − 1 X F Linear: K ( x , z ) = x ⊤ z . Polynomial: K ( x , z ) = ( x ⊤ z + τ ) p , p ∈ N , τ ≥ 0 . Gaussian: K ( x , z ) = exp( −|| x − z || 2 2 /σ 2 ) , σ ∈ R kernel width. F. Ojeda (KUL-ESAT-SISTA) PhD dissertation May 20, 2011 5 / 32
Least Squares Support Vector Machines (LS-SVM) Optimization problem Model: f ( x ) = w ⊤ ϕ ( x i ) + b n 1 2 w ⊤ w + γ 1 � e 2 min i 2 w ,b,e i =1 s . t .y i = w ⊤ ϕ ( x i ) + b + e i , i = 1 , . . . , n , Estimate parameters w ∈ R d h and feature map ϕ ( · ) : R d → R d h . Linear equations: Dual Solve in α ∈ R n , via kernel trick � Ω + γ − 1 I n � y � � α � � 1 = . 1 ⊤ 0 b 0 Model: f ( x ) = sign( � n i =1 α i K ( x , x i ) + b ) . F. Ojeda (KUL-ESAT-SISTA) PhD dissertation May 20, 2011 6 / 32
Outline 1 Background 2 Low rank updated LS-SVM 3 Sparse linear models 4 Entropy based spectral clustering 5 Conclusions F. Ojeda (KUL-ESAT-SISTA) PhD dissertation May 20, 2011 6 / 32
Variable selection problem Definition Given D = { ( x i , y i ) } n i =1 , x i ∈ R d , let S = x 1 , . . . , x k , . . . , x d � � . Find S ∗ ⊂ S , S ∗ ∈ R m , m < d , minimizing J S ∗ ≤ J S , e.g. LOO error. Elements J S ∗ → easy/cheap to evaluate. Exploit any (if possible) structure of the predictor. Reduce computational complexity. F. Ojeda (KUL-ESAT-SISTA) PhD dissertation May 20, 2011 7 / 32
Rank-one updates Linear kernels can be written in outer product form ⊤ x 1 d � x 1 , . . . , x d � . x k x k ⊤ � . Ω = = . x d k =1 d x k x k ⊤ + γ − 1 I n . H = Ω + γ − 1 I n = � k =1 At the level of variable x k k − 1 x j x j ⊤ + γ − 1 I n + x k x k ⊤ � H k = j =1 H k = H k − 1 + x k x k ⊤ . (3) Key point compute H − 1 from H − 1 k − 1 and obtain α ∗ , b ∗ . k F. Ojeda (KUL-ESAT-SISTA) PhD dissertation May 20, 2011 8 / 32
Low rank updates With Cholesky factorization LL ⊤ = Ω + γ − 1 I n , then adding new variable x k results in a rank-1 modification to L L ⊤ = LL ⊤ + x k x k ⊤ . L ˜ ˜ (4) The modified Cholesky factor is L ⊤ = LL ⊤ + uu ⊤ L ˜ ˜ (5) = L ( I + qq ⊤ ) L ⊤ L ⊤ L ⊤ , = L ¯ L ¯ ˜ L can be directly computed from L . Updated model parameters become: b = 1 ⊤ ˜ ν ) − 1 , ˜ χ − ˜ χ ( 1 ⊤ ˜ α = ˜ ˜ b ˜ ν . (6) L ⊤ ˜ where ˜ L ˜ χ = y and ˜ L ˜ L ⊤ ˜ ν = 1 . F. Ojeda (KUL-ESAT-SISTA) PhD dissertation May 20, 2011 9 / 32
Experiments Data Gene expression data. Leukemia. n = 72 , d = 7129 . Colon cancer n = 60 , d = 2000 . Algorithms SVM-RFE with and without retraining Naive LS-SVM with forward selection. LS-SVM with fast LOO and rank-one modifications. Validation Computational complexity. 10-fold cross-validation. F. Ojeda (KUL-ESAT-SISTA) PhD dissertation May 20, 2011 10 / 32
Computational time. Colon data set. Goal : Select 200 and 500 genes. Forward algorithms Backward algorithms 4 4 10 10 3 3 10 10 CPU time (seconds) CPU time (seconds) 2 2 10 10 1 10 1 10 0 10 0 10 −1 10 −1 10 0 20 40 60 80 100 120 140 160 180 0 50 100 150 200 250 300 350 400 450 Number of variables Number of variables Baseline LS-SVM ( ◦ ), LOO bound ( � ), SVM-RFE1 ( ◦ ), SVM-RFE2 ( � ), Low Low rank updated LS-SVM ( ∗ ) rank downdated LS-SVM( ∗ ). Improvement by two orders. F. Ojeda (KUL-ESAT-SISTA) PhD dissertation May 20, 2011 11 / 32
Prediction performance. Test error. Colon data set Leukemia data set 0.4 0.35 0.25 0.3 0.2 0.25 Test error Test error 0.15 0.2 0.15 0.1 0.1 0.05 0.05 0 0 0 100 200 300 400 500 600 700 800 900 1000 0 10 20 30 40 50 60 70 80 90 100 Number of removed genes Number of variables SVM-RFE1 ( • ) and SVM-RFE2 ( � ), LOO-Bound (Dashed), Low rank updated Low rank downdated LS-SVM ( ∗ ). LS-SVM (Solid). F. Ojeda (KUL-ESAT-SISTA) PhD dissertation May 20, 2011 12 / 32
Extension to polynomial kernels Explicit feature map – yield low rank matrices. Explicit feature map ϕ p ( · ) for polynomial kernel of degree p � ⊤ �� p �� � � p � z p − 1 , z p ϕ p ( z ) = 1 , z, . . . , , (7) 1 p − 1 with ϕ p ( · ) : R → R p +1 . Hence, Gram matrix becomes p d ( ϕ l ◦ x k )( ϕ l ◦ x k ) ⊤ , Ω d � Ω k with Ω k � p = p = (8) p k =1 l =0 Matrix notation ⊤ , Ω k p = Φ k p Φ k (9) p Φ k ( ϕ 0 ◦ x k ) , . . . , ( ϕ p ◦ x k ) � � p = is a n × ( p + 1) matrix. F. Ojeda (KUL-ESAT-SISTA) PhD dissertation May 20, 2011 13 / 32
Polynomial updates For a Gram matrix the following holds � � � ⊤ � � � Φ k Φ k p Φ k Ω k rank = rank = rank , (10) p p p Ω k � � ≤ p + 1 . that is rank p For all inputs, k = 1 , . . . , d , Ω d p is a sum of d rank- ( p + 1) matrices � d d � � � � � � � Ω d Ω k Ω k rank = rank ≤ rank . (11) p p p k =1 k =1 Note: For linear kernel Ω d = Ω , outer product definition. rank - ( p + 1) updates ⊤ . L ⊤ = LL ⊤ + Φ k L ˜ ˜ p Φ k (12) p Apply ( p + 1) rank-1 updates sequentially over columns of Φ k p . F. Ojeda (KUL-ESAT-SISTA) PhD dissertation May 20, 2011 14 / 32
Experimental results Synthetic data set with n = 100 and d = 500 : i − 0 . 5) 2 + 10 x 3 y i = 10 sinc ( x 1 i ) + 20( x 2 i + 5 x 4 i + ǫ i , ǫ i ∼ N (0 , 1) . 2 2 10 10 1 2 3 4 5 1 10 Average LOO PRESS 1 10 Time (secs) 0 10 −1 10 0 10 −2 10 −1 10 −3 10 1 2 3 4 5 6 7 8 9 10 Degree of the polynomial kernel 0 5 10 15 20 25 30 Number of ranked variables Time required to computed 50 updates Linear and quadratic model do not retrieve true variables. F. Ojeda (KUL-ESAT-SISTA) PhD dissertation May 20, 2011 15 / 32
Recommend
More recommend