Stochastic Iterative Hard Thresholding for Graph-Structured Sparsity - PowerPoint PPT Presentation

Stochastic Iterative Hard Thresholding for Graph-Structured Sparsity Optimization Baojian Zhou 1 , Feng Chen 1 , and Yiming Ying 2 1 Department of Computer Science, 2 Department of Mathematics and Statistics, University at Albany, NY, USA 06/13/2019 Poster # 92 1 / 7

Graph structure information Current limitations: Our goals propose/provide: • only focus on specific loss as a prior often have: • an algo. for general loss • better classification, • expensive full-gradient under stochastic setting regression performance calculation • convergence analysis • stronger interpretation • cannot handle complex • real-world applications structure Structured sparse learning Given M ( M ) = { w : supp( w ) ∈ M } , the structured sparse learning problems can be formulated as n w ∈M ( M ) F ( w ) := 1 � min f i ( w ) , where n w 6 w 6 i =1 w 5 w 4 w 4 w 5 w 1 w F ( w ) is a convex loss such as least square, logistic loss, . . . w 3 w 2 w 3 w 2 M ( M ) models structured sparsity such as connected subgraphs, w 1 G dense subgraphs, and subgraphs isomophic to a query graph, . . . 1 2 / 7

Inspired by two recent works Hegde et al. (2016); Nguyen et al. (2017) Algorithm 1 GraphStoIHT w 6 w 6 w 6 w 6 1: Input : η t , F ( · ) , M H , M T w 5 2: Initialize : w 0 and t = 0 w 4 w 4 w 5 w 1 w 4 w 5 w 1 w 4 w 5 w 1 w 3 3: for t = 0 , 1 , 2 , . . . do w 2 w 3 w 2 w 3 w 2 w 3 w 2 Choose ξ t from [ n ] with prob. p ξ t 4: b t = P ( ∇ f ξ t ( w t ) , M H ) w 1 5: � w t +1 = P ( w t − η t b t , M T ) 6: Weighted Graph Model 7: end for M = { S : | S | ≤ 3 , S is connected } (Hegde et al., 2015a) 8: Return w t +1 Orthogonal Projection Operator P ( · , M ) : Two differences from StoIHT : R p → R p defined as • project the gradient ∇ f ξ t ( · ) • projects the proxy onto M ( M T ). � w − w ′ � 2 P ( w , M ) = arg min Why projection b t = P ( ∇ f ξ t ( w t ) , M H ) ? w ′ ∈M ( M ) • Both of them solve the same projection problem • s -sparse set • Intuitively, sparsity is both in primal and dual space • Weighted Graph Model • Remove some noisy directions at the first stage 2 3 / 7

Two assumptions in M ( M ): 2 � w − w ′ � 2 f i ( w ): β -Restricted Strong Smoothness 1 β F ( w ): α -Restricted Strong Convexity ) ′ ( w , w Efficient Approximated projections: 2 f B α • P ( · , M H ) with approximation factor c H 2 � w − w ′ � 2 • P ( · , M T ) with approximation factor c T B f ( w , w ′ ) = f ( w ) − f ( w ′ ) − �∇ f ( w ′ ) , w − w ′ � Theorem 1 ( Linear Convergence) Let w 0 be the start point and choose η t = η , then w t +1 of Algorithm 1 satisfies σ E ξ [ t ] � w t +1 − w ∗ � ≤ κ t +1 � w 0 − w ∗ � + 1 − κ, where � �� αβη 2 − 2 αη + 1 + αβτ 2 − 2 ατ + 1 , β 0 = (1 + c H ) τ � 1 − α 2 κ = (1 + c T ) , α 0 = c H ατ − 0 � � β 0 α 0 β 0 E ξ t �∇ I f ξ t ( w ∗ ) � + η E ξ t �∇ I f ξ t ( w ∗ ) � , and η, τ ∈ (0 , 2 /β ) . σ = + α 0 � 1 − α 2 0 3 4 / 7

Graph Linear Regression Contraction factor w ∗ : y = Xw ∗ + ǫ X ∈ R m × p , ǫ ∼ N ( 0 , I m ) Algorithm κ Consider the least square loss � √ δ + 2 √ 1 − δ � √ GraphIHT (1 + c T ) δ 1+ δ + 2 √ � √ �� 2(1 − δ ) 2 (1 + c T ) δ GraphStoIHT n 1+ δ F ( w ) := 1 n � 2 m � X B i w − y B i � 2 . arg min • For GraphIHT , δ ≤ 0 . 0527 n supp( w ) ∈M ( M ) i =1 • For GraphStoIHT , δ ≤ 0 . 0142 Graph Logistic Regression If x i is normalized, then F ( w ) satisfies λ -RSC and each f i ( w ) satisfies ( α + (1 + w ∗ : (1 + e − y i ·� w ∗ , x i � ) − 1 x i ∈ R p , y i ∈ { +1 , − 1 } ν ) θ max )-RSS. The condition of κ < 1 is λ + n (1 + ν ) θ max / 4 m ≥ 243 λ Consider the logistic loss 250 , m / n n F ( w ) := 1 n h ( w , i j )+ λ � � 2 � w � 2 , arg min with prob. 1 − p exp ( − θ max ν/ 4) , where n m supp( w ) ∈M ( M ) i =1 j =1 θ max = λ max ( � m / n j =1 E [ x i j x T i j ]) and ν ≥ 1. where h ( w , i j ) = log(1 + exp ( − y i j · � x i j , w � )). 4 5 / 7

BackGround Angio Text Simulation Dataset each entry √ m X ij ∼ N (0 , 1) NIHT IHT supp( w ∗ ) is generated by random walk StoIHT Entries of w ∗ from N (0 , 1) CoSaMP GraphIHT Weighted Graph Model (Hegde et al., 2015b) GraphCoSaMP GraphStoIHT η = 0 . 1 GraphStoIHT GraphStoIHT η = 0 . 2 1 . 0 10 0 10 0 Probability of Recovery b = 1 η = 0 . 3 b = 2 η = 0 . 4 0 . 8 10 − 2 10 − 2 b = 4 η = 0 . 5 b = 8 η = 0 . 6 b = 16 η = 0 . 7 10 − 4 10 − 4 0 . 6 x � � x − ˆ b = 24 η = 0 . 8 b = 32 η = 0 . 9 10 − 6 10 − 6 0 . 4 b = 40 η = 1 . 0 b = 48 η = 1 . 1 b = 56 η = 1 . 2 10 − 8 10 − 8 0 . 2 b = 64 η = 1 . 3 b = 180 η = 1 . 4 η = 1 . 5 0 . 0 0 5 10 15 20 25 0 300 600 900 η = 1 . 6 Epoch Iteration 1 . 5 2 . 0 2 . 5 3 . 0 3 . 5 1 . 5 2 . 0 2 . 5 3 . 0 3 . 5 1 . 5 2 . 0 2 . 5 3 . 0 3 . 5 Oversampling ratio m/s Oversampling ratio m/s Oversampling ratio m/s Breast Cancer Dataset 295 samples with 78 positives (metastatic) � w t � 0 Algorithm Cancer related genes AUC and 217 negatives (non-metastatic) provided GraphStoIHT BRCA2, CCND2, CDKN1A, ATM, AR, TOP2A 051.7 0.715 GraphIHT ATM, CDKN1A, BRCA2, AR, TOP2A 055.2 0.714 in (Van De Vijver et al., 2002). ℓ 1 - Path BRCA1, CDKN1A, ATM, DSC2 061.2 0.675 MKI67, NAT1, AR, TOP2A 059.6 0.708 PPI network with 637 pathways is provided StoIHT ℓ 1 /ℓ 2 - Edge CCND3, ATM, CDH3 051.4 0.705 in (Jacob et al., 2009). We restrict our ℓ 1 - Edge CCND3, AR, CDH3 039.9 0.698 analysis on 3,243 genes (nodes) with 19,938 ℓ 1 /ℓ 2 - Path BRCA1, CDKN1A 147.6 0.705 edges. These cancer-related genes form a IHT NAT1, TOP2A 067.9 0.707 connected subgraph. 5 6 / 7

See you at Poster #92 Thank you! 7 / 7

Stochastic Iterative Hard Thresholding for Graph-Structured Sparsity - PowerPoint PPT Presentation

Stochastic Iterative Hard Thresholding for Graph-Structured Sparsity Optimization Baojian Zhou 1 , Feng Chen 1 , and Yiming Ying 2 1 Department of Computer Science, 2 Department of Mathematics and Statistics, University at Albany, NY, USA

Thresholding of Text Documents Oliver A Nina William A Barrett Thresholding or Binarization

Convergence of Iterative Hard Thresholding Variants with Application to Asynchronous Parallel

Planted Cliques, Iterative Thresholding and Message Passing Algorithms Yash Deshpande and Andrea

Matrix estimation by Universal Singular Value Thresholding Sourav Chatterjee Courant Institute,

Score Distribution Based Term Specific Thresholding for Spoken Term Detection D. Can M. Sarac

Basic Techniques II: Iterative Compression Marek Cygan Institute of Informatics University of

Chapter 12: Iterative Methods ES 240: Scientific and Engineering Computation. Iterative Methods

Development Figures are from : Agile and Iterative Development: A Manager's Guide, Craig

GRAPH MINING AND GRAPH KERNELS Part I: Graph Mining Karsten Borgwardt^ and Xifeng Yan*

Graph Covers and Iterative Decoding of Finite-Length Codes Pascal O. Vontobel (CSL, UIUC) Ralf

GRAPH MINING AND GRAPH KERNELS Part II: Graph Kernels Karsten Borgwardt^ and Xifeng Yan*

HydroCare HC-44 HydroCare HC-44 Hard Water Problems Hard Water Problems Hard Water Costs You

6/18/2018 When Family Life Gets Hard 1 6/18/2018 When Family Life Gets Hard God

Stochastic Processes Will Perkins March 7, 2013 Stochastic Processes Q: What is a Stochastic

What If We Only Have Stochastic . . . What if the Stochastic . . . Approximate Stochastic

An Iterative Solver for the Diffusion The Methods Progress So Far... Equation Alan Davidson

Comparison of the 2005 Weimer and HAO Empirical High Latitude Models of Energy Transfer in terms

NSI with ANT ARES data N. R. Khan Chowdhury, T arak Thakore 1 12th Dec. 2019 | N. R. Khan

Pi v oting DataFrames MAN IP U L ATIN G DATAFR AME S W ITH PAN DAS Anaconda Instr u ctor

Transcendental Julia Sets with Fractional Packing Dimension Jack Burkart, Stony Brook Topics in

More Practical Single-Trace Attacks on the Number Theoretic Transform Peter Pessl, Robert Primas

The dual Voronoi diagrams with respect to representational Bregman divergences Frank Nielsen and

ACESIII Outline Collaborators Design philosophy Mr. Mark Ponton, ACES Q. C.

Se Search arch for for th the e Lep Lepton ton Fl Flavor avor Vio Violating lating De