Lecture 14: Planted Sparse Vector
Lecture Outline • Part I: Planted Sparse Vector and 2 to 4 Norm • Part II: SOS and 2 to 4 Norm on Random Subspaces • Part III: Warmup: Showing 𝑦 ≈ 1 • Part IV: 4-Norm Analysis • Part V: SOS-symmetry to the Rescue • Part VI: Observations and Loose Ends • Part VII: Open Problems
Part I: Planted Sparse Vector and 2 to 4 Norm
Planted Sparse Vector • Planted Sparse Vector problem: Given the span of 𝑒 − 1 random vectors in ℝ 𝑜 and one unit vector 𝑤 ∈ ℝ 𝑜 of sparsity 𝑙 , can we recover 𝑤 ? • More precisely, let 𝑊 be an n × 𝑒 matrix where: 1. 𝑒 − 1 columns of 𝑊 are vectors of length ≈ 1 chosen randomly from ℝ 𝑜 2. One column of 𝑊 is a unit vector 𝑤 with ≤ 𝑙 nonzero entries. • Given 𝑊𝑆 where 𝑆 is an arbitrary invertible 𝑒 × 𝑒 matrix, can we recover 𝑤 ?
Theorem Statement • Theorem 1.4 [BKS14]: There is a constant 𝑑 > 0 and an algorithm based on constant degree SOS such that for every vector 𝑤 0 supported on at most 𝑑𝑜 ⋅ min{1, 𝑜/𝑒 2 } coordinates, if 𝑤 1 , … , 𝑤 𝑒 are chosen independently at random from the Gaussian distribution on 𝑆 𝑜 , then given any basis for 𝑊 = 𝑡𝑞𝑏𝑜{𝑤 0 , … , 𝑤 𝑒 } , the algorithm outputs an 𝜗 -approximation to 𝑤 0 in 𝑞𝑝𝑚𝑧(𝑜, log(1/𝜗)) time.
Random Distribution • Random Distribution: We choose each entry of 𝑊 1 independently from 𝑂 0, 𝑜 , the normal distribution with mean 0 and standard deviation 1 𝑜 • We then choose 𝑆 to be a random 𝑒 × 𝑒 orthogonal/rotation matrix and take 𝑊𝑆 to be our input matrix.
Random Distribution • Remark: If 𝑆 is any 𝑒 × 𝑒 orthogonal/rotation matrix then 𝑊𝑆 can also be chosen by taking 1 each entry of 𝑊 independently from 𝑂 0, 𝑜 . • Idea: Each row of 𝑊 comes from a multivariate normal distribution with covariance matrix 1 𝑜 𝐽𝑒 𝑒 , which is invariant under rotations
Planted Distribution • Planted Distribution: We choose each entry of the first 𝑒 − 1 columns of 𝑊 independently from 1 𝑂 0, 𝑜 . The last column of 𝑊 is our sparse unit vector 𝑤 . • We then choose 𝑆 to be a random 𝑒 × 𝑒 orthogonal/rotation matrix and take 𝑊𝑆 to be our input matrix.
Output • We ask for an 𝑦 such that 𝑊𝑆𝑦 = 1 1. 𝑊𝑆𝑦 is k-sparse (i.e. at most 𝑙 indices of 𝑊𝑆𝑦 are 2. nonzero). • Hard to search for 𝑦 such that 𝑊𝑆𝑦 is k-sparse, so we’ll need to relax the problem.
Distinguishing Sparse Vectors • Key idea: All unit vectors have the same 2 -norm. However, sparse vectors will have higher 4-norm • 4-norm for a 𝑙 -sparse unit vector in ℝ 𝑜 is at 4 k ⋅ 1 1 𝑙 2 = 4 𝑙 (obtained by setting 𝑙 least ±1 𝑙 and the rest to 0 ) coordinates to • Relaxation Attempt #1: Search for an 𝑦 such that 𝑊𝑆𝑦 = 1 1. 1 𝑊𝑆𝑦 4 ≥ 2. 4 𝑙
2 to 4 Norm Problem • This is the 2 to 4 Norm Problem: Given a matrix 𝐵𝑦 4 𝐵 , find the vector 𝑦 which maximizes 𝐵𝑦
Part II: SOS and 2 to 4 Norm on Random Subspaces
2 to 4 Norm Hardness • Unfortunately, the 2 to 4 norm problem is hard [BBH+12]: – NP-hard to obtain an approximation ratio of 1 1 + 𝑜𝑞𝑝𝑚𝑧𝑚𝑝(𝑜) – Assuming ETH (the exponential time hypothesis), it is hard to approximate to within a constant factor. • Thus, we’ll need to relax our problem further.
SOS Relaxation • Relaxation: Find ෨ 𝐹 which respects the following constraints: 2 = 1 𝑊𝑆𝑦 2 = σ 𝑗=1 𝑜 𝑊𝑆𝑦 𝑗 1. 4 ≥ 4 = σ 𝑗=1 1 𝑜 𝑊𝑆𝑦 4 𝑊𝑆𝑦 𝑗 2. 𝑙
Showing a Distinguishing Algorithm • Constraints: 2 = 1 𝑊𝑆𝑦 2 = σ 𝑗=1 𝑜 1. 𝑊𝑆𝑦 𝑗 4 ≥ 4 = σ 𝑗=1 1 𝑜 𝑊𝑆𝑦 4 𝑊𝑆𝑦 𝑗 2. 𝑙 • To show that SOS distinguishes between the random and planted distribution, it is sufficient to show that there is no ෨ 𝐹 which respects these constraints and has a PSD moment matrix 𝑁 . • Remark: Although the 2 to 4 Norm problem is hard in general, we just need to show that SOS can approximate it on random subspaces.
2 to 4 Norm on Random Subspaces • Given a random subspace, what is the expected value of the largest 4 -norm of a unit vector in the subspace? • Trivial strategy: Any unit vector’s 4 -norm is at 1 least 4 𝑜 . • Can we do better?
2 to 4 Norm on Random Subspaces • Another strategy: Take a basis for this space and take a linear combination which maximizes one coordinate (subject to having length 1 ) • If we add together 𝑒 random vectors with entries 1 𝑜 , w.h.p. the result will have norm ෩ ≈ ± Θ 𝑒 . Diving the resulting vector by ෩ Θ 𝑒 , the 𝑒 maximized entry will have magnitude ෩ Θ 𝑜 , 1 other entries will have magnitude ෩ O 𝑜
2 to 4 Norm on Random Subspaces • Calling our final result 𝑥 , w.h.p. the maximized 𝑒 2 4 while the entry of 𝑥 contributes ෩ Θ 𝑜 2 to 𝑥 4 1 other entries contribute ෩ Θ 𝑜 . • It turns out that this strategy is essentially optimal. Thus, with high probability the maximum 4 -norm of a unit vector in a d- dimensional random subspace will be 𝑒 𝑜 , 1 ෩ Θ max . 4 𝑜
Algorithm Boundary 1 • Planted dist: max 4 -norm ≥ 4 𝑙 𝑒 𝑜 , 1 • Random dist: max 4-norm is ෩ Θ max . 4 𝑜 • IF SOS can certify the upper bound for a random subspace, this gives a distinguishing 𝑜 , 1 𝑒 1 algorithm when max 4 𝑜 ≪ 4 𝑙 (which happens when 𝑒 ≤ 𝑜 and 𝑙 ≪ 𝑜 or when 𝑜 and k ≪ 𝑜 2 𝑒 ≥ 𝑒 2 )
Part III: Warmup: Showing 𝑦 ≈ 1
Showing 𝑦 ≈ 1 • Take 𝑥 = 𝑊𝑆𝑦 . • We expect that 𝑥 ≈ 𝑦 . Since we require that 𝑥 = 1 , this implies that we will have 𝑦 ≈ 1 2 = • To check that 𝑥 ≈ 𝑦 , observe that 𝑥 2 𝑦 𝑈 RV T VR x . Thus, it is sufficient to show that RV T VR ≈ 𝐽𝑒 .
Checking RV T VR ≈ 𝐽𝑒 • We have that RV T VR ≈ 𝐽𝑒 because the columns of 𝑊𝑆 are 𝑒 random unit vectors (where 𝑒 ≪ 𝑜 ) and are thus approximately orthonormal. • However, we will use graph matrices to analyze the 4-norm, so as a warm- up, let’s check that RV T VR ≈ 𝐽𝑒 using graph matrices.
Graph Matrices Over 𝑂(0,1) • So far we have worked over {−1, +1} 𝑛 . • How can we use graph matrices over 𝑂 0,1 𝑛 ? • Key idea: Look at the Fourier characters over 𝑂(0,1) .
Fourier Analysis Over 𝑂(0,1) • Inner product on 𝑂 0,1 : 𝑔 ⋅ = 𝐹 𝑦∼𝑂 0,1 𝑔 𝑦 (𝑦) • Fourier characters: Hermite polynomials • The first few Hermite polynomials (up to normalization) are as follows: 1. ℎ 0 = 1 2. ℎ 1 = 𝑦 ℎ 2 = 𝑦 2 − 1 3. ℎ 3 = 𝑦 3 − 3𝑦 4. • To normalize, divide ℎ 𝑘 by 𝑘!
Graph Matrices Over 𝑂(0,1) • Graph matrices over {−1,1} 𝑛 : 1 and 𝑦 are a basis for functions over {−1,1} . We represent 𝑦 by an edge and 1 by the absence of an edge • Graph matrices over 𝑂 0,1 𝑛 : {ℎ 𝑘 } are a basis for functions over 𝑂(0,1) . We represent ℎ 𝑘 by a multi-edge with multiplicity 𝑘 .
Graph Matrices for RV T VR • For convenience, take 𝐵 = 𝑜𝑆𝑊 and think of the entries of 𝐵 as the input. Now each entry of 𝐵 is chosen independently from 𝑂(0,1) • 𝐵 𝑗𝑘 is represented by an edge from node 𝑗 to node 𝑘 . • In class challenge: What is RV T VR in terms of graph matrices? 1 × 𝑘 1 𝑗 𝑗 𝑘 2 𝑜 𝑒 𝑜 𝑜 𝑒
Graph Matrices for RV T VR • In class challenge answer: 1 = 𝑘 1 𝑗 × 𝑗 𝑘 2 𝑜 𝑒 𝑜 𝑜 𝑒 n n 𝑗 𝑗 1 2 1 𝑘 2 𝑘 𝑘 𝑘 1 𝑗 + + 𝑜 𝑜 𝑜 d d d n d 𝑊 𝑉 = 𝑊 𝑉 = 𝑊 𝑉
Generalizing Rough Norm Bounds • Here we have two different types of vertices, one for the rows of 𝐵 (which has 𝑜 possibilities) and one for the columns of 𝐵 (which has 𝑒 possibilities) • Can generalize the rough norm bounds to handle multiple types of vertices (writing this up is on my to-do list)
Generalizing Rough Norm Bounds • Generalized rough norm bounds: • Each isolated vertex outside of 𝑉 and 𝑊 contributes a factor equal to the number of possibilities for that vertex • Each vertex in the minimum separator (which minimizes the total number of possibilities for its vertices) contributes nothing • Each other vertex contributes a factor equal to the square root of the number of possibilities for that vertex
Norm Bounds for RV T VR 1 = × 𝑘 1 𝑗 𝑗 𝑘 2 𝑜 𝑒 𝑜 𝑜 𝑒 n n 𝑗 𝑗 1 2 1 𝑘 2 𝑘 𝑘 𝑘 1 𝑗 + + 𝑜 𝑜 𝑜 d d d n d 𝑊 𝑉 = 𝑊 𝑉 = 𝑊 𝑉 𝑒 ෨ 1 𝑃 ෨ = 𝐽𝑒 𝑒 𝑃 𝑜 𝑜
Part IV: 4-Norm Analysis
4-Norm Analysis 4 1 • We want to bound 𝑜 𝐵𝑦 4 • Take 𝐶 to be the matrix with entries 𝐶 𝑗,(𝑘 1 ,𝑘 2 ) = 𝐵 𝑗𝑘 1 𝐵 𝑗𝑘 2 4 1 = 1 𝑜 2 𝑦 ⊗ 𝑦 𝑈 𝐶 𝑈 𝐶(𝑦 ⊗ 𝑦) • 𝑜 𝐵𝑦 4 • Can try to bound 𝐶 𝑈 𝐶
Recommend
More recommend