Lecture 14: Planted Sparse Vector Lecture Outline Part I: Planted - PowerPoint PPT Presentation

Lecture 14: Planted Sparse Vector

Lecture Outline • Part I: Planted Sparse Vector and 2 to 4 Norm • Part II: SOS and 2 to 4 Norm on Random Subspaces • Part III: Warmup: Showing 𝑦 ≈ 1 • Part IV: 4-Norm Analysis • Part V: SOS-symmetry to the Rescue • Part VI: Observations and Loose Ends • Part VII: Open Problems

Part I: Planted Sparse Vector and 2 to 4 Norm

Planted Sparse Vector • Planted Sparse Vector problem: Given the span of 𝑒 − 1 random vectors in ℝ 𝑜 and one unit vector 𝑤 ∈ ℝ 𝑜 of sparsity 𝑙 , can we recover 𝑤 ? • More precisely, let 𝑊 be an n × 𝑒 matrix where: 1. 𝑒 − 1 columns of 𝑊 are vectors of length ≈ 1 chosen randomly from ℝ 𝑜 2. One column of 𝑊 is a unit vector 𝑤 with ≤ 𝑙 nonzero entries. • Given 𝑊𝑆 where 𝑆 is an arbitrary invertible 𝑒 × 𝑒 matrix, can we recover 𝑤 ?

Theorem Statement • Theorem 1.4 [BKS14]: There is a constant 𝑑 > 0 and an algorithm based on constant degree SOS such that for every vector 𝑤 0 supported on at most 𝑑𝑜 ⋅ min{1, 𝑜/𝑒 2 } coordinates, if 𝑤 1 , … , 𝑤 𝑒 are chosen independently at random from the Gaussian distribution on 𝑆 𝑜 , then given any basis for 𝑊 = 𝑡𝑞𝑏𝑜{𝑤 0 , … , 𝑤 𝑒 } , the algorithm outputs an 𝜗 -approximation to 𝑤 0 in 𝑞𝑝𝑚𝑧(𝑜, log(1/𝜗)) time.

Random Distribution • Random Distribution: We choose each entry of 𝑊 1 independently from 𝑂 0, 𝑜 , the normal distribution with mean 0 and standard deviation 1 𝑜 • We then choose 𝑆 to be a random 𝑒 × 𝑒 orthogonal/rotation matrix and take 𝑊𝑆 to be our input matrix.

Random Distribution • Remark: If 𝑆 is any 𝑒 × 𝑒 orthogonal/rotation matrix then 𝑊𝑆 can also be chosen by taking 1 each entry of 𝑊 independently from 𝑂 0, 𝑜 . • Idea: Each row of 𝑊 comes from a multivariate normal distribution with covariance matrix 1 𝑜 𝐽𝑒 𝑒 , which is invariant under rotations

Planted Distribution • Planted Distribution: We choose each entry of the first 𝑒 − 1 columns of 𝑊 independently from 1 𝑂 0, 𝑜 . The last column of 𝑊 is our sparse unit vector 𝑤 . • We then choose 𝑆 to be a random 𝑒 × 𝑒 orthogonal/rotation matrix and take 𝑊𝑆 to be our input matrix.

Output • We ask for an 𝑦 such that 𝑊𝑆𝑦 = 1 1. 𝑊𝑆𝑦 is k-sparse (i.e. at most 𝑙 indices of 𝑊𝑆𝑦 are 2. nonzero). • Hard to search for 𝑦 such that 𝑊𝑆𝑦 is k-sparse, so we’ll need to relax the problem.

Distinguishing Sparse Vectors • Key idea: All unit vectors have the same 2 -norm. However, sparse vectors will have higher 4-norm • 4-norm for a 𝑙 -sparse unit vector in ℝ 𝑜 is at 4 k ⋅ 1 1 𝑙 2 = 4 𝑙 (obtained by setting 𝑙 least ±1 𝑙 and the rest to 0 ) coordinates to • Relaxation Attempt #1: Search for an 𝑦 such that 𝑊𝑆𝑦 = 1 1. 1 𝑊𝑆𝑦 4 ≥ 2. 4 𝑙

2 to 4 Norm Problem • This is the 2 to 4 Norm Problem: Given a matrix 𝐵𝑦 4 𝐵 , find the vector 𝑦 which maximizes 𝐵𝑦

Part II: SOS and 2 to 4 Norm on Random Subspaces

2 to 4 Norm Hardness • Unfortunately, the 2 to 4 norm problem is hard [BBH+12]: – NP-hard to obtain an approximation ratio of 1 1 + 𝑜𝑞𝑝𝑚𝑧𝑚𝑝𝑕(𝑜) – Assuming ETH (the exponential time hypothesis), it is hard to approximate to within a constant factor. • Thus, we’ll need to relax our problem further.

SOS Relaxation • Relaxation: Find ෨ 𝐹 which respects the following constraints: 2 = 1 𝑊𝑆𝑦 2 = σ 𝑗=1 𝑜 𝑊𝑆𝑦 𝑗 1. 4 ≥ 4 = σ 𝑗=1 1 𝑜 𝑊𝑆𝑦 4 𝑊𝑆𝑦 𝑗 2. 𝑙

Showing a Distinguishing Algorithm • Constraints: 2 = 1 𝑊𝑆𝑦 2 = σ 𝑗=1 𝑜 1. 𝑊𝑆𝑦 𝑗 4 ≥ 4 = σ 𝑗=1 1 𝑜 𝑊𝑆𝑦 4 𝑊𝑆𝑦 𝑗 2. 𝑙 • To show that SOS distinguishes between the random and planted distribution, it is sufficient to show that there is no ෨ 𝐹 which respects these constraints and has a PSD moment matrix 𝑁 . • Remark: Although the 2 to 4 Norm problem is hard in general, we just need to show that SOS can approximate it on random subspaces.

2 to 4 Norm on Random Subspaces • Given a random subspace, what is the expected value of the largest 4 -norm of a unit vector in the subspace? • Trivial strategy: Any unit vector’s 4 -norm is at 1 least 4 𝑜 . • Can we do better?

2 to 4 Norm on Random Subspaces • Another strategy: Take a basis for this space and take a linear combination which maximizes one coordinate (subject to having length 1 ) • If we add together 𝑒 random vectors with entries 1 𝑜 , w.h.p. the result will have norm ෩ ≈ ± Θ 𝑒 . Diving the resulting vector by ෩ Θ 𝑒 , the 𝑒 maximized entry will have magnitude ෩ Θ 𝑜 , 1 other entries will have magnitude ෩ O 𝑜

2 to 4 Norm on Random Subspaces • Calling our final result 𝑥 , w.h.p. the maximized 𝑒 2 4 while the entry of 𝑥 contributes ෩ Θ 𝑜 2 to 𝑥 4 1 other entries contribute ෩ Θ 𝑜 . • It turns out that this strategy is essentially optimal. Thus, with high probability the maximum 4 -norm of a unit vector in a d- dimensional random subspace will be 𝑒 𝑜 , 1 ෩ Θ max . 4 𝑜

Algorithm Boundary 1 • Planted dist: max 4 -norm ≥ 4 𝑙 𝑒 𝑜 , 1 • Random dist: max 4-norm is ෩ Θ max . 4 𝑜 • IF SOS can certify the upper bound for a random subspace, this gives a distinguishing 𝑜 , 1 𝑒 1 algorithm when max 4 𝑜 ≪ 4 𝑙 (which happens when 𝑒 ≤ 𝑜 and 𝑙 ≪ 𝑜 or when 𝑜 and k ≪ 𝑜 2 𝑒 ≥ 𝑒 2 )

Part III: Warmup: Showing 𝑦 ≈ 1

Showing 𝑦 ≈ 1 • Take 𝑥 = 𝑊𝑆𝑦 . • We expect that 𝑥 ≈ 𝑦 . Since we require that 𝑥 = 1 , this implies that we will have 𝑦 ≈ 1 2 = • To check that 𝑥 ≈ 𝑦 , observe that 𝑥 2 𝑦 𝑈 RV T VR x . Thus, it is sufficient to show that RV T VR ≈ 𝐽𝑒 .

Checking RV T VR ≈ 𝐽𝑒 • We have that RV T VR ≈ 𝐽𝑒 because the columns of 𝑊𝑆 are 𝑒 random unit vectors (where 𝑒 ≪ 𝑜 ) and are thus approximately orthonormal. • However, we will use graph matrices to analyze the 4-norm, so as a warmup, let’s check that RV T VR ≈ 𝐽𝑒 using graph matrices.

Graph Matrices Over 𝑂(0,1) • So far we have worked over {−1, +1} 𝑛 . • How can we use graph matrices over 𝑂 0,1 𝑛 ? • Key idea: Look at the Fourier characters over 𝑂(0,1) .

Fourier Analysis Over 𝑂(0,1) • Inner product on 𝑂 0,1 : 𝑔 ⋅ 𝑕 = 𝐹 𝑦∼𝑂 0,1 𝑔 𝑦 𝑕(𝑦) • Fourier characters: Hermite polynomials • The first few Hermite polynomials (up to normalization) are as follows: 1. ℎ 0 = 1 2. ℎ 1 = 𝑦 ℎ 2 = 𝑦 2 − 1 3. ℎ 3 = 𝑦 3 − 3𝑦 4. • To normalize, divide ℎ 𝑘 by 𝑘!

Graph Matrices Over 𝑂(0,1) • Graph matrices over {−1,1} 𝑛 : 1 and 𝑦 are a basis for functions over {−1,1} . We represent 𝑦 by an edge and 1 by the absence of an edge • Graph matrices over 𝑂 0,1 𝑛 : {ℎ 𝑘 } are a basis for functions over 𝑂(0,1) . We represent ℎ 𝑘 by a multi-edge with multiplicity 𝑘 .

Graph Matrices for RV T VR • For convenience, take 𝐵 = 𝑜𝑆𝑊 and think of the entries of 𝐵 as the input. Now each entry of 𝐵 is chosen independently from 𝑂(0,1) • 𝐵 𝑗𝑘 is represented by an edge from node 𝑗 to node 𝑘 . • In class challenge: What is RV T VR in terms of graph matrices? 1 × 𝑘 1 𝑗 𝑗 𝑘 2 𝑜 𝑒 𝑜 𝑜 𝑒

Graph Matrices for RV T VR • In class challenge answer: 1 = 𝑘 1 𝑗 × 𝑗 𝑘 2 𝑜 𝑒 𝑜 𝑜 𝑒 n n 𝑗 𝑗 1 2 1 𝑘 2 𝑘 𝑘 𝑘 1 𝑗 + + 𝑜 𝑜 𝑜 d d d n d 𝑊 𝑉 = 𝑊 𝑉 = 𝑊 𝑉

Generalizing Rough Norm Bounds • Here we have two different types of vertices, one for the rows of 𝐵 (which has 𝑜 possibilities) and one for the columns of 𝐵 (which has 𝑒 possibilities) • Can generalize the rough norm bounds to handle multiple types of vertices (writing this up is on my to-do list)

Generalizing Rough Norm Bounds • Generalized rough norm bounds: • Each isolated vertex outside of 𝑉 and 𝑊 contributes a factor equal to the number of possibilities for that vertex • Each vertex in the minimum separator (which minimizes the total number of possibilities for its vertices) contributes nothing • Each other vertex contributes a factor equal to the square root of the number of possibilities for that vertex

Norm Bounds for RV T VR 1 = × 𝑘 1 𝑗 𝑗 𝑘 2 𝑜 𝑒 𝑜 𝑜 𝑒 n n 𝑗 𝑗 1 2 1 𝑘 2 𝑘 𝑘 𝑘 1 𝑗 + + 𝑜 𝑜 𝑜 d d d n d 𝑊 𝑉 = 𝑊 𝑉 = 𝑊 𝑉 𝑒 ෨ 1 𝑃 ෨ = 𝐽𝑒 𝑒 𝑃 𝑜 𝑜

Part IV: 4-Norm Analysis

4-Norm Analysis 4 1 • We want to bound 𝑜 𝐵𝑦 4 • Take 𝐶 to be the matrix with entries 𝐶 𝑗,(𝑘 1 ,𝑘 2 ) = 𝐵 𝑗𝑘 1 𝐵 𝑗𝑘 2 4 1 = 1 𝑜 2 𝑦 ⊗ 𝑦 𝑈 𝐶 𝑈 𝐶(𝑦 ⊗ 𝑦) • 𝑜 𝐵𝑦 4 • Can try to bound 𝐶 𝑈 𝐶

Lecture 14: Planted Sparse Vector Lecture Outline Part I: Planted - PowerPoint PPT Presentation

Lecture 14: Planted Sparse Vector Lecture Outline Part I: Planted Sparse Vector and 2 to 4 Norm Part II: SOS and 2 to 4 Norm on Random Subspaces Part III: Warmup: Showing 1 Part IV: 4-Norm Analysis Part V:

for Planted Clique Part I Lecture Outline Part I: Planted Clique and the Meka-Wigderson

Vector addition: The zero vector The D -vector whose entries are all zero is the zero vector ,

Inventor Natalie Heckert SEEDS PLANTED = HARVEST? SEEDS PLANTED. LETS GROW! 1986

Parallel Sparse Matrix-Vector and Matrix- Transpose-Vector Multiplication using Compressed Sparse

Sparse Matrices Example Of Sparse Matrices diagonal tridiagonal sparse many elements are

Sparse Matrix Partitioning, Reordering and Vector Multiplication Albert-Jan Yzelman, Utrecht

Matrix and Vector Operations Matrix and Vector Operations 1 / 21 Matrix and Vector Operations

Day 3 Advanced Vector Architectures Session A: Vector Instruction Execution Pipelines Break

Sparse Matrices sparse many elements are zero dense few elements are zero Example Of

Exploiting GPU Caches in Sparse Matrix Vector Multiplication Yusuke Nagasaka Tokyo Institute of

Benchmarking Sparse Matrix-Vector Multiply In 5 Minutes Hormozd Gahvari, Mark Hoemmen, James

Random growth models and planted problems Graduating bits - ITCS 2016 Laura Florescu NYU

Planted Cliques, Iterative Thresholding and Message Passing Algorithms Yash Deshpande and Andrea

Lecture 11 Vector Linear Network Coding Vector Linear Network Coding Outline Fundamentals for

Exploiting Matrix Reuse and Data Locality in Sparse Matrix-Vector and Matrix-Transpose-Vector

P e o p l e A very sparse semantical vector for frame: Emphasize primary object Vector

Dynamics of a planar Coulomb gas F . Bolley, D. Chafa , J. Fontbona Jussieu, Dauphine,

The Galerkin Finite Element Method Stephen C. Jardin Princeton Plasma Physics Laboratory CEMRACS

Nonlinear coherent states associated with a measure on the positive real half line Zouhar

The Essentials of CAGD Chapter 7: Working with Polynomial Patches Gerald Farin & Dianne

Stein-Malliavin Approximations for Nonlinear Functionals of Random Eigenfunctions on S d Maurizia

On the interface between Hermitian and normal random matrices Yacin Ameur. Centre for

Reducing number field defining polynomials: An application to class group computations Alexandre

Numerical Analysis I Interpolation and Polynomial Approximation Instructor: Wei-Cheng Wang 1

Sambuz

Useful Links

Newsletter

Mail Us

Lecture 14: Planted Sparse Vector Lecture Outline Part I: Planted - PowerPoint PPT Presentation

Lecture 14: Planted Sparse Vector Lecture Outline Part I: Planted Sparse Vector and 2 to 4 Norm Part II: SOS and 2 to 4 Norm on Random Subspaces Part III: Warmup: Showing 1 Part IV: 4-Norm Analysis Part V:

for Planted Clique Part I Lecture Outline Part I: Planted Clique and the Meka-Wigderson

Vector addition: The zero vector The D -vector whose entries are all zero is the zero vector ,

Inventor Natalie Heckert SEEDS PLANTED = HARVEST? SEEDS PLANTED. LETS GROW! 1986

Parallel Sparse Matrix-Vector and Matrix- Transpose-Vector Multiplication using Compressed Sparse

Sparse Matrices Example Of Sparse Matrices diagonal tridiagonal sparse many elements are

Sparse Matrix Partitioning, Reordering and Vector Multiplication Albert-Jan Yzelman, Utrecht

Matrix and Vector Operations Matrix and Vector Operations 1 / 21 Matrix and Vector Operations

Day 3 Advanced Vector Architectures Session A: Vector Instruction Execution Pipelines Break

Sparse Matrices sparse many elements are zero dense few elements are zero Example Of

Exploiting GPU Caches in Sparse Matrix Vector Multiplication Yusuke Nagasaka Tokyo Institute of

Benchmarking Sparse Matrix-Vector Multiply In 5 Minutes Hormozd Gahvari, Mark Hoemmen, James

Random growth models and planted problems Graduating bits - ITCS 2016 Laura Florescu NYU

Planted Cliques, Iterative Thresholding and Message Passing Algorithms Yash Deshpande and Andrea

Lecture 11 Vector Linear Network Coding Vector Linear Network Coding Outline Fundamentals for

Exploiting Matrix Reuse and Data Locality in Sparse Matrix-Vector and Matrix-Transpose-Vector

P e o p l e A very sparse semantical vector for frame: Emphasize primary object Vector

Dynamics of a planar Coulomb gas F . Bolley, D. Chafa , J. Fontbona Jussieu, Dauphine,

The Galerkin Finite Element Method Stephen C. Jardin Princeton Plasma Physics Laboratory CEMRACS

Nonlinear coherent states associated with a measure on the positive real half line Zouhar

The Essentials of CAGD Chapter 7: Working with Polynomial Patches Gerald Farin &amp; Dianne

Stein-Malliavin Approximations for Nonlinear Functionals of Random Eigenfunctions on S d Maurizia

On the interface between Hermitian and normal random matrices Yacin Ameur. Centre for

Reducing number field defining polynomials: An application to class group computations Alexandre

Numerical Analysis I Interpolation and Polynomial Approximation Instructor: Wei-Cheng Wang 1

Sambuz

Useful Links

Newsletter

Mail Us

The Essentials of CAGD Chapter 7: Working with Polynomial Patches Gerald Farin & Dianne