Suppose the matrix A= FY of size m by n (where sensing matrix F has size m by n , and basis matrix Y has size n by n ) has RIP property of order 2 S where d 2S < 0.41. Let the solution of the following be denoted as q * , (for signal f = Yq , measurement vector y = FYq): 2 θ ΦΨθ min such that y 1 2 Then we have: C q S is created by retaining the S θ θ θ θ * 0 C largest magnitude elements 1 S 1 2 S of q , and setting the rest to 0.
The proof can be found in a paper by Candes “The restricted isometry property and its implications for compressed sensing” , published in 2008. The proof as such is just between 1 and 2 pages long.
The proof uses various properties of vectors in Euclidean space. v w v w The Cauchy Schwartz inequality: | | 2 2 v w v w The triangle inequality: 2 2 2 v w v w Reverse triangle inequality: 2 2 2
Relationship between various norms: v v v 1 v | | n 2 1 2 v v if is a - sparse vector k k 2 Refer to Theorem 3. For the sake of simplicity alone, we shall assume Y to be the identity matrix. Hence x = θ . Even if Y were not identity, the proof as such does not change.
This result is called the Tube constraint . We have: Φ x * x Φx * y Φx y ( ) 2 0 0 2 2 2 In the following, Given constraint + feasibility Triangle inequality x 0 = true signal of solution x* 2 ε y = Φ x x 0
h x * x Define vector . 0 Decompose h into vectors h T0 , h T1 , h T2 ,… which are all at the most s -sparse. T 0 contains indices corresponding to the s largest elements of x , T 1 contains indices corresponding to the s largest elements of h (T0-c) = h - h T0 , T 2 contains indices corresponding to the s largest indices of h (T0 U T1)-c = h - h T0 - h T1 , and so on.
We will assume x 0 is s-sparse (later we will remove this requirement). We now establish the so-called cone constraint. The vector h has its origin at x 0 and it lies in the intersection of * x x h x the L1 ball and the tube. 0 0 1 1 1 | | | | x h x h x 0 0 0 i i i i 1 0-valued i T i T 0 0 c x h h x The vector h must also 0 0 0 0 T T c 1 1 1 1 necessarily obey this h h constraint – the cone 0 0 T c T 1 1 constraint.
We will now prove that such a vector h is orthogonal to the null-space of Φ . In fact, we will prove that . Φh h 2 2 In other words, we will prove that the magnitude of h is not much greater than 2 ε , which means that the solution x* of the optimization problem is close enough to x 0 .
In step 3, we use a bunch of algebraic manipulations to prove that the magnitude of h outside of T 0 U T 1 is upper bounded by the magnitude of h on T 0 U T 1 . In other words, we prove that: h h h (T0 T1) - c T0 T0 T0 T0 T1 T1 2 2 2 The algebra involves various inequalities mentioned earlier.
We now prove that the magnitude of h on T 0 U T 1 is upper bounded by a reasonable quantity. For this, we show using the RIP of Φ of order 2 s and a series of manipulations that: 2 2 d F d d ( 1 ) ( 2 1 2 ) h h h h 2 0 1 0 1 0 1 2 2 0 s T T T T T T s s T 2 2 2 2 This implies that d d d d 2 1 2 1 4 1 2 2 2 2 s 2 s s s h h h h d d T 0 T 1 T 0 T 1 T 0 T 1 d d 2 2 2 1 1 1 ( 2 1 ) 1 ( 2 1 ) 2 2 s 2 s 2 s 2 s
The steps change a bit. The cone constraint changes to: * x x h x 0 0 1 1 1 | | | | x h x h x 0 0 0 i i i i 1 i T i T 0 0 c x h h x x 0 , 0 0 0 0 , 0 0 T T T c T c 1 1 1 1 1 2 h h x 0 0 0 , 0 T c T T c 1 1 1
All the other steps remain as is, except the last one which produces the following bound: d d x x 2 1 1 ( 2 1 ) 0 0 , 0 T 2 s 2 s 1 h d d 2 1 ( 1 2 ) 1 ( 2 1 ) s 2 2 s s
Step 3 of the proof uses the following corollary of the RIP for two s -sparse unit d Φx 1 Φx | | vectors with disjoint support: 2 2 s Proof of corollary: 2 2 2 d d Φ ( 1 ) ( ) ( 1 ) , by RIP x x x x x x 2 2 S 1 2 1 2 s 1 2 1 Φ Φ Φ Φ Φ Φ 2 2 | | x x x x x x 1 2 1 2 1 2 4 ) 1 d 2 d 2 ( 1 ) ( 1 ) x x x x 2 2 s 1 2 s 1 2 4 ) 1 2 2 2 2 d d ( 1 )( 2 ) ( 1 )( 2 ) x x x x x x x x 2 2 s 1 2 1 2 s 1 2 1 2 4 2 2 d 1 , 0 x x x x 2 1 s 1 2 2
This step also uses the following corollary of the RIP for two s -sparse unit vectors with d disjoint support: Φx 1 Φx | | 2 2 s What if the original vectors x 1 and x 2 were not unit-vectors, but both were s -sparse? Φx Φx | | d d Φx Φx 1 2 | | x x 2 2 s 1 2 s 1 2 2 2 x x 1 2 2 2
The bound is d d 4 1 1 1 ( 2 1 ) h x x 2 s 2 s 0 0, 0, T0 T0 d d 2 1 1 ( 1 2 ) 1 ( 1 2 ) s 2 2 s s Note the requirement that δ 2s should be less than 2 0.5 - 1. You can prove that the two constant factors – one before and the other before | x 0 - x 0,TO | 1 , are both increasing functions of δ 2s in the domain [0,1]. So sensing matrices with smaller values of δ 2s are always nicer!
Suppose the matrix A= FY of size m by n (where sensing matrix F has size m by n , and basis matrix Y has size n by n ) has RIP property of order S where d S < 0.307. Let the solution of the following be denoted as q * , (for signal f = Yq , measurement vector y = FYq): 2 θ ΦΨθ min such that y 1 2 Then we have: 1 1 θ θ θ θ * d d S 1 0 . 307 2 ( 0 . 307 ) S q S is created by retaining the S k k largest magnitude elements of q , and setting the rest to 0.
Theorems 3,5,6 refer to orthonormal bases for the signal to have sparse or compressible representations. However that is not a necessary condition. There exist the so-called “over -complete bases” in which the number of columns exceeds the number of rows ( n x K , K > n ). Such matrices afford even sparser signal representations.
Why? We explain with an example. A cosine wave (with grid-aligned frequency) will have a sparse representation in the DCT basis V 1 . An impulse signal has sparse representation in the identity basis V 2 . Now consider a signal which is the superposition of a small number of cosines and impulses. The combined signal has sparse representation in neither the DCT basis nor the identity basis. But the combined signal will have a sparse representation in the combined dictionary [ V 1 V 2 ].
We know that certain classes of random matrices satisfy the RIP with very high probability. However, we also know that small RICs are desirable. This gives rise to the question: Can we design matrices with smaller RIC than a randomly generated matrix?
Unfortunately, there is no known efficient algorithm for even computing the RIC given a fixed matrix! But we know that the mutual coherence of Φ Y d ( is an upper bound to the RIC: 1 ) s s So we can design a CS matrix by starting with a random one, and then performing a gradient descent on the mutual coherence to reach a matrix with a smaller mutual coherence!
The procedure is summarized below: Φ Randomly pick a by matrix . m n Repeat until convergenc e { Φ Φ ΦΨ ( ( )) Pick the step-size adaptively so that you Φ actually descend on the mutual } coherence. ΦΨ ΦΨ ( ) ( ) ΦΨ j ( ) max i i j ΦΨ ΦΨ ( ) ( ) i j 2 2
The aforementioned is one example of a procedure to “design” a CS matrix – as opposed to picking one randomly. Note that mutual coherence has one more advantage over RIC – the former is not tied to any particular sparsity level ! But one must bear in mind that the mutual coherence is an upper bound to the RIC!
The main problem is how to find a derivative of the “max” function which is non - differentiable! Use the softmax function which is differentiable: 1 n n lim log exp( ) max{ } x x 1 i i i 1 i
Recommend
More recommend