Least squares problem (Observed by Pananjady, Wainwright, & Courtade, 2016.) Reduction from 3 -PARTITION (H., Shi, & Sun, 2017) . 9 Given ( x i ) n i = 1 from R d and ( y i ) n i = 1 from R , minimize n ) 2 . ( ∑ F ( β , π ) := x ⊤ i β − y π ( i ) i = 1 • d = 1 : O ( n log n ) -time algorithm. • d = Ω( n ) : (strongly) NP-hard to decide if min F = 0 . Naïve brute-force search : Ω( | S n | ) = Ω( n !) . Least squares with known correspondence : O ( nd 2 ) time.
10 25 5 22 2 25 5 20 2 Least squares problem ( d = 1 ) Given ( x i ) n i = 1 and ( y i ) n i = 1 from R , minimize n ) 2 . ( ∑ F ( β, π ) := x i β − y π ( i ) i = 1 y 1 x 1 y 2 x 2 . . . . . . y n x n
10 25 5 22 2 25 5 20 2 Least squares problem ( d = 1 ) Given ( x i ) n i = 1 and ( y i ) n i = 1 from R , minimize n ) 2 . ( ∑ F ( β, π ) := x i β − y π ( i ) i = 1 2 � x 1 y 1 � β − 2 � x 2 y 2 � β − . . . . . . 2 � x n y n � β − Cost with π ( i ) = i for all i = 1 , . . . , n .
10 25 5 22 2 25 5 20 2 Least squares problem ( d = 1 ) Given ( x i ) n i = 1 and ( y i ) n i = 1 from R , minimize n ) 2 . ( ∑ F ( β, π ) := x i β − y π ( i ) i = 1 2 � � β 3 2 − 2 � � β 4 1 − . . . . . . 2 � � β 6 7 − Cost with π ( i ) = i for all i = 1 , . . . , n .
10 25 5 22 2 25 5 20 2 Least squares problem ( d = 1 ) Given ( x i ) n i = 1 and ( y i ) n i = 1 from R , minimize n ) 2 . ( ∑ F ( β, π ) := x i β − y π ( i ) i = 1 2 � � β 3 2 − 2 � � β 4 1 − . . . . . . 2 � � β 6 7 − If β > 0 , then can improve cost with π ( 1 ) = 2 and π ( 2 ) = 1 .
10 Least squares problem ( d = 1 ) Given ( x i ) n i = 1 and ( y i ) n i = 1 from R , minimize n ) 2 . ( ∑ F ( β, π ) := x i β − y π ( i ) i = 1 2 2 � � � � β β 3 2 3 1 − − 2 2 � � � � β β 4 1 4 2 − − > . . . . . . . . . . . . 2 2 � � � � β β 6 7 6 7 − − If β > 0 , then can improve cost with π ( 1 ) = 2 and π ( 2 ) = 1 . 25 β 2 − 20 β + 5 + · · · > 25 β 2 − 22 β + 5 + · · ·
11 3. Solve classical least squares problem 1 ? What about . Overall running time : . to get optimal 2 1 (via sorting). 2 1 such that fjnd optimal , 2 1 2. Assuming WLOG that Algorithm for least squares problem ( d = 1 ) [PWC’16] 1. “Guess” sign of optimal β . (Only two possibilities.)
(via sorting). 3. Solve classical least squares problem 1 2 to get optimal . Overall running time : . What about 1 ? 11 Algorithm for least squares problem ( d = 1 ) [PWC’16] 1. “Guess” sign of optimal β . (Only two possibilities.) 2. Assuming WLOG that x 1 β ≤ x 2 β · · · ≤ x n β , fjnd optimal π such that y π ( 1 ) ≤ y π ( 2 ) ≤ · · · ≤ y π ( n )
11 Overall running time : 1 ? What about (via sorting). 3. Solve classical least squares problem . Algorithm for least squares problem ( d = 1 ) [PWC’16] 1. “Guess” sign of optimal β . (Only two possibilities.) 2. Assuming WLOG that x 1 β ≤ x 2 β · · · ≤ x n β , fjnd optimal π such that y π ( 1 ) ≤ y π ( 2 ) ≤ · · · ≤ y π ( n ) n ∑ min ( x i β − y π ( i ) ) 2 β ∈ R i = 1 to get optimal β .
11 3. Solve classical least squares problem 1 ? What about (via sorting). Algorithm for least squares problem ( d = 1 ) [PWC’16] 1. “Guess” sign of optimal β . (Only two possibilities.) 2. Assuming WLOG that x 1 β ≤ x 2 β · · · ≤ x n β , fjnd optimal π such that y π ( 1 ) ≤ y π ( 2 ) ≤ · · · ≤ y π ( n ) n ∑ min ( x i β − y π ( i ) ) 2 β ∈ R i = 1 to get optimal β . Overall running time : O ( n log n ) .
(via sorting). 3. Solve classical least squares problem 11 Algorithm for least squares problem ( d = 1 ) [PWC’16] 1. “Guess” sign of optimal β . (Only two possibilities.) 2. Assuming WLOG that x 1 β ≤ x 2 β · · · ≤ x n β , fjnd optimal π such that y π ( 1 ) ≤ y π ( 2 ) ≤ · · · ≤ y π ( n ) n ∑ min ( x i β − y π ( i ) ) 2 β ∈ R i = 1 to get optimal β . Overall running time : O ( n log n ) . What about d > 1 ?
Alternating minimization • Each loop-iteration effjciently computable. ( Open : How many restarts? How many iterations?) . So try many initial • But can get stuck in local minima. 12 Loop until convergence: Pick initial ˆ β ∈ R d (e.g., randomly). n ) 2 . ( i ˆ ∑ x ⊤ π ← arg min ˆ β − y π ( i ) π ∈ S n i = 1 n ) 2 . ( ˆ ∑ x ⊤ β ← arg min i β − y ˆ π ( i ) β ∈ R d i = 1
Alternating minimization • Each loop-iteration effjciently computable. ( Open : How many restarts? How many iterations?) . So try many initial • But can get stuck in local minima. 12 Loop until convergence: Pick initial ˆ β ∈ R d (e.g., randomly). n ) 2 . ( i ˆ ∑ x ⊤ π ← arg min ˆ β − y π ( i ) π ∈ S n i = 1 n ) 2 . ( ˆ ∑ x ⊤ β ← arg min i β − y ˆ π ( i ) β ∈ R d i = 1
Alternating minimization • Each loop-iteration effjciently computable. ( Open : How many restarts? How many iterations?) . So try many initial • But can get stuck in local minima. 12 Loop until convergence: Pick initial ˆ β ∈ R d (e.g., randomly). n ) 2 . ( i ˆ ∑ x ⊤ π ← arg min ˆ β − y π ( i ) π ∈ S n i = 1 n ) 2 . ( ˆ ∑ x ⊤ β ← arg min i β − y ˆ π ( i ) β ∈ R d i = 1
Alternating minimization (Image credit: Wolfram|Alpha ) • Each loop-iteration effjciently computable. • But can get stuck in local minima. So try many initial . ( Open : How many restarts? How many iterations?) 12
Alternating minimization (Image credit: Wolfram|Alpha ) • Each loop-iteration effjciently computable. ( Open : How many restarts? How many iterations?) 12 • But can get stuck in local minima. So try many initial ˆ β ∈ R d .
Approximation result Theorem (H., Shi, & Sun, 2017) least squares problem in time Recall : Brute-force solution needs time. (No other previous algorithm with approximation guarantee.) 13 There is an algorithm that given any inputs ( x i ) n i = 1 , ( y i ) n i = 1 , and ϵ ∈ ( 0 , 1 ) , returns a ( 1 + ϵ ) -approximate solution to the ) O ( d ) ( n + poly( n, d ) . ϵ
Approximation result Theorem (H., Shi, & Sun, 2017) least squares problem in time (No other previous algorithm with approximation guarantee.) 13 There is an algorithm that given any inputs ( x i ) n i = 1 , ( y i ) n i = 1 , and ϵ ∈ ( 0 , 1 ) , returns a ( 1 + ϵ ) -approximate solution to the ) O ( d ) ( n + poly( n, d ) . ϵ Recall : Brute-force solution needs Ω( n !) time.
algorithms and lower-bounds Statistical recovery of β ∗ :
Motivation Approach : Study question in context of statistical model for data. 1. Understand information-theoretic limits on recovering truth. 2. Natural “average-case” setting for algorithms. 14 When does the best fjt model shed light on the “truth” ( π ∗ & β ∗ )?
Motivation Approach : Study question in context of statistical model for data. 1. Understand information-theoretic limits on recovering truth. 2. Natural “average-case” setting for algorithms. 14 When does the best fjt model shed light on the “truth” ( π ∗ & β ∗ )?
Motivation Approach : Study question in context of statistical model for data. 1. Understand information-theoretic limits on recovering truth. 2. Natural “average-case” setting for algorithms. 14 When does the best fjt model shed light on the “truth” ( π ∗ & β ∗ )?
Statistical model Recoverability of . to approximately recover Just need is known) : Classical setting (where 2 2 2 depends on signal-to-noise ratio : 15 x ⊤ y 1 ε 1 π ∗ (1) x ⊤ y 2 ε 2 π ∗ (2) β ∗ = + . . . . . . . . . x ⊤ y n ε n π ∗ ( n ) Assume ( x i ) n i = 1 iid from P and ( ε i ) n i = 1 iid from N( 0 , σ 2 ) .
Statistical model 2 . to approximately recover Just need is known) : Classical setting (where 15 x ⊤ y 1 ε 1 π ∗ (1) x ⊤ y 2 ε 2 π ∗ (2) β ∗ = + . . . . . . . . . x ⊤ y n ε n π ∗ ( n ) Assume ( x i ) n i = 1 iid from P and ( ε i ) n i = 1 iid from N( 0 , σ 2 ) . Recoverability of β ∗ depends on signal-to-noise ratio : SNR := ∥ β ∗ ∥ 2 . σ 2
Statistical model 2 15 x ⊤ y 1 ε 1 π ∗ (1) x ⊤ y 2 ε 2 π ∗ (2) β ∗ = + . . . . . . . . . x ⊤ y n ε n π ∗ ( n ) Assume ( x i ) n i = 1 iid from P and ( ε i ) n i = 1 iid from N( 0 , σ 2 ) . Recoverability of β ∗ depends on signal-to-noise ratio : SNR := ∥ β ∗ ∥ 2 . σ 2 Classical setting (where π ∗ is known) : Just need SNR ≳ d/n to approximately recover β ∗ .
1 and 2 can improve with High-level intuition 2 . unknown : distinguishability is less clear. 1 1 1 0 if Suppose 1 2 1 0 2 if 2 ( denotes unordered multi-set .) known : distinguishability of 16 is either 1 1 0 0 0 or 2 0 1 0 0 . x ⊤ y 1 ε 1 π ∗ (1) x ⊤ y 2 ε 2 π ∗ (2) β ∗ = + . . . . . . . . . x ⊤ y n ε n π ∗ ( n )
1 and 2 can improve with if . unknown : distinguishability is less clear. 1 1 1 0 2 High-level intuition known : distinguishability of 2 1 0 2 if 2 ( denotes unordered multi-set .) 1 16 Suppose β ∗ is either e 1 = ( 1 , 0 , 0 , . . . , 0 ) or e 2 = ( 0 , 1 , 0 , . . . , 0 ) . x ⊤ y 1 ε 1 π ∗ (1) x ⊤ y 2 ε 2 π ∗ (2) β ∗ = + . . . . . . . . . x ⊤ y n ε n π ∗ ( n )
High-level intuition 2 1 1 0 2 if 1 1 unknown : distinguishability is less clear. 0 2 if 2 ( denotes unordered multi-set .) 1 16 Suppose β ∗ is either e 1 = ( 1 , 0 , 0 , . . . , 0 ) or e 2 = ( 0 , 1 , 0 , . . . , 0 ) . x ⊤ y 1 ε 1 π ∗ (1) x ⊤ y 2 ε 2 π ∗ (2) β ∗ = + . . . . . . . . . x ⊤ y n ε n π ∗ ( n ) π ∗ known : distinguishability of e 1 and e 2 can improve with n .
High-level intuition 16 Suppose β ∗ is either e 1 = ( 1 , 0 , 0 , . . . , 0 ) or e 2 = ( 0 , 1 , 0 , . . . , 0 ) . x ⊤ y 1 ε 1 π ∗ (1) x ⊤ y 2 ε 2 π ∗ (2) β ∗ = + . . . . . . . . . x ⊤ y n ε n π ∗ ( n ) π ∗ known : distinguishability of e 1 and e 2 can improve with n . π ∗ unknown : distinguishability is less clear. if β ∗ = e 1 , � x i, 1 � n i = 1 + N( 0 , σ 2 ) � y i � n i = 1 = if β ∗ = e 2 . � x i, 2 � n i = 1 + N( 0 , σ 2 ) ( � · � denotes unordered multi-set .)
Efgect of noise With noise 2 0 17 Without noise ( P = N( 0 , I d ) ) 60 60 50 50 40 40 30 30 20 20 10 10 0 0 -6 -4 -2 0 2 4 6 -6 -4 -2 0 2 4 6 � x i, 1 � n � x i, 2 � n i = 1 i = 1
Efgect of noise With noise 17 Without noise ( P = N( 0 , I d ) ) 60 60 50 50 40 40 30 30 20 20 10 10 0 0 -6 -4 -2 0 2 4 6 -6 -4 -2 0 2 4 6 � x i, 1 � n � x i, 2 � n i = 1 i = 1 60 50 40 30 20 10 0 -6 -4 -2 0 2 4 6 ??? + N( 0 , σ 2 )
18 Theorem (H., Shi, & Sun, 2017) . 1 9 , even as , must have 1 1 Another theorem : for suffjces. “Known correspondence” setting : unless 3 Lower bound on SNR For P = N( 0 , I d ) , no estimator ˆ β can guarantee ≤ ∥ β ∗ ∥ 2 [ ∥ ˆ ] β − β ∗ ∥ 2 E d SNR ≥ C · log log( n ) .
18 3 . 1 9 , even as , must have 1 1 Another theorem : for Theorem (H., Shi, & Sun, 2017) unless Lower bound on SNR For P = N( 0 , I d ) , no estimator ˆ β can guarantee ≤ ∥ β ∗ ∥ 2 [ ∥ ˆ ] β − β ∗ ∥ 2 E d SNR ≥ C · log log( n ) . “Known correspondence” setting : SNR ≳ d/n suffjces.
18 Theorem (H., Shi, & Sun, 2017) unless 3 Lower bound on SNR For P = N( 0 , I d ) , no estimator ˆ β can guarantee ≤ ∥ β ∗ ∥ 2 [ ∥ ˆ ] β − β ∗ ∥ 2 E d SNR ≥ C · log log( n ) . “Known correspondence” setting : SNR ≳ d/n suffjces. Another theorem : for P = Uniform([ − 1 , 1 ] d ) , must have SNR ≥ 1 / 9 , even as n → ∞ .
1 4 . 19 and .) (Recall: our approximate MLE algorithm has running time also permit effjcient algorithms? Does high that is correct w.p. 1 Have estimator for . Estimate sign of correlation between Previous works (Unnikrishnan, Haghighatshoar, & Vetterli, 2015; 1 ): broken random sample (DeGroot and Goel, 1980) . Related ( using Maximum Likelihood Estimation, i.e., least squares. , approximately) (and , then can recover If Pananjady, Wainwright, & Courtade, 2016): High SNR regime
1 4 . 19 Have estimator for .) (Recall: our approximate MLE algorithm has running time also permit effjcient algorithms? Does high that is correct w.p. 1 . Previous works (Unnikrishnan, Haghighatshoar, & Vetterli, 2015; and Estimate sign of correlation between 1 ): broken random sample (DeGroot and Goel, 1980) . Related ( using Maximum Likelihood Estimation, i.e., least squares. Pananjady, Wainwright, & Courtade, 2016): High SNR regime If SNR ≫ poly( n ) , then can recover π ∗ (and β ∗ , approximately)
Previous works (Unnikrishnan, Haghighatshoar, & Vetterli, 2015; Pananjady, Wainwright, & Courtade, 2016): using Maximum Likelihood Estimation, i.e., least squares. Does high also permit effjcient algorithms? (Recall: our approximate MLE algorithm has running time .) 19 High SNR regime If SNR ≫ poly( n ) , then can recover π ∗ (and β ∗ , approximately) Related ( d = 1 ): broken random sample (DeGroot and Goel, 1980) . Estimate sign of correlation between x i and y i . Have estimator for sign( β ∗ ) that is correct w.p. 1 − ˜ O ( SNR − 1 / 4 ) .
Previous works (Unnikrishnan, Haghighatshoar, & Vetterli, 2015; Pananjady, Wainwright, & Courtade, 2016): using Maximum Likelihood Estimation, i.e., least squares. 19 High SNR regime If SNR ≫ poly( n ) , then can recover π ∗ (and β ∗ , approximately) Related ( d = 1 ): broken random sample (DeGroot and Goel, 1980) . Estimate sign of correlation between x i and y i . Have estimator for sign( β ∗ ) that is correct w.p. 1 − ˜ O ( SNR − 1 / 4 ) . Does high SNR also permit effjcient algorithms? (Recall: our approximate MLE algorithm has running time n O ( d ) .)
Average-case recovery with very high SNR
20 Also assume with high probability. suffjces to recover Claim : ). 1 (i.e., 1 We’ll assume (a.s.). gives exact recovery of , then recovery of 1 If 0 . 0 Noise-free setting ( SNR = ∞ ) x ⊤ y 0 π ∗ (0) x ⊤ y 1 π ∗ (1) β ∗ = . . . . . . x ⊤ y n π ∗ ( n ) Assume ( x i ) n i = 0 iid from N( 0 , I d ) .
20 If with high probability. suffjces to recover Claim : ). 1 (i.e., 1 We’ll assume (a.s.). gives exact recovery of , then recovery of 1 Noise-free setting ( SNR = ∞ ) x ⊤ y 0 0 x ⊤ y 1 π ∗ (1) β ∗ = . . . . . . x ⊤ y n π ∗ ( n ) Assume ( x i ) n i = 0 iid from N( 0 , I d ) . Also assume π ∗ ( 0 ) = 0 .
20 We’ll assume with high probability. suffjces to recover Claim : ). 1 (i.e., 1 Noise-free setting ( SNR = ∞ ) x ⊤ y 0 0 x ⊤ y 1 π ∗ (1) β ∗ = . . . . . . x ⊤ y n π ∗ ( n ) Assume ( x i ) n i = 0 iid from N( 0 , I d ) . Also assume π ∗ ( 0 ) = 0 . If n + 1 ≥ d , then recovery of π ∗ gives exact recovery of β ∗ (a.s.).
20 Claim : with high probability. suffjces to recover Noise-free setting ( SNR = ∞ ) x ⊤ y 0 0 x ⊤ y 1 π ∗ (1) β ∗ = . . . . . . x ⊤ y n π ∗ ( n ) Assume ( x i ) n i = 0 iid from N( 0 , I d ) . Also assume π ∗ ( 0 ) = 0 . If n + 1 ≥ d , then recovery of π ∗ gives exact recovery of β ∗ (a.s.). We’ll assume n + 1 ≥ d + 1 (i.e., n ≥ d ).
20 Noise-free setting ( SNR = ∞ ) x ⊤ y 0 0 x ⊤ y 1 π ∗ (1) β ∗ = . . . . . . x ⊤ y n π ∗ ( n ) Assume ( x i ) n i = 0 iid from N( 0 , I d ) . Also assume π ∗ ( 0 ) = 0 . If n + 1 ≥ d , then recovery of π ∗ gives exact recovery of β ∗ (a.s.). We’ll assume n + 1 ≥ d + 1 (i.e., n ≥ d ). Claim : n ≥ d suffjces to recover π ∗ with high probability.
Result on exact recovery Theorem (H., Shi, & Sun, 2017) 21 In the noise-free setting, there is a poly( n, d ) -time ⋆ algorithm that returns π ∗ and β ∗ with high probability. ⋆ Assuming problem is appropriately discretized.
Main idea: hidden subset , so 0 1 0 0 We also know: 1 1 1 Measurements: for , and For simplicity : assume 22 0 β ∗ ; π ∗ ( i ) β ∗ , y 0 = x ⊤ y i = x ⊤ i = 1 , . . . , n .
Main idea: hidden subset Measurements: 0 1 0 0 We also know: 22 0 β ∗ ; π ∗ ( i ) β ∗ , y 0 = x ⊤ y i = x ⊤ i = 1 , . . . , n . For simplicity : assume n = d , and x i = e i for i = 1 , . . . , d , so � y 1 , . . . , y d � = � β ∗ 1 , . . . , β ∗ d � .
Main idea: hidden subset Measurements: We also know: 22 0 β ∗ ; π ∗ ( i ) β ∗ , y 0 = x ⊤ y i = x ⊤ i = 1 , . . . , n . For simplicity : assume n = d , and x i = e i for i = 1 , . . . , d , so � y 1 , . . . , y d � = � β ∗ 1 , . . . , β ∗ d � . d 0 β ∗ = ∑ x 0 ,j β ∗ y 0 = x ⊤ j . j = 1
2 “source” numbers Reduction to Subset Sum • Subset Sum problem. 0 . adds up to The subset 0 . , “target” sum 0 23 x 0 , 1 y 1 d x 0 , 2 0 β ∗ = y 2 ∑ x 0 ,j β ∗ y 0 = x ⊤ j j = 1 . . . . d d . . ∑ ∑ x 0 ,j y i · 1 { π ∗ ( i ) = j } = x 0 ,n i = 1 j = 1 y n
2 “source” numbers Reduction to Subset Sum • Subset Sum problem. 0 . adds up to The subset 0 . , “target” sum 0 23 x 0 , 1 ??? y 1 d x 0 , 2 0 β ∗ = y 2 ∑ x 0 ,j β ∗ y 0 = x ⊤ j j = 1 . . . . . . d d ∑ ∑ x 0 ,j y i · 1 { π ∗ ( i ) = j } = x 0 ,n i = 1 j = 1 y n
Reduction to Subset Sum Subset Sum problem. 23 x 0 , 1 ??? y 1 d x 0 , 2 0 β ∗ = y 2 ∑ x 0 ,j β ∗ y 0 = x ⊤ j j = 1 . . . . . . d d ∑ ∑ x 0 ,j y i · 1 { π ∗ ( i ) = j } = x 0 ,n i = 1 j = 1 y n • d 2 “source” numbers c i,j := x 0 ,j y i , “target” sum y 0 . The subset { c i,j : π ∗ ( i ) = j } adds up to y 0 .
Reduction to Subset Sum Subset Sum problem. 23 x 0 , 1 ??? y 1 d x 0 , 2 0 β ∗ = y 2 ∑ x 0 ,j β ∗ y 0 = x ⊤ j j = 1 . . . . . . d d ∑ ∑ x 0 ,j y i · 1 { π ∗ ( i ) = j } = x 0 ,n i = 1 j = 1 y n • d 2 “source” numbers c i,j := x 0 ,j y i , “target” sum y 0 . The subset { c i,j : π ∗ ( i ) = j } adds up to y 0 .
NP-Completeness of Subset Sum (a.k.a. “Knapsack”) (Karp, 1972) 24
Easiness of Subset Sum • But Subset Sum is only “weakly” NP-hard (effjcient algorithm exists for unary-encoded inputs). • Lagarias & Odlyzko (1983) : solving certain random instances can be reduced to solving Approximate Shortest Vector Problem in lattices. • Lenstra, Lenstra, & Lovász (1982) : effjcient algorithm to solve Approximate SVP. • Our algorithm is based on similar reduction but requires a somewhat difgerent analysis. 25
Easiness of Subset Sum • But Subset Sum is only “weakly” NP-hard (effjcient algorithm exists for unary-encoded inputs). • Lagarias & Odlyzko (1983) : solving certain random instances can be reduced to solving Approximate Shortest Vector Problem in lattices. • Lenstra, Lenstra, & Lovász (1982) : effjcient algorithm to solve Approximate SVP. • Our algorithm is based on similar reduction but requires a somewhat difgerent analysis. 25
Easiness of Subset Sum • But Subset Sum is only “weakly” NP-hard (effjcient algorithm exists for unary-encoded inputs). • Lagarias & Odlyzko (1983) : solving certain random instances can be reduced to solving Approximate Shortest Vector Problem in lattices. • Lenstra, Lenstra, & Lovász (1982) : effjcient algorithm to solve Approximate SVP. • Our algorithm is based on similar reduction but requires a somewhat difgerent analysis. 25
Easiness of Subset Sum • But Subset Sum is only “weakly” NP-hard (effjcient algorithm exists for unary-encoded inputs). • Lagarias & Odlyzko (1983) : solving certain random instances can be reduced to solving Approximate Shortest Vector Problem in lattices. • Lenstra, Lenstra, & Lovász (1982) : effjcient algorithm to solve Approximate SVP. • Our algorithm is based on similar reduction but requires a somewhat difgerent analysis. 25
1 such that • correct subset of basis vectors gives short lattice vector • any other lattice vector Reducing subset sum to shortest vector problem 0 . for suffjciently large 1 0 1 0 2 -times longer. ; is more than 2 Lagarias & Odlyzko (1983) : random instances of Subset Sum Reduction : construct lattice basis in by noticeable amount. sum Main idea : (w.h.p.) every incorrect subset will “miss” the target 26 effjciently solvable when N source numbers c 1 , . . . , c N chosen independently and u.a.r. from suffjciently wide interval of Z .
1 such that • correct subset of basis vectors gives short lattice vector • any other lattice vector 2 -times longer. 0 . for suffjciently large 1 0 1 0 Reducing subset sum to shortest vector problem is more than 2 Lagarias & Odlyzko (1983) : random instances of Subset Sum ; Reduction : construct lattice basis in Main idea : (w.h.p.) every incorrect subset will “miss” the target 26 effjciently solvable when N source numbers c 1 , . . . , c N chosen independently and u.a.r. from suffjciently wide interval of Z . sum T by noticeable amount.
Reducing subset sum to shortest vector problem Main idea : (w.h.p.) every incorrect subset will “miss” the target Lagarias & Odlyzko (1983) : random instances of Subset Sum 26 effjciently solvable when N source numbers c 1 , . . . , c N chosen independently and u.a.r. from suffjciently wide interval of Z . sum T by noticeable amount. Reduction : construct lattice basis in R N + 1 such that • correct subset of basis vectors gives short lattice vector v ⋆ ; • any other lattice vector ̸∝ v ⋆ is more than 2 N/ 2 -times longer. I N [ ] := · · · 0 b 0 b 1 b N MT − Mc 1 · · · − Mc N for suffjciently large M > 0 .
integer multiple of permutation matrix corresponding to Our random subset sum instance that is not an 2 2 1 0 , Key lemma : (w.h.p.) for every Gaussian anti-concentration for quadratic and quartic forms. • To show that Lagarias & Odlyzko reduction still works, use 0 1 . • Instead, have some joint density derived from 27 Catch : Our source numbers c i,j = y i x ⊤ j x 0 are not independent , and not uniformly distributed on some wide interval of Z .
integer multiple of permutation matrix corresponding to Our random subset sum instance • To show that Lagarias & Odlyzko reduction still works, use Gaussian anti-concentration for quadratic and quartic forms. Key lemma : (w.h.p.) for every that is not an , 0 1 2 2 27 Catch : Our source numbers c i,j = y i x ⊤ j x 0 are not independent , and not uniformly distributed on some wide interval of Z . • Instead, have some joint density derived from N( 0 , 1 ) .
integer multiple of permutation matrix corresponding to Our random subset sum instance • To show that Lagarias & Odlyzko reduction still works, use Gaussian anti-concentration for quadratic and quartic forms. Key lemma : (w.h.p.) for every that is not an , 0 1 2 2 27 Catch : Our source numbers c i,j = y i x ⊤ j x 0 are not independent , and not uniformly distributed on some wide interval of Z . • Instead, have some joint density derived from N( 0 , 1 ) .
Our random subset sum instance • To show that Lagarias & Odlyzko reduction still works, use 1 Gaussian anti-concentration for quadratic and quartic forms. 27 Catch : Our source numbers c i,j = y i x ⊤ j x 0 are not independent , and not uniformly distributed on some wide interval of Z . • Instead, have some joint density derived from N( 0 , 1 ) . Key lemma : (w.h.p.) for every Z ∈ Z d × d that is not an integer multiple of permutation matrix corresponding to π ∗ , � � � � ∑ 2 poly( d ) · ∥ β ∗ ∥ 2 . � � y 0 − Z i,j · c i,j ≥ � � � � i,j � �
measurements. Unlikely to tolerate much noise . Some remarks works via Moore-Penrose pseudoinverse. • Algorithm strongly exploits assumption of noise-free Open problem : robust effjcient algorithm in high setting. 28 • In general, x 1 , . . . , x n are not e 1 , . . . , e d , but similar reduction
Some remarks works via Moore-Penrose pseudoinverse. • Algorithm strongly exploits assumption of noise-free measurements. Unlikely to tolerate much noise . Open problem : 28 • In general, x 1 , . . . , x n are not e 1 , . . . , e d , but similar reduction robust effjcient algorithm in high SNR setting.
Correspondence retrieval
2 . Correspondence retrieval problem • Correspondence across measurements is lost. 0 iid from • as unordered multi-set; 1 1 ; 0 iid from • , where 1 for Measurements : 29 Goal : recover k unknown “signals” β ∗ 1 , . . . , β ∗ k ∈ R d .
Correspondence across measurements is lost. Correspondence retrieval problem 29 Goal : recover k unknown “signals” β ∗ 1 , . . . , β ∗ k ∈ R d . Measurements : ( x i , Y i ) for i = 1 , . . . , n , where • ( x i ) iid from N( 0 , I d ) ; i β ∗ i β ∗ • Y i = � x ⊤ 1 + ε i, 1 , . . . , x ⊤ k + ε i,k � as unordered multi-set; • ( ε i,j ) iid from N( 0 , σ 2 ) .
Correspondence retrieval problem Correspondence across measurements is lost. 29 Goal : recover k unknown “signals” β ∗ 1 , . . . , β ∗ k ∈ R d . Measurements : ( x i , Y i ) for i = 1 , . . . , n , where • ( x i ) iid from N( 0 , I d ) ; i β ∗ i β ∗ • Y i = � x ⊤ 1 + ε i, 1 , . . . , x ⊤ k + ε i,k � as unordered multi-set; • ( ε i,j ) iid from N( 0 , σ 2 ) . β ∗ β ∗ 2 3 x i β ∗ 1
Correspondence retrieval problem Correspondence across measurements is lost. 29 Goal : recover k unknown “signals” β ∗ 1 , . . . , β ∗ k ∈ R d . Measurements : ( x i , Y i ) for i = 1 , . . . , n , where • ( x i ) iid from N( 0 , I d ) ; i β ∗ i β ∗ • Y i = � x ⊤ 1 + ε i, 1 , . . . , x ⊤ k + ε i,k � as unordered multi-set; • ( ε i,j ) iid from N( 0 , σ 2 ) . β ∗ β ∗ 2 3 x i β ∗ 1
Special cases • 2 and 1 2 : (real variant of) phase retrieval . Note that has same information as . Existing methods require 2 . 30 • k = 1 : classical linear regression regression model.
Special cases 2 : (real variant of) phase retrieval . 30 • k = 1 : classical linear regression regression model. • k = 2 and β ∗ 1 = − β ∗ i β ∗ , − x ⊤ i β ∗ � has same information as | x ⊤ i β ∗ | . Note that � x ⊤ Existing methods require n > 2 d .
Algorithmic results (Andoni, H., Shi, & Sun, 2017) Algorithm based on reduction to Subset Sum that requires • General setting : Method-of-moments algorithm that requires . I.e., based on forming averages over the data, like: 1 1 2 Questions : SNR limits? Sub-optimality of “method-of-moments”? 31 • Noise-free setting (i.e., σ = 0 ): n ≥ d + 1 , which is optimal.
Recommend
More recommend