learning without correspondence
play

Learning without correspondence Daniel Hsu Computer Science - PowerPoint PPT Presentation

Learning without correspondence Daniel Hsu Computer Science Department & Data Science Institute Columbia University Introduction Example #1: unlinked data sources Two separate data sources about same entities: Record linkage unknown.


  1. Least squares problem (Observed by Pananjady, Wainwright, & Courtade, 2016.) Reduction from 3 -PARTITION (H., Shi, & Sun, 2017) . 9 Given ( x i ) n i = 1 from R d and ( y i ) n i = 1 from R , minimize n ) 2 . ( ∑ F ( β , π ) := x ⊤ i β − y π ( i ) i = 1 • d = 1 : O ( n log n ) -time algorithm. • d = Ω( n ) : (strongly) NP-hard to decide if min F = 0 . Naïve brute-force search : Ω( | S n | ) = Ω( n !) . Least squares with known correspondence : O ( nd 2 ) time.

  2. 10 25 5 22 2 25 5 20 2 Least squares problem ( d = 1 ) Given ( x i ) n i = 1 and ( y i ) n i = 1 from R , minimize n ) 2 . ( ∑ F ( β, π ) := x i β − y π ( i ) i = 1 y 1 x 1 y 2 x 2 . . . . . . y n x n

  3. 10 25 5 22 2 25 5 20 2 Least squares problem ( d = 1 ) Given ( x i ) n i = 1 and ( y i ) n i = 1 from R , minimize n ) 2 . ( ∑ F ( β, π ) := x i β − y π ( i ) i = 1 2 � x 1 y 1 � β − 2 � x 2 y 2 � β − . . . . . . 2 � x n y n � β − Cost with π ( i ) = i for all i = 1 , . . . , n .

  4. 10 25 5 22 2 25 5 20 2 Least squares problem ( d = 1 ) Given ( x i ) n i = 1 and ( y i ) n i = 1 from R , minimize n ) 2 . ( ∑ F ( β, π ) := x i β − y π ( i ) i = 1 2 � � β 3 2 − 2 � � β 4 1 − . . . . . . 2 � � β 6 7 − Cost with π ( i ) = i for all i = 1 , . . . , n .

  5. 10 25 5 22 2 25 5 20 2 Least squares problem ( d = 1 ) Given ( x i ) n i = 1 and ( y i ) n i = 1 from R , minimize n ) 2 . ( ∑ F ( β, π ) := x i β − y π ( i ) i = 1 2 � � β 3 2 − 2 � � β 4 1 − . . . . . . 2 � � β 6 7 − If β > 0 , then can improve cost with π ( 1 ) = 2 and π ( 2 ) = 1 .

  6. 10 Least squares problem ( d = 1 ) Given ( x i ) n i = 1 and ( y i ) n i = 1 from R , minimize n ) 2 . ( ∑ F ( β, π ) := x i β − y π ( i ) i = 1 2 2 � � � � β β 3 2 3 1 − − 2 2 � � � � β β 4 1 4 2 − − > . . . . . . . . . . . . 2 2 � � � � β β 6 7 6 7 − − If β > 0 , then can improve cost with π ( 1 ) = 2 and π ( 2 ) = 1 . 25 β 2 − 20 β + 5 + · · · > 25 β 2 − 22 β + 5 + · · ·

  7. 11 3. Solve classical least squares problem 1 ? What about . Overall running time : . to get optimal 2 1 (via sorting). 2 1 such that fjnd optimal , 2 1 2. Assuming WLOG that Algorithm for least squares problem ( d = 1 ) [PWC’16] 1. “Guess” sign of optimal β . (Only two possibilities.)

  8. (via sorting). 3. Solve classical least squares problem 1 2 to get optimal . Overall running time : . What about 1 ? 11 Algorithm for least squares problem ( d = 1 ) [PWC’16] 1. “Guess” sign of optimal β . (Only two possibilities.) 2. Assuming WLOG that x 1 β ≤ x 2 β · · · ≤ x n β , fjnd optimal π such that y π ( 1 ) ≤ y π ( 2 ) ≤ · · · ≤ y π ( n )

  9. 11 Overall running time : 1 ? What about (via sorting). 3. Solve classical least squares problem . Algorithm for least squares problem ( d = 1 ) [PWC’16] 1. “Guess” sign of optimal β . (Only two possibilities.) 2. Assuming WLOG that x 1 β ≤ x 2 β · · · ≤ x n β , fjnd optimal π such that y π ( 1 ) ≤ y π ( 2 ) ≤ · · · ≤ y π ( n ) n ∑ min ( x i β − y π ( i ) ) 2 β ∈ R i = 1 to get optimal β .

  10. 11 3. Solve classical least squares problem 1 ? What about (via sorting). Algorithm for least squares problem ( d = 1 ) [PWC’16] 1. “Guess” sign of optimal β . (Only two possibilities.) 2. Assuming WLOG that x 1 β ≤ x 2 β · · · ≤ x n β , fjnd optimal π such that y π ( 1 ) ≤ y π ( 2 ) ≤ · · · ≤ y π ( n ) n ∑ min ( x i β − y π ( i ) ) 2 β ∈ R i = 1 to get optimal β . Overall running time : O ( n log n ) .

  11. (via sorting). 3. Solve classical least squares problem 11 Algorithm for least squares problem ( d = 1 ) [PWC’16] 1. “Guess” sign of optimal β . (Only two possibilities.) 2. Assuming WLOG that x 1 β ≤ x 2 β · · · ≤ x n β , fjnd optimal π such that y π ( 1 ) ≤ y π ( 2 ) ≤ · · · ≤ y π ( n ) n ∑ min ( x i β − y π ( i ) ) 2 β ∈ R i = 1 to get optimal β . Overall running time : O ( n log n ) . What about d > 1 ?

  12. Alternating minimization • Each loop-iteration effjciently computable. ( Open : How many restarts? How many iterations?) . So try many initial • But can get stuck in local minima. 12 Loop until convergence: Pick initial ˆ β ∈ R d (e.g., randomly). n ) 2 . ( i ˆ ∑ x ⊤ π ← arg min ˆ β − y π ( i ) π ∈ S n i = 1 n ) 2 . ( ˆ ∑ x ⊤ β ← arg min i β − y ˆ π ( i ) β ∈ R d i = 1

  13. Alternating minimization • Each loop-iteration effjciently computable. ( Open : How many restarts? How many iterations?) . So try many initial • But can get stuck in local minima. 12 Loop until convergence: Pick initial ˆ β ∈ R d (e.g., randomly). n ) 2 . ( i ˆ ∑ x ⊤ π ← arg min ˆ β − y π ( i ) π ∈ S n i = 1 n ) 2 . ( ˆ ∑ x ⊤ β ← arg min i β − y ˆ π ( i ) β ∈ R d i = 1

  14. Alternating minimization • Each loop-iteration effjciently computable. ( Open : How many restarts? How many iterations?) . So try many initial • But can get stuck in local minima. 12 Loop until convergence: Pick initial ˆ β ∈ R d (e.g., randomly). n ) 2 . ( i ˆ ∑ x ⊤ π ← arg min ˆ β − y π ( i ) π ∈ S n i = 1 n ) 2 . ( ˆ ∑ x ⊤ β ← arg min i β − y ˆ π ( i ) β ∈ R d i = 1

  15. Alternating minimization (Image credit: Wolfram|Alpha ) • Each loop-iteration effjciently computable. • But can get stuck in local minima. So try many initial . ( Open : How many restarts? How many iterations?) 12

  16. Alternating minimization (Image credit: Wolfram|Alpha ) • Each loop-iteration effjciently computable. ( Open : How many restarts? How many iterations?) 12 • But can get stuck in local minima. So try many initial ˆ β ∈ R d .

  17. Approximation result Theorem (H., Shi, & Sun, 2017) least squares problem in time Recall : Brute-force solution needs time. (No other previous algorithm with approximation guarantee.) 13 There is an algorithm that given any inputs ( x i ) n i = 1 , ( y i ) n i = 1 , and ϵ ∈ ( 0 , 1 ) , returns a ( 1 + ϵ ) -approximate solution to the ) O ( d ) ( n + poly( n, d ) . ϵ

  18. Approximation result Theorem (H., Shi, & Sun, 2017) least squares problem in time (No other previous algorithm with approximation guarantee.) 13 There is an algorithm that given any inputs ( x i ) n i = 1 , ( y i ) n i = 1 , and ϵ ∈ ( 0 , 1 ) , returns a ( 1 + ϵ ) -approximate solution to the ) O ( d ) ( n + poly( n, d ) . ϵ Recall : Brute-force solution needs Ω( n !) time.

  19. algorithms and lower-bounds Statistical recovery of β ∗ :

  20. Motivation Approach : Study question in context of statistical model for data. 1. Understand information-theoretic limits on recovering truth. 2. Natural “average-case” setting for algorithms. 14 When does the best fjt model shed light on the “truth” ( π ∗ & β ∗ )?

  21. Motivation Approach : Study question in context of statistical model for data. 1. Understand information-theoretic limits on recovering truth. 2. Natural “average-case” setting for algorithms. 14 When does the best fjt model shed light on the “truth” ( π ∗ & β ∗ )?

  22. Motivation Approach : Study question in context of statistical model for data. 1. Understand information-theoretic limits on recovering truth. 2. Natural “average-case” setting for algorithms. 14 When does the best fjt model shed light on the “truth” ( π ∗ & β ∗ )?

  23. Statistical model Recoverability of . to approximately recover Just need is known) : Classical setting (where 2 2 2 depends on signal-to-noise ratio : 15 x ⊤ y 1 ε 1 π ∗ (1) x ⊤ y 2 ε 2 π ∗ (2) β ∗ = + . . . . . . . . . x ⊤ y n ε n π ∗ ( n ) Assume ( x i ) n i = 1 iid from P and ( ε i ) n i = 1 iid from N( 0 , σ 2 ) .

  24. Statistical model 2 . to approximately recover Just need is known) : Classical setting (where 15 x ⊤ y 1 ε 1 π ∗ (1) x ⊤ y 2 ε 2 π ∗ (2) β ∗ = + . . . . . . . . . x ⊤ y n ε n π ∗ ( n ) Assume ( x i ) n i = 1 iid from P and ( ε i ) n i = 1 iid from N( 0 , σ 2 ) . Recoverability of β ∗ depends on signal-to-noise ratio : SNR := ∥ β ∗ ∥ 2 . σ 2

  25. Statistical model 2 15 x ⊤ y 1 ε 1 π ∗ (1) x ⊤ y 2 ε 2 π ∗ (2) β ∗ = + . . . . . . . . . x ⊤ y n ε n π ∗ ( n ) Assume ( x i ) n i = 1 iid from P and ( ε i ) n i = 1 iid from N( 0 , σ 2 ) . Recoverability of β ∗ depends on signal-to-noise ratio : SNR := ∥ β ∗ ∥ 2 . σ 2 Classical setting (where π ∗ is known) : Just need SNR ≳ d/n to approximately recover β ∗ .

  26. 1 and 2 can improve with High-level intuition 2 . unknown : distinguishability is less clear. 1 1 1 0 if Suppose 1 2 1 0 2 if 2 ( denotes unordered multi-set .) known : distinguishability of 16 is either 1 1 0 0 0 or 2 0 1 0 0 . x ⊤ y 1 ε 1 π ∗ (1) x ⊤ y 2 ε 2 π ∗ (2) β ∗ = + . . . . . . . . . x ⊤ y n ε n π ∗ ( n )

  27. 1 and 2 can improve with if . unknown : distinguishability is less clear. 1 1 1 0 2 High-level intuition known : distinguishability of 2 1 0 2 if 2 ( denotes unordered multi-set .) 1 16 Suppose β ∗ is either e 1 = ( 1 , 0 , 0 , . . . , 0 ) or e 2 = ( 0 , 1 , 0 , . . . , 0 ) . x ⊤ y 1 ε 1 π ∗ (1) x ⊤ y 2 ε 2 π ∗ (2) β ∗ = + . . . . . . . . . x ⊤ y n ε n π ∗ ( n )

  28. High-level intuition 2 1 1 0 2 if 1 1 unknown : distinguishability is less clear. 0 2 if 2 ( denotes unordered multi-set .) 1 16 Suppose β ∗ is either e 1 = ( 1 , 0 , 0 , . . . , 0 ) or e 2 = ( 0 , 1 , 0 , . . . , 0 ) . x ⊤ y 1 ε 1 π ∗ (1) x ⊤ y 2 ε 2 π ∗ (2) β ∗ = + . . . . . . . . . x ⊤ y n ε n π ∗ ( n ) π ∗ known : distinguishability of e 1 and e 2 can improve with n .

  29. High-level intuition 16 Suppose β ∗ is either e 1 = ( 1 , 0 , 0 , . . . , 0 ) or e 2 = ( 0 , 1 , 0 , . . . , 0 ) . x ⊤ y 1 ε 1 π ∗ (1) x ⊤ y 2 ε 2 π ∗ (2) β ∗ = + . . . . . . . . . x ⊤ y n ε n π ∗ ( n ) π ∗ known : distinguishability of e 1 and e 2 can improve with n . π ∗ unknown : distinguishability is less clear.  if β ∗ = e 1 , � x i, 1 � n i = 1 + N( 0 , σ 2 )   � y i � n i = 1 = if β ∗ = e 2 . � x i, 2 � n i = 1 + N( 0 , σ 2 )   ( � · � denotes unordered multi-set .)

  30. Efgect of noise With noise 2 0 17 Without noise ( P = N( 0 , I d ) ) 60 60 50 50 40 40 30 30 20 20 10 10 0 0 -6 -4 -2 0 2 4 6 -6 -4 -2 0 2 4 6 � x i, 1 � n � x i, 2 � n i = 1 i = 1

  31. Efgect of noise With noise 17 Without noise ( P = N( 0 , I d ) ) 60 60 50 50 40 40 30 30 20 20 10 10 0 0 -6 -4 -2 0 2 4 6 -6 -4 -2 0 2 4 6 � x i, 1 � n � x i, 2 � n i = 1 i = 1 60 50 40 30 20 10 0 -6 -4 -2 0 2 4 6 ??? + N( 0 , σ 2 )

  32. 18 Theorem (H., Shi, & Sun, 2017) . 1 9 , even as , must have 1 1 Another theorem : for suffjces. “Known correspondence” setting : unless 3 Lower bound on SNR For P = N( 0 , I d ) , no estimator ˆ β can guarantee ≤ ∥ β ∗ ∥ 2 [ ∥ ˆ ] β − β ∗ ∥ 2 E d SNR ≥ C · log log( n ) .

  33. 18 3 . 1 9 , even as , must have 1 1 Another theorem : for Theorem (H., Shi, & Sun, 2017) unless Lower bound on SNR For P = N( 0 , I d ) , no estimator ˆ β can guarantee ≤ ∥ β ∗ ∥ 2 [ ∥ ˆ ] β − β ∗ ∥ 2 E d SNR ≥ C · log log( n ) . “Known correspondence” setting : SNR ≳ d/n suffjces.

  34. 18 Theorem (H., Shi, & Sun, 2017) unless 3 Lower bound on SNR For P = N( 0 , I d ) , no estimator ˆ β can guarantee ≤ ∥ β ∗ ∥ 2 [ ∥ ˆ ] β − β ∗ ∥ 2 E d SNR ≥ C · log log( n ) . “Known correspondence” setting : SNR ≳ d/n suffjces. Another theorem : for P = Uniform([ − 1 , 1 ] d ) , must have SNR ≥ 1 / 9 , even as n → ∞ .

  35. 1 4 . 19 and .) (Recall: our approximate MLE algorithm has running time also permit effjcient algorithms? Does high that is correct w.p. 1 Have estimator for . Estimate sign of correlation between Previous works (Unnikrishnan, Haghighatshoar, & Vetterli, 2015; 1 ): broken random sample (DeGroot and Goel, 1980) . Related ( using Maximum Likelihood Estimation, i.e., least squares. , approximately) (and , then can recover If Pananjady, Wainwright, & Courtade, 2016): High SNR regime

  36. 1 4 . 19 Have estimator for .) (Recall: our approximate MLE algorithm has running time also permit effjcient algorithms? Does high that is correct w.p. 1 . Previous works (Unnikrishnan, Haghighatshoar, & Vetterli, 2015; and Estimate sign of correlation between 1 ): broken random sample (DeGroot and Goel, 1980) . Related ( using Maximum Likelihood Estimation, i.e., least squares. Pananjady, Wainwright, & Courtade, 2016): High SNR regime If SNR ≫ poly( n ) , then can recover π ∗ (and β ∗ , approximately)

  37. Previous works (Unnikrishnan, Haghighatshoar, & Vetterli, 2015; Pananjady, Wainwright, & Courtade, 2016): using Maximum Likelihood Estimation, i.e., least squares. Does high also permit effjcient algorithms? (Recall: our approximate MLE algorithm has running time .) 19 High SNR regime If SNR ≫ poly( n ) , then can recover π ∗ (and β ∗ , approximately) Related ( d = 1 ): broken random sample (DeGroot and Goel, 1980) . Estimate sign of correlation between x i and y i . Have estimator for sign( β ∗ ) that is correct w.p. 1 − ˜ O ( SNR − 1 / 4 ) .

  38. Previous works (Unnikrishnan, Haghighatshoar, & Vetterli, 2015; Pananjady, Wainwright, & Courtade, 2016): using Maximum Likelihood Estimation, i.e., least squares. 19 High SNR regime If SNR ≫ poly( n ) , then can recover π ∗ (and β ∗ , approximately) Related ( d = 1 ): broken random sample (DeGroot and Goel, 1980) . Estimate sign of correlation between x i and y i . Have estimator for sign( β ∗ ) that is correct w.p. 1 − ˜ O ( SNR − 1 / 4 ) . Does high SNR also permit effjcient algorithms? (Recall: our approximate MLE algorithm has running time n O ( d ) .)

  39. Average-case recovery with very high SNR

  40. 20 Also assume with high probability. suffjces to recover Claim : ). 1 (i.e., 1 We’ll assume (a.s.). gives exact recovery of , then recovery of 1 If 0 . 0 Noise-free setting ( SNR = ∞ ) x ⊤ y 0 π ∗ (0) x ⊤ y 1 π ∗ (1) β ∗ = . . . . . . x ⊤ y n π ∗ ( n ) Assume ( x i ) n i = 0 iid from N( 0 , I d ) .

  41. 20 If with high probability. suffjces to recover Claim : ). 1 (i.e., 1 We’ll assume (a.s.). gives exact recovery of , then recovery of 1 Noise-free setting ( SNR = ∞ ) x ⊤ y 0 0 x ⊤ y 1 π ∗ (1) β ∗ = . . . . . . x ⊤ y n π ∗ ( n ) Assume ( x i ) n i = 0 iid from N( 0 , I d ) . Also assume π ∗ ( 0 ) = 0 .

  42. 20 We’ll assume with high probability. suffjces to recover Claim : ). 1 (i.e., 1 Noise-free setting ( SNR = ∞ ) x ⊤ y 0 0 x ⊤ y 1 π ∗ (1) β ∗ = . . . . . . x ⊤ y n π ∗ ( n ) Assume ( x i ) n i = 0 iid from N( 0 , I d ) . Also assume π ∗ ( 0 ) = 0 . If n + 1 ≥ d , then recovery of π ∗ gives exact recovery of β ∗ (a.s.).

  43. 20 Claim : with high probability. suffjces to recover Noise-free setting ( SNR = ∞ ) x ⊤ y 0 0 x ⊤ y 1 π ∗ (1) β ∗ = . . . . . . x ⊤ y n π ∗ ( n ) Assume ( x i ) n i = 0 iid from N( 0 , I d ) . Also assume π ∗ ( 0 ) = 0 . If n + 1 ≥ d , then recovery of π ∗ gives exact recovery of β ∗ (a.s.). We’ll assume n + 1 ≥ d + 1 (i.e., n ≥ d ).

  44. 20 Noise-free setting ( SNR = ∞ ) x ⊤ y 0 0 x ⊤ y 1 π ∗ (1) β ∗ = . . . . . . x ⊤ y n π ∗ ( n ) Assume ( x i ) n i = 0 iid from N( 0 , I d ) . Also assume π ∗ ( 0 ) = 0 . If n + 1 ≥ d , then recovery of π ∗ gives exact recovery of β ∗ (a.s.). We’ll assume n + 1 ≥ d + 1 (i.e., n ≥ d ). Claim : n ≥ d suffjces to recover π ∗ with high probability.

  45. Result on exact recovery Theorem (H., Shi, & Sun, 2017) 21 In the noise-free setting, there is a poly( n, d ) -time ⋆ algorithm that returns π ∗ and β ∗ with high probability. ⋆ Assuming problem is appropriately discretized.

  46. Main idea: hidden subset , so 0 1 0 0 We also know: 1 1 1 Measurements: for , and For simplicity : assume 22 0 β ∗ ; π ∗ ( i ) β ∗ , y 0 = x ⊤ y i = x ⊤ i = 1 , . . . , n .

  47. Main idea: hidden subset Measurements: 0 1 0 0 We also know: 22 0 β ∗ ; π ∗ ( i ) β ∗ , y 0 = x ⊤ y i = x ⊤ i = 1 , . . . , n . For simplicity : assume n = d , and x i = e i for i = 1 , . . . , d , so � y 1 , . . . , y d � = � β ∗ 1 , . . . , β ∗ d � .

  48. Main idea: hidden subset Measurements: We also know: 22 0 β ∗ ; π ∗ ( i ) β ∗ , y 0 = x ⊤ y i = x ⊤ i = 1 , . . . , n . For simplicity : assume n = d , and x i = e i for i = 1 , . . . , d , so � y 1 , . . . , y d � = � β ∗ 1 , . . . , β ∗ d � . d 0 β ∗ = ∑ x 0 ,j β ∗ y 0 = x ⊤ j . j = 1

  49. 2 “source” numbers Reduction to Subset Sum • Subset Sum problem. 0 . adds up to The subset 0 . , “target” sum 0 23 x 0 , 1 y 1 d x 0 , 2 0 β ∗ = y 2 ∑ x 0 ,j β ∗ y 0 = x ⊤ j j = 1 . . . . d d . . ∑ ∑ x 0 ,j y i · 1 { π ∗ ( i ) = j } = x 0 ,n i = 1 j = 1 y n

  50. 2 “source” numbers Reduction to Subset Sum • Subset Sum problem. 0 . adds up to The subset 0 . , “target” sum 0 23 x 0 , 1 ??? y 1 d x 0 , 2 0 β ∗ = y 2 ∑ x 0 ,j β ∗ y 0 = x ⊤ j j = 1 . . . . . . d d ∑ ∑ x 0 ,j y i · 1 { π ∗ ( i ) = j } = x 0 ,n i = 1 j = 1 y n

  51. Reduction to Subset Sum Subset Sum problem. 23 x 0 , 1 ??? y 1 d x 0 , 2 0 β ∗ = y 2 ∑ x 0 ,j β ∗ y 0 = x ⊤ j j = 1 . . . . . . d d ∑ ∑ x 0 ,j y i · 1 { π ∗ ( i ) = j } = x 0 ,n i = 1 j = 1 y n • d 2 “source” numbers c i,j := x 0 ,j y i , “target” sum y 0 . The subset { c i,j : π ∗ ( i ) = j } adds up to y 0 .

  52. Reduction to Subset Sum Subset Sum problem. 23 x 0 , 1 ??? y 1 d x 0 , 2 0 β ∗ = y 2 ∑ x 0 ,j β ∗ y 0 = x ⊤ j j = 1 . . . . . . d d ∑ ∑ x 0 ,j y i · 1 { π ∗ ( i ) = j } = x 0 ,n i = 1 j = 1 y n • d 2 “source” numbers c i,j := x 0 ,j y i , “target” sum y 0 . The subset { c i,j : π ∗ ( i ) = j } adds up to y 0 .

  53. NP-Completeness of Subset Sum (a.k.a. “Knapsack”) (Karp, 1972) 24

  54. Easiness of Subset Sum • But Subset Sum is only “weakly” NP-hard (effjcient algorithm exists for unary-encoded inputs). • Lagarias & Odlyzko (1983) : solving certain random instances can be reduced to solving Approximate Shortest Vector Problem in lattices. • Lenstra, Lenstra, & Lovász (1982) : effjcient algorithm to solve Approximate SVP. • Our algorithm is based on similar reduction but requires a somewhat difgerent analysis. 25

  55. Easiness of Subset Sum • But Subset Sum is only “weakly” NP-hard (effjcient algorithm exists for unary-encoded inputs). • Lagarias & Odlyzko (1983) : solving certain random instances can be reduced to solving Approximate Shortest Vector Problem in lattices. • Lenstra, Lenstra, & Lovász (1982) : effjcient algorithm to solve Approximate SVP. • Our algorithm is based on similar reduction but requires a somewhat difgerent analysis. 25

  56. Easiness of Subset Sum • But Subset Sum is only “weakly” NP-hard (effjcient algorithm exists for unary-encoded inputs). • Lagarias & Odlyzko (1983) : solving certain random instances can be reduced to solving Approximate Shortest Vector Problem in lattices. • Lenstra, Lenstra, & Lovász (1982) : effjcient algorithm to solve Approximate SVP. • Our algorithm is based on similar reduction but requires a somewhat difgerent analysis. 25

  57. Easiness of Subset Sum • But Subset Sum is only “weakly” NP-hard (effjcient algorithm exists for unary-encoded inputs). • Lagarias & Odlyzko (1983) : solving certain random instances can be reduced to solving Approximate Shortest Vector Problem in lattices. • Lenstra, Lenstra, & Lovász (1982) : effjcient algorithm to solve Approximate SVP. • Our algorithm is based on similar reduction but requires a somewhat difgerent analysis. 25

  58. 1 such that • correct subset of basis vectors gives short lattice vector • any other lattice vector Reducing subset sum to shortest vector problem 0 . for suffjciently large 1 0 1 0 2 -times longer. ; is more than 2 Lagarias & Odlyzko (1983) : random instances of Subset Sum Reduction : construct lattice basis in by noticeable amount. sum Main idea : (w.h.p.) every incorrect subset will “miss” the target 26 effjciently solvable when N source numbers c 1 , . . . , c N chosen independently and u.a.r. from suffjciently wide interval of Z .

  59. 1 such that • correct subset of basis vectors gives short lattice vector • any other lattice vector 2 -times longer. 0 . for suffjciently large 1 0 1 0 Reducing subset sum to shortest vector problem is more than 2 Lagarias & Odlyzko (1983) : random instances of Subset Sum ; Reduction : construct lattice basis in Main idea : (w.h.p.) every incorrect subset will “miss” the target 26 effjciently solvable when N source numbers c 1 , . . . , c N chosen independently and u.a.r. from suffjciently wide interval of Z . sum T by noticeable amount.

  60. Reducing subset sum to shortest vector problem Main idea : (w.h.p.) every incorrect subset will “miss” the target Lagarias & Odlyzko (1983) : random instances of Subset Sum 26 effjciently solvable when N source numbers c 1 , . . . , c N chosen independently and u.a.r. from suffjciently wide interval of Z . sum T by noticeable amount. Reduction : construct lattice basis in R N + 1 such that • correct subset of basis vectors gives short lattice vector v ⋆ ; • any other lattice vector ̸∝ v ⋆ is more than 2 N/ 2 -times longer.   I N [ ] := · · ·  0 b 0 b 1 b N  MT − Mc 1 · · · − Mc N for suffjciently large M > 0 .

  61. integer multiple of permutation matrix corresponding to Our random subset sum instance that is not an 2 2 1 0 , Key lemma : (w.h.p.) for every Gaussian anti-concentration for quadratic and quartic forms. • To show that Lagarias & Odlyzko reduction still works, use 0 1 . • Instead, have some joint density derived from 27 Catch : Our source numbers c i,j = y i x ⊤ j x 0 are not independent , and not uniformly distributed on some wide interval of Z .

  62. integer multiple of permutation matrix corresponding to Our random subset sum instance • To show that Lagarias & Odlyzko reduction still works, use Gaussian anti-concentration for quadratic and quartic forms. Key lemma : (w.h.p.) for every that is not an , 0 1 2 2 27 Catch : Our source numbers c i,j = y i x ⊤ j x 0 are not independent , and not uniformly distributed on some wide interval of Z . • Instead, have some joint density derived from N( 0 , 1 ) .

  63. integer multiple of permutation matrix corresponding to Our random subset sum instance • To show that Lagarias & Odlyzko reduction still works, use Gaussian anti-concentration for quadratic and quartic forms. Key lemma : (w.h.p.) for every that is not an , 0 1 2 2 27 Catch : Our source numbers c i,j = y i x ⊤ j x 0 are not independent , and not uniformly distributed on some wide interval of Z . • Instead, have some joint density derived from N( 0 , 1 ) .

  64. Our random subset sum instance • To show that Lagarias & Odlyzko reduction still works, use 1 Gaussian anti-concentration for quadratic and quartic forms. 27 Catch : Our source numbers c i,j = y i x ⊤ j x 0 are not independent , and not uniformly distributed on some wide interval of Z . • Instead, have some joint density derived from N( 0 , 1 ) . Key lemma : (w.h.p.) for every Z ∈ Z d × d that is not an integer multiple of permutation matrix corresponding to π ∗ , � � � � ∑ 2 poly( d ) · ∥ β ∗ ∥ 2 . � � y 0 − Z i,j · c i,j ≥ � � � � i,j � �

  65. measurements. Unlikely to tolerate much noise . Some remarks works via Moore-Penrose pseudoinverse. • Algorithm strongly exploits assumption of noise-free Open problem : robust effjcient algorithm in high setting. 28 • In general, x 1 , . . . , x n are not e 1 , . . . , e d , but similar reduction

  66. Some remarks works via Moore-Penrose pseudoinverse. • Algorithm strongly exploits assumption of noise-free measurements. Unlikely to tolerate much noise . Open problem : 28 • In general, x 1 , . . . , x n are not e 1 , . . . , e d , but similar reduction robust effjcient algorithm in high SNR setting.

  67. Correspondence retrieval

  68. 2 . Correspondence retrieval problem • Correspondence across measurements is lost. 0 iid from • as unordered multi-set; 1 1 ; 0 iid from • , where 1 for Measurements : 29 Goal : recover k unknown “signals” β ∗ 1 , . . . , β ∗ k ∈ R d .

  69. Correspondence across measurements is lost. Correspondence retrieval problem 29 Goal : recover k unknown “signals” β ∗ 1 , . . . , β ∗ k ∈ R d . Measurements : ( x i , Y i ) for i = 1 , . . . , n , where • ( x i ) iid from N( 0 , I d ) ; i β ∗ i β ∗ • Y i = � x ⊤ 1 + ε i, 1 , . . . , x ⊤ k + ε i,k � as unordered multi-set; • ( ε i,j ) iid from N( 0 , σ 2 ) .

  70. Correspondence retrieval problem Correspondence across measurements is lost. 29 Goal : recover k unknown “signals” β ∗ 1 , . . . , β ∗ k ∈ R d . Measurements : ( x i , Y i ) for i = 1 , . . . , n , where • ( x i ) iid from N( 0 , I d ) ; i β ∗ i β ∗ • Y i = � x ⊤ 1 + ε i, 1 , . . . , x ⊤ k + ε i,k � as unordered multi-set; • ( ε i,j ) iid from N( 0 , σ 2 ) . β ∗ β ∗ 2 3 x i β ∗ 1

  71. Correspondence retrieval problem Correspondence across measurements is lost. 29 Goal : recover k unknown “signals” β ∗ 1 , . . . , β ∗ k ∈ R d . Measurements : ( x i , Y i ) for i = 1 , . . . , n , where • ( x i ) iid from N( 0 , I d ) ; i β ∗ i β ∗ • Y i = � x ⊤ 1 + ε i, 1 , . . . , x ⊤ k + ε i,k � as unordered multi-set; • ( ε i,j ) iid from N( 0 , σ 2 ) . β ∗ β ∗ 2 3 x i β ∗ 1

  72. Special cases • 2 and 1 2 : (real variant of) phase retrieval . Note that has same information as . Existing methods require 2 . 30 • k = 1 : classical linear regression regression model.

  73. Special cases 2 : (real variant of) phase retrieval . 30 • k = 1 : classical linear regression regression model. • k = 2 and β ∗ 1 = − β ∗ i β ∗ , − x ⊤ i β ∗ � has same information as | x ⊤ i β ∗ | . Note that � x ⊤ Existing methods require n > 2 d .

  74. Algorithmic results (Andoni, H., Shi, & Sun, 2017) Algorithm based on reduction to Subset Sum that requires • General setting : Method-of-moments algorithm that requires . I.e., based on forming averages over the data, like: 1 1 2 Questions : SNR limits? Sub-optimality of “method-of-moments”? 31 • Noise-free setting (i.e., σ = 0 ): n ≥ d + 1 , which is optimal.

Recommend


More recommend