The Johnson-Lindenstrauss Lemma in Linear Programming Leo Liberti , Vu Khac Ky, Pierre-Louis Poirion CNRS LIX Ecole Polytechnique, France Aussois COW 2016
The gist • Goal : solving very large LPs min { c ⊤ x | Ax = b ∧ x ≥ 0 } • Trade-off : approximate / wrong with low probability: OK • Means : project cols of Ax = b to random subspace T , get Ax = b ∧ x ≥ 0 ⇔ TAx = Tb ∧ x ≥ 0 with high probability • Bisection : solve LP using [ TAx = Tb ∧ x ≥ 0] as oracle 2
Plan • Restricted Linear Membership • Johnson-Lindenstrauss Lemma • Applying JLL to RLM • Towards solving LPs 3
Restricted Linear Membership 4
Linear feasibility with constrained multipliers Restricted Linear Membership (RLM) Given vectors A 1 , . . . , A n , b ∈ R m and X ⊆ R n , is there x ∈ X s.t. � b = x i A i ? i ≤ n RLM X is a fundamental problem class, which subsumes: • Linear Feasibility Problem (LFP) with X = R n + • Integer Feasibility Problem (IFP) with X = Z n + • Efficient solution of LFP/IFP yields sol. of LP/IP via bisection 5
The shape of a set of points • Lose dimensions but not too much accuracy Given A 1 , . . . , A n ∈ R m find k ≪ m and points A ′ n ∈ R k s.t. 1 , . . . , A ′ A and A ′ “have almost the same shape” • What is the shape of a set of points? A ′ A congruent sets have the same shape • Approximate congruence: A, A ′ have almost the same shape if (1 − ε ) � A i − A j � ≤ � A ′ i − A ′ j � ≤ (1 + ε ) � A i − A j � ∀ i < j ≤ n for some small ε > 0 Assume norms are all Euclidean 6
Losing dimensions in the RLM Given X ⊆ R n and b, A 1 , . . . , A n ∈ R m , find k ≪ m , b ′ , A ′ 1 , . . . , A ′ n ∈ R k such that: ∃ x ∈ X b ′ = � � x i A ′ ∃ x ∈ X b = x i A i iff i i ≤ n i ≤ n � �� � � �� � high dimensional low dimensional with high probability • If this is possible, then solve RLM X ( b ′ , A ′ ) • Since k ≪ m , solving RLM X ( b ′ , A ′ ) should be faster • RLM X ( b ′ , A ′ ) = RLM X ( b, A ) with high probability 7
Losing dimensions = “projection” In the plane, hopeless line 2 line 1 In 3D: no better 8
The Johnson-Lindenstrauss Lemma 9
Johnson-Lindenstrauss Lemma Thm. Given A ⊆ R m with | A | = n and ε > 0 there is k ∼ O ( 1 ε 2 ln n ) and a k × m matrix T s.t. ∀ x, y ∈ A (1 − ε ) � x − y � ≤ � Tx − Ty � ≤ (1 + ε ) � x − y � If k × m matrix T is sampled componentwise from N (0 , 1 √ k ), then A and TA have almost the same shape Discrete approximations of N (0 , 1 √ k ) can also be used, e.g. 1 k ) = P ( T ij = − 1 k ) = 1 6 , P ( T ij = 0) = 2 P ( T ij = √ √ 3 (This makes T sparser) 10
Sampling to desired accuracy • Distortion has low probability: 1 ∀ x, y ∈ A P ( � Tx − Ty � ≤ (1 − ε ) � x − y � ) ≤ n 2 1 ∀ x, y ∈ A P ( � Tx − Ty � ≥ (1 + ε ) � x − y � ) ≤ n 2 • Probability ∃ pair x, y ∈ A distorting Euclidean distance: � n � union bound over pairs 2 � 2 � n n 2 = 1 − 1 P ( ¬ ( A and TA have almost the same shape)) ≤ 2 n 1 P ( A and TA have almost the same shape) ≥ n ⇒ re-sampling T gives JLL with arbitrarily high probability 11
Sketch of a possible JLL proof Thm. Let T be a k × m rectangular matrix with each component sampled from 90% 90% 90% k ), and u ∈ R m s.t. � u � = 1. N (0 , 1 √ Then E( � Tu � 2 ) = 1 n=3 n=11 n=101 dt d ¯ S m O Tu t S m − 1 � 1 − t 2 1 12
In practice • Empirical estimation of C in k = C ε 2 ln n : C ≈ 1 . 8 [Venkatasubramanian & Wang 2011] • Empirically, sample T very few times (e.g. once will do!) on average � Tx − Ty � ≈ � x − y � , and distortion decreases exponentially with n We only need a logarithmic number of dimensions in function of the number of points Surprising fact: k is independent of the original number of dimensions m 13
Typical applications of JLL Problems involving Euclidean distances only • Euclidean clustering k -means, k -nearest neighbors • Linear regression min x � Ax − b � 2 where A is m × n with m ≫ n 14
Applying the JLL to the RLM 15
Projecting infeasibility Thm. T : R m → R k a JLL random projection, b, A 1 , . . . , A n ∈ R m a RLM X instance. For any given vector x ∈ X , we have: n n � � (i) If b = x i A i then Tb = x i TA i i =1 i =1 � � n n � � ≥ 1 − 2 e −C k (ii) If b � = x i A i then P Tb � = x i TA i i =1 i =1 n � y i A i for all y ∈ X ⊆ R n , where | X | is finite, then (iii) If b � = i =1 � � n � ≥ 1 − 2 | X | e −C k P ∀ y ∈ X Tb � = y i TA i i =1 for some constant C > 0 (independent of n, k ). [VPL, arXiv:1507.00990v1/math.OC] 16
Proof (ii) Cor. ∀ ε ∈ (0 , 1) and z ∈ R m , there is a constant C such that P ((1 − ε ) � z � ≤ � Tz � ≤ (1 + ε ) � z � ) ≥ 1 − 2 e −C ε 2 k Proof By the JLL Lemma If z � = 0, there is a constant C such that P ( Tz � = 0) ≥ 1 − 2 e −C k Proof Consider events A : Tz � = 0 and B : (1 − ε ) � z � ≤ � Tz � ≤ (1 + ε ) � z � ⇒ A c ∩ B = ∅ , othw Tz = 0 ⇒ (1 − ε ) � z � ≤ � Tz � = 0 ⇒ z = 0, contradiction ⇒ B ⊆ A ⇒ P ( A ) ≥ P ( B ) ≥ 1 − e −C ε 2 k by Corollary Holds ∀ ε ∈ (0 , 1) hence result Now it suffices to apply the Lemma to Ax − b 17
Consequences of the main theorem • (i) and (ii): checking certificates given x , with high probability b = � i x i A i ⇔ Tb = � i x i TA i • (iii) RLM X whenever | X | is polynomially bounded e.g. knapsack set { x ∈ { 0 , 1 } n | � α i x i ≤ d } for a fixed d i ≤ n with α > 0 • (iii) hints that LFP case is more complicated as X = R n + is not polynomially bounded 18
Separating hyperplanes When | X | is large, project separating hyperplanes instead • Convex C ⊆ R m , x �∈ C : then ∃ hyperplane c separating x , C • In particular, true if C = cone( A 1 , . . . , A n ) for A ⊆ R m • We aim to show x ∈ C ⇔ Tx ∈ TC with high probability • As above, if x ∈ C then Tx ∈ TC by linearity of T real issue is proving the converse 19
Projecting the separation Thm. Given c, b, A 1 , . . . , A n ∈ R m of unit norm s.t. b / ∈ cone { A 1 , . . . , A n } pointed, ε > 0, c ∈ R m s.t. c ⊤ b < − ε , c ⊤ A i ≥ ε ( i ≤ n ), and T a random projector: � � ≥ 1 − 4( n + 1) e −C ( ε 2 − ε 3 ) k P ∈ cone { TA 1 , . . . , TA n } Tb / for some constant C . Proof Let A be the event that T approximately preserves � c − χ � 2 and � c + χ � 2 for all χ ∈ { b, A 1 , . . . , A n } . Since A consists of 2( n + 1) events, by the JLL Corollary (squared version) and the union bound, we get P ( A ) ≥ 1 − 4( n + 1) e −C ( ε 2 − ε 3 ) k Now consider χ = b � Tc, Tb � = 1 4( � T ( c + b ) � 2 − � T ( c − b ) � 2 ) ≤ 1 4( � c + b � 2 − � c − b � 2 ) + ε 4( � c + b � 2 + � c − b � 2 ) by JLL = c ⊤ b + ε < 0 and similarly � Tc, TA i � ≥ 0 [VPL, arXiv:1507.00990v1/math.OC] 20
Is this useful? Previous results look like: orig. LFP infeasible ⇒ P (proj. LFP infeasible) ≥ 1 − p ( n ) e −C r ( ε ) k where p, r two polynomials • Pick a suitable δ > 0 C r ( ε ) (ln p ( n ) + ln 1 1 • Choose k ∼ O ( δ )) so that RHS ≥ 1 − δ • Preserve infeasibility with probability ≥ 1 − δ • Useful for m ≤ n large enough that k ≪ m 21
Consequences of projecting separations • Applicable to LFP • Probability depends on ε (the larger the better) • Largest ε given by LP max { ε ≥ 0 | c ⊤ b ≤ − ε ∧ ∀ i ≤ n ( c ⊤ A i ≥ ε ) } • If cone( A 1 , . . . , A n ) is almost non-pointed, ε can be very small 22
Projecting minimum distances to a cone • Thm. : minimum distance to a cone is approximately preserved • This result also works with non-pointed cones Trade-off: need larger k, m, n • We appear to be all set for LFPs • Using bisection and LFP, also for LPs 23
Main theorem for LFP projections Established so far: Thm. Given δ > 0, ∃ sufficiently large m ≤ n such that: for any LFP input A, b where A is m × n we can sample a random k × m matrix T with k ≪ m and P (orig. LFP feasible ⇐ ⇒ proj. LFP feasible) ≥ 1 − δ 24
Towards solving LPs 25
Some results on uniform dense LFP • Matrix product TA takes too long (call this an “implementation detail” and don’t count it) • Infeasible instances (sizes from 1000 × 1500 to 2000 × 2400 ) : Uniform ǫ k ≈ CPU saving accuracy ( − 1 , 1) 0 . 1 0 . 5 m 30% 50% ( − 1 , 1) 0 . 15 0 . 25 m 92% 0% ( − 1 , 1) 0 . 2 0 . 12 m 99 . 2% 0% (0 , 1) 0 . 1 0 . 5 m 10% 100% (0 , 1) 0 . 15 0 . 25 m 90% 100% (0 , 1) 0 . 2 0 . 12 m 97% 100% • Feasible instances : – similar CPU savings – obviously 100% accuracy 26
Certificates • Ax = b ⇒ TAx = Tb by linearity, however • Thm. : For x ≥ 0 s.t. TAx = Tb , Ax = b with probability 0 • Can’t get certificate for original LFP using projected LFP! 27
Can we solve LPs by bisection? – Projected certificate is infeasible in original problem – Only get approximate optimal objective function value – No bound on error, no idea about how large m, n should be – Validated on “large enough” NetLib instances (with k ≈ 0 . 95 m ) 28
Recommend
More recommend