Sublinear Algorithms Lectures 1 and 2 Sofya Raskhodnikova Penn State University 1
Tentative Topics Introduction, examples and general techniques. Sublinear-time algorithms for • graphs • strings • basic properties of functions • algebraic properties and codes • metric spaces • distributions Tools: probability, Fourier analysis, combinatorics , codes, … Sublinear-space algorithms: streaming 2
Tentative Plan Introduction, examples and general techniques. Lecture 1. Background. Testing properties of images and lists. Lecture 2. Properties of functions and graphs. Sublinear approximation. Lecture 3-5. Background in probability. Techniques for proving hardness. Other models for sublinear computation. 3
Motivation for Sublinear-Time Algorithms Massive datasets • world-wide web • online social networks • genome project • sales logs • census data • high-resolution images • scientific measurements Long access time • communication bottleneck (dial-up connection) • implicit data (an experiment per data point) 4
What Can We Hope For? • What can an algorithm compute if it – reads only a sublinear portion of the data? – runs in sublinear time? • Some problems have exact deterministic solutions • For most interesting problems algorithms must be – approximate – randomized 5
A Sublinear-Time Algorithm B L A - B L A - B L A - B L A - B L A - B L A - B L A - B L A ? L ? B ? L ? A sublinear-time algorithm approximate answer Resources Quality of number of samples vs. approximation running time 6
Types of Approximation Classical approximation • need to compute a value output is close to the desired value examples: average, median values • need to compute the best structure output is a structure with “cost” close to optima l examples: furthest pair of points, minimum spanning tree Property testing • need to answer YES or NO output is a correct answer for a given input, or at least some input close to it 7
Classical Approximation A Simple Example
Approximate Diameter of a Point Set [Indyk] Input: 𝑛 points, described by a distance matrix 𝐸 – 𝐸 𝑗𝑘 is the distance between points 𝑗 and 𝑘 – 𝐸 satisfies triangle inequality and symmetry (Note: input size is 𝑜 = 𝑛 2 ) Let 𝑗, 𝑘 be indices that maximize 𝐸 𝑗𝑘 . Maximum 𝐸 𝑗𝑘 is the diameter. • Output: (𝑙, ℓ) such that 𝐸 𝑙ℓ 𝐸 𝑗𝑘 /2
Algorithm and Analysis 𝑘 Algorithm (𝑛, 𝐸) 1. Pick 𝑙 arbitrarily ℓ 2. Pick ℓ to maximize 𝐸 𝑙ℓ 3. Output (𝑙, ℓ) • Approximation guarantee 𝐸 𝑗𝑘 ≤ 𝐸 𝑗𝑙 + 𝐸 𝑙𝑘 (triangle inequality) 𝑙 ≤ 𝐸 𝑙ℓ + 𝐸 𝑙ℓ (choice of ℓ + symmetry of 𝐸 ) ≤ 2𝐸𝑙 ℓ • Running time: 𝑃(𝑛) = 𝑃(𝑛 = 𝑜) 𝑗 A rare example of a deterministic sublinear-time algorithm
Property Testing
Property Testing: YES/NO Questions Does the input satisfy some property? (YES/NO) “in the ballpark” vs. “out of the ballpark” Does the input satisfy the property or is it far from satisfying it? • sometimes it is the right question (probabilistically checkable proofs (PCPs)) • as good when the data is constantly changing (WWW) • fast sanity check to rule out inappropriate inputs (airport security questioning) 12
Property Tester Definition Probabilistic Algorithm Property Tester YES YES Accept with Accept with probability ≥ 𝟑/𝟒 probability ≥ 𝟑/𝟒 𝜁 Don’t care Close to YES NO Far from Reject with Reject with YES probability 2/3 probability 2/3 𝜁 - ( ≥ 𝜁 fraction of places) far = differs in many places 13
Randomized Sublinear Algorithms Toy Examples
Property Testing: a Toy Example Input: a string 𝑥 ∈ 0,1 𝑜 0 0 0 1 … 0 1 0 0 Question: Is 𝑥 = 00 … 0 ? Requires reading entire input. Is 𝑥 = 00 … 0 or Approximate version: does it have ≥ 𝜁𝑜 1’s (“errors”)? Test (𝑜, 𝑥) Sample 𝑡 = 2/𝜁 positions uniformly and independently at random 1. 2. If 1 is found, reject ; otherwise, accept Used: 1 − 𝑦 ≤ 𝑓 −𝑦 Analysis: If 𝑥 = 00 … 0 , it is always accepted. If 𝑥 is 𝜁 -far, Pr[error] = Pr [no 1’s in the sample] ≤ 1 − 𝜁 𝑡 ≤ 𝑓 −𝜁𝑡 = 𝑓 −2 < 1 3 Witness Lemma If a test catches a witness with probability ≥ 𝑞 , 2 then s = 𝑞 iterations of the test catch a witness with probability ≥ 2/3. 15
Randomized Approximation: a Toy Example Input: a string 𝑥 ∈ 0,1 𝑜 0 0 0 1 … 0 1 0 0 Goal: Estimate the fraction of 1’s in 𝑥 (like in polls) It suffices to sample 𝑡 = 1 ⁄ 𝜁 2 positions and output the average to get the fraction of 1’s ±𝜁 (i.e., additive error 𝜁 ) with probability ¸ 2/3 Hoeffding Bound Let Y 1 , … , Y s be independently distributed random variables in [0,1] and 𝑡 ≥ δ ≤ 2e −2𝜀 2 /𝑡 . let Y = ∑ Y i (sample sum). Then Pr Y − E Y 𝑗=1 𝑡 Y i = value of sample 𝑗 . Then E[Y] = ∑ E[Y i ] = 𝑡 ⋅ (fraction of 1’s in 𝑥 ) 𝑗=1 Pr (sample average) − fraction of 1′s in 𝑥 ≥ 𝜁 = Pr Y − E Y ≥ 𝜁𝑡 ≤ 2e −2𝜀 2 /𝑡 = 2𝑓 −2 < 1/3 substitute 𝑡 = 1 ⁄ 𝜁 2 Apply Hoeffding Bound with 𝜀 = 𝜁𝑡 16
Property Testing Simple Examples
Testing Properties of Images 18
Pixel Model Input: 𝑜 × 𝑜 matrix of pixels (0/1 values for black-and-white pictures) Query: point (𝑗 1 , 𝑗 2 ) Answer: color of (𝑗 1 , 𝑗 2 ) 19
Testing if an Image is a Half-plane [R03] A half-plane or 𝜁 -far from a half-plane? O(1/ 𝜁) time 20
Half-plane Instances 1 A half-plane 4 -far from a half-plane 21
Half-plane Instances 1 A half-plane 4 -far from a half-plane 22
Half-plane Instances 1 A half-plane 4 -far from a half-plane 23
Half-plane Instances 1 A half-plane 4 -far from a half-plane 24
Half-plane Instances 1 A half-plane 4 -far from a half-plane 25
Half-plane Instances 1 A half-plane 4 -far from a half-plane 26
Half-plane Instances 1 A half-plane 4 -far from a half-plane 27
Strategy “ Testing by implicit learning ” paradigm • Learn the outline of the image by querying a few pixels. • Test if the image conforms to the outline by random sampling, and reject if something is wrong. 28
Half-plane Test Claim. The number of sides with different corners is 0, 2, or 4. ? ? ? ? Algorithm 1. Query the corners. 29
Half-plane Test: 4 Bi-colored Sides Claim. The number of sides with different corners is 0, 2, or 4. Analysis • If it is 4, the image cannot be a half-plane. Algorithm 1. Query the corners. 2. If the number of sides with different corners is 4, reject . 30
Half-plane Test: 0 Bi-colored Sides Claim. The number of sides with different corners is 0, 2, or 4. Analysis ? ? • If all corners have the same color, the image is a ? half-plane if and only if it is unicolored. ? ? ? Algorithm 1. Query the corners. If all corners have the same color 𝑑 , test if all pixels have color 𝑑 2. (as in Toy Example 1). 31
Half-plane Test: 2 Bi-colored Sides Claim. The number of sides with different 𝜁𝑜/2 corners is 0, 2, or 4. ? ? 𝑋 Analysis The area outside of 𝑋 ∪ 𝐶 has ≤ 𝜁𝑜 2 /2 pixels. • • If the image is a half-plane, W contains only 𝐶 white pixels and B contains only black pixels. If the image is 𝜁 -far from half-planes, it has • ≥ 𝜁𝑜 2 /2 wrong pixels in 𝑋 ∪ 𝐶. ? By Witness Lemma, 4/𝜁 samples suffice to ? • 𝜁𝑜/2 catch a wrong pixel. Algorithm 1. Query the corners. 2. If # of sides with different corners is 2, on both sides find 2 different pixels within distance 𝜁𝑜/2 by binary search. Query 4/𝜁 pixels from 𝑋 ∪ 𝐶 3. Accept iff all 𝑋 pixels are white and all 𝐶 pixels are black. 4. 32
Testing if an Image is a Half-plane [R03] A half-plane or 𝜁 -far from a half-plane? O(1/ 𝜁) time 33
Other Results on Properties of Images • Pixel Model Convexity [R03] Convex or 𝜁 -far from convex? O(1/ 𝜁 2 ) time Connectedness [R03] Connected or 𝜁 -far from connected? O(1/ 𝜁 4 ) time Partitioning [Kleiner Keren Newman 10] Can be partitioned according to a template or is 𝜁 -far? time independent of image size • Properties of sparse images [Ron Tsur 10] 34
Testing if a List is Sorted Input: a list of n numbers x 1 , x 2 ,..., x n • Question: Is the list sorted? Requires reading entire list: (n) time • Approximate version: Is the list sorted or ² -far from sorted? (An ² fraction of x i ’s have to be changed to make it sorted.) [Ergün Kannan Kumar Rubinfeld Viswanathan 98, Fischer 01]: O((log n)/ ² ) time (log n) queries • Attempts: 1. Test: Pick a random i and reject if x i > x i+1 . Fails on: 1 1 1 1 1 1 1 0 0 0 0 0 0 0 Ã 1/2-far from sorted 2. Test: Pick random i < j and reject if x i > x j . Fails on: 1 0 2 1 3 2 4 3 5 4 6 5 7 6 Ã 1/2-far from sorted 35
Recommend
More recommend