property testing
play

PROPERTY TESTING Arnab BHATTACHARYYA (in lieu of Seth) 29/08/2019 - PowerPoint PPT Presentation

CS523 234: 4: Alg lgori rith thms ms at Sca cale le PROPERTY TESTING Arnab BHATTACHARYYA (in lieu of Seth) 29/08/2019 Lecture Outline What is property testing? Identify what goes into showing correctness of a testing algorithm.


  1. CS523 234: 4: Alg lgori rith thms ms at Sca cale le PROPERTY TESTING Arnab BHATTACHARYYA (in lieu of Seth) 29/08/2019

  2. Lecture Outline ■ What is property testing? ■ Identify what goes into showing correctness of a testing algorithm. Some examples. ■ Identify what goes into showing impossibility of fast testing. Some examples.

  3. A motivating example • DNA: strings in 4 characters {A, C, T, G} • Pr Prob oblem em: Given two DNA strands 𝑌 and 𝑍 , are they from the same species or from different?

  4. ■ If 𝑌 and 𝑍 are from the same species, then we expect the strings are similar. Otherwise, not. ■ But similar in what sense?

  5. ■ If 𝑌 and 𝑍 are from the same species, then we expect the strings are similar. Otherwise, not. ■ But similar in what sense? Nee eed d a met metric. ric. – One possibility is Levenshtein distance (# of insertions, deletions or substitutions to turn one string into another)

  6. Want an algorithm that outputs: – SAME if 𝑒 𝑀 𝑌, 𝑍 is “small” – DIFFERENT if 𝑒 𝑀 (𝑌, 𝑍) is “large”

  7. For exactly computing 𝑒 𝑀 , only 𝑃 𝑜 2 algorithms are known. Too expensive for bio applications.

  8. For exactly computing 𝑒 𝑀 , only 𝑃 𝑜 2 algorithms are known. Too expensive for bio applications. Is there a more efficient algorithm that outputs – SAME if 𝑒 𝑀 𝑌, 𝑍 ≤ 𝑈 1 – DIFFERENT if 𝑒 𝑀 𝑌, 𝑍 ≥ 𝑈 2 ?

  9. Is there a more efficient algorithm that outputs – SAME if 𝑒 𝑀 𝑌, 𝑍 ≤ 𝑈 1 – DIFFERENT if 𝑒 𝑀 𝑌, 𝑍 ≥ 𝑈 2 ? Indeed, there is! If 𝑈 1 and 𝑈 2 are sufficiently apart, you only need to look at ≪ 𝑜 characters in the strings to make the correct decision with high probability!

  10. ( , ) ( , )

  11. Pr Prop oper erty ty Test estin ing Fram amewor ork Bad inputs are 𝝑 -far ar from good, which means: For a distance function 𝑒: Inputs → [0,1] , for any good 𝑌 and bad 𝑍 , 𝑒 𝑌, 𝑍 > 𝜗.

  12. Pr Prop oper erty ty Test estin ing Fram amewor ork Def efinitio inition. An algorithm is a tes ester er for r a p proper operty ty 𝓠 if: • The inputs are: integer 𝑜 > 0 , real 𝜗 ∈ (0,1) , and query access to an object 𝑦 of size 𝑜 • It accepts with probability ≥ 2/3 if 𝑦 ∈ 𝒬 . • It rejects with probability ≥ 2/3 if 𝑦 is 𝜗 -far from 𝒬 .

  13. Pr Prop oper erty ty Test estin ing Fram amewor ork Que uery co comple lexity xity: The number of query accesses made by the tester. Main focus of this course will be understanding the query complexity for various properties 𝒬 .

  14. Pr Prop oper erty ty Test estin ing Fram amewor ork Data a re repr presen esenta tation tion decides what is revealed by each query. For example, can represent graph as an adjacency matrix or list.

  15. Pr Prop oper erty ty Test estin ing Fram amewor ork Dista stance nce fun unctio ction decides what is meant by 𝜗 -far. The default choice is the Hamming amming distance istance. For two functions 𝑔, 𝑕: 𝑜 → 𝑆 , 𝑗 ∈ 𝑜 : 𝑔 𝑗 ≠ 𝑕 𝑗 𝑒 𝐼 𝑔, 𝑕 = . 𝑜

  16. Pr Prop oper erty ty Test estin ing Fram amewor ork Often, our testers will be one ne-sid sided ed, meaning the tester will accept with probability 1 if 𝑦 ∈ 𝒬 .

  17. ■ Inputs are strings of length 𝑜 . Property 𝒬 is satisfied only by the all- 1’s string. Distance measure is the Hamming distance, 𝑒 𝐼 . ■ Want tester to accept 𝑦 with probability ≥ 2/3 if 𝑦 = 1 𝑜 . Want tester to reject 𝑦 with probability ≥ 2/3 if A S A Sim imple e #{𝑗: 𝑦 𝑗 ≠ 1} > 𝜗𝑜 . Exam Ex ample le ■ Tester: Sample 2/𝜗 random locations 𝑗 ∈ [𝑜] . Accept iff for all such 𝑗 , 𝑦 𝑗 = 1 . ■ One-sided error. If 𝑦 is 𝜗 -far from 𝒬 , Pr[tester rejects] ≥ 1 − 1 − 𝜗 2/𝜗 ≥ 2/3

  18. To show that an algorithm 𝒝 is a tester for a property 𝒬 with query complexity 𝑟(𝜗, 𝑜) , you need to do th thre ree things: 1. Prove that for any 𝑦 ∈ 𝒬 , 𝒝 accepts with probability ≥ 2/3 (or 1 for one-sided) 2. Prove that for any 𝑦 that is 𝜗 -far from 𝒬 , 𝒝 rejects with probability ≥ 2/3 3. Prove that the number of queries is at most 𝑟(𝜗, 𝑜) for all inputs

  19. 𝒬 = monotonicity ■ Input: array of 𝑜 distinct numbers. ■ Array 𝐵 is mo monot notone one if 𝐵 𝑗 < 𝐵[𝑘] when 𝑗 < 𝑘 . ■ Array 𝐵 is 𝝑 -far r from m mo monot notone one if: monotone 𝐶 𝑒 𝐼 𝐵, 𝐶 > 𝜗 min

  20. Test1( 𝜗, 𝑜, 𝐵) : for t=1,…,q: choose random i ∈ [1, 𝑜 − 1] output “NO” if A[ i] > A[i+1] output “YES” For what choice of 𝑟 is Test1 a tester for monotonicity?

  21. Test2( 𝜗, 𝑜, 𝐵) : for t=1,…,q: choose random i ∈ [1, 𝑜 − 1] choose random j ∈ [𝑗 + 1, 𝑜] output “NO” if A[ i] > A[j] output “YES” For what choice of 𝑟 is Test2 a tester for monotonicity?

  22. Test3( 𝜗, 𝑜, 𝐵) : for t=1,…,2/ 𝜗 : choose random i ∈ [1, 𝑜] x ← A[i] output “NO” if binary search \\ for x does not end at i output “YES” Th Theo eorem rem: Test3 is a one-sided tester for monotonicity with query complexity 𝑃((log 𝑜)/𝜗) . NO case YES case Query complexity

  23. NO case analysis Call a coordinate 𝑗 sea earcha chabl ble e if the binary search for 𝐵[𝑗] ends at 𝑗 . Cla laim m 1: If 𝐵 is 𝜗 -far from monotone, then the number of searchable 𝑗 ’s is at most 1 − 𝜗 𝑜 . NO case done with this claim. Why?

  24. Proof of Claim 1 Cla laim m 2: The array 𝐵 restricted to its searchable coordinates is monotone. Claim 1 follows from Claim 2. Why?

  25. Proof of Claim 2 Cla laim m 3: If 𝑗 < 𝑘 and both 𝑗 and 𝑘 are searchable, then 𝐵 𝑗 < 𝐵[𝑘] .

  26. Some notes ■ Tester is adaptiv aptive, meaning that its queries may depend on the answers to its past queries. ■ It is possible to make the tester non-adaptive. ■ Test2 is a valid tester with query complexity 𝑃 𝜗 −1 when the inputs are Boolean arrays.

  27. Lower bounds on query complexity Three common approaches Yao’s Minimax Gap-Preserving Communication Principle Reductions Complexity

  28. Lower bounds on query complexity Three common approaches Yao’s Minimax Gap-Preserving Communication Principle Reductions Complexity

  29. Lower bounds for randomized testers ■ Testers are ra randomi ndomized zed alg lgori rith thms ms. You can think of a randomized algorithm as a random element of a collection of deterministic algorithms: 𝒝 = {𝐵 1 , 𝐵 2 , 𝐵 3 , … } ■ Showing limitations for randomized algorithms is usually trickier than for deterministic algorithms

  30. For any randomized tester 𝑈 making 𝑟 queries, there exists an input 𝑦 such that: 𝑈 [𝑈 𝑦 is wrong] > 1 Pr 3 There exists a distribution 𝔈 on inputs such that for any det etermi ermini nist stic ic tester 𝑈 making 𝑟 queries: 𝑦∼𝒠 [𝑈 𝑦 is wrong] > 1 Pr 3

  31. For any randomized tester 𝑈 There exists a distribution 𝔈 on making 𝑟 queries, there exists an inputs such that for any det eterministic erministic input 𝑦 such that: tester 𝑈 making 𝑟 queries: 𝑈 [𝑈 𝑦 is wrong] > 1 𝑦∼𝒠 [𝑈 𝑦 is wrong] > 1 Pr Pr 3 3

  32. For any randomized tester 𝑈 making 𝑟 queries, there exists an input 𝑦 such that: 𝑈 [𝑈 𝑦 is wrong] > 1 Pr 3 There exists a distribution 𝔈 on inputs such that for any det etermi ermini nist stic ic tester 𝑈 making 𝑟 queries: 𝑦∼𝒠 [𝑈 𝑦 is wrong] > 1 Pr 3

  33. It suffices to come up with a distribution ribution of inputs that is hard on average for any low- query det etermini rminist stic tester. Yao’s Minimax Principle : 𝒬 is a property over objects. Suppose there are two distributions ℱ 1 and ℱ 2 such that: • 𝑦∼ℱ 1 [𝑦 ∈ 𝒬] ≥ 1 − 𝜃 1 Pr • 𝑦∼ℱ 2 [𝑦 is 𝜗−far from 𝒬] ≥ 1 − 𝜃 2 Pr • For any deterministic algorithm 𝑈 making 𝑟 𝑜, 𝜗 queries: 𝑦∼ℱ 1 𝑈 accepts − Pr Pr 𝑦∈ℱ 2 𝑈 accepts ≤ 𝜃 3 If 𝜃 1 + 𝜃 2 + 𝜃 3 < 1/3 , then the query complexity of testing 𝒬 is more than 𝑟(𝑜, 𝜗) .

  34. Suppose 𝒬 = 1 𝑜 . The query complexity of Ex Exam ample ple testing 𝒬 is Ω(𝜗 −1 ) .

  35. What hat ab about out 𝒬 = 0 𝑜 , 1 𝑜 ? 𝒬 = {𝑨} for a fixed string 𝑨 ∈ 0,1 𝑜 ?

  36. Suppose 𝒬 = 𝑦 ∈ 0,1 𝑜 : 𝑦 ≤ 𝑜 Ex Exam ample ple 2 (1 − 𝜗) . The query complexity of testing 𝒬 is Ω(𝜗 −2 ) .

  37. Takeaways ■ Property testing is about how you can uncover differences in the global structure by using local queries. ■ For showing correctness of testers, you need to verify its query complexity and its performance on YES and NO input instances. ■ For proving lower bounds on the query complexity via Yao’s minimax principle, you explicitly come up with a hard input distribution for deterministic testers.

Recommend


More recommend