Artur Czumaj Artur Czumaj DIMAP (Centre for Discrete Maths and it - PowerPoint PPT Presentation

Testing Continuous Distributions Artur Czumaj Artur Czumaj DIMAP (Centre for Discrete Maths and it Applications) DIMAP ( entre for D screte Maths and t Appl cat ons) & Department of Computer Science University of Warwick Joint work with A. Adamaszek & C. Sohler

Testing probability distributions Testing probability distributions • General question: G l – Test a given property of a given probability distribution • distribution is available by accessing only samples drawn from the distribution Examples: - is given probability uniform? - are two prob. distributions independent?

Testing probability distributions Testing probability distributions For more details/introduction: see R. Rubinfeld’s talk on Wednesday • Typical result: – Given a probability distribution on n points, we can test √ n if it’s uniform after seeing ~ random samples [Batu et al ‘01] Testing = distinguish between uniform distribution and Testing = distinguish between uniform distribution and distributions which are ² -far from uniform ² -far from uniform: ² far from uniform P x ∈ Ω | Pr[ x ] − 1 n | ≥ ²

Testing probability distributions Testing probability distributions For more details/introduction: see R. Rubinfeld’s talk on Wednesday • Typical result: – Given a probability distribution on n points, we can test √ n if it’s uniform after seeing ~ random samples [Batu et al ‘01] • What if distribution has infinite support? What if distribution has infinite support? • Continuous probability distributions?

Testing continuous probability distributions Testing continuous probability distributions • Typical result: yp – Given a probability distribution on n points, we can test √ n if it’s uniform after seeing ~ random samples √ n √ – ~ random samples are necessary • Given a continuous probability distribution on [0,1], can we test if it’s uniform? • Impossible bl • Follows from the lower bound for discrete case with n → ∞ h

Testing continuous probability distributions Testing continuous probability distributions • More direct proof: • Suppose tester A distinguishes in at most t steps between uniform distribution and ² -far from uniform • D 1 – uniform distribution • D 2 is ½-far from uniform and is defined as follows: Partition [0,1] into t 3 interval of identical length • • Split each interval into two halves • Randomly choose one half: – the chosen half gets uniform distribution – the other half has zero probability th th h lf h s p b bilit • In t steps, no interval will be chosen more than once in D 2 A A cannot distinguish between D 1 and D 2 t di ti i h b t D d D

Testing continuous probability distributions Testing continuous probability distributions • What can be tested? Wh b d • First question: test if the distribution is indeed continuous

Testing continuous probability distributions Testing continuous probability distributions • Test if a probability distribution is discrete f b b l d b d • Prob. distribution D on  is discrete on N points if there is a set X ⊆  |X| ≤ N st Pr [X]=1 if there is a set X ⊆  , |X| ≤ N, st. Pr D [X]=1 • D is ² -far from discrete on N points if D is ² far from discrete on N points if ∀ X ⊆  , |X| ≤ N Pr [X]<1 ² Pr D [X]<1- ²

Testing if distribution is discrete on N points Testing if distribution is discrete on N points • We repeatedly draw random points from D W dl d d f D • All what can we see: – Count frequency of each point – Count number of points drawn For some D (eg, uniform or close): √ • we need  ( ) to see first multiple occurrence N Gi Gives a hope that can be solved in sublinear-time h th t b l d i bli ti

Testing if distribution is discrete on N points Testing if distribution is discrete on N points R Raskhodnikova et al ’07 (Valiant’08): kh d k l ’0 (V l ’08) Distinct Elements Problem: • D discrete with each element with prob. ≥ 1/N • Estimate the support size pp  (N 1-o(1) ) queries are needed to distinguish instances with ≤ N/100 and ≥ N/11 support size pp ≤ ≥ Key step: two distributions that have identical first log Θ (1) N moments their expected frequencies up to log Θ (1) N are identical •

Testing if distribution is discrete on N points Testing if distribution is discrete on N points R Raskhodnikova et al ’07 (Valiant’08): kh d k l ’0 (V l ’08) Distinct Elements Problem: • D discrete with each element with prob. ≥ 1/N • Estimate the support size pp  (N 1-o(1) ) queries are needed to distinguish instances with ≤ N/100 and ≥ N/11 support size pp ≤ ≥ Corollary: Testing if a distribution is discrete on N points g p requires  (N 1-o(1) ) samples

Testing if distribution is discrete on N points Testing if distribution is discrete on N points • We repeatedly draw random points from D W dl d d f D • All what can we see: – Count frequency of each point – Count number of points drawn • Can we get O(N) time?

Testing if distribution is discrete on N points Testing if distribution is discrete on N points • Testing if a distribution is discrete on N points: f d b d N • Draw a sample S = (s 1 , …, s t ) with t = cN/ ² • If S has more than N distinct elements then REJECT else ACCEPT If D is discrete on N points then we will accept D p p • We only have to prove that • if D is ² -far from discrete on N points, then we will reject • with probability >2/3 with probability >2/3

Testing if distribution is discrete on N points Testing if distribution is discrete on N points • Testing if a distribution is discrete on N points: f d b d N • Draw a sample S = (s 1 , …, s t ) with t = cN/ ² • If S has more than N distinct elements then REJECT else ACCEPT Can we do better (if we only count distinct elements)? y D: has 1 point with prob. 1-4 ² 2N points with prob. 2 ² /N D i D is ² -far from discrete on N points f f di N i We need  (N/ ² ) samples to see at least N points

Testing if distribution is discrete on N points Testing if distribution is discrete on N points Assume D is ² -far from discrete on N points Assume D is ² far from discrete on N points Order points in  so that Pr[X i ] = p i and p i ≥ p i+1 A = {X 1 , …, X N }, B = other points from the support p 1 +p 2 +…+p N < 1- ² α = # points from A drawn by the algorithm β = # points from B drawn by the algorithm # points from B drawn by the algorithm β We consider 3 cases (all bounds are with prob. > 0.99): We consider 3 cases (all bounds are with prob > 0 99): 1) p N < ² /2N  β > N all points in B have small prob.  not too many repetitions • 2) p N ≥ c N / ²  β ≥ ² /2p N points in B have small prob.  bound for #distinct points • 3) p N ≥ ² /2N  α ≥ N - ² /2p N 3) p N ≥ ² /2N  α ≥ N ² /2p N either many distinct points from A or p N is very small (then β will • be large)

Testing if distribution is discrete on N points Testing if distribution is discrete on N points Assume D is ² -far from discrete on N points Assume D is ² far from discrete on N points Order points in  so that Pr[X i ] = p i and p i ≥ p i+1 A = {X 1 , …, X N }, B = other points from the support α = # points from A drawn by the algorithm β = # points from B drawn by the algorithm Main ideas: Case 2) p N ≥ c N / ²  β ≥ ² /2p N Worst case: all points in B have uniform and maximum distrib = p N Worst case: all points in B have uniform and maximum distrib. = p N • • Z i = random variable: number of steps to get ith new point from B • ²/ 2 p N X We have to prove that with prob. > 0.99: • Z i < t i =1 Z 1 , Z 2 , … - geometric distribution: 1 E [ Z i ] = ( r − i ) p N , r = number of points in B • P ²/ 2 p N 2 E [ Z i ] ≤ i =1 i 1 p N p N → Markov gives with prob. ≥ 0.99: P ²/ 2 p N Z i < t i =1

Testing if distribution is discrete on N points Testing if distribution is discrete on N points • We repeatedly draw random points from D W dl d d f D • All what can we see: – Count frequency of each point – Count number of points drawn By sampling O(N/ ² ) points one can distinguish between By sampling O(N/ ² ) points one can distinguish between • distributions discrete on N points and • those ² -far from discrete on N points those ² far from discrete on N points The algorithm may fail with prob. < 1/3

Testing continuous probability distributions Testing continuous probability distributions • What can we test efficiently? Wh ff l – Complexity for discrete distributions should be “independent” on the support size “i d d t” th t i • Uniform distribution … under some conditions U if di t ib ti d diti • Rubinfeld & Servedio’05: – testing monotone distributions for uniformity

Artur Czumaj Artur Czumaj DIMAP (Centre for Discrete Maths and it - PowerPoint PPT Presentation

Testing Continuous Distributions Artur Czumaj Artur Czumaj DIMAP (Centre for Discrete Maths and it Applications) DIMAP ( entre for D screte Maths and t Appl cat ons) & Department of Computer Science University of Warwick Joint work with A.

Directed Graphs Artur Czumaj DIMAP and Department of Computer Science University of Warwick

neighbor graph Artur Czumaj DIMAP (Centre for Discrete Mathematics and it Applications)

Conjunctive grammars generate non-regular unary languages Artur Je z August 21, 2007 Artur

At-Speed BIST for Board-Level Interconnect Artur Jutman artur@pld.ttu.ee Tallinn University of

<3 Thursday, July 23, 2009 Artur Bergman artur@crucially.net perl hacker

Equations over sets of natural numbers. Artur Je z University of Wroclaw December 13, 2007

Expressing high level optimizations within LLVM Artur Pilipenko artur.pilipenko@azul.com This

Use of the LLVM framework for the MSIL code generation Artur PIETREK artur.pietrek@imag.fr

Local compression and Word Equations Artur Je MPI, Germany 28 February 2013 Compression and

Compressed Membership for NFA (DFA) with Compressed Labels is in NP (P) Artur Je University of

Learning, Expectations, and Endogenous Business Cycles Artur Doshchyn & Nicola Giommetti

Fully compressed pattern matching by recompression Artur Je University of Wrocaw 9 VII 2012

Conjunctive grammars over a unary alphabet Artur Je z, Alexander Okhotin September 7, 2007

Compressed Membership for NFA (DFA) with Compressed Labels is in NP (P) Artur Je Wrocaw,

Computational completeness of equations over sets of natural numbers Artur Je z Alexander

DFA hyper-minimisation l Gawrychowski 1 Artur Je z 1 Pawe Institute of Computer Science,

International Association for Cryptologic Research Christian Cachin President, IACR Crypto 2018

Outline Vienna, Austria - introduction to the giRaph package The giRaph package for graph

Overview MAXENT-Modeling: A framework for Discrete MAXENT-Models and RMs IRT-Modeling?

Undirected graphical models Graph G : arbitrary undirected graph Useful when variables interact

Bayesian games with continuous type spaces: The "Study groups" game Felix Munoz-Garcia

Probabilit y r u les FU N DAME N TAL S OF BAYE SIAN DATA AN ALYSIS IN R Rasm u s Bth Data

Formal Modeling in Cognitive Science 1 Special Probability Distributions Uniform Distribution

Probability Recap MDM4U: Mathematics of Data Management Determine the probability of rolling each

Artur Czumaj Artur Czumaj DIMAP (Centre for Discrete Maths and it - PowerPoint PPT Presentation

Testing Continuous Distributions Artur Czumaj Artur Czumaj DIMAP (Centre for Discrete Maths and it Applications) DIMAP ( entre for D screte Maths and t Appl cat ons) & Department of Computer Science University of Warwick Joint work with A.

Directed Graphs Artur Czumaj DIMAP and Department of Computer Science University of Warwick

neighbor graph Artur Czumaj DIMAP (Centre for Discrete Mathematics and it Applications)

Conjunctive grammars generate non-regular unary languages Artur Je z August 21, 2007 Artur

At-Speed BIST for Board-Level Interconnect Artur Jutman artur@pld.ttu.ee Tallinn University of

&lt;3 Thursday, July 23, 2009 Artur Bergman artur@crucially.net perl hacker

Equations over sets of natural numbers. Artur Je z University of Wroclaw December 13, 2007

Expressing high level optimizations within LLVM Artur Pilipenko artur.pilipenko@azul.com This

Use of the LLVM framework for the MSIL code generation Artur PIETREK artur.pietrek@imag.fr

Local compression and Word Equations Artur Je MPI, Germany 28 February 2013 Compression and

Compressed Membership for NFA (DFA) with Compressed Labels is in NP (P) Artur Je University of

Learning, Expectations, and Endogenous Business Cycles Artur Doshchyn &amp; Nicola Giommetti

Fully compressed pattern matching by recompression Artur Je University of Wrocaw 9 VII 2012

Conjunctive grammars over a unary alphabet Artur Je z, Alexander Okhotin September 7, 2007

Compressed Membership for NFA (DFA) with Compressed Labels is in NP (P) Artur Je Wrocaw,

Computational completeness of equations over sets of natural numbers Artur Je z Alexander

DFA hyper-minimisation l Gawrychowski 1 Artur Je z 1 Pawe Institute of Computer Science,

International Association for Cryptologic Research Christian Cachin President, IACR Crypto 2018

Outline Vienna, Austria - introduction to the giRaph package The giRaph package for graph

Overview MAXENT-Modeling: A framework for Discrete MAXENT-Models and RMs IRT-Modeling?

Undirected graphical models Graph G : arbitrary undirected graph Useful when variables interact

Bayesian games with continuous type spaces: The &quot;Study groups&quot; game Felix Munoz-Garcia

Probabilit y r u les FU N DAME N TAL S OF BAYE SIAN DATA AN ALYSIS IN R Rasm u s Bth Data

Formal Modeling in Cognitive Science 1 Special Probability Distributions Uniform Distribution

Probability Recap MDM4U: Mathematics of Data Management Determine the probability of rolling each

<3 Thursday, July 23, 2009 Artur Bergman artur@crucially.net perl hacker

Learning, Expectations, and Endogenous Business Cycles Artur Doshchyn & Nicola Giommetti

Bayesian games with continuous type spaces: The "Study groups" game Felix Munoz-Garcia