Need for Estimating . . . Case of Interval . . . Need to Preserve . . . Computing E under . . . Estimating Estimating Covariance . . . Statistical Characteristics Estimating . . . Proof of the First . . . Under Interval Uncertainty Toward Justification of . . . Towards Proving the . . . and Constraints: Home Page Mean, Variance, Covariance, Title Page and Correlation ◭◭ ◮◮ ◭ ◮ Ali Jalal-Kamali Page 1 of 46 Department of Computer Science Go Back The University of Texas at El Paso El Paso, TX 79968, USA Full Screen December 2011 Close Quit
Need for Estimating . . . Case of Interval . . . 1. Need for Estimating Statistical Characteristics Need to Preserve . . . • Often, we have a sample of values x 1 , . . . , x n corre- Computing E under . . . sponding to objects of a certain type. Estimating Covariance . . . Estimating . . . • A standard way to describe the population is to de- Proof of the First . . . scribe its mean, variance, and standard deviation: Toward Justification of . . . n n � � √ E = 1 V = 1 ( x i − E ) 2 ; Towards Proving the . . . n · x i ; n · σ = V . Home Page i =1 i =1 Title Page • When we measure two quantities x and y : ◭◭ ◮◮ – we describe the means E x , E y , variances V x , V y and ◭ ◮ standard deviations σ x , σ y of both; – we also estimate their covariance and correlation: Page 2 of 46 n � Go Back C x,y = 1 C x,y n · ( x i − E x ) · ( y i − E y ); ρ x,y = . σ x · σ y Full Screen i =1 Close Quit
Need for Estimating . . . Case of Interval . . . 2. Case of Interval Uncertainty Need to Preserve . . . • The above formulas assume that we know the exact Computing E under . . . values of the characteristics x 1 , . . . , x n . Estimating Covariance . . . Estimating . . . • In practice, values usually come from measurements, Proof of the First . . . and measurements are never absolutely exact. Toward Justification of . . . • The measurement results � x i are, in general, different Towards Proving the . . . from the actual (unknown) values x i : � x i � = x i . Home Page • Often, it is assumed that we know the probability dis- Title Page def tribution of the measurement errors ∆ x i = � x i − x i . ◭◭ ◮◮ • However, often, the only information available is the ◭ ◮ upper bound on the measurement error: | ∆ x i | ≤ ∆ i . Page 3 of 46 • In this case, the only information that we have about Go Back the actual value x i is that x i ∈ x i = [ x i , x i ], where Full Screen x i = � x i − ∆ i , x i = � x i + ∆ i . Close Quit
Need for Estimating . . . Case of Interval . . . 3. Need to Preserve Privacy in Statistical Databases Need to Preserve . . . • In order to find relations between different quantities, Computing E under . . . we collect a large amount of data . Estimating Covariance . . . Estimating . . . • Example: we collect medical data to try to find corre- Proof of the First . . . lations between a disease and lifestyle factors. Toward Justification of . . . • In some cases, we are looking for commonsense corre- Towards Proving the . . . lations, e.g., between smoking and lung diseases. Home Page • For statistical databases to be most useful, we need to Title Page allow researchers to ask arbitrary questions . ◭◭ ◮◮ • However, this may inadvertently disclose some private ◭ ◮ information about the individuals. Page 4 of 46 • Therefore, it is desirable to preserve privacy in statis- Go Back tical databases. Full Screen Close Quit
Need for Estimating . . . Case of Interval . . . 4. Intervals as a Way to Preserve Privacy in Sta- Need to Preserve . . . tistical Databases Computing E under . . . • One way to preserve privacy is to store ranges (inter- Estimating Covariance . . . vals) rather than the exact data values. Estimating . . . Proof of the First . . . • This makes sense from the viewpoint of a statistical database. Toward Justification of . . . Towards Proving the . . . • In general, this is how data is often collected: Home Page – we set some threshold values t 0 , . . . , t N and Title Page – ask a person whether the actual value x i is in the ◭◭ ◮◮ interval [ t 0 , t 1 ], or . . . , or in the interval [ t N − 1 , t N ]. ◭ ◮ • As a result, for each quantity x and for each person i : Page 5 of 46 – instead of the exact value x i , Go Back – we store an interval x i = [ x i , x i ] that contains x i . Full Screen • Each of these intervals coincides with one of the given ranges [ t 0 , t 1 ], [ t 1 , t 2 ], . . . , [ t N − 1 , t N ] . Close Quit
Need for Estimating . . . Case of Interval . . . 5. Need to Estimate Statistical Characteristics Need to Preserve . . . S ( x 1 , . . . ) Under Interval Uncertainty Computing E under . . . • In both situations of measurement errors and privacy: Estimating Covariance . . . Estimating . . . – instead of the actual values x i (and y i ), Proof of the First . . . – we only know the intervals x i (and y i ) that contain Toward Justification of . . . the actual values. Towards Proving the . . . • Different values of x i (and y i ) from these intervals lead, Home Page in general, to different values of each characteristic. Title Page • It is desirable to find the range of possible values of ◭◭ ◮◮ these characteristics when x i ∈ x i (and y i ∈ y i ): ◭ ◮ S = { S ( x 1 , . . . , x n ) : x 1 ∈ x 1 , . . . , x n ∈ x n } ; Page 6 of 46 S = { S ( x 1 , . . . , x n , y 1 , . . . , y n ) : Go Back x 1 ∈ x 1 , . . . , x n ∈ x n , y 1 ∈ y 1 , . . . , y n ∈ y n } . Full Screen Close Quit
Need for Estimating . . . Case of Interval . . . 6. Estimating Statistical Characteristics under In- Need to Preserve . . . terval Uncertainty: What is Known Computing E under . . . n � • The mean E = 1 Estimating Covariance . . . n · x i is an increasing function of Estimating . . . i =1 all its inputs x 1 , . . . , x n . Proof of the First . . . Toward Justification of . . . • Hence, E is the smallest when all the inputs x i ∈ [ x i , x i ] n n Towards Proving the . . . � � are the smallest ( x i = x i ): E = 1 x i ; E = 1 n · n · x i . Home Page i =1 i =1 Title Page • However, variance, covariance, and correlation are, in ◭◭ ◮◮ general, non-monotonic. ◭ ◮ • It is known that computing the ranges of these char- Page 7 of 46 acteristics under interval uncertainty is NP-hard. Go Back • The problem gets even more complex because in prac- tice, we often have additional constraints. Full Screen Close Quit
Need for Estimating . . . Case of Interval . . . 7. Formulation of the Problem and What We Did Need to Preserve . . . • Reminder: under interval uncertainty, Computing E under . . . Estimating Covariance . . . – in the absence of constraints, computing the range Estimating . . . E of the mean E is feasible; Proof of the First . . . – computing the ranges V , C , and [ ρ, ρ ] is NP-hard. Toward Justification of . . . • Problem: find practically useful cases when feasible al- Towards Proving the . . . gorithms are possible. Home Page • What is known: for V , we can feasibly compute: Title Page – one of the endpoints ( V ) – always; and ◭◭ ◮◮ – both endpoints – in the privacy case. ◭ ◮ • We designed: feasible algorithms for computing: Page 8 of 46 – the range E under constraints; Go Back – the range C in the privacy case; and Full Screen – one of the endpoints ρ or ρ . Close Quit
Need for Estimating . . . Case of Interval . . . 8. Computing E under Variance Constraints Need to Preserve . . . • In the previous expressions, we assumed only that x i Computing E under . . . belongs to the intervals x i = [ x i , x i ]. Estimating Covariance . . . Estimating . . . • In some cases, we have an additional a priori constraint Proof of the First . . . on x i : V ≤ V 0 , for a given V 0 . Toward Justification of . . . • For example, we know that within a species, there can Towards Proving the . . . be ≤ 0 . 1 variation of a certain characteristic. Home Page • Thus, we arrive at the following problem: Title Page – given: n intervals x i = [ x i , x i ] and a number V 0 ≥ 0; ◭◭ ◮◮ – compute: the range ◭ ◮ [ E, E ] = { E ( x 1 , . . . , x n ) : x i ∈ x i & V ( x 1 , . . . , x n ) ≤ V 0 } ; Page 9 of 46 – under the assumption that there exist values x i ∈ x i Go Back for which V ( x 1 , . . . , x n ) ≤ V 0 . Full Screen • This is a problem that we will solve in this thesis. Close Quit
Need for Estimating . . . Case of Interval . . . 9. Cases Where This Problem Is (Relatively) Easy Need to Preserve . . . to Solve Computing E under . . . • First case: V 0 is ≥ the largest possible value V of the Estimating Covariance . . . variance corresponding to the given sample. Estimating . . . Proof of the First . . . • In this case, the constraint V ≤ V 0 is always satisfied. Toward Justification of . . . • Thus, in this case, the desired range simply coincides Towards Proving the . . . with the range of all possible values of E . Home Page • Second case: V 0 = 0. Title Page • In this case, the constraint V ≤ V 0 means that the ◭◭ ◮◮ variance V should be equal to 0, i.e., x 1 = . . . = x n . ◭ ◮ • In this case, we know that this common value x i be- Page 10 of 46 longs to each of n intervals x i . Go Back • So, the set of all possible values E is the intersection: Full Screen E = x 1 ∩ . . . ∩ x n . Close Quit
Recommend
More recommend