selecting
play

Selecting Statistics the Most Representative How to describe - PowerPoint PPT Presentation

Introduction to the . . . Population: exact . . . Statistical characteristics Sample Selecting Statistics the Most Representative How to describe closeness Formulation of the . . . Sample Main results Auxiliary result is NP-Hard: Proof:


  1. Introduction to the . . . Population: exact . . . Statistical characteristics Sample Selecting Statistics the Most Representative How to describe closeness Formulation of the . . . Sample Main results Auxiliary result is NP-Hard: Proof: main idea Need for Expert (Fuzzy) Proof (cont-d) Title Page Knowledge ◭◭ ◮◮ ◭ ◮ J. Esteban Gamez 1 , Fran¸ cois Modave 1 , and Olga Kosheleva 2 Page 1 of 13 Departments of 1 Computer Science and 2 Teacher Education Go Back University of Texas, El Paso, TX 79968, USA contact email olgak@utep.edu Full Screen Close Quit

  2. Introduction to the . . . Population: exact . . . 1. Outline Statistical characteristics • One of the main applications of fuzzy is to formalize Sample the notions of “typical”, “representative”, etc. Statistics How to describe closeness • The main idea behind fuzzy: formalize expert knowl- Formulation of the . . . edge expressed by words from natural language. Main results • In this talk, we show that Auxiliary result – if we do not use this knowledge, i.e., if we only use Proof: main idea the data, Proof (cont-d) – then selecting the most representative sample be- Title Page comes computationally difficult (NP-hard). ◭◭ ◮◮ • Thus, the need to find such samples in reasonable time ◭ ◮ justifies the use of fuzzy techniques. Page 2 of 13 Go Back Full Screen Close Quit

  3. Introduction to the . . . Population: exact . . . 2. Introduction to the problem Statistical characteristics • In practice: the population is often large, so we analyze Sample a sample. Statistics • Examples: poll, educational survey. How to describe closeness Formulation of the . . . • Idea: the more “representative” the sample, the larger Main results our confidence in the statistical results. Auxiliary result • Requirement: a representative sample should have the Proof: main idea same averages as the population. Proof (cont-d) • Example: the same average age, average income, etc. Title Page • Additional requirement: the sample should exhibit the ◭◭ ◮◮ same variety as the population. ◭ ◮ • Example: the sample should include both poorer and Page 3 of 13 reacher people. Go Back • Formalization: a representative sample should have the Full Screen same variance as the population. Close Quit

  4. Introduction to the . . . Population: exact . . . 3. Population: exact description Statistical characteristics By a population , we mean a tuple Sample Statistics def = � N, k, { x j,i }� , p How to describe closeness where: Formulation of the . . . Main results • N is an integer; this integer will be called the popula- Auxiliary result tion size; Proof: main idea • k is an integer; this integer is called the number of Proof (cont-d) characteristics ; Title Page • x j,i (1 ≤ j ≤ k, 1 ≤ i ≤ N ) are real numbers; ◭◭ ◮◮ • the real number x j,i will be called the value of the j -th ◭ ◮ characteristic for the i -th object. Page 4 of 13 Go Back Full Screen Close Quit

  5. Introduction to the . . . Population: exact . . . 4. Statistical characteristics Statistical characteristics • Let p = � N, k, { x j,i }� be a population, and let j be an Sample integer from 1 to k . Statistics • By the population mean E j of the j -th characteristic, How to describe closeness N Formulation of the . . . we mean the value E j = 1 � N · x j,i . Main results i =1 Auxiliary result • By the population variance V j of the j -th characteristic, Proof: main idea we mean the value Proof (cont-d) N V j = 1 Title Page � ( x j,i − E j ) 2 . N · ◭◭ ◮◮ i =1 • For every integer d ≥ 1, by the central moment M (2 d ) ◭ ◮ of j order 2 d of the j -th characteristic, we mean the value Page 5 of 13 N = 1 Go Back M (2 d ) � ( x j,i − E j ) 2 d . N · j Full Screen i =1 Close Quit

  6. Introduction to the . . . Population: exact . . . 5. Sample Statistical characteristics • Let N be a population size. Sample Statistics • By a sample , we mean a non-empty subset I ⊆ { 1 , 2 , . . . , N } . How to describe closeness • For every sample I , by its size , we mean the number Formulation of the . . . of elements in I . Main results • By the sample mean E j ( I ) of the j -th characteristic, Auxiliary result we mean the value E j ( I ) = 1 � n · x j,i . Proof: main idea i ∈ I Proof (cont-d) • By the sample variance V j ( I ) of the j -th characteristic, Title Page we mean the value V j ( I ) = 1 � ( x j,i − E j ( I )) 2 . ◭◭ ◮◮ n · i ∈ I ◭ ◮ • For every d ≥ 1, by the sample central moment M (2 d ) ( I ) Page 6 of 13 j of order 2 d of the j -th characteristic, we mean the value Go Back ( I ) = 1 M (2 d ) � ( x j,i − E j ( I )) 2 d . n · Full Screen j i ∈ I Close Quit

  7. Introduction to the . . . Population: exact . . . 6. Statistics Statistical characteristics • Let p = � N, k, { x j,i }� be a population, and let I be a Sample sample. Statistics How to describe closeness • By an E -statistics tuple corresponding to p , we mean a tuple t (1) def Formulation of the . . . = ( E 1 , . . . , E k ) . Main results • By an E -statistics tuple corresponding to I , we mean Auxiliary result def a tuple t (1) ( I ) = ( E 1 ( I ) , . . . , E k ( I )) . Proof: main idea • By an ( E, V ) -statistics tuple corresponding to p , we Proof (cont-d) mean a tuple t (2) def Title Page = ( E 1 , . . . , E k , V 1 , . . . , V k ) . ◭◭ ◮◮ • By an ( E, V ) -statistics tuple corresponding to I , we def ◭ ◮ mean a tuple t (2) ( I ) = ( E 1 ( I ) , . . . , E k ( I ) , V 1 ( I ) , . . . , V k ( I )) . Page 7 of 13 • For every integer d ≥ 1, we can similarly define a statis- tics tuple of order 2 d . Go Back Full Screen Close Quit

  8. Introduction to the . . . Population: exact . . . 7. How to describe closeness Statistical characteristics • By a distance function , we mean a mapping ρ that Sample maps tuples t and t ′ into a real value ρ ( t, t ′ ) s.t. Statistics How to describe closeness • ρ ( t, t ) = 0 for all tuples t and Formulation of the . . . • ρ ( t, t ′ ) > 0 for all t � = t ′ . Main results • Example: Euclidean metric between the tuples t = Auxiliary result ( t 1 , t 2 , . . . ) and t ′ = ( t ′ 1 , t ′ 2 , . . . ): Proof: main idea �� Proof (cont-d) ρ ( t, t ′ ) = ( t j − t ′ j ) 2 . Title Page j ◭◭ ◮◮ ◭ ◮ Page 8 of 13 Go Back Full Screen Close Quit

  9. Introduction to the . . . Population: exact . . . 8. Formulation of the problem Statistical characteristics • Let ρ be a distance function. Sample Statistics • E -sample selection problem corresponding to ρ : How to describe closeness – Given: Formulation of the . . . ∗ a population p = � N, k, { x j,i }� , and Main results ∗ an integer n < N . Auxiliary result – Find: a sample I ⊆ { 1 , . . . , N } of size n for which Proof: main idea the distance ρ ( t (1) ( I ) , t (1) ) is the smallest possible. Proof (cont-d) Title Page • ( E, V ) -sample selection problem corresponding to ρ : ◭◭ ◮◮ – Given: ◭ ◮ ∗ a population p = � N, k, { x j,i }� , and Page 9 of 13 ∗ an integer n < N . – Find: a sample I ⊆ { 1 , . . . , N } of size n for which Go Back the distance ρ ( t (2) ( I ) , t (2) ) is the smallest possible. Full Screen Close Quit

  10. Introduction to the . . . Population: exact . . . 9. Main results Statistical characteristics • For every distance function ρ , the corresponding E - Sample sample selection problem is NP-hard. Statistics How to describe closeness • For every distance function ρ , the corresponding ( E, V )- Formulation of the . . . sample selection problem is NP-hard. Main results • For every distance function ρ and for every d ≥ 1, the Auxiliary result (2 d )-th order sample selection problem is NP-hard. Proof: main idea Proof (cont-d) Title Page ◭◭ ◮◮ ◭ ◮ Page 10 of 13 Go Back Full Screen Close Quit

  11. Introduction to the . . . Population: exact . . . 10. Auxiliary result Statistical characteristics • In our proofs: we considered the case when the desired Sample sample contains half of the original population. Statistics How to describe closeness • In practice: samples usually form a much smaller por- Formulation of the . . . tion of the population. Main results • A natural question: Auxiliary result – fix 2 P ≫ 2, and Proof: main idea – look for samples which constitute the (2 P )-th part Proof (cont-d) of the original population. Title Page ◭◭ ◮◮ • Result: the resulting problems of selecting the most representative sample are still NP-hard. ◭ ◮ Page 11 of 13 Go Back Full Screen Close Quit

Recommend


More recommend