Welcome back. Two populations. Which population? DNA data: - PowerPoint PPT Presentation

Welcome back. Two populations. Which population? DNA data: Population 1: snp 843: Pr[A] = .4 , Pr[T] = .6 human1: A ··· C ··· T ··· A Population 2: snp 843: Pr[A] = .6 , Pr[T] = .4 human2: C ··· C ··· A ··· T Projects comments available on Glookup! human3: A ··· G ··· T ··· T Individual: x 1 , x 2 , x 3 .., x n . Turn in homework! Population 1: snp i : Pr[ x i = 1] = p ( 1 ) Single Nucleotide Polymorphism. i I am away April 15-20. Population 2: snp i : Pr[ x i = 0] = p ( 2 ) Same population? i Midterm out when I get back. Model: same populution breeds. Simpler Calculation: Few days takehome. Population 1: Gaussion with mean µ 1 ∈ R d , variance σ in each dim. Population 1: snp 843: Pr[A] = .4 , Pr[T] = .6 Shiftable. Population 2: Gaussion with mean µ 2 ∈ R d , variance σ in each dim. Population 2: snp 843: Pr[A] = .6 , Pr[T] = .4 Have handle on projects before that. Individual: x 1 , x 2 , x 3 .., x n . Progress report due Monday. Which population? Comment: snps could be movie preferences, populations could be types. E.g., republican/democrat, shopper/saver. Gaussians Projection Don’t know much about... Population 1: Gaussion with mean µ 1 ∈ R d , std deviation σ in each dim. Population 1: Gaussion with mean µ 1 ∈ R d , variance σ in each dim. Population 2: Gaussion with mean µ 2 ∈ R d , std deviation σ in each Population 2: Gaussion with mean µ 2 ∈ R d , variance σ in each dim. dim. Difference between humans σ per snp. Difference between humans σ per snp. Difference between populations ε per snp. Difference between populations ε per snp. Project x onto unit vector v in direction µ 2 − µ 1 . How many snps to collect to determine population for individual x ? E [(( x − µ 1 ) · v ) 2 ] = 0 if x is population 1. Don’t know µ 1 or µ 2 ? x in population 1. E [(( x − µ 2 ) · v ) 2 ] ≥ ( µ 1 − µ 2 ) 2 if x is population 2. E [( x − µ 1 ) 2 ] = d σ 2 √ Std deviation is σ 2 ! versus d σ 2 ! E [( x − µ 2 ) 2 ] ≥ ( d − 1 ) σ 2 +( µ 1 − µ 2 ) 2 . No loss in signal! If ( µ 1 − µ 2 ) 2 = d ε 2 >> σ 2 , then different. d ε 2 >> σ 2 . → take d >> σ 2 / ε 2 → d >> σ 2 / ε 2 Variance of estimator? Versus d >> σ 4 / ε 4 . Roughly d σ 4 . Signal is difference between expecations. A quadratic difference in amount of data! roughly d ε 2 √ Signal >> Noise. ↔ d ε 2 >> d σ 2 . Need d >> σ 4 / ε 4 .

Without the means? Principal components analysis. Nets Remember Projection! Sample of n people. “ δ - Net”. Some (say half) from population 1, Set D of directions some from population 2. where all others, v , are close to x ∈ D . Don’t know µ 1 or µ 2 ? x · v ≥ 1 − δ . Which are which? δ - Net: Near Neighbors Approach [ ··· , i δ / d , ··· ] integers i ∈ [ − d / δ , d δ ] . Principal component analysis: Compute Euclidean distance squared. � O ( d ) Find direction, v , of maximum variance. � Cluster using threshold. d Total of N ∝ vectors in net. Maximize ∑ ( x · v ) 2 (zero center the points) δ Signal E [ d ( x 1 , x 2 )] − E [ d ( x 1 , y 1 )] Recall: ( x · v ) 2 could determine population. Signal >> Noise times log N = O ( d log d δ ) to isolate direction. should be larger than noise in d ( x , y ) Typical direction variance. n σ 2 . Where x ’s from one population, y ’s from other. log N is due to union bound over vectors in net. Direction along µ 1 − µ 2 , Signal is proportional d ε 2 . Signal (exp. projection): ∝ nd ε 2 . ∝ n ( µ 1 − µ 2 ) 2 . Noise (std dev.): √ n σ 2 . √ d σ 2 . ∝ nd ε 2 . Noise is proportional to nd >> ( σ 4 / ε 4 ) log d and d >> σ 2 / ε 2 works. Need d >> σ 2 / ε 2 at least. d >> σ 4 / ε 4 → same type people closer to each other. Nearest neighbor works with very high d > σ 4 / ε 4 . d >> ( σ 4 / ε 4 ) log n suffices for threshold clustering. When will PCA pick correct direction with good probability? � n PCA can reduce d to “knowing centers” case, with reasonable � Union bound over directions. How many directions? log n factor for union bound over pairs. 2 number of sample points. Best one can do? Infinity and beyond! PCA calculation. Computing eigenvalues. Sum up. Power method: Matrix A where rows are points. Choose random x . First eigenvector of B = A T A is maximum variance direction. Repeat: Let x = Bx . Scale x to unit vector. Clustering mixture of gaussians. Av are projections onto v . x = a 1 v 1 + a 2 v 2 + ··· vBv = ( vA ) T ( Av ) is ∑ x ( x · v ) 2 . Near Neighbor works with sufficient data. x t ∝ B t x = a 1 λ t 1 v 1 + a 2 λ t 2 v 2 + ··· First eigenvector, v , of B maximizes x T Bx . Projection onto subspace of means is better. Mostly v 1 after a while since λ t 1 >> λ t 2 . Principal compent analysis can find subspace of means. Bv = λ v for maximum λ . Cluster Algorithm: → vBv = λ for unit v . Power method computes principal component. Choose random partition. Eigenvectors form orthonormal basis. Repeat: Compute means of partition. Project, cluster. Generic clustering algorithm is rounded version of power method. Any other vector av + x , x · v = 0 Choose random + 1 / − 1 vector. Multiply by A T (direction between x is composed of possibly smaller eigenvalue vectors. means), multiply by A (project points), cluster (round to +1/-1 vector.) → vBv ≥ ( av + x ) B ( av + x ) for unit v , av + x . Sort of repeatedly multiplying by AA T . Power method.

See you on Thursday.

Welcome back. Two populations. Which population? DNA data: - PowerPoint PPT Presentation

Welcome back. Two populations. Which population? DNA data: Population 1: snp 843: Pr[A] = .4 , Pr[T] = .6 human1: A C T A Population 2: snp 843: Pr[A] = .6 , Pr[T] = .4 human2: C C A T Projects

Welcome back... Welcome back... ..to me. Welcome back... ..to me. Test out Welcome back...

Possum Populations Examining changes in possum populations at two different areas of Australia

Population Ecology 1. Population Concepts 2. Population Growth 3. Regulation of Population

Today. Two populations. Which population? DNA data: Population 1: snp 843: Pr[A] = .4 , Pr[T] =

Welcome back. Today. Welcome back. Today. Continue Sampling combinatorial structures. Welcome

Counting Words: Type probabilities Population models Type-rich populations, samples, ZM &

Welcome back. Projects comments available on Glookup! Welcome back. Projects comments available

Analysis of population genetic data: Identifying populations or stocks Robin Waples NOAA

Modeling Prey-Predator Populations Alison Pool and Lydia Silva December 13, 2006 Alison Pool and

World Population Trends January 26, 2012 World Population Trends World Population Growth

Interactions: Populations and Communities Population Interactions A population of organisms

Variance Estimation in Complex Samples: The Finite Population Bootstrap Using Pseudo-Populations

THE BACK THE MUSCLES THE MUSCLES OF THE BACK THE MUSCLES OF THE BACK Muscles of the back are

PREVENTION OF LOW BACK PAIN : Back Education for Workplace and Home PRESENTED AT BODIJA-ASHI

Disability Claims Process Personal support through each step to get back to health, back to work

Back To Work Back To Work Back To Work Back To Work Q2 F2008 Results 31 January 2008 y

Selecting Statistics the Most Representative How to describe closeness Formulation of the . . .

Analysis Toolpack on a Mac It seems Excel has done away with the Analysis Toolpack on Macs They

STK-IN4300 Details of Random Forests Statistical Learning Methods in Data Science Adaptive

Bayesian Models for Combining Data Across Subjects and Studies in Predictive fMRI Data Analysis

Estimating Variance under Estimating Mean . . . Interval and Fuzzy Estimating Variance . . .

Statistical analysis and bootstrapping Michel Bierlaire michel.bierlaire@epfl.ch Transport and

Mee eeting E Employers Sharing data, stories, and tools February 9, 2017 Advancing your

Positive definite max-QP Recall max-cut max vE(1-v) s.t. 0 v 1 max

Welcome back. Two populations. Which population? DNA data: - PowerPoint PPT Presentation

Welcome back. Two populations. Which population? DNA data: Population 1: snp 843: Pr[A] = .4 , Pr[T] = .6 human1: A C T A Population 2: snp 843: Pr[A] = .6 , Pr[T] = .4 human2: C C A T Projects

Welcome back... Welcome back... ..to me. Welcome back... ..to me. Test out Welcome back...

Possum Populations Examining changes in possum populations at two different areas of Australia

Population Ecology 1. Population Concepts 2. Population Growth 3. Regulation of Population

Today. Two populations. Which population? DNA data: Population 1: snp 843: Pr[A] = .4 , Pr[T] =

Welcome back. Today. Welcome back. Today. Continue Sampling combinatorial structures. Welcome

Counting Words: Type probabilities Population models Type-rich populations, samples, ZM &amp;

Welcome back. Projects comments available on Glookup! Welcome back. Projects comments available

Analysis of population genetic data: Identifying populations or stocks Robin Waples NOAA

Modeling Prey-Predator Populations Alison Pool and Lydia Silva December 13, 2006 Alison Pool and

World Population Trends January 26, 2012 World Population Trends World Population Growth

Interactions: Populations and Communities Population Interactions A population of organisms

Variance Estimation in Complex Samples: The Finite Population Bootstrap Using Pseudo-Populations

THE BACK THE MUSCLES THE MUSCLES OF THE BACK THE MUSCLES OF THE BACK Muscles of the back are

PREVENTION OF LOW BACK PAIN : Back Education for Workplace and Home PRESENTED AT BODIJA-ASHI

Disability Claims Process Personal support through each step to get back to health, back to work

Back To Work Back To Work Back To Work Back To Work Q2 F2008 Results 31 January 2008 y

Selecting Statistics the Most Representative How to describe closeness Formulation of the . . .

Analysis Toolpack on a Mac It seems Excel has done away with the Analysis Toolpack on Macs They

STK-IN4300 Details of Random Forests Statistical Learning Methods in Data Science Adaptive

Bayesian Models for Combining Data Across Subjects and Studies in Predictive fMRI Data Analysis

Estimating Variance under Estimating Mean . . . Interval and Fuzzy Estimating Variance . . .

Statistical analysis and bootstrapping Michel Bierlaire michel.bierlaire@epfl.ch Transport and

Mee eeting E Employers Sharing data, stories, and tools February 9, 2017 Advancing your

Positive definite max-QP Recall max-cut max vE(1-v) s.t. 0 v 1 max

Counting Words: Type probabilities Population models Type-rich populations, samples, ZM &