Welcome back. Two populations. Which population? DNA data: Population 1: snp 843: Pr[A] = .4 , Pr[T] = .6 human1: A ··· C ··· T ··· A Population 2: snp 843: Pr[A] = .6 , Pr[T] = .4 human2: C ··· C ··· A ··· T Projects comments available on Glookup! human3: A ··· G ··· T ··· T Individual: x 1 , x 2 , x 3 .., x n . Turn in homework! Population 1: snp i : Pr[ x i = 1] = p ( 1 ) Single Nucleotide Polymorphism. i I am away April 15-20. Population 2: snp i : Pr[ x i = 0] = p ( 2 ) Same population? i Midterm out when I get back. Model: same populution breeds. Simpler Calculation: Few days takehome. Population 1: Gaussion with mean µ 1 ∈ R d , variance σ in each dim. Population 1: snp 843: Pr[A] = .4 , Pr[T] = .6 Shiftable. Population 2: Gaussion with mean µ 2 ∈ R d , variance σ in each dim. Population 2: snp 843: Pr[A] = .6 , Pr[T] = .4 Have handle on projects before that. Individual: x 1 , x 2 , x 3 .., x n . Progress report due Monday. Which population? Comment: snps could be movie preferences, populations could be types. E.g., republican/democrat, shopper/saver. Gaussians Projection Don’t know much about... Population 1: Gaussion with mean µ 1 ∈ R d , std deviation σ in each dim. Population 1: Gaussion with mean µ 1 ∈ R d , variance σ in each dim. Population 2: Gaussion with mean µ 2 ∈ R d , std deviation σ in each Population 2: Gaussion with mean µ 2 ∈ R d , variance σ in each dim. dim. Difference between humans σ per snp. Difference between humans σ per snp. Difference between populations ε per snp. Difference between populations ε per snp. Project x onto unit vector v in direction µ 2 − µ 1 . How many snps to collect to determine population for individual x ? E [(( x − µ 1 ) · v ) 2 ] = 0 if x is population 1. Don’t know µ 1 or µ 2 ? x in population 1. E [(( x − µ 2 ) · v ) 2 ] ≥ ( µ 1 − µ 2 ) 2 if x is population 2. E [( x − µ 1 ) 2 ] = d σ 2 √ Std deviation is σ 2 ! versus d σ 2 ! E [( x − µ 2 ) 2 ] ≥ ( d − 1 ) σ 2 +( µ 1 − µ 2 ) 2 . No loss in signal! If ( µ 1 − µ 2 ) 2 = d ε 2 >> σ 2 , then different. d ε 2 >> σ 2 . → take d >> σ 2 / ε 2 → d >> σ 2 / ε 2 Variance of estimator? Versus d >> σ 4 / ε 4 . Roughly d σ 4 . Signal is difference between expecations. A quadratic difference in amount of data! roughly d ε 2 √ Signal >> Noise. ↔ d ε 2 >> d σ 2 . Need d >> σ 4 / ε 4 .
Without the means? Principal components analysis. Nets Remember Projection! Sample of n people. “ δ - Net”. Some (say half) from population 1, Set D of directions some from population 2. where all others, v , are close to x ∈ D . Don’t know µ 1 or µ 2 ? x · v ≥ 1 − δ . Which are which? δ - Net: Near Neighbors Approach [ ··· , i δ / d , ··· ] integers i ∈ [ − d / δ , d δ ] . Principal component analysis: Compute Euclidean distance squared. � O ( d ) Find direction, v , of maximum variance. � Cluster using threshold. d Total of N ∝ vectors in net. Maximize ∑ ( x · v ) 2 (zero center the points) δ Signal E [ d ( x 1 , x 2 )] − E [ d ( x 1 , y 1 )] Recall: ( x · v ) 2 could determine population. Signal >> Noise times log N = O ( d log d δ ) to isolate direction. should be larger than noise in d ( x , y ) Typical direction variance. n σ 2 . Where x ’s from one population, y ’s from other. log N is due to union bound over vectors in net. Direction along µ 1 − µ 2 , Signal is proportional d ε 2 . Signal (exp. projection): ∝ nd ε 2 . ∝ n ( µ 1 − µ 2 ) 2 . Noise (std dev.): √ n σ 2 . √ d σ 2 . ∝ nd ε 2 . Noise is proportional to nd >> ( σ 4 / ε 4 ) log d and d >> σ 2 / ε 2 works. Need d >> σ 2 / ε 2 at least. d >> σ 4 / ε 4 → same type people closer to each other. Nearest neighbor works with very high d > σ 4 / ε 4 . d >> ( σ 4 / ε 4 ) log n suffices for threshold clustering. When will PCA pick correct direction with good probability? � n PCA can reduce d to “knowing centers” case, with reasonable � Union bound over directions. How many directions? log n factor for union bound over pairs. 2 number of sample points. Best one can do? Infinity and beyond! PCA calculation. Computing eigenvalues. Sum up. Power method: Matrix A where rows are points. Choose random x . First eigenvector of B = A T A is maximum variance direction. Repeat: Let x = Bx . Scale x to unit vector. Clustering mixture of gaussians. Av are projections onto v . x = a 1 v 1 + a 2 v 2 + ··· vBv = ( vA ) T ( Av ) is ∑ x ( x · v ) 2 . Near Neighbor works with sufficient data. x t ∝ B t x = a 1 λ t 1 v 1 + a 2 λ t 2 v 2 + ··· First eigenvector, v , of B maximizes x T Bx . Projection onto subspace of means is better. Mostly v 1 after a while since λ t 1 >> λ t 2 . Principal compent analysis can find subspace of means. Bv = λ v for maximum λ . Cluster Algorithm: → vBv = λ for unit v . Power method computes principal component. Choose random partition. Eigenvectors form orthonormal basis. Repeat: Compute means of partition. Project, cluster. Generic clustering algorithm is rounded version of power method. Any other vector av + x , x · v = 0 Choose random + 1 / − 1 vector. Multiply by A T (direction between x is composed of possibly smaller eigenvalue vectors. means), multiply by A (project points), cluster (round to +1/-1 vector.) → vBv ≥ ( av + x ) B ( av + x ) for unit v , av + x . Sort of repeatedly multiplying by AA T . Power method.
See you on Thursday.
Recommend
More recommend