bayes and lancaster at the chinese restaurant
play

Bayes and Lancaster at the Chinese restaurant. Statistical uses of - PowerPoint PPT Presentation

Bayes and Lancaster at the Chinese restaurant. Statistical uses of the Fleming-Viot Process. Dario Span` o University of Warwick 1st Berlin-Padova Young Researchers Workshop 23-25 October, 2014 0 / 1 Based on joint works with Bob Gri ffi ths


  1. Bayes and Lancaster at the Chinese restaurant. Statistical uses of the Fleming-Viot Process. Dario Span` o University of Warwick 1st Berlin-Padova Young Researchers Workshop 23-25 October, 2014 0 / 1

  2. Based on joint works with Bob Gri ffi ths (Oxford) Paul Jenkins (Warwick) Matteo Ruggiero (Torino) and Omiros Papaspiliopoulos (Barcelona) 1 / 1

  3. Outline Chinese Restaurant Process and Bayes Computable filters Fleming-Viot Lancaster joins the restaurant 2 / 1

  4. Dirichlet measures and the Chinese restaurant process Infinitely many delegates participate to an important probability young researcher workshop. Day 1: dinner at the chinese restaurant. Delegates enter the room one by one, and, if k tables occupied by n 1 , . . . , n k persons ( P k i =1 n i = n ), then the ( n + 1)-th delegate: joins table with n j people with probability n j / ( n + θ ) ( j = 1 , . . . , k ); chooses a new table with probability θ / ( n + θ ); each new table labelled with a color chosen at random from a space E of colors, using a prob. distribution P 0 . 3 / 1

  5. Dirichlet measures and the Chinese restaurant process Infinitely many delegates participate to an important probability young researcher workshop. Day 1: dinner at the chinese restaurant. Delegates enter the room one by one, and, if k tables occupied by n 1 , . . . , n k persons ( P k i =1 n i = n ), then the ( n + 1)-th delegate: joins table with n j people with probability n j / ( n + θ ) ( j = 1 , . . . , k ); chooses a new table with probability θ / ( n + θ ); each new table labelled with a color chosen at random from a space E of colors, using a prob. distribution P 0 . Let X n = “color of table occupied by n -th delegate”, n 2 N . Denote X ( n ) = ( X 1 , . . . , X n ) and n X e n ( X ( n ) ) := 1 δ X i , n 2 N . n i =1 3 / 1

  6. Bayes at the Chinese restaurant. The sequence ( X 1 , X 2 , . . . ) is infinitely exchangeable. 4 / 1

  7. Bayes at the Chinese restaurant. The sequence ( X 1 , X 2 , . . . ) is infinitely exchangeable. Prior : e ( X ( n ) ) ! a . s . n !1 F where F ⇠ π θ , P 0 Ferguson-Dirichlet : π θ , P 0 ( F ( A 1 ) , . . . , F ( A d )) ⇠ Dir ( θ P 0 ( A 1 ) , . . . , θ P 0 ( A d )) , for every d 2 N and every partition ( A 1 , . . . , A d ) of E . 4 / 1

  8. Bayes at the Chinese restaurant. The sequence ( X 1 , X 2 , . . . ) is infinitely exchangeable. Prior : e ( X ( n ) ) ! a . s . n !1 F where F ⇠ π θ , P 0 Ferguson-Dirichlet : π θ , P 0 ( F ( A 1 ) , . . . , F ( A d )) ⇠ Dir ( θ P 0 ( A 1 ) , . . . , θ P 0 ( A d )) , for every d 2 N and every partition ( A 1 , . . . , A d ) of E . Likelihood: � � = µ ⌦ n L X ( n ) | F = µ 4 / 1

  9. Bayes at the Chinese restaurant. The sequence ( X 1 , X 2 , . . . ) is infinitely exchangeable. Prior : e ( X ( n ) ) ! a . s . n !1 F where F ⇠ π θ , P 0 Ferguson-Dirichlet : π θ , P 0 ( F ( A 1 ) , . . . , F ( A d )) ⇠ Dir ( θ P 0 ( A 1 ) , . . . , θ P 0 ( A d )) , for every d 2 N and every partition ( A 1 , . . . , A d ) of E . Likelihood: � � = µ ⌦ n L X ( n ) | F = µ Posterior: L ( F | x ( n ) ) ⇠ π θ + n , θ + n P 0 . n θ θ + n e ( x ( n ) )+ . 4 / 1

  10. How crowded is your table? Under π θ , P 0 , the pdf of ( F ( A 1 ) , . . . , F ( A d )) is 2 3 ⇣ ⌘ k Y ( x 1 , ..., x d ) 2 [0 , 1] d : | x | = 1 x θ P 0 ( A j ) � 1 4 5 I / j j =1 If E = { 0 , 1 } , then P 0 = p 0 2 [0 , 1] so π θ , p 0 = beta ( θ p 0 , θ (1 � p 0 )) If E any polish, then F ( A ) ⇠ beta ( θ P 0 ( A ) , θ (1 � P 0 ( A ))) 5 / 1

  11. Di ff usion model The time-evolution of a genetic variant, or allele, is well approximated by a di ff usion process on the interval [0 , 1]. 1 Allele frequency 0 Time 6 / 1

  12. Di ff usion model The time-evolution of a genetic variant, or allele, is well approximated by a di ff usion process on the interval [0 , 1]. 1 Allele frequency 0 Time Wright-Fisher SDE p dF t = b θ ( F t ) dt + F t (1 � F t ) dW t , F 0 = µ, t � 0 . The infinitesimal drift, b θ ( x ), encapsulates directional forces such as natural selection, migration, mutation, . . . 6 / 1

  13. Filtering with genetic time series data. We do not observe the path of the frequency di ff usion F = ( F t : t � 0), but only samples taken at distinct time points t 1 < . . . < t k . Key assumption on likelihood iid X 1 ( t ) , . . . , X n ( t ) ( t ) | F t ⇠ F t , t 2 { t 1 , . . . , t k } 1 Allele frequency 0 Time How to infer di ff usion sample path properties given data? 7 / 1

  14. Optimal filter. Assume the di ff usion has stationary measure π and transition function P t ( µ, d ν ). Let f µ ( x ) the likelihood of data given signal, both at stationarity. Two operators. Update operator (Bayes’ rule) : φ x ( π )( d µ ) = f µ ( x ) π ( d µ ) E π ( X ) prediction operator (propagator) : Z ψ t ( π )( d ν ) = π ( d µ ) P t ( µ, d ν ) M 1 Definition The optimal filter is the solution of the recursion π 0 = φ x t 0 ( π ) , π n = φ x tn ( ψ t n � t n − 1 ( π n )) it is called computable filter if iterating n times update/propagation involves finite sums whose number of terms depends on n . 8 / 1

  15. Filtering with genetic time series data. We do not observe the path of the frequency di ff usion F = ( F t : t � 0), but only samples taken at distinct time points t 1 < . . . < t k . Key assumption on likelihood iid X 1 ( t ) , . . . , X n ( t ) ( t ) | F t ⇠ F t , t 2 { t 1 , . . . , t k } 1 Allele frequency 0 Time How to infer di ff usion sample path properties given data? 9 / 1

  16. Filtering with genetic time series data. We do not observe the path of the frequency di ff usion F = ( F t : t � 0), but only samples taken at distinct time points t 1 < . . . < t k . Key assumption on likelihood iid X 1 ( t ) , . . . , X n ( t ) ( t ) | F t ⇠ F t , t 2 { t 1 , . . . , t k } 1 Allele frequency 0 Time A priori , F t 1 ⇠ π 10 / 1

  17. Filtering with genetic time series data. We do not observe the path of the frequency di ff usion F = ( F t : t � 0), but only samples taken at distinct time points t 1 < . . . < t k . Key assumption on likelihood iid X 1 ( t ) , . . . , X n ( t ) ( t ) | F t ⇠ F t , t 2 { t 1 , . . . , t k } 1 Allele frequency 0 Time Update F t 0 | Data at t 1 11 / 1

  18. Filtering with genetic time series data. We do not observe the path of the frequency di ff usion F = ( F t : t � 0), but only samples taken at distinct time points t 1 < . . . < t k . Key assumption on likelihood iid X 1 ( t ) , . . . , X n ( t ) ( t ) | F t ⇠ F t , t 2 { t 1 , . . . , t k } 1 Allele frequency 0 Time Predict F 2 based on posterior update at t 1 via P t 2 � t 1 ( F x ( n 1) ( t 1 ) , · ) 12 / 1

  19. Filtering with genetic time series data. We do not observe the path of the frequency di ff usion F = ( F t : t � 0), but only samples taken at distinct time points t 1 < . . . < t k . Key assumption on likelihood iid X 1 ( t ) , . . . , X n ( t ) ( t ) | F t ⇠ F t , t 2 { t 1 , . . . , t k } Update distribution of F t 2 given Data at t 2 . Carry on for t 3 , t 4 , · · · . 13 / 1

  20. Tractability of a filter. p dF t = b θ ( F t ) dt + F t (1 � F t ) dW t , F 0 = µ, t � 0 . Ideally we would like to be able to Know the stationary distribution π ; Know how to compute posterior P ( F t | Data at t ); Know how to compute P t ( µ, d ν ). Generally all three aspects are intractable. Neutral Fleming-Viot models have them all ! b α , β ( x ) = 1 2[ α (1 � x ) � β x ] , α , β > 0 . 14 / 1

  21. Tractability of a filter. p dF t = b θ ( F t ) dt + F t (1 � F t ) dW t , F 0 = µ, t � 0 . Ideally we would like to be able to Know the stationary distribution π ; Beta( α , β ) distribution Know how to compute posterior P ( F t | Data at t );CRP Know how to compute P t ( µ, d ν ).Lancaster probability Generally all three aspects are intractable. Neutral Fleming-Viot models have them all ! b α , β ( x ) = 1 2[ α (1 � x ) � β x ] , α , β > 0 . 14 / 1

  22. What are Lancaster probabilities? Definition Let ( X , Y ) be an exchangeable pair of random variables with (identical) marginal distn. π . The joint distribution of ( X , Y ) is a Lancaster probability distribution if, for every n , E [ Y n | X = x ] = ρ n x n + polynomial in x of degree less than n The coe ffi cients { ρ n } are termed Canonical Correlation Coe ffi cients . In neutral Fleming-Viot model P t µ n = e � 1 2 n ( n + θ � 1) t µ n + . . . , θ = α + β . Benefit in filtering: Given F 0 = µ sample of size n from µ is su ffi cient to predict sample of size n at time t . 15 / 1

  23. Genealogy and eigenvalues The canonical correlation coe ffi cients e � 1 2 n ( n + θ � 1) t are the eigenvalues of the semigroup P t . A probabilistic interpretation is in terms of the model’s genealogy (dual to the di ff usion). 1 Allele frequency 0 Time 16 / 1

  24. Neutral model, a closer look. Finite population of size N , discrete, non-overlapping generations. at each generation, type of individuals J 1 , . . . , J N labeled with points in some Polish space E (e.g. E = { 0 , 1 } ). At each time k , each individual picks her parent uniformly at random from previous generation k � 1. Any individual with probability 1 � u inherits her parent’s type. With probability u it mutates to a new type chosen from E according to a probability distribution P 0 on E (if E = { 0 , 1 } , then P 0 { 1 } 2 [0 , 1]). Let N X F N ( k ) := 1 δ J i ( k ) , k = 0 , 1 , . . . . N i =1 17 / 1

Recommend


More recommend