Using DNA from many samples to distinguish pedigree relationships of close relatives Amy L. Williams @amythewilliams February 24, 2020 Family History Technology Workshop
Massive datasets: Many close relatives / small pedigrees >100,000 samples > 9 million samples ~500,000 samples >14 million samples π πβ1 In dataset with π individuals, have π = π« π 2 pairs 2 = 2
Goal: detect and reconstruct pedigrees using only DNA β¦
Signal: Identical by descent (IBD) sharing β’ Close (and some distant) relatives share large regions identical by descent (IBD) β Represented here as same color β’ Each generation, parents transmit random Β½ of their genome to children ο Relatives separated by π generations 1 share average of 2 π of genome β’ Average IBD sharing fractions: β Full siblings: 50%, Aunt-nephew: 25%, First cousins: 12.5%
Second degree relatives: All share ~25% of genome IBD Grandparent- Avuncular (AV) Half-sibling (HS) grandchild (GP) ο Difficult to distinguish using only data from the pairs
IBD sharing rates for these relationships heavily overlap
Idea: analyze IBD sharing of pair to other relatives
CREST: Classification of Relationship Types Ying Qiao Jens Sannerud
Approach: ratios of IBD sharing in three samples versus two π 1 = πππππ’β π½πΆπΈ π¦ 1 ,π§ β© π½πΆπΈ π¦ 2 ,π§ πππππ’β π½πΆπΈ π¦ 1 ,π§ π 2 = πππππ’β π½πΆπΈ π¦ 1 ,π§ β© π½πΆπΈ π¦ 2 ,π§ πππππ’β π½πΆπΈ π¦ 2 ,π§ π¦ 1 For GP, expect π 1 = 1/4, π 2 = 1 π§ π¦ 2 Ying Qiao
Approach: ratios of IBD sharing in three samples versus two π 1 = πππππ’β π½πΆπΈ π¦ 1 ,π§ β© π½πΆπΈ π¦ 2 ,π§ πππππ’β π½πΆπΈ π¦ 1 ,π§ π 2 = πππππ’β π½πΆπΈ π¦ 1 ,π§ β© π½πΆπΈ π¦ 2 ,π§ πππππ’β π½πΆπΈ π¦ 2 ,π§ For GP, expect π 1 = 1/4, π 2 = 1 For AV, expect π 1 = 1/4, π 2 = 1/2 π¦ 1 π§ π¦ 2 Ying Qiao
Approach: ratios of IBD sharing in three samples versus two π 1 = πππππ’β π½πΆπΈ π¦ 1 ,π§ β© π½πΆπΈ π¦ 2 ,π§ πππππ’β π½πΆπΈ π¦ 1 ,π§ π 2 = πππππ’β π½πΆπΈ π¦ 1 ,π§ β© π½πΆπΈ π¦ 2 ,π§ πππππ’β π½πΆπΈ π¦ 2 ,π§ For GP, expect π 1 = 1/4, π 2 = 1 For AV, expect π 1 = 1/4, π 2 = 1/2 For HS, expect π 1 = 1/2, π 2 = 1/2 π§ π¦ 1 π¦ 2 Ying Qiao
CREST uses kernel density estimators to infer relationships Trained kernel density estimators (KDEs) using simulated data Features: π 1 , π 2
Can combine multiple relatives by taking union of IBD sharing π§ π βs πππππ’β π π½πΆπΈ π¦ 1 ,π§ π β© π π½πΆπΈ π¦ 2 ,π§ π β© π½πΆπΈ π¦ 1 ,π¦ 2 π π = πππππ’β π π½πΆπΈ π¦ π ,π§ π
CREST highly sensitive, highly specific Ran PADRE, CREST on 200 replicates of various pedigree structures : CREST : PADRE Qiao, Sannerud et al. (in revision, 2019)
CREST infers relative types in Generation Scotland data Generation Scotland data: 205 GP, 1,949 AV, and 121 HS pairs with at least one mutual relative Given data equivalent to one first cousin (10% of genome covered by IBD regions), CRESTβs sensitivity is 0.99 in GP, 0.86 in AV, and 0.95 in HS pairs Qiao, Sannerud et al. (in revision, 2019)
Secondary aim: infer whether relatives are paternal or maternal Paternal Maternal Grandparent Half-siblings
Key insight: males / females have different crossover locations Female rate (cM/Mb) Data from human chromosome 10 Average number of crossovers: Male rate (cM/Mb) β’ Females: 2.04 β’ Males: 1.27 Physical position (Mb) Genetic map from BhΓ©rer et al. (2017)
CREST infers maternal / paternal type in Generation Scotland Analyzed all 848 GP and 381 HS pairs in Generation Scotland Using πππΈ = 0 as Half-siblings boundary: β’ 99.7% of HS β’ 93.5% of GP Inferred correctly Grandparent-grandchild Qiao, Sannerud et al. (in revision, 2019)
Conclusions β’ CREST classifies second degree relationship types β Enabled by multi-way IBD sharing β’ Male / female crossovers reveal the paternal / maternal type of half-siblings and grandparent-grandchild pairs β’ Can apply to pedigree reconstruction: other methods subject to ambiguities for second degree pairs β’ Preliminary results indicate CREST also applies to third degree pairs
Acknowledgements Generation Scotland Caroline Hayward Archie Campbell Ying Qiao Jens Sannerud Nancy E. and Peter C. Meinig
Approach: IBD segment ends approximate crossover locations β’ Model IBD segments as regions flanked by two crossovers No-crossover interval: interior of IBD segment πint π₯ 0 π₯ 1 Locations of crossovers: window surrounding IBD segment ends β’ For each IBD segment π, likelihood of parent being π β {πΊ, π} is π π π = π π₯ 0 π β π πint π β π π₯ 1 π β’ Taking all IBD segments to be independent, we compute π π(π|πΊ) πππΈ = log 10 π π π π Jens Sannerud
Recommend
More recommend