the 1000 genomes project
play

The 1000 genomes project The 1000 genomes project Genetic variation - PowerPoint PPT Presentation

The 1000 genomes project The 1000 genomes project Genetic variation > 1% 1000 2500 individuals China, Germany, the UK, the USA 28 populations from Europe, East Asia, West Africa, America, South Asia The 1000 genomes project


  1. The 1000 genomes project

  2. The 1000 genomes project ● Genetic variation > 1% ● 1000 → 2500 individuals ● China, Germany, the UK, the USA ● 28 populations from Europe, East Asia, West Africa, America, South Asia

  3. The 1000 genomes project

  4. The 1000 genomes project Pilot Purpose Coverage Strategy Status Assess strategy of Sequencing sharing data Whole-genome completed October 1 - low coverage across samples 2-4X sequencing of 180 samples 2008 Assess coverage Whole-genome Sequencing and platforms and sequencing of 2 mother- completed October 2 - trios centers 20-60X father-adult child trios 2008 Assess methods for gene-region- 1000 gene regions in 900 Sequencing 3 - gene regions capture 50X samples completed June 2009

  5. The 1001 Genomes Project Arabidopsis thaliana

  6. The 1001 Genomes Project ● First plant with a known genome sequence ● 125 – 150 Mb, 5 chromosomes, 30000 genes ● Self-fertilizing ● Big genetic and phenotypic diversity ● Few known alleles responsible for phenotypic variations

  7. The 1001 Genomes Project ● 10x10x10+1 samples ● The seeds are available in Arabidospis stock centers ● Includes morphological analysis

  8. SHORE ● Mapping and analysis pipeline ● Short DNA sequences ● Mapping to a reference sequence ● Weighted and gapped alignments ● SHOREmap

  9. Sequencing Arabidopsis thaliana ● Two naturally inbred accessions (Bur-0, Tsu-1) ● Reference genome sequence (Col-0) ● 120 – 173 million SBS reads ● Aligned to Col-0 (4 MM, 3 bp indels) ● Minimum read support for base calls

  10. Identifying polymorphic regions ● 4.3 Mb non-repetitive or moderately repetitive regions not covered ● GC poor regions ● 8 non.rep. or mod.rep. positions ● Col-0: 28kb ● Bur-0: 3.25 Mb, Tsu-1: 3.13 Mb

  11. De novo assembly of dissimilar sequences ● Unmapped reads of high quality ● Retain high-confidence reads ● Alignment to the homologous target in the reference genome ● Bur-0: 7396 contigs ● Tsu-1: 3525 contigs ● Col-0: 20 contigs

  12. Detection of duplications ● Higher than expected coverage ● Several reads support more than one base ● Segmentation into regions of 250bp ● Search for “heterozygous” positions ● Bur-0: 332 kb ● Tsu-1: 364 kb ● Col-0: 11 kb

Recommend


More recommend