The 1000 genomes project
The 1000 genomes project ● Genetic variation > 1% ● 1000 → 2500 individuals ● China, Germany, the UK, the USA ● 28 populations from Europe, East Asia, West Africa, America, South Asia
The 1000 genomes project
The 1000 genomes project Pilot Purpose Coverage Strategy Status Assess strategy of Sequencing sharing data Whole-genome completed October 1 - low coverage across samples 2-4X sequencing of 180 samples 2008 Assess coverage Whole-genome Sequencing and platforms and sequencing of 2 mother- completed October 2 - trios centers 20-60X father-adult child trios 2008 Assess methods for gene-region- 1000 gene regions in 900 Sequencing 3 - gene regions capture 50X samples completed June 2009
The 1001 Genomes Project Arabidopsis thaliana
The 1001 Genomes Project ● First plant with a known genome sequence ● 125 – 150 Mb, 5 chromosomes, 30000 genes ● Self-fertilizing ● Big genetic and phenotypic diversity ● Few known alleles responsible for phenotypic variations
The 1001 Genomes Project ● 10x10x10+1 samples ● The seeds are available in Arabidospis stock centers ● Includes morphological analysis
SHORE ● Mapping and analysis pipeline ● Short DNA sequences ● Mapping to a reference sequence ● Weighted and gapped alignments ● SHOREmap
Sequencing Arabidopsis thaliana ● Two naturally inbred accessions (Bur-0, Tsu-1) ● Reference genome sequence (Col-0) ● 120 – 173 million SBS reads ● Aligned to Col-0 (4 MM, 3 bp indels) ● Minimum read support for base calls
Identifying polymorphic regions ● 4.3 Mb non-repetitive or moderately repetitive regions not covered ● GC poor regions ● 8 non.rep. or mod.rep. positions ● Col-0: 28kb ● Bur-0: 3.25 Mb, Tsu-1: 3.13 Mb
De novo assembly of dissimilar sequences ● Unmapped reads of high quality ● Retain high-confidence reads ● Alignment to the homologous target in the reference genome ● Bur-0: 7396 contigs ● Tsu-1: 3525 contigs ● Col-0: 20 contigs
Detection of duplications ● Higher than expected coverage ● Several reads support more than one base ● Segmentation into regions of 250bp ● Search for “heterozygous” positions ● Bur-0: 332 kb ● Tsu-1: 364 kb ● Col-0: 11 kb
Recommend
More recommend