The 1000 genomes project The 1000 genomes project Genetic variation - PowerPoint PPT Presentation

The 1000 genomes project

The 1000 genomes project ● Genetic variation > 1% ● 1000 → 2500 individuals ● China, Germany, the UK, the USA ● 28 populations from Europe, East Asia, West Africa, America, South Asia

The 1000 genomes project

The 1000 genomes project Pilot Purpose Coverage Strategy Status Assess strategy of Sequencing sharing data Whole-genome completed October 1 - low coverage across samples 2-4X sequencing of 180 samples 2008 Assess coverage Whole-genome Sequencing and platforms and sequencing of 2 mother- completed October 2 - trios centers 20-60X father-adult child trios 2008 Assess methods for gene-region- 1000 gene regions in 900 Sequencing 3 - gene regions capture 50X samples completed June 2009

The 1001 Genomes Project Arabidopsis thaliana

The 1001 Genomes Project ● First plant with a known genome sequence ● 125 – 150 Mb, 5 chromosomes, 30000 genes ● Self-fertilizing ● Big genetic and phenotypic diversity ● Few known alleles responsible for phenotypic variations

The 1001 Genomes Project ● 10x10x10+1 samples ● The seeds are available in Arabidospis stock centers ● Includes morphological analysis

SHORE ● Mapping and analysis pipeline ● Short DNA sequences ● Mapping to a reference sequence ● Weighted and gapped alignments ● SHOREmap

Sequencing Arabidopsis thaliana ● Two naturally inbred accessions (Bur-0, Tsu-1) ● Reference genome sequence (Col-0) ● 120 – 173 million SBS reads ● Aligned to Col-0 (4 MM, 3 bp indels) ● Minimum read support for base calls

Identifying polymorphic regions ● 4.3 Mb non-repetitive or moderately repetitive regions not covered ● GC poor regions ● 8 non.rep. or mod.rep. positions ● Col-0: 28kb ● Bur-0: 3.25 Mb, Tsu-1: 3.13 Mb

De novo assembly of dissimilar sequences ● Unmapped reads of high quality ● Retain high-confidence reads ● Alignment to the homologous target in the reference genome ● Bur-0: 7396 contigs ● Tsu-1: 3525 contigs ● Col-0: 20 contigs

Detection of duplications ● Higher than expected coverage ● Several reads support more than one base ● Segmentation into regions of 250bp ● Search for “heterozygous” positions ● Bur-0: 332 kb ● Tsu-1: 364 kb ● Col-0: 11 kb

The 1000 genomes project The 1000 genomes project Genetic variation - PowerPoint PPT Presentation

The 1000 genomes project The 1000 genomes project Genetic variation > 1% 1000 2500 individuals China, Germany, the UK, the USA 28 populations from Europe, East Asia, West Africa, America, South Asia The 1000 genomes project

Genomes for LIfe Cohort study of Genomes

Algorithms in Bioinformatics: A Practical Introduction Genome Alignment Complete genomes

Polynomial vs. Exponential I Big difference n 3 : n = 1000 10 9 2 n : n = 1000 2 1000 = 10

A Amylase NJ-1000 B Amylase ML-100 C Enteropeptidase NJ-1000 D Enteropeptidase ML-100 E

Compound Interest What would you rather have: $1000 a year ago, $1000 today, or

Units of Energy Unit Term Recalculation J Joule 1000 J = 1000 Ws = 1 kJ cal Calorie 1000

Topic outline - Quick look to the pioneers: HapMap - 1000 Genomes project -Description -

Masters Thesis Genome Assembly: Scaffolding Guided by Related Genomes Runar Furenes

More Accurate Prediction of Replication Origins in Herpesvirus Genomes Ming-Ying Leung

Common intervals of genomes Mathieu Raffinot CNRS - LIAFA Context: - comparative genomics. -

Working with gene features and genomes Typical workflow when working with sequence data (e.g.,

Comparative protein structure modeling of genes, genomes and complexes Marc A. Marti-Renom

Comparative protein structure modeling of genes and genomes Marc A. Marti-Renom Department of

Interuniversity Attraction Pole BioMAGNet (IAP P6/25) Bioinformatics and Modeling: from Genomes

Potential Effect of Adjuvants on Residues in Relation to Routine 0 1 Residue Trial Vari ability

FERC Order 1000 Transmission Planning and Cost Allocation FRCC Region Stakeholder Meeting March

Bioremediation Expanding the Toolbox: Session II - Novel Omics Approaches Julian Schroeder

A Statistical Framework for Spatial Comparative Genomics Thesis Proposal Rose Hoberman Carnegie

Using structure to select features in high dimension Chlo-Agathe Azencott Center for

He who asks is a fool for five CSE527 minutes, but he who does not Computational Biology ask

Introducing ShortRead Paula Andrea Martinez, PhD. Data Scientist DataCamp Introduction to

Data Mining: References Prof. Dr. Karsten Borgwardt, Department Biosystems, ETH Z urich Basel,

Machine Learning Methods for Metabolic Pathway Prediction Joseph M. Dale, Liviu Popescu, and

Inferring parameters in genetic regulatory networks Camilo La Rota 1 Fabien Tarissan 2 Leo Liberti

Explore More Topics

Sambuz

Useful Links

Newsletter

Mail Us