Apicomplexan Genome Sequencing in Sanger Arnab Pain, The Pathogen Sequencing Unit (PSU) 2 nd November, 2005; ICGEB, New Delhi
Overview • Overview of Apicomplexan genome sequencing projects in Sanger • Update on Plasmodium genome projects • Theileria and Tropical Theileriosis • Sequencing & annotation strategies • Genome architecture • Gene families • Metabolic reconstruction • Comparative genomics (eg. dN/dS analysis, synteny) • SNPs/ INDELs
Apicomplexans and ongoing genome seuencing projects
Overview of Eukaryotic Pathogen Sequencing in PSU
The World of Apicomplexans! Oyster parasite Dog parasites Malaria Tropical theileriosis Babesiosis Parasitise brain/ kidney of rodent Periph. eosinophilia (30%) Brain infection (Chinchila) Parasitises heart muscle/brain Toxoplasmosis parasitic disease of falcolns Intestinal disease in mammals bowel diease in human Coccidiosis Insect gut parasites Insect gut parasites Cryptosporidiosis
Update on Plasmodium genome projects 3D7: 3 gaps Clinical isolate: 8x P. falciparum (Gardner et al 2002) IT strain: 31,000 reads (0.8x) P. reichenowi ( 3x shotgun in progress) P. gallinaceum (3x shotgun in progress) P. vivax (complete sequencing) P. knowlesi (3x shotgun) 8x complete, prefinishing, annotation P. berghei (3x shotgun complete) 8x complete, some finishing P. yoelii (5.6x, Carlton et al 2002) P. chabaudi (3x shotgun complete) 8x complete, prefinish
Tropical Theileriosis Theileria and
Genus- Theileria Theileria annulata Parasite of cattle: S. Europe, North Africa, Middle East, Asia ‘Tropical Theileriosis’ Theileria parva Parasite of cattle: East/Central Africa ‘East-Coast Fever’ Theileria hirci Parasite of sheep/goats: S. Europe, North Africa, Middle East & Asia (B. Shiels)
Theileria annulata Disease; Tropical Theileriosis 250 million cattle are at risk Pathogenic in exotic animals, up to 70% mortality Mild to moderate pathogenicity in indigenous breeds but productivity loss
Theileria parva Disease: East Coast Fever 50 million cattle at risk Highly pathogenic in naïve animals 97-100 % mortality rates
Theileria Life Cycle Macroschizont H.nuc Clonal expansion of infected cells merozoites Merozoite production Piroplasm infected erythrocytes
Clinical Pathology Tropical Theileriosis ( T. annulata ) Following lymph node enlargement get fever Marked anaemia - pale mucous membranes which may become jaundiced, diarrhoea/blood stained faeces common Sub acute/chronic cases show intermittent fever, anaemia and Jaundice can be seen Poor condition and convalescence is protracted
Sequencing & annotation strategies
Shotgun sequencing STS-1 STS-2 STS-3 STS-4 DNA Contiguous sequence pUC clone end sequence sequence gap physical gap large clone end sequence “scaffold”
Strategy • Separate chromosomes by PFGE. • Shotgun sequence individual chromosomes • Align Contigs to map and close gaps using PCR/primer walking.
Map Resources • Mapped STS markers. – Short sequence markers, mapped genetically. • Mapped YAC clones. – reads from mapped YAC clones align with contigs and thus position. • Optical map – DNA fragments of partial digestion of genome are sized optically and tiled, providing ordered restriction fragments. • HAPPY MAP – Fragmented DNA diluted and replicated. STS markers detected by PCR.
Curating gene models in Artemis Use of multiple lines of evidence
T. annulata Genome From Karyotype: • Four chromosomes – 2.6 Mb (3 gaps) – 2 Mb (Finished) – 1.9 Mb (2 small gaps) – 1.8 Mb (Finished) From Sequencing: • Number of bases: 8,351,610 • Gene number: 3792 • Genes with orthologues in T. parva : 3265 • GC percentage: 32.5 • Unique T. annulata genes: 34 (60 in T. parva )
Gene finding & Annotation
e.g. Theileria annulata (~ 8.4 Mb) Total contig length vs No of reads. Contig no. vs No. of reads 3X 4X 6X 8X 3X 4X 6X 8X
Genome architecture
The Chromosomes of P. falciparum telomere Subtelomeric Rifin/stevor genes Other gene families VAR genes repeats TARE2-5 Rep20 antigenic antigenic House-keeping variation variation VAR genes Rifin
Chromosome Structure ( Theileria ) Repeats Family_3 (0-3) Family_1 (up to 28), Family_3 (0-3) Family_5 (1-3) Other families (0-2) telomere (T)TTAGGG Putative centromere Secreted antigens Secreted antigens House-keeping genes
Telomeres A) B) C) T. parva T. annulata T. annulata e e T. annulata T. parva T. parva TaSR3 [TaSrpt2,TaSrpt1] m [(T)TTTAGGG] n TpSR3 TpSR2 TpSrpt1 [Fam-3 (0 to 3)] Other fam (0-2) Fam-5 (1-3) [Fam-1 (up to 28), Fam-3 (0 to 4)] Subtelomeric repeats (species-specific)
P. falciparum & P. vivax : sub-telomeric species-specific gene families P. falciparum P. vivax
Centromeres Chromosome 2 Chromosome 3 T. annulata P falciparum P. falciparum
Synteny: TA & TP ACT comparison
Comparative Genomics: synteny
Comparing genomes with Artemis Comparison Tool (ACT): Chr 02 & TBlastX T. annulata T. parva BlastN
Pain et al. Science (2005)
T. parva TPR loci T PARVA TPR_related family shown in pink Chr_02 T ANN
P. knowlesi ACT comparison: 3 malaria species P P
“species-specific” genes at interruptuions in synteny Plasmodium falciparum Plasmodium knowlesi Plasmodium yoelii
Plasmodium core proteome and “species- specific” genes Hall et al . Science (2005)
ACT Comparison: 3D7 vs PFCLIN
Gene Families
Clustering Theileria proteins • All peptide sets from TA and TP combined • BLASTed against itself with a cutoff of E=10-5. • TRIBE-MCL run with an inflation value of 5 (quite stringent). • Each checked for numbers of peptides from each organism. To identify which gene families have expanded in which organisms. • Clusters annotated using predicted products in TA & TP
Theileria -specific gene families: Family 1 (SVSP) Exclusively Sub-telomeric Contain 1 or more DUF529 (now called FAINT ) domains Majority contain signal peptides and conserved C-termini Unequally expanded (48 in TA, 85 in TP) Expressed during macroschizont stage (EST evidence)
DUF529 domain containing proteins Frequently Appears IN Theileria: FAINT • Only found in Theileria proteins • Highly diverged ~70 residue domain • Majority of FAINT- domain containing proteins have signal peptides • > 900 copies per genome (in at least 166 Theileria annulata proteins) • Many are expressed at least at the macroschizont stage
Comparative Genomics: protein domains
Architecture of Theileria proteins with FAINT domain [TA20090 / TP01_0603, TashHN] (332 aa) [TA03125 / TP01_0608, Tash1] (416 aa) [TA20085 / TP01_0604, TashAT1] (465 aa) [TA20082, TashAT3] (994 aa) [TA20095 / TP01_0602, TashAT2] (1163 aa) [TA17375 / TP03_0861, Polymorphic antigen precursor / P150] (1338 aa) [TA18950, Subtelomeric hypothetical protein (SVSP), family 1] (605 aa) [TA18865, Subtelomeric hypothetical protein (SVSP), family 1] (502 aa) [TA08425 / TP04_0437, Microneme-rhoptry protein] (893 aa) [TA17505, Sfi I-fragment-related hypothetical protein, family 3] (2732 aa) - Signal peptide - PEST - PT Domain keys: - FAINT - AT-hook
Whole genome domain organisation of pfEMP1 proteins • 59 var genes in total. • Expressed on red cell surface and involved in sequestration • 3 types of domain. – DBL- duffy binding like – CIDR- cystine rich interdomain region – C2 - constant2
Comparative Genomics: metabolic reconstruction
KEGG: Phospholipid metabolism X8
Recommend
More recommend