apicomplexan genome sequencing in sanger
play

Apicomplexan Genome Sequencing in Sanger Arnab Pain, The Pathogen - PowerPoint PPT Presentation

Apicomplexan Genome Sequencing in Sanger Arnab Pain, The Pathogen Sequencing Unit (PSU) 2 nd November, 2005; ICGEB, New Delhi Overview Overview of Apicomplexan genome sequencing projects in Sanger Update on Plasmodium genome projects


  1. Apicomplexan Genome Sequencing in Sanger Arnab Pain, The Pathogen Sequencing Unit (PSU) 2 nd November, 2005; ICGEB, New Delhi

  2. Overview • Overview of Apicomplexan genome sequencing projects in Sanger • Update on Plasmodium genome projects • Theileria and Tropical Theileriosis • Sequencing & annotation strategies • Genome architecture • Gene families • Metabolic reconstruction • Comparative genomics (eg. dN/dS analysis, synteny) • SNPs/ INDELs

  3. Apicomplexans and ongoing genome seuencing projects

  4. Overview of Eukaryotic Pathogen Sequencing in PSU

  5. The World of Apicomplexans! Oyster parasite Dog parasites Malaria Tropical theileriosis Babesiosis Parasitise brain/ kidney of rodent Periph. eosinophilia (30%) Brain infection (Chinchila) Parasitises heart muscle/brain Toxoplasmosis parasitic disease of falcolns Intestinal disease in mammals bowel diease in human Coccidiosis Insect gut parasites Insect gut parasites Cryptosporidiosis

  6. Update on Plasmodium genome projects 3D7: 3 gaps Clinical isolate: 8x P. falciparum (Gardner et al 2002) IT strain: 31,000 reads (0.8x) P. reichenowi ( 3x shotgun in progress) P. gallinaceum (3x shotgun in progress) P. vivax (complete sequencing) P. knowlesi (3x shotgun) 8x complete, prefinishing, annotation P. berghei (3x shotgun complete) 8x complete, some finishing P. yoelii (5.6x, Carlton et al 2002) P. chabaudi (3x shotgun complete) 8x complete, prefinish

  7. Tropical Theileriosis Theileria and

  8. Genus- Theileria Theileria annulata Parasite of cattle: S. Europe, North Africa, Middle East, Asia ‘Tropical Theileriosis’ Theileria parva Parasite of cattle: East/Central Africa ‘East-Coast Fever’ Theileria hirci Parasite of sheep/goats: S. Europe, North Africa, Middle East & Asia (B. Shiels)

  9. Theileria annulata Disease; Tropical Theileriosis 250 million cattle are at risk Pathogenic in exotic animals, up to 70% mortality Mild to moderate pathogenicity in indigenous breeds but productivity loss

  10. Theileria parva Disease: East Coast Fever 50 million cattle at risk Highly pathogenic in naïve animals 97-100 % mortality rates

  11. Theileria Life Cycle Macroschizont H.nuc Clonal expansion of infected cells merozoites Merozoite production Piroplasm infected erythrocytes

  12. Clinical Pathology Tropical Theileriosis ( T. annulata ) Following lymph node enlargement get fever Marked anaemia - pale mucous membranes which may become jaundiced, diarrhoea/blood stained faeces common Sub acute/chronic cases show intermittent fever, anaemia and Jaundice can be seen Poor condition and convalescence is protracted

  13. Sequencing & annotation strategies

  14. Shotgun sequencing STS-1 STS-2 STS-3 STS-4 DNA Contiguous sequence pUC clone end sequence sequence gap physical gap large clone end sequence “scaffold”

  15. Strategy • Separate chromosomes by PFGE. • Shotgun sequence individual chromosomes • Align Contigs to map and close gaps using PCR/primer walking.

  16. Map Resources • Mapped STS markers. – Short sequence markers, mapped genetically. • Mapped YAC clones. – reads from mapped YAC clones align with contigs and thus position. • Optical map – DNA fragments of partial digestion of genome are sized optically and tiled, providing ordered restriction fragments. • HAPPY MAP – Fragmented DNA diluted and replicated. STS markers detected by PCR.

  17. Curating gene models in Artemis Use of multiple lines of evidence

  18. T. annulata Genome From Karyotype: • Four chromosomes – 2.6 Mb (3 gaps) – 2 Mb (Finished) – 1.9 Mb (2 small gaps) – 1.8 Mb (Finished) From Sequencing: • Number of bases: 8,351,610 • Gene number: 3792 • Genes with orthologues in T. parva : 3265 • GC percentage: 32.5 • Unique T. annulata genes: 34 (60 in T. parva )

  19. Gene finding & Annotation

  20. e.g. Theileria annulata (~ 8.4 Mb) Total contig length vs No of reads. Contig no. vs No. of reads 3X 4X 6X 8X 3X 4X 6X 8X

  21. Genome architecture

  22. The Chromosomes of P. falciparum telomere Subtelomeric Rifin/stevor genes Other gene families VAR genes repeats TARE2-5 Rep20 antigenic antigenic House-keeping variation variation VAR genes Rifin

  23. Chromosome Structure ( Theileria ) Repeats Family_3 (0-3) Family_1 (up to 28), Family_3 (0-3) Family_5 (1-3) Other families (0-2) telomere (T)TTAGGG Putative centromere Secreted antigens Secreted antigens House-keeping genes

  24. Telomeres A) B) C) T. parva T. annulata T. annulata e e T. annulata T. parva T. parva TaSR3 [TaSrpt2,TaSrpt1] m [(T)TTTAGGG] n TpSR3 TpSR2 TpSrpt1 [Fam-3 (0 to 3)] Other fam (0-2) Fam-5 (1-3) [Fam-1 (up to 28), Fam-3 (0 to 4)] Subtelomeric repeats (species-specific)

  25. P. falciparum & P. vivax : sub-telomeric species-specific gene families P. falciparum P. vivax

  26. Centromeres Chromosome 2 Chromosome 3 T. annulata P falciparum P. falciparum

  27. Synteny: TA & TP ACT comparison

  28. Comparative Genomics: synteny

  29. Comparing genomes with Artemis Comparison Tool (ACT): Chr 02 & TBlastX T. annulata T. parva BlastN

  30. Pain et al. Science (2005)

  31. T. parva TPR loci T PARVA TPR_related family shown in pink Chr_02 T ANN

  32. P. knowlesi ACT comparison: 3 malaria species P P

  33. “species-specific” genes at interruptuions in synteny Plasmodium falciparum Plasmodium knowlesi Plasmodium yoelii

  34. Plasmodium core proteome and “species- specific” genes Hall et al . Science (2005)

  35. ACT Comparison: 3D7 vs PFCLIN

  36. Gene Families

  37. Clustering Theileria proteins • All peptide sets from TA and TP combined • BLASTed against itself with a cutoff of E=10-5. • TRIBE-MCL run with an inflation value of 5 (quite stringent). • Each checked for numbers of peptides from each organism. To identify which gene families have expanded in which organisms. • Clusters annotated using predicted products in TA & TP

  38. Theileria -specific gene families: Family 1 (SVSP) Exclusively Sub-telomeric Contain 1 or more DUF529 (now called FAINT ) domains Majority contain signal peptides and conserved C-termini Unequally expanded (48 in TA, 85 in TP) Expressed during macroschizont stage (EST evidence)

  39. DUF529 domain containing proteins Frequently Appears IN Theileria: FAINT • Only found in Theileria proteins • Highly diverged ~70 residue domain • Majority of FAINT- domain containing proteins have signal peptides • > 900 copies per genome (in at least 166 Theileria annulata proteins) • Many are expressed at least at the macroschizont stage

  40. Comparative Genomics: protein domains

  41. Architecture of Theileria proteins with FAINT domain [TA20090 / TP01_0603, TashHN] (332 aa) [TA03125 / TP01_0608, Tash1] (416 aa) [TA20085 / TP01_0604, TashAT1] (465 aa) [TA20082, TashAT3] (994 aa) [TA20095 / TP01_0602, TashAT2] (1163 aa) [TA17375 / TP03_0861, Polymorphic antigen precursor / P150] (1338 aa) [TA18950, Subtelomeric hypothetical protein (SVSP), family 1] (605 aa) [TA18865, Subtelomeric hypothetical protein (SVSP), family 1] (502 aa) [TA08425 / TP04_0437, Microneme-rhoptry protein] (893 aa) [TA17505, Sfi I-fragment-related hypothetical protein, family 3] (2732 aa) - Signal peptide - PEST - PT Domain keys: - FAINT - AT-hook

  42. Whole genome domain organisation of pfEMP1 proteins • 59 var genes in total. • Expressed on red cell surface and involved in sequestration • 3 types of domain. – DBL- duffy binding like – CIDR- cystine rich interdomain region – C2 - constant2

  43. Comparative Genomics: metabolic reconstruction

  44. KEGG: Phospholipid metabolism X8

Recommend


More recommend