FRIDAY , 8 TH JUNE 2018 INSTITUTO DE TECNOLOGIA QUIMICA E BIOLOGICA ( ITQB NOVA ), OEIRAS , PORTUGA L Dr José Lourenço D EPARTMENTAL L ECTURER IN I NFECTIOUS D ISEASE D EPARTMENT OF Z OOLOGY , U NIVERSITY OF O XFORD POPULATION STRUCTURE OF THE PNEUMOCOCCUS IS DRIVEN BY SELECTION ON THE GROEL HEAT-SHOCK PROTEIN A WHOLE-GENOME MACHINE LEARNING PERSPECTIVE
MY PATH / PAST EXPERIENCE DENV, CHIKV, YFV, ZIKV BSc Software RESEARCH ASSISTANT Engineering @IGC (2005-2008) FluA, HIV, HCV, HBV @IST Pneumococcus PhD (PDBC programme) POSTDOC 1y courses @IGC (2008) @Oxford (2014-2018) 3y project @Oxford (2009) DENV POSTDOC DEPARTMENTAL RESEARCH LECTURER @ImperialCollege ON INFECTIOUS DISEASE @Oxford (2018-2022) (2013-2014) 2 POPULATION STRUCTURE OF THE PNEUMOCOCCUS
THE PNEUMOCOCCUS
STREPTOCOCCUS PNEUMONIAE Bacteria → also known as the pneumococcus → gram-positive bacterial pathogen → usually found in pairs (diplococci) and does not form spores → presents high levels of recombination and horizontal gene transfer Disease & carriage → commonly carried asymptomatically (nasopharynx) → can cause invasive disease: e.g. pneumonia, meningitis Serotypes → cell capsule dictates antigenic type ( serotype ) → there are +100 known serotypes → 10 to 15 serotypes are classically responsible for carriage and disease 4 POPULATION STRUCTURE OF THE PNEUMOCOCCUS
THE CAPSULE & VACCINE polysaccharide capsule PspA LytA LytB CbpA Pneumococcal conjugate vaccine (PCV) → contains the capsule sugars → PCV7: 4, 6B, 9V, 14, 18C, 19F and 23F → PCV13: PCV7 + 1, 3, 5, 6A, 7F and 19A Engholm DH et al. FEMS Microbiology Reviews. 2017 5 POPULATION STRUCTURE OF THE PNEUMOCOCCUS
EPIDEMIOLOGICAL POPULATION STRUCTURE pre-vaccination → disease is dominated by a small subset of serotypes ( PCV7 ) → majority of serotypes circulate at low frequency post-vaccination → disease of vaccine types is reduced Waight PA et al. Lancet ID 2015 (England and Wales) → disease of non-vaccine types is increased 6 POPULATION STRUCTURE OF THE PNEUMOCOCCUS
GENETIC POPULATION STRUCTURE Azarian T et al. PLoS Pathogens 2018 23F, B, A Croucher et al. Nature Genetics 2013 7 POPULATION STRUCTURE OF THE PNEUMOCOCCUS
OUR GROUP'S RESEARCH HISTORY Chaos, Persistence, and Evolution of Strain Structure in Antigenically Diverse Infectious Agents. Science (1998). epidemiological Role of selection in the emergence of lineages and the evolution of virulence in Neisseria population structure meningitidis . PNAS (2008). Long-term evolution of antigen repertoires among carried meningococci. Proc. Roy. Soc. B (2010). Lineage structure of Streptococcus pneumoniae may be driven by immune selection on the genetic groEL heat-shock protein. Nature Scientific Reports (2017). population structure Identifying Streptococcus pneumoniae genes associated with invasive disease using pangenome-based whole genome sequence typing. Under review (2018). Vaccination can drive an increase in frequencies of antibiotic resistance among nonvaccine serotypes of Streptococcus pneumoniae . PNAS (2018). vaccine response Vaccination Drives Changes in Metabolic and Virulence Profiles of Streptococcus pneumoniae . PLoS Pathogens (2015). High prevalence of vaccine serotype Streptococcus pneumoniae carriage six years after 13-valent pneumococcal vaccine introduction in Malawi: a prospective serial cross-sectional study. Under review (2018). 8 POPULATION STRUCTURE OF THE PNEUMOCOCCUS
RESEARCH ON GENETIC POPULATION STRUCTURE
IN SEARCH OF POPULATION STRUCTURE Croucher et al. What determines genetic population structure? Nature Genetics → which genes determine phylogenetic branching? 2013 → which genes determine Sequence cluster (SC)? → which genes determine Serotype (Sero)? → are genes that determine SC the same as those that determine Sero? Dataset → 616 genomes of S. pneumo, Massachusetts (USA, 2001-2007) → full genomes with 2135 genes per sample → known SC and Sero per sample Approach → whole-genome multi-locus sequence typing approach (wgMLST) → machine learning to explore the determinants of SC and Sero 10 POPULATION STRUCTURE OF THE PNEUMOCOCCUS
WHOLE-GENOME MULTI-LOCUS SEQUENCE TYPING gene A 2135 genes sample 1 .. . sample N for all genes 1. find gene in reference genome 2. compare gene's sequence to reference gene's sequence reference genome: ATCC700669 serotype 23F discretize and collapse gene's sequence into allele (integers) allelic → matrix = 1 ≠ → 2 → = 1 gene A other genes .. .. alleles of gene A 1 1 4 1 1 8 1 sample 1 .. . .. . .. 2 1 3 1 1 7 1 . . . .. .. sample N 1 3 1 1 1 3 1 . . 11 POPULATION STRUCTURE OF THE PNEUMOCOCCUS
MACHINE LEARNING → Random Forest Algorithm (RFA) - is a collection of Classification Trees (CT) → We set predictor variables (genes) to classify a variable of interest (SC, Sero) 2135 predictors variables of interest ... ... sample 1 1 1 4 1 1 8 1 10 6A ... ... ... 2 1 3 1 1 7 1 2 19A ... ... sample N 1 3 1 1 1 3 1 4 19A Sero SC Critical outputs → how well it can classify (predict) variables of interest → scores predictor variables according to their importance in getting classification right 12 POPULATION STRUCTURE OF THE PNEUMOCOCCUS
CLASSIFICATION TREE - CLASSIC EXAMPLE OUTLOOK outlook humidity windy golf ok? rainy high false no rainy high true no sunny overcast rainy overcast high false yes sunny high false yes importance of sunny normal false yes sunny normal true no predictor WINDY HUMIDITY overcast normal true yes rainy high false no rainy normal false yes sunny normal false yes false true high normal rainy normal true yes overcast high true yes .... .... .... .... go play yes no yes no yes golf today? many more samples 13 POPULATION STRUCTURE OF THE PNEUMOCOCCUS
RANDOM FOREST - TOY EXAMPLE bootstra p (... ) outlook humidity windy golf ok? rainy high false no rainy high true no overcast high false yes sunny high false yes solutions assembled sunny normal false yes sunny normal true no RF increases prediction accuracy and allows overcast normal true yes → prediction accuracy rainy high false no for robust error estimations, since the rainy normal false yes → error rates (i.e. false positives) ensemble of slight different classification sunny normal false yes rainy normal true yes → predictor variable importance results adjusts for the instability of overcast high true yes .... .... .... .... → etc. individual trees and avoids data overfitting. 14 POPULATION STRUCTURE OF THE PNEUMOCOCCUS
SEROTYPE PREDICTION SUCCESS Good classification: lower success for some serotypes was related to varying samples sizes (e.g. N=8 for 16F, N=5 for 17F, N=1 for 21) 15 POPULATION STRUCTURE OF THE PNEUMOCOCCUS
GENES PREDICTING SEROTYPE top-scoring genes: → 38% of top genes were placed within 10 genes downstream and upstream of the capsular locus → 62% with compelling support for functional background that mediates competitive interactions or niche specialization 16 POPULATION STRUCTURE OF THE PNEUMOCOCCUS
SEQUENCE CLUSTER PREDICTION SUCCESS Perfect classification. 17 POPULATION STRUCTURE OF THE PNEUMOCOCCUS
GENES PREDICTING SEQUENCE CLUSTER top-scoring genes: → 75% were randomly distributed along the genome (expectation) → 10 genes were contiguous and within the groESL operon (p-value=1.52×10 − 06 ) Croucher et al. Nature Genetics 2013 18 POPULATION STRUCTURE OF THE PNEUMOCOCCUS
DISCORDANT GENES PREDICTING SERO & SC 19 POPULATION STRUCTURE OF THE PNEUMOCOCCUS
IMPLICATIONS / CONCLUSIONS (SO FAR) major lineages are determined by variation in the groESL operon (and have been determined or locked long ago) minor lineages are determined by variation in and around the capsular locus (and have been determined more recently and are known to be in constant flux) Croucher et al. Nature Genetics 2013 20 POPULATION STRUCTURE OF THE PNEUMOCOCCUS
GRO ESL?
GROESL OPERON Operon → encodes for groEL (chaperonin) and groES (cofactor) groEL / groES protein complex → 'a nano-cage for protein folding' most of what we know is from E. coli → groEL is a heat-shock protein (stress protein) → at least 50 essential proteins need groESL for folding → groESL is required for cell viability Hayer-Hartl M et al. Trends in Biochemical Sciences 2016 22 POPULATION STRUCTURE OF THE PNEUMOCOCCUS
GROEL IS UNIVERSAL groEL is present across bacterial species → only exceptions found are 2 species of mycoplasma (intracellular) groEL is an homolog of HSP60 → HSP60 is present across the kingdoms, including, plants and vertebrates groEL and HSP60 are functionally, structurally and genetically similar Wong P et al. Journal of structural biology 2004 23 POPULATION STRUCTURE OF THE PNEUMOCOCCUS
Recommend
More recommend