phylogenetics tutorial 1
play

Phylogenetics Tutorial 1: 1. Overview 2. Installation 3. Data 4. - PowerPoint PPT Presentation

Finlay Maguire Making Phylogenies Faculty of Computer Science Phylogenetics Tutorial 1: 1. Overview 2. Installation 3. Data 4. Multiple Sequence Alignemnt 5. Trimming 6. Approximate ML Tree 7. Maximum-Likelihood Tree 8. Phylogenomics 1


  1. Finlay Maguire Making Phylogenies Faculty of Computer Science Phylogenetics Tutorial 1:

  2. 1. Overview 2. Installation 3. Data 4. Multiple Sequence Alignemnt 5. Trimming 6. Approximate ML Tree 7. Maximum-Likelihood Tree 8. Phylogenomics 1 Table of contents

  3. Overview

  4. • Get a protein • Using pairwise alignment to find potential homologs • Perform a multiple sequence alignment • Trim the alignment • Infer a NJ distance phylogeny • Infer an approximate Maximum Likelihood phylogeny • Infer an accurate Maximum Likelihood phylogeny • Compare the trees 2 Protein Phylogeny Aims

  5. • Get genomes • Find core genome • Extract SNPs • Infer a Maximum Likelihood phylogeny • Visualise Phylogeny 3 Core Genome Phylogeny

  6. • mafft • trimal • aliview • FastTree2 • iqtree • FigTree • prokka • roary • snp-sites 4 Requirements

  7. Installation

  8. If you don’t have miniconda https://docs.conda.io/en/latest/miniconda.html conda create -n phylo -c bioconda mafft trimal prokka fasttree iqtree roary snp-sites conda activate phylo or if older miniconda version: source activate phylo 5 miniconda

  9. Unfortunately, not everything is in bioconda: • AliView https://github.com/AliView/AliView/releases • FigTree https://github.com/rambaut/figtree/releases 6 Other tools

  10. Data

  11. http://www.uniprot.org 7 Starting Sequence Figure 1: High-quality protein reference database: swiss-prot

  12. 8 Starting Sequence Figure 2: Choose ‘Gene Ontology’ and ‘biological process’

  13. 9 Starting Sequence Figure 3: Go down to ‘detoxification’ and expand

  14. 10 Starting Sequence Figure 4: Select ‘8 results’ next to ‘detoxification of arsenic’

  15. 11 Using BLAST to find related sequences Figure 5: Select the C. elegans sequence and BLAST

  16. 12 Using BLAST to find related sequences Figure 6: Wait...

  17. 13 Using BLAST to find related sequences Figure 7: Download 10 sequences across a range of similarity

  18. Multiple Sequence Alignemnt

  19. 14 mafft-linsi arsenic.faa > arsenic.afa MAFFT

  20. Trimming

  21. 15 java -jar aliview.jar Inspecting the alignment

  22. 16 trimal -nogaps -in arsenic.afa -out arsenic_nogaps.mask TrimAL

  23. 17 trimal -automated1 -in arsenic.afa -out arsenic_auto.mask TrimAL

  24. 18 java -jar aliview.jar Compare Trimming

  25. Approximate ML Tree

  26. 19 FastTree -lg arsenic_auto.mask > arsenic_dist.tree FastTree

  27. 20 FigTree Inspect

  28. Maximum-Likelihood Tree

  29. • Generate 100 parsimony trees • Optimise all 100 with lazy SPR moves • Collect resulting unique topologies and optimise branch lengths • Select top 20 by likelihood • Perform hill-climbing NNI (stochastic followed by hill-climbing) on each and optimise • Retain top 5 topologies as candidate trees • Randomly perturb candidates (stochastic NNI) and optimise (hill-climbing) • If new tree is better than top candidate, replace • If top candidate doesn’t change after 100 random perturbations then output. 21 IQ-Tree

  30. • Generate 100 parsimony trees • Optimise all 100 with lazy SPR moves • Collect resulting unique topologies and optimise branch lengths • Select top 20 by likelihood • Perform hill-climbing NNI (stochastic followed by hill-climbing) on each and optimise • Retain top 5 topologies as candidate trees • Randomly perturb candidates (stochastic NNI) and optimise (hill-climbing) • If new tree is better than top candidate, replace • If top candidate doesn’t change after 100 random perturbations then output. 21 IQ-Tree

  31. • Generate 100 parsimony trees • Optimise all 100 with lazy SPR moves • Collect resulting unique topologies and optimise branch lengths • Select top 20 by likelihood • Perform hill-climbing NNI (stochastic followed by hill-climbing) on each and optimise • Retain top 5 topologies as candidate trees • Randomly perturb candidates (stochastic NNI) and optimise (hill-climbing) • If new tree is better than top candidate, replace • If top candidate doesn’t change after 100 random perturbations then output. 21 IQ-Tree

  32. • Generate 100 parsimony trees • Optimise all 100 with lazy SPR moves • Collect resulting unique topologies and optimise branch lengths • Select top 20 by likelihood • Perform hill-climbing NNI (stochastic followed by hill-climbing) on each and optimise • Retain top 5 topologies as candidate trees • Randomly perturb candidates (stochastic NNI) and optimise (hill-climbing) • If new tree is better than top candidate, replace • If top candidate doesn’t change after 100 random perturbations then output. 21 IQ-Tree

  33. • Generate 100 parsimony trees • Optimise all 100 with lazy SPR moves • Collect resulting unique topologies and optimise branch lengths • Select top 20 by likelihood • Perform hill-climbing NNI (stochastic followed by hill-climbing) on each and optimise • Retain top 5 topologies as candidate trees • Randomly perturb candidates (stochastic NNI) and optimise (hill-climbing) • If new tree is better than top candidate, replace • If top candidate doesn’t change after 100 random perturbations then output. 21 IQ-Tree

  34. • Generate 100 parsimony trees • Optimise all 100 with lazy SPR moves • Collect resulting unique topologies and optimise branch lengths • Select top 20 by likelihood • Perform hill-climbing NNI (stochastic followed by hill-climbing) on each and optimise • Retain top 5 topologies as candidate trees • Randomly perturb candidates (stochastic NNI) and optimise (hill-climbing) • If new tree is better than top candidate, replace • If top candidate doesn’t change after 100 random perturbations then output. 21 IQ-Tree

  35. • Generate 100 parsimony trees • Optimise all 100 with lazy SPR moves • Collect resulting unique topologies and optimise branch lengths • Select top 20 by likelihood • Perform hill-climbing NNI (stochastic followed by hill-climbing) on each and optimise • Retain top 5 topologies as candidate trees • Randomly perturb candidates (stochastic NNI) and optimise (hill-climbing) • If new tree is better than top candidate, replace • If top candidate doesn’t change after 100 random perturbations then output. 21 IQ-Tree

  36. • Generate 100 parsimony trees • Optimise all 100 with lazy SPR moves • Collect resulting unique topologies and optimise branch lengths • Select top 20 by likelihood • Perform hill-climbing NNI (stochastic followed by hill-climbing) on each and optimise • Retain top 5 topologies as candidate trees • Randomly perturb candidates (stochastic NNI) and optimise (hill-climbing) • If new tree is better than top candidate, replace • If top candidate doesn’t change after 100 random perturbations then output. 21 IQ-Tree

  37. • Generate 100 parsimony trees • Optimise all 100 with lazy SPR moves • Collect resulting unique topologies and optimise branch lengths • Select top 20 by likelihood • Perform hill-climbing NNI (stochastic followed by hill-climbing) on each and optimise • Retain top 5 topologies as candidate trees • Randomly perturb candidates (stochastic NNI) and optimise (hill-climbing) • If new tree is better than top candidate, replace • If top candidate doesn’t change after 100 random perturbations then output. 21 IQ-Tree

  38. Note: IQTree does output a neighbour joining distance tree too iqtree -mset LG,JTT,WAG -s arsenic_auto.mask (.bionj). 22 Running IQ-Tree

  39. 23 FigTree Inspect

  40. Phylogenomics

  41. wget finlaymagui.re/assets/listeria_genomes.tar.gz Download the 6 listeria genomes tar xvf listeria_genomes.tar.gz 24 Get genomes

  42. For genome GCA000008258: prokka --kingdom Bacteria --outdir prokka_GCA_000008285 --genus Listeria --locustag GCA_000008285 GCA_000008285.1_ASM828v1_genomic.fna Repeat for all genomes 25 Annotate genomes

  43. cp */*.gff annotations mkdir annotations roary -f core_genome -e -n -v annotations/*.gff 26 Find shared parts

  44. core_genome/core_gene_alignment.aln snp-sites -o listeria_snps.fna 27 Extract SNPs

  45. 28 iqtree -mset GTR -s listeria_snps.fna Infer ML Phylogeny

  46. 29 Visualise Tree Figure 8: Roary Tutorial

  47. 29 Questions?

Recommend


More recommend