Parsimony 123456789... Taxon1 CGACC A GGT... Taxon2 CGACC A GGT... - PowerPoint PPT Presentation

Parsimony 123456789... Taxon1 CGACC A GGT... Taxon2 CGACC A GGT... Taxon3 CGGTC C GGT... Taxon4 CGGCC T GGT... Same tree but with data One of the Taxon1 Taxon3 A C from site 6 inserted three possible in place of taxon unrooted names trees Taxon2 Taxon4 A T

"Standard" Parsimony

Important things to note about that last slide: • Two (2) steps was the minimum – no way to explain the observed data with just 1 evolutionary change • More than one way to assign ancestral character states to get 2 steps – one interior node must have A but the other interior node can have anything except G • Enumerating all possible combinations of ancestral states is not the most efficient way to determine the number of steps – more on this later

Parsimony Steps 123456789... Taxon1 Taxon3 Taxon1 CGACC A GGT... Taxon2 CGACC A GGT... Taxon3 CGGTC C GGT... Taxon2 Taxon4 Taxon4 CGGCC T GGT... Steps 00110 2 000... Let's call this tree 1: (1,2,(3,4)) Tree 1's length for first 9 sites = 4

Parsimony Steps 123456789... Taxon1 Taxon2 Taxon1 CG A CC A GGT... Taxon2 CG A CC A GGT... Taxon3 CG G TC C GGT... Taxon3 Taxon4 Taxon4 CG G CC T GGT... Steps 00 2 10 2 000... Tree 2: (1,3,(2,4)) Tree 2's length for first 9 sites = 5

Parsimony Steps 123456789... Taxon1 Taxon2 Taxon1 CG A CC A GGT... Taxon2 CG A CC A GGT... Taxon3 CG G TC C GGT... Taxon4 Taxon3 Taxon4 CG G CC T GGT... Steps 00 2 10 2 000... Tree 3: (1,4,(2,3)) Tree 3's length for first 9 sites = 5

Parsimony (using only 9 sites) Taxon1 Taxon3 Taxon1 Taxon3 Taxon1 Taxon3 Taxon2 Taxon4 Taxon2 Taxon4 Taxon2 Taxon4 4 steps 5 steps 5 steps most This is the simplest explanation of the data for the first 9 sites according to the parsimony parsimonious criterion. Choosing one of the other two trees requires additional (ad hoc) justification.

Wagner vs. Fitch Parsimony (distinction exists only in case of multistate characters) 3 2 0 1 In Fitch Parsimony, a change 1 between any This "tree" says Note: this is two states is that all changes just one possible possible, and between 0 and 2, all changes character state 2 3 0 and 3, or 2 and count just 1 tree 3 must go through step state 1 (and thus 0 require 2 steps) Wagner Fitch (ordered characters) (unordered characters)

Transversion Parsimony • Transitions (A ↔ G, C ↔ T) more common than transversions (all other changes) • Transitions saturate faster than transversions, thus transversions are sometimes more reliable for reconstructing history • Transversion parsimony is extreme, ignoring all transitions, counts 1 step for each transversion

Saturation C G A A A G A → G A → G Transitions common, often involved in parallelism (shown here), convergence, or reversal Saturation refers to the loss of historical Transversions rarer, C → A information due to the should trust them more effect of "multiple hits"

Implementing Transversion Parsimony • Ambiguity codes: – R means purine (A or G) – Y means pyrimidine (C or T) • Replace nucleotides with either R or Y – only transversions will be detectable • Note: Nexus data file format allows you to do this substitution virtually – no need to actually modify your data

Transversion Parsimony

Step Matrices To A C G T A 0 1 1 1 C 1 0 1 1 From G 1 1 0 1 T 1 1 1 0 Step matrix for Fitch parsimony

Step Matrices To A C G T A 0 5 1 5 It counts 5 C 5 0 5 1 From for each G 1 5 0 5 transversion T 5 1 5 0 This step matrix implements something like transversion parsimony, but less severe

Step Matrices To A C G T A 0 5 1 5 And counts 1 C 5 0 5 1 step for each From transition G 1 5 0 5 T 5 1 5 0 This step matrix implements something like transversion parsimony, but less severe

Generalized Parsimony

Important points • Do not compare scores across parsimony variants – A tree with a transversion parsimony score of 25 is not necessarily better than a tree with a Fitch parsimony score of 31 • Parsimony does not provide any guidance for selecting weights for step matrices – parsimony cannot tell us that the transition:transversion weight ratio 1:5 is better than 1:1

Other variants • Camin-Sokal parsimony – characters are assumed irreversible – ancestral state assumed known – forces use of rooted trees • Dollo parsimony – derived state can arise only once, but as many reversals as needed are allowed – popular for modeling restriction sites (which are lost more easily than they are gained) • Unweighted parsimony, equal-weighted parsimony – usually means Fitch parsimony (what I call standard parsimony)

Counting steps with a minimum of effort T C C A A G {A,C} (+1 step) {A,G} (+1 step) 4 steps total {A} (+1 step) {A,C} (+1 step) {A,C,T}

What is "weighted" parsimony? When someone says they are using weighted parsimony, this can mean more than one thing: • Some changes weighted more than others – i.e. generalized parsimony • Some sites weighted more than other sites – weighting may be determined a priori – weighting may be dynamic (i.e. a function of the number of changes reconstructed)

Parsimony 123456789... Taxon1 CGACC A GGT... Taxon2 CGACC A GGT... - PowerPoint PPT Presentation

Parsimony 123456789... Taxon1 CGACC A GGT... Taxon2 CGACC A GGT... Taxon3 CGGTC C GGT... Taxon4 CGGCC T GGT... Same tree but with data One of the Taxon1 Taxon3 A C from site 6 inserted three possible in place of taxon unrooted names

Computing parsimony Parsimony treats each site (position in a sequence) l independently Total

Parsimony Small Parsimony and Search Algorithms Genome 559: Introduction to Statistical and

Parsimony Small Parsimony Genome 559: Introduction to Statistical and Computational Genomics

Parsimony Small Parsimony and Search Algorithms Genome 559: Introduction to Statistical and

Parsimony Large Parsimony, Search Algorithms, Branch confidence Genome 559: Introduction to

Phylogenetic trees III Maximum Parsimony Gerhard Jger Words, Bones, Genes, Tools February 28,

Phylogenetic trees III Maximum Parsimony . Gerhard Jger ESSLLI 2016 Gerhard Jger Maximum

Parsimony II Search Algorithms Genome 373 Genomic Informatics Elhanan Borenstein A quick

A quick review The parsimony principle: Find the tree that requires the fewest

A quick review The parsimony principle: Find the tree that requires the fewest

Parsimony II Search Algorithms Genome 373 Genomic Informatics Elhanan Borenstein A quick

The parsimony assumption in distance based methods Stuart Serdoz University of Western Sydney

Gene Tree Parsimony for Incomplete Gene Trees Md. Shamsuzzoha Bayzid and Tandy Warnow

The worst case complexity of Maximum Parsimony Amir Carmel Noa Musa-Lempel Dekel Tsur

Phylogenetics: Parsimony COMP 571 Luay Nakhleh, Rice University The Problem Input: Multiple

Introduction to characters and parsimony analysis Genetic Relationships Genetic relationships

L ear ning and Vision R esear ch Gr oup Shuic he ng YAN Natio nal U nive rsity o f Singapo

Introduction to Statistical and Computational Genomics Professors Jim Thomas and Elhanan

Sequence comparison: Introduction and motivation Genome 559: Introduction to Statistical and

Opportunity : Approach to New Product Development Idea and Opportunity A form, look or

CS CS 466 466 In Introduct ctio ion t to B Bio ioin informatics ics Lecture 2 Part 2

TITLE PAGE: Is protein sequence evolution constant over time? Carolin Kosiol & Nick

Data Mining in Bioinformatics Day 5: Frequent Subgraph Mining Chlo-Agathe Azencott &

A Docker-Based Replicability Study of a Neural Information Retrieval Model Nicola Ferro, Stefano

Sambuz

Useful Links

Newsletter

Mail Us

Parsimony 123456789... Taxon1 CGACC A GGT... Taxon2 CGACC A GGT... - PowerPoint PPT Presentation

Parsimony 123456789... Taxon1 CGACC A GGT... Taxon2 CGACC A GGT... Taxon3 CGGTC C GGT... Taxon4 CGGCC T GGT... Same tree but with data One of the Taxon1 Taxon3 A C from site 6 inserted three possible in place of taxon unrooted names

Computing parsimony Parsimony treats each site (position in a sequence) l independently Total

Parsimony Small Parsimony and Search Algorithms Genome 559: Introduction to Statistical and

Parsimony Small Parsimony Genome 559: Introduction to Statistical and Computational Genomics

Parsimony Small Parsimony and Search Algorithms Genome 559: Introduction to Statistical and

Parsimony Large Parsimony, Search Algorithms, Branch confidence Genome 559: Introduction to

Phylogenetic trees III Maximum Parsimony Gerhard Jger Words, Bones, Genes, Tools February 28,

Phylogenetic trees III Maximum Parsimony . Gerhard Jger ESSLLI 2016 Gerhard Jger Maximum

Parsimony II Search Algorithms Genome 373 Genomic Informatics Elhanan Borenstein A quick

A quick review The parsimony principle: Find the tree that requires the fewest

A quick review The parsimony principle: Find the tree that requires the fewest

Parsimony II Search Algorithms Genome 373 Genomic Informatics Elhanan Borenstein A quick

The parsimony assumption in distance based methods Stuart Serdoz University of Western Sydney

Gene Tree Parsimony for Incomplete Gene Trees Md. Shamsuzzoha Bayzid and Tandy Warnow

The worst case complexity of Maximum Parsimony Amir Carmel Noa Musa-Lempel Dekel Tsur

Phylogenetics: Parsimony COMP 571 Luay Nakhleh, Rice University The Problem Input: Multiple

Introduction to characters and parsimony analysis Genetic Relationships Genetic relationships

L ear ning and Vision R esear ch Gr oup Shuic he ng YAN Natio nal U nive rsity o f Singapo

Introduction to Statistical and Computational Genomics Professors Jim Thomas and Elhanan

Sequence comparison: Introduction and motivation Genome 559: Introduction to Statistical and

Opportunity : Approach to New Product Development Idea and Opportunity A form, look or

CS CS 466 466 In Introduct ctio ion t to B Bio ioin informatics ics Lecture 2 Part 2

TITLE PAGE: Is protein sequence evolution constant over time? Carolin Kosiol &amp; Nick

Data Mining in Bioinformatics Day 5: Frequent Subgraph Mining Chlo-Agathe Azencott &amp;

A Docker-Based Replicability Study of a Neural Information Retrieval Model Nicola Ferro, Stefano

Sambuz

Useful Links

Newsletter

Mail Us

TITLE PAGE: Is protein sequence evolution constant over time? Carolin Kosiol & Nick

Data Mining in Bioinformatics Day 5: Frequent Subgraph Mining Chlo-Agathe Azencott &