Phylogenetic methods for taxonomic profiling Siavash Mirarab University of California at San Diego (UCSD) Joint work with Tandy Warnow, Nam-Phuong Nguyen, Mike Nute, Mihai Pop, and Bo Liu
Phylogeny reconstruction pipeline gene 1 ACTGCACACCG ACTGCCCCCG AATGCCCCCG CTGCACACGG samples gene 2 CTGAGCATCG CTGAGCTCG ATGAGCTC Sequencing CTGACACG gene 1000 CAGGCACGCACGAA AGCCACGCCATA Bioinformatic ATGGCACGCCTA AGCTACCACGGAT processing 2
Phylogeny reconstruction pipeline Step 1: Multiple sequence alignment gene 1 gene 1 ACTGCACACCG ACTGCACACCG ACTGCCCCCG ACTGC-CCCCG MSA AATGCCCCCG AATGC-CCCCG CTGCACACGG -CTGCACACGG samples gene 2 gene 2 CTGAGCATCG CTGAGCATCG CTGAGCTCG CTGAGC-TCG MSA ATGAGCTC ATGAGC-TC- Sequencing CTGACACG CTGA-CAC-G gene 1000 gene 1000 CAGGCACGCACGAA CAGGCACGCACGAA AGCCACGCCATA AGC-CACGC-CATA MSA Bioinformatic ATGGCACGCCTA ATGGCACGC-C-TA AGCTACCACGGAT AGCTAC-CACGGAT processing 2
Phylogeny reconstruction pipeline Step 2: Species tree reconstruction Approach 1: Concatenation Orangutan Chimpanzee supermatrix Phylogeny gene 1 gene 2 gene 1000 inference ACTGCACACCG CTGAGCATCG CAGAGCACGCACGAA ACTGC-CCCCG CTGAGC-TCG AGCA-CACGC-CATA AATGC-CCCCG ATGAGC-TC- ATGAGCACGC-C-TA Gorilla Human -CTGCACACGG CTGA-CAC-G AGC-TAC-CACGGAT Step 1: Multiple sequence alignment Approach 2: Summary methods Gene tree estimation gene 1 gene 1 ACTGCACACCG ACTGCACACCG Chimp Gorilla ACTGCCCCCG ACTGC-CCCCG gene 1 MSA AATGCCCCCG AATGC-CCCCG CTGCACACGG -CTGCACACGG Orangutan Chimpanzee samples Human Orang. gene 2 gene 2 Gorilla Chimp gene 2 CTGAGCATCG CTGAGCATCG Summary CTGAGCTCG CTGAGC-TCG MSA method ATGAGCTC ATGAGC-TC- Sequencing CTGACACG CTGA-CAC-G Human Orang. Gorilla Human gene 1000 Orang. Chimp gene 1000 gene 1000 CAGGCACGCACGAA CAGGCACGCACGAA Human Gorilla AGCCACGCCATA AGC-CACGC-CATA MSA Bioinformatic ATGGCACGCCTA ATGGCACGC-C-TA AGCTACCACGGAT AGCTAC-CACGGAT processing 2
Phylogeny reconstruction pipeline Step 2: Species tree reconstruction Approach 1: Concatenation Orangutan Chimpanzee supermatrix Phylogeny gene 1 gene 2 gene 1000 inference ACTGCACACCG CTGAGCATCG CAGAGCACGCACGAA ACTGC-CCCCG CTGAGC-TCG AGCA-CACGC-CATA AATGC-CCCCG ATGAGC-TC- ATGAGCACGC-C-TA Gorilla Human -CTGCACACGG CTGA-CAC-G AGC-TAC-CACGGAT Step 1: Multiple sequence alignment Approach 2: Summary methods Gene tree estimation gene 1 gene 1 ACTGCACACCG ACTGCACACCG Chimp Gorilla ACTGCCCCCG ACTGC-CCCCG gene 1 MSA AATGCCCCCG AATGC-CCCCG CTGCACACGG -CTGCACACGG Orangutan Chimpanzee samples Human Orang. gene 2 gene 2 Gorilla Chimp gene 2 CTGAGCATCG CTGAGCATCG Summary CTGAGCTCG CTGAGC-TCG MSA method ATGAGCTC ATGAGC-TC- Sequencing CTGACACG CTGA-CAC-G Human Orang. Gorilla Human gene 1000 Orang. Chimp gene 1000 gene 1000 CAGGCACGCACGAA CAGGCACGCACGAA Human Gorilla AGCCACGCCATA AGC-CACGC-CATA MSA Bioinformatic ATGGCACGCCTA ATGGCACGC-C-TA AGCTACCACGGAT AGCTAC-CACGGAT processing TGGCACGCAACG ATGGCACGCTA ATGGCACGCA AGCTAACACGGAT ATGGCACGA 2
Phylogeny reconstruction pipeline Step 2: Species tree reconstruction Approach 1: Concatenation Orangutan Chimpanzee supermatrix Phylogeny gene 1 gene 2 gene 1000 inference ACTGCACACCG CTGAGCATCG CAGAGCACGCACGAA ACTGC-CCCCG CTGAGC-TCG AGCA-CACGC-CATA AATGC-CCCCG ATGAGC-TC- ATGAGCACGC-C-TA Gorilla Human -CTGCACACGG CTGA-CAC-G AGC-TAC-CACGGAT Step 1: Multiple sequence alignment Approach 2: Summary methods Gene tree estimation gene 1 gene 1 ACTGCACACCG ACTGCACACCG Chimp Gorilla ACTGCCCCCG ACTGC-CCCCG gene 1 MSA AATGCCCCCG AATGC-CCCCG CTGCACACGG -CTGCACACGG Orangutan Chimpanzee samples Human Orang. gene 2 gene 2 Gorilla Chimp gene 2 CTGAGCATCG CTGAGCATCG Summary CTGAGCTCG CTGAGC-TCG MSA method ATGAGCTC ATGAGC-TC- Sequencing CTGACACG CTGA-CAC-G Human Orang. Gorilla Human gene 1000 Orang. Chimp gene 1000 gene 1000 CAGGCACGCACGAA CAGGCACGCACGAA Human Gorilla AGCCACGCCATA AGC-CACGC-CATA MSA Bioinformatic ATGGCACGCCTA ATGGCACGC-C-TA AGCTACCACGGAT AGCTAC-CACGGAT processing TGGCACGCAACG ATGGCACGCTA ATGGCACGCA AGCTAACACGGAT Step 3: Phylogenetic placement ATGGCACGA gene 1000 gene 200 gene 30 Gorilla Chimp Gorilla Chimp Gorilla Chimp CAGGCACGCACGAA CAGGCACGCACGAA CAGGCACGCACGAA AGC-CACGC-CATA AGC-CACGC-CATA AGC-CACGC-CATA ATGGCACGC-C-TA ATGGCACGC-C-TA ATGGCACGC-C-TA Human Orang. Human Orang. AGCTAC-CACGGAT AGCTAC-CACGGAT AGCTAC-CACGGAT Human Orang. -ACATGGCT----- ----ATGGCGA--- -ACATGGCT----- -----CATTGCT-- 2
Phylogeny reconstruction pipeline Step 2: Species tree reconstruction — PASTA Approach 1: Concatenation Orangutan Chimpanzee — UPP supermatrix Phylogeny gene 1 gene 2 gene 1000 inference ACTGCACACCG CTGAGCATCG CAGAGCACGCACGAA ACTGC-CCCCG CTGAGC-TCG AGCA-CACGC-CATA AATGC-CCCCG ATGAGC-TC- ATGAGCACGC-C-TA Gorilla Human -CTGCACACGG CTGA-CAC-G AGC-TAC-CACGGAT Step 1: Multiple sequence alignment Approach 2: Summary methods Gene tree estimation gene 1 gene 1 ACTGCACACCG ACTGCACACCG Chimp Gorilla ACTGCCCCCG ACTGC-CCCCG gene 1 MSA AATGCCCCCG AATGC-CCCCG CTGCACACGG -CTGCACACGG Orangutan Chimpanzee samples Human Orang. gene 2 gene 2 Gorilla Chimp gene 2 CTGAGCATCG CTGAGCATCG Summary CTGAGCTCG CTGAGC-TCG MSA method ATGAGCTC ATGAGC-TC- Sequencing CTGACACG CTGA-CAC-G Human Orang. Gorilla Human gene 1000 Orang. Chimp gene 1000 gene 1000 CAGGCACGCACGAA CAGGCACGCACGAA Human Gorilla AGCCACGCCATA AGC-CACGC-CATA MSA Bioinformatic ATGGCACGCCTA ATGGCACGC-C-TA AGCTACCACGGAT AGCTAC-CACGGAT processing TGGCACGCAACG ATGGCACGCTA ATGGCACGCA AGCTAACACGGAT Step 3: Phylogenetic placement ATGGCACGA gene 1000 gene 200 gene 30 Gorilla Chimp Gorilla Chimp Gorilla Chimp CAGGCACGCACGAA CAGGCACGCACGAA CAGGCACGCACGAA AGC-CACGC-CATA AGC-CACGC-CATA AGC-CACGC-CATA ATGGCACGC-C-TA ATGGCACGC-C-TA ATGGCACGC-C-TA Human Orang. Human Orang. AGCTAC-CACGGAT AGCTAC-CACGGAT AGCTAC-CACGGAT Human Orang. -ACATGGCT----- ----ATGGCGA--- -ACATGGCT----- -----CATTGCT-- 2
Phylogeny reconstruction pipeline Step 2: Species tree reconstruction Sta$s$cal ¡ — PASTA Approach 1: Concatenation binning Orangutan Chimpanzee — UPP supermatrix Phylogeny gene 1 gene 2 gene 1000 inference ACTGCACACCG CTGAGCATCG CAGAGCACGCACGAA ACTGC-CCCCG CTGAGC-TCG AGCA-CACGC-CATA AATGC-CCCCG ATGAGC-TC- ATGAGCACGC-C-TA Gorilla Human -CTGCACACGG CTGA-CAC-G AGC-TAC-CACGGAT Step 1: Multiple sequence alignment Approach 2: Summary methods Gene tree estimation gene 1 gene 1 ACTGCACACCG ACTGCACACCG Chimp Gorilla ACTGCCCCCG ACTGC-CCCCG gene 1 MSA AATGCCCCCG AATGC-CCCCG CTGCACACGG -CTGCACACGG Orangutan Chimpanzee samples Human Orang. gene 2 gene 2 Gorilla Chimp gene 2 CTGAGCATCG CTGAGCATCG Summary CTGAGCTCG CTGAGC-TCG MSA method ATGAGCTC ATGAGC-TC- Sequencing CTGACACG CTGA-CAC-G Human Orang. Gorilla Human gene 1000 Orang. Chimp gene 1000 gene 1000 ASTRAL CAGGCACGCACGAA CAGGCACGCACGAA Human Gorilla AGCCACGCCATA AGC-CACGC-CATA MSA Bioinformatic ATGGCACGCCTA ATGGCACGC-C-TA AGCTACCACGGAT AGCTAC-CACGGAT processing TGGCACGCAACG ATGGCACGCTA ATGGCACGCA AGCTAACACGGAT Step 3: Phylogenetic placement ATGGCACGA gene 1000 gene 200 gene 30 Gorilla Chimp Gorilla Chimp Gorilla Chimp CAGGCACGCACGAA CAGGCACGCACGAA CAGGCACGCACGAA AGC-CACGC-CATA AGC-CACGC-CATA AGC-CACGC-CATA ATGGCACGC-C-TA ATGGCACGC-C-TA ATGGCACGC-C-TA Human Orang. Human Orang. AGCTAC-CACGGAT AGCTAC-CACGGAT AGCTAC-CACGGAT Human Orang. -ACATGGCT----- ----ATGGCGA--- -ACATGGCT----- -----CATTGCT-- 2
Recommend
More recommend