Introduction CSCE CSCE 471/871 471/871 Lecture 6: Lecture 6: - PDF document

Introduction CSCE CSCE 471/871 471/871 Lecture 6: Lecture 6: Multiple Multiple CSCE 471/871 Lecture 6: Multiple Sequence Sequence Start with a set of sequences Alignments Alignments Sequence Alignments In each column, residues are homolgous Stephen Scott Stephen Scott Residues occupy similar positions in 3D structure Introduction Introduction Residues diverge from a common ancestral residue Scoring Scoring Figure 6.1 Stephen Scott Multidimensional Multidimensional Can be done manually, but requires expertise and is DP DP Progressive Progressive very tedious Alignments Alignments Often there is no single, unequivocally “correct” MA via Profile MA via Profile HMMs HMMs alignment Problems from low sequence identity & structural evolution sscott@cse.unl.edu 1 / 33 2 / 33 Outline Scoring a Multiple Alignment CSCE CSCE Scoring a multiple alignment Ideally, is based in evolution, as in e.g., PAM and 471/871 471/871 Minimum entropy scoring Lecture 6: Lecture 6: BLOSUM matrices Multiple Multiple Sum of pairs (SP) scoring Sequence Sequence Contrasts with pairwise alignments: Alignments Alignments Multidimenisonal dynamic programming Position-specific scoring (some positions more Stephen Scott Stephen Scott 1 Standard MDP algorithm conserved than others) MSA Introduction Introduction Ideally, need to consider entire phylogenetic tree to Progressive alignment methods 2 Scoring Scoring explain evolution of entire family Feng-Doolittle Minimum Entropy Multidimensional Sum of Pairs Profile alignment I.e., build complete probabilistic model of evolution DP Multidimensional CLUSTALW Progressive Not enough data to parameterize such a model DP Alignments Iterative refinement ⇒ use approximations Progressive MA via Profile Alignments Multiple alignment via profile HMMs HMMs Assume columns statistically independent: MA via Profile Multiple alignment with known profile HMM HMMs Profile HMM training from unaligned sequences X S ( m ) = G + S ( m i ) Initial model Baum-Welch i Avoiding local maxima m i is column i of MA m , G is (affine) score of gaps in m Model surgery 3 / 33 4 / 33 Scoring a Multiple Alignment Scoring a Multiple Alignment Minimum Entropy Scoring Minimum Entropy Scoring (2) CSCE CSCE 471/871 471/871 Lecture 6: Lecture 6: Multiple Multiple Sequence Sequence Set score to be S ( m i ) = − log P ( m i ) = − P Alignments Alignments a c ia log p ia Stephen Scott Stephen Scott m j Propotional to Shannon entropy i = symbol in column i in sequence j , c ia = observed Define optimal alignment as count of residue a in column i Introduction Introduction Scoring Scoring (X ) Assume sequences are statistically independent, i.e., Minimum Entropy Minimum Entropy m ⇤ = argmin S ( m i ) Sum of Pairs Sum of Pairs residues independent within columns m Multidimensional Multidimensional m i 2 m Then probability of column m i is P ( m i ) = Q a p c ia ia , where DP DP Independence assumption valid only if all evolutionary Progressive Progressive p ia = probability of a in column i Alignments Alignments subfamilies are represented equally; otherwise bias MA via Profile MA via Profile HMMs HMMs skews results 5 / 33 6 / 33

Scoring a Multiple Alignment Scoring a Multiple Alignment Sum of Pairs (SP) Scores SP Problem CSCE CSCE 471/871 471/871 Given an alignment with only “L ” in column i , using Lecture 6: Lecture 6: � N � Multiple Multiple Treat multiple alignment as pairwise alignments BLOSUM50 yields an SP score of Sequence Sequence 2 Alignments Alignments � N � If s ( a , b ) = substitution score from e.g., PAM or S 1 = 5 = 5 N ( N − 1 ) / 2 2 Stephen Scott Stephen Scott BLOSUM: If one “L ” is replaced with “G”, then SP score is X s ( m k i , m ` Introduction S ( m i ) = i ) Introduction S 2 = S 1 − 9 ( N − 1 ) Scoring Scoring k < ` Problem: Minimum Entropy Minimum Entropy Sum of Pairs Sum of Pairs Caveat: s ( a , b ) was derived for pairwise comparisons, Multidimensional Multidimensional 9 ( N − 1 ) S 2 5 N ( N − 1 ) / 2 = 1 − 18 not N -way comparisons DP DP = 1 − 5 N , S 1 Progressive Progressive Alignments Alignments correct SP MA via Profile MA via Profile i.e., as N increases, S 2 / S 1 → 1 z }| { z }| { HMMs HMMs p abc log p ab + log p bc + log p ac = log p ab p bc p ac log vs. But large N should give more support for “L ” in m i q 2 a q 2 b q 2 q a q b q c q a q b q b q c q a q c relative to S 2 , not less (i.e., should have S 2 / S 1 c decreasing) 7 / 33 8 / 33 Multidimensional Dynamic Programming Multidimensional Dynamic Programming (2) CSCE CSCE Generalization of DP for pairwise alignments 471/871 471/871 Lecture 6: Lecture 6: Assume statistical independence of columns and linear Multiple Multiple Sequence Sequence gap penalty (can also handle affine gap penalties) Alignments Alignments S ( m ) = P i S ( m i ) , and ↵ i 1 , i 2 ,..., i N = max score of Stephen Scott Stephen Scott alignment of subsequences x 1 1 ... i 1 , x 2 1 ... i 2 , . . . , x N Assume all N sequences are of length L 1 ... i N Introduction Introduction Scoring Scoring Space complexity = Θ ( ) 8 � � x 1 i 1 , x 2 i 2 , x 3 i 3 , . . . , x N ↵ i 1 � 1 , i 2 � 1 , i 3 � 1 ,..., i N � 1 + S , Multidimensional > i N Multidimensional > � � Time complexity = Θ ( ) − , x 2 i 2 , x 3 i 3 , . . . , x N DP > ↵ i 1 , i 2 � 1 , i 3 � 1 ,..., i N � 1 + S , DP > > i N > � � Algorithm Algorithm x 1 i 1 , − , x 3 i 3 , . . . , x N > ↵ i 1 � 1 , i 2 , i 3 � 1 ,..., i N � 1 + , S Is it practical? > MSA MSA > i N > . < Progressive . Progressive ↵ i 1 , i 2 ,..., i N = max . Alignments Alignments � � > x 1 i 1 , x 2 i 2 , x 3 ↵ i 1 � 1 , i 2 � 1 , i 3 � 1 ,..., i N + S i 3 , . . . , − , > MA via Profile MA via Profile > > � � HMMs > HMMs − , − , x 3 i 3 , . . . , x N ↵ i 1 , i 2 , i 3 � 1 ,..., i N � 1 + S , > > i N > > . > . : . In each column, take all gap-residue combinations except 100% gaps 9 / 33 10 / 33 MSA [Carrillo & Lipman 88; Lipman et al. 89] MSA (2) CSCE CSCE 471/871 471/871 Assume we have lower bound � ( a ⇤ ) on score of optimal Lecture 6: Lecture 6: Multiple Multiple alignment a ⇤ : Sequence Sequence Uses MDP , but eliminates many entries from Alignments Alignments X consideration to save time Stephen Scott Stephen Scott � ( a ⇤ ) ≤ S ( a ⇤ ) = S ( a ⇤ k ` ) Can optimally solve problems with L = 300 and N = 7 k < ` Introduction Introduction (old numbers), L = 150 and N = 50 , L = 500 and X X Scoring Scoring S ( a ⇤ k 0 ` 0 ) ≤ S ( a ⇤ k ` ) + a k 0 ` 0 ) = S ( a ⇤ k ` ) + S (ˆ N = 25 , and L = 1000 and N = 10 (newer numbers) Multidimensional Multidimensional k 0 < ` 0 k 0 < ` 0 DP DP Uses SP scoring: S ( a ) = P k < ` S ( a k ` ) , where a is any ( k 0 , ` 0 ) 6 =( k , ` ) ( k 0 , ` 0 ) 6 =( k , ` ) Algorithm Algorithm MSA MSA MA and a k ` is PA between x k and x ` induced by a Progressive Progressive Alignments Alignments Thus S ( a ⇤ k ` ) ≥ � k ` = � ( a ⇤ ) − P a k ` is optimal PA between x k and x ` (easily computed), a k 0 ` 0 ) S (ˆ If ˆ k 0 < ` 0 MA via Profile MA via Profile ( k 0 , ` 0 ) 6 =( k , ` ) HMMs then S ( a k ` ) ≤ S (ˆ a k ` ) for all k and ` HMMs When filling in matrix, only need to consider PAs that score at least � k ` (Figure 6.3) Can get � ( a ⇤ ) from other (heuristic) alignment methods 11 / 33 12 / 33

Introduction CSCE CSCE 471/871 471/871 Lecture 6: Lecture 6: - PDF document

Introduction CSCE CSCE 471/871 471/871 Lecture 6: Lecture 6: Multiple Multiple CSCE 471/871 Lecture 6: Multiple Sequence Sequence Start with a set of sequences Alignments Alignments Sequence Alignments In each column, residues are

INTRODUCTION INTRODUCTION INTRODUCTION INTRODUCTION INTRODUCTION INTRODUCTION INTRODUCTION

Introduction ATV Introduction A T V Introduction A lphabet T V Introduction A lphabet

Brief Brief Introduction Introduction Brief Brief Introduction Introduction Zhengzhou

Brief Brief Introduction Introduction Brief Brief Introduction Introduction Zhengzhou

Shenzhen Cuilu jewelry Co., Ltd was founded in 1996 and its a large private enterprise

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Spectrum Painting Richard Shipman MW0RCZ ADARS 6th Jan 2020 Introduction Introduction

Introduction Introduction Introduction Introduction Outline Motivation Failures

Introduction Introduction Introduction Nationwide Cause for Concern 1

Team Introduction Experiments Outreach Problem Project Brainstorm Introduction Introduction

Lecture 1 Andreas Habegger Introduction Zynq Introduction Zynq Introduction Zynq PS vs. PL

Introduction to Web Design & Computer Principles Class 1 CSCI-UA 4 Introduction and Overview

Introduction to CICS Course introduction Course introduction What is CICS? What is an

INF5110 Compiler Construction Introduction Spring 2016 1 / 33 Outline 1. Introduction

INTRODUCTION I Syllabus INTRODUCTION I Syllabus I Why study labor economics? INTRODUCTION I

2018.06 01 SMILE5 Introduction S E 5 02 Alpha Cloud M I L 03 Company Introduction 04

Operating programmers aware of the underlying HW no multitasking System one job at a time IBM

Memory Management 54 Memory Management Programs expand to fill the memory that holds them.

Embedded Systems Programming x86 Memory and Interrupt (Module 8) Yann-Hang Lee Arizona State

Linked Structures Songs, Games, Movies Part IV Fall 2013 Carola Wenk Storing Text Weve

Third Quarter 2010 Investor Call Investor Call Terry Turner, President and CEO Harold Carpenter,

Implementation Status of Fukushima Lessons Learned at Duke Energy Bill Pitesa Senior Vice

NEW COMPUTATIONAL RESULTS ON SOLVING THE SEQUENTIAL PROCEDURE WITH FEEDBACK David Boyce,

Kingmans coalescent Random collision of lineages as go back in time (sans recombination)

Introduction CSCE CSCE 471/871 471/871 Lecture 6: Lecture 6: - PDF document

Introduction CSCE CSCE 471/871 471/871 Lecture 6: Lecture 6: Multiple Multiple CSCE 471/871 Lecture 6: Multiple Sequence Sequence Start with a set of sequences Alignments Alignments Sequence Alignments In each column, residues are

INTRODUCTION INTRODUCTION INTRODUCTION INTRODUCTION INTRODUCTION INTRODUCTION INTRODUCTION

Introduction ATV Introduction A T V Introduction A lphabet T V Introduction A lphabet

Brief Brief Introduction Introduction Brief Brief Introduction Introduction Zhengzhou

Brief Brief Introduction Introduction Brief Brief Introduction Introduction Zhengzhou

Shenzhen Cuilu jewelry Co., Ltd was founded in 1996 and its a large private enterprise

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Spectrum Painting Richard Shipman MW0RCZ ADARS 6th Jan 2020 Introduction Introduction

Introduction Introduction Introduction Introduction Outline Motivation Failures

Introduction Introduction Introduction Nationwide Cause for Concern 1

Team Introduction Experiments Outreach Problem Project Brainstorm Introduction Introduction

Lecture 1 Andreas Habegger Introduction Zynq Introduction Zynq Introduction Zynq PS vs. PL

Introduction to Web Design &amp; Computer Principles Class 1 CSCI-UA 4 Introduction and Overview

Introduction to CICS Course introduction Course introduction What is CICS? What is an

INF5110 Compiler Construction Introduction Spring 2016 1 / 33 Outline 1. Introduction

INTRODUCTION I Syllabus INTRODUCTION I Syllabus I Why study labor economics? INTRODUCTION I

2018.06 01 SMILE5 Introduction S E 5 02 Alpha Cloud M I L 03 Company Introduction 04

Operating programmers aware of the underlying HW no multitasking System one job at a time IBM

Memory Management 54 Memory Management Programs expand to fill the memory that holds them.

Embedded Systems Programming x86 Memory and Interrupt (Module 8) Yann-Hang Lee Arizona State

Linked Structures Songs, Games, Movies Part IV Fall 2013 Carola Wenk Storing Text Weve

Third Quarter 2010 Investor Call Investor Call Terry Turner, President and CEO Harold Carpenter,

Implementation Status of Fukushima Lessons Learned at Duke Energy Bill Pitesa Senior Vice

NEW COMPUTATIONAL RESULTS ON SOLVING THE SEQUENTIAL PROCEDURE WITH FEEDBACK David Boyce,

Kingmans coalescent Random collision of lineages as go back in time (sans recombination)

Introduction to Web Design & Computer Principles Class 1 CSCI-UA 4 Introduction and Overview