The Plan BLAST CSE 427 Scoring Computational Biology Another - PowerPoint PPT Presentation

The Plan • BLAST CSE 427 • Scoring Computational Biology • Another Bio Interlude: PCR & Sequencing BLAST Alignment score significance PCR and DNA sequencing 1 2 A Protein Structure: (Dihydrofolate Reductase) Sequence Evolution Nothing in Biology Makes Sense Except in the Light of Evolution – Theodosius Dobzhansky , 1973 • Changes happen at random • Deleterious/neutral/advantageous changes unlikely/possibly/likely spread widely in a population • Changes are less likely to be tolerated in positions involved in many/close interactions, e.g. – enzyme binding pocket – protein/protein interaction surface – … 3 4 1

BLAST: BLAST: What Basic Local Alignment Search Tool Altschul, Gish, Miller, Myers, Lipman, J Mol Biol 1990 • Input: • The most widely used comp bio tool – a query sequence (say, 300 residues) • Which is better: long mediocre match or a few – a data base to search for other sequences similar to the nearby, short, strong matches with the same total query (say, 10 6 - 10 9 residues) score? – a score matrix σ (r,s), giving cost of substituting r for s (& – score-wise, exactly equivalent perhaps gap costs) – biologically, later may be more interesting, & is common – various score thresholds & tuning parameters – at least, if must miss some, rather miss the former • Output: • BLAST is a heuristic emphasizing the later – “all” matches in data base above threshold – speed/sensitivity tradeoff: BLAST may miss former, but – “E-value” of each gains greatly in speed 5 6 BLAST: How BLAST: Example ≥ 7 (thresh 1 ) Idea: find parts of data base near a good match to some query deadly short subword of the query • Break query into overlapping words w i of small fixed de (11) -> de ee dd dq dk length (e.g. 3 aa or 11 nt) ea ( 9) -> ea • For each w i , find (empirically, ~50) “neighboring” words ad (10) -> ad sd v ij with ungapped score σ (w i , v ij ) > thresh 1 dl (10) -> dl di dm dv • Look up each v ij in database (via prebuilt index) -- ly (11) -> ly my iy vy fy lf i.e., exact match to short, high-scoring word ddgearlyk . . . DB • Extend each such “seed match” (bidirectional) • Report those scoring > thresh 2 , calculate E-values ddge 10 hits ≥ 10 (thresh 2 ) early 18 7 8 2

BLOSUM 62 BLAST Refinements A R N D C Q E G H I L K M F P S T W Y V A 4 -1 -2 -2 0 -1 -1 0 -2 -1 -1 -1 -1 -2 -1 1 0 -3 -2 0 R -1 5 0 -2 -3 1 0 -2 0 -3 -2 2 -1 -3 -2 -1 -1 -3 -2 -3 • “Two hit heuristic” -- need 2 nearby, nonoverlapping, N -2 0 6 1 -3 0 0 0 1 -3 -3 0 -2 -3 -2 1 0 -4 -2 -3 D -2 -2 1 6 -3 0 2 -1 -1 -3 -4 -1 -3 -3 -1 0 -1 -4 -3 -3 gapless hits before trying to extend either C 0 -3 -3 -3 9 -3 -4 -3 -3 -1 -1 -3 -1 -2 -3 -1 -1 -2 -2 -1 • “Gapped BLAST” -- run heuristic version of Smith- Q -1 1 0 0 -3 5 2 -2 0 -3 -2 1 0 -3 -1 0 -1 -2 -1 -2 E -1 0 0 2 -4 2 5 -2 0 -3 -3 1 -2 -3 -1 0 -1 -3 -2 -2 Waterman, bi-directional from hit, until score drops by G 0 -2 0 -1 -3 -2 -2 6 -2 -4 -4 -2 -3 -3 -2 0 -2 -2 -3 -3 fixed amount below max H -2 0 1 -1 -3 0 0 -2 8 -3 -3 -1 -2 -1 -2 -1 -2 -2 2 -3 I -1 -3 -3 -3 -1 -3 -3 -4 -3 4 2 -3 1 0 -3 -2 -1 -3 -1 3 L -1 -2 -3 -4 -1 -2 -3 -4 -3 2 4 -2 2 0 -3 -2 -1 -2 -1 1 K -1 2 0 -1 -3 1 1 -2 -1 -3 -2 5 -1 -3 -1 0 -1 -3 -2 -2 • PSI-BLAST -- For proteins, iterated search, using M -1 -1 -2 -3 -1 0 -2 -3 -2 1 2 -1 5 0 -2 -1 -1 -1 -1 1 “weight matrix” pattern from initial pass to find weaker F -2 -3 -3 -3 -2 -3 -3 -3 -1 0 0 -3 0 6 -4 -2 -2 1 3 -1 P -1 -2 -2 -1 -3 -1 -1 -2 -2 -3 -3 -1 -2 -4 7 -1 -1 -4 -3 -2 matches in subsequent passes S 1 -1 1 0 -1 0 0 0 -1 -2 -2 0 -1 -2 -1 4 1 -3 -2 -2 T 0 -1 0 -1 -1 -1 -1 -2 -2 -1 -1 -1 -1 -2 -1 1 5 -2 -2 0 W -3 -3 -4 -4 -2 -2 -3 -2 -2 -3 -2 -3 -1 1 -4 -3 -2 11 2 -3 Y -2 -2 -2 -3 -2 -1 -2 -3 2 -1 -1 -2 -1 3 -3 -2 -2 2 7 -1 10 V 0 -3 -3 -3 -1 -2 -2 -3 -3 3 1 -2 1 -1 -2 -2 0 -3 -1 4 Hypothesis Testing: Significance of Alignments A Very Simple Example • Is “42” a good score? • Given: A coin, either fair (p(H)=1/2) or biased (p(H)=2/3) • Decide: which • Compared to what? • How? Flip it 5 times. Suppose outcome D = HHHTH • Null Model/Null Hypothesis M 0 : p(H)=1/2 • Usual approach: compared to a specific “null model”, • Alternative Model/Alt Hypothesis M 1 : p(H)=2/3 such as “random sequences” • Likelihoods: – P(D | M 0 ) = (1/2) (1/2) (1/2) (1/2) (1/2) = 1/32 – P(D | M 1 ) = (2/3) (2/3) (2/3) (1/3) (2/3) = 16/243 p ( D | M 1 ) p ( D | M 0 ) = 16/ 243 1/ 32 = 512 243 � 2.1 • Likelihood Ratio: I.e., alt model is ≈ 2.1x more likely than null model, given data 11 12 3

Hypothesis Testing, II p-values • Log of likelihood ratio is equivalent, often more • the p-value of such a test is the probability, assuming that the null model is true, of seeing data as extreme or more extreme convenient that what you actually observed – add logs instead of multiplying… • e.g., we observed 4 heads; p-value is prob of seeing 4 or 5 • “Likelihood Ratio Tests”: reject null if LLR > threshold heads in 5 tosses of a fair coin – LLR > 0 disfavors null, but higher threshold gives stronger • Why interesting? It measures probability that we would be evidence against making a mistake in rejecting null. • Neyman-Pearson Theorem: For a given error rate, • Usual scientific convention is to reject null only if p-value is < 0.05; sometimes demand p << 0.05 LRT is as good a test as any (subject to some fine print). • can analytically find p-value for simple problems like coins; often turn to simulation/permutation tests for more complex situations; as below 13 14 A Likelihood Ratio Test for Alignment Non- ad hoc Alignment Scores • Take alignments of homologs and look at frequency • Defn: two proteins are homologous if they are alike because of of x-y alignments vs freq of x, y overall shared ancestry; similarity by descent • Issues • suppose among proteins overall, residue x occurs with frequency p x – biased samples • then in a random alignment of 2 random proteins, you would expect – evolutionary distance p x y to find x aligned to y with prob p x p y 1 • suppose among homologs , x & y align with prob p xy � log 2 • BLOSUM approach p x p y • are seqs X & Y homologous? Which is – large collection of trusted alignments more likely, that the alignment reflects log p x i y i (the BLOCKS DB), � chance or homology? Use a likelihood – subsetted by similarity, e.g. ratio test. BLOSUM62 => 62% identity p x i p y i i – e.g. http://blocks.fhcrc.org/blocks-bin/getblock.pl?IPB013598 15 16 4

The Plan BLAST CSE 427 Scoring Computational Biology Another - PowerPoint PPT Presentation

The Plan BLAST CSE 427 Scoring Computational Biology Another Bio Interlude: PCR & Sequencing BLAST Alignment score significance PCR and DNA sequencing 1 2 A Protein Structure: (Dihydrofolate Reductase) Sequence Evolution

PHASE IA PLAN ULTIMATE PLAN 13 PHASE IB PLAN ULTIMATE PLAN 14 ULTIMATE PLAN ULTIMATE PLAN

To Plan or Not To Plan Failure to plan is a plan to fail Questions to ask yourself Who is this

( ) + ( ) + ( ) + What is an Area Plan ( ) + What is an Area Plan A Comprehensive

Best Available Technology SUPPLEMENTAL INFORMATION VMT C-Plan Components 4 Plan

Plan Growth Company Stats Plan Growth Plan Growth 133,527 91,873 66,972 48,573 28,369

STRATEGI C PLAN 20 18 -20 23 DEVELOPM ENT OF ASI S STRATEGI C PLAN THE STRATEGIC PLAN

Master Plan Open House #3 Preferred Alternative Master Plan Master Plan Process What is a

DISTRICT PLANS 2 Financial Plan Instructional Plan Strategic Plan STRATEGIC PLAN GOALS 3

Retirement Plan Information Session Todays meeting Why plan for retirement? Your Plan

40 Accident Plan Accident Plan New plan for 1/1/18 44,500 enrolled to date To date

What is a Neighbourhood Plan? A Neighbourhood Plan is a community led plan that sets out the

What is a Neighbourhood Plan? A Neighbourhood Plan is a community led plan that sets out the

Par Partnership ip Plan Plan 2.0 UnitedHealthcare re/Ox Oxfo ford rd w welcomes You t to

MESA/POWDERHORN PLAN UPDATE Meeting 3 October 23, 2012 Agenda 2 Plan Purpose Plan Area

Retirement Plan Changes Update Overview Retirement Plan Options at W&M No changes to

SAN TAN VALLEY AREA PLAN Community Development Department 7/06/16 San Tan Valley Area Plan

PLAN-E Workshop Dr. Patrick J.C. Aerts PLAN-E NLeSC 1 PLAN-E is is: A social platform for

NEIGHBOURHOOD PLAN SUBMISSION PLAN REGULATION 16 CONSULTATION Recap of what the Plan is for

1 VMT C-Plan Renewal VMT Coordination Workgroup VMT C-Plan Changes Tank Inspections

Site Plan May 2009 Site Plan February 2010 Site Plan May 5, 2010 Site Plan

Comprehensive Plan Rewrite JULY 2018 COMPREHENSIVE PLAN BACKGROUND Role of the Comprehensive

4 Updated Plan must be fiscally Every constrained Years Plan must conform to air quality

Glenview District 34 Presented by: Transition Plan: Superintendent Dr. Recommendations &

803 SOUTH AVENUE AERIAL MAP SITE PLAN CRAWL SPACE & FOUNDATION PLAN FIRST FLOOR PLAN

The Plan BLAST CSE 427 Scoring Computational Biology Another - PowerPoint PPT Presentation

The Plan BLAST CSE 427 Scoring Computational Biology Another Bio Interlude: PCR & Sequencing BLAST Alignment score significance PCR and DNA sequencing 1 2 A Protein Structure: (Dihydrofolate Reductase) Sequence Evolution

PHASE IA PLAN ULTIMATE PLAN 13 PHASE IB PLAN ULTIMATE PLAN 14 ULTIMATE PLAN ULTIMATE PLAN

To Plan or Not To Plan Failure to plan is a plan to fail Questions to ask yourself Who is this

( ) + ( ) + ( ) + What is an Area Plan ( ) + What is an Area Plan A Comprehensive

Best Available Technology SUPPLEMENTAL INFORMATION VMT C-Plan Components 4 Plan

Plan Growth Company Stats Plan Growth Plan Growth 133,527 91,873 66,972 48,573 28,369

STRATEGI C PLAN 20 18 -20 23 DEVELOPM ENT OF ASI S STRATEGI C PLAN THE STRATEGIC PLAN

Master Plan Open House #3 Preferred Alternative Master Plan Master Plan Process What is a

DISTRICT PLANS 2 Financial Plan Instructional Plan Strategic Plan STRATEGIC PLAN GOALS 3

Retirement Plan Information Session Todays meeting Why plan for retirement? Your Plan

40 Accident Plan Accident Plan New plan for 1/1/18 44,500 enrolled to date To date

What is a Neighbourhood Plan? A Neighbourhood Plan is a community led plan that sets out the

What is a Neighbourhood Plan? A Neighbourhood Plan is a community led plan that sets out the

Par Partnership ip Plan Plan 2.0 UnitedHealthcare re/Ox Oxfo ford rd w welcomes You t to

MESA/POWDERHORN PLAN UPDATE Meeting 3 October 23, 2012 Agenda 2 Plan Purpose Plan Area

Retirement Plan Changes Update Overview Retirement Plan Options at W&amp;M No changes to

SAN TAN VALLEY AREA PLAN Community Development Department 7/06/16 San Tan Valley Area Plan

PLAN-E Workshop Dr. Patrick J.C. Aerts PLAN-E NLeSC 1 PLAN-E is is: A social platform for

NEIGHBOURHOOD PLAN SUBMISSION PLAN REGULATION 16 CONSULTATION Recap of what the Plan is for

1 VMT C-Plan Renewal VMT Coordination Workgroup VMT C-Plan Changes Tank Inspections

Site Plan May 2009 Site Plan February 2010 Site Plan May 5, 2010 Site Plan

Comprehensive Plan Rewrite JULY 2018 COMPREHENSIVE PLAN BACKGROUND Role of the Comprehensive

4 Updated Plan must be fiscally Every constrained Years Plan must conform to air quality

Glenview District 34 Presented by: Transition Plan: Superintendent Dr. Recommendations &amp;

803 SOUTH AVENUE AERIAL MAP SITE PLAN CRAWL SPACE &amp; FOUNDATION PLAN FIRST FLOOR PLAN

Retirement Plan Changes Update Overview Retirement Plan Options at W&M No changes to

Glenview District 34 Presented by: Transition Plan: Superintendent Dr. Recommendations &

803 SOUTH AVENUE AERIAL MAP SITE PLAN CRAWL SPACE & FOUNDATION PLAN FIRST FLOOR PLAN