Practical Bioinformatics Mark Voorhies 4/29/2011 Mark Voorhies - PowerPoint PPT Presentation

Practical Bioinformatics Mark Voorhies 4/29/2011 Mark Voorhies Practical Bioinformatics

Our current tool set data (strings, floats, lists, nested lists) logic (if/then, try/except, for, while) functions (def, import, reload) Mark Voorhies Practical Bioinformatics

Our current tool set data (strings, floats, lists, nested lists) logic (if/then, try/except, for, while) functions (def, import, reload) File I/O (open, csv) Random numbers (seed, shuffle, choice) Mark Voorhies Practical Bioinformatics

Our current tool set data (strings, floats, lists, nested lists) logic (if/then, try/except, for, while) functions (def, import, reload) File I/O (open, csv) Random numbers (seed, shuffle, choice) Descriptive statistics (mean, pearson) Mark Voorhies Practical Bioinformatics

Whiteboard Image Mark Voorhies Practical Bioinformatics

Dictionaries geneticCode = { ”TTT” : ”F” , ”TTC” : ”F” , ”TTA” : ”L” , ”TTG” : ”L” , ”CTT” : ”L” , ”CTC” : ”L” , ”CTA” : ”L” , ”CTG” : ”L” , ”ATT” : ” I ” , ”ATC” : ” I ” , ”ATA” : ” I ” , ”ATG” : ”M” , ”GTT” : ”V” , ”GTC” : ”V” , ”GTA” : ”V” , ”GTG” : ”V” , ”TCT” : ”S” , ”TCC” : ”S” , ”TCA” : ”S” , ”TCG” : ”S” , ”CCT” : ”P” , ”CCC” : ”P” , ”CCA” : ”P” , ”CCG” : ”P” , ”ACT” : ”T” , ”ACC” : ”T” , ”ACA” : ”T” , ”ACG” : ”T” , ”GCT” : ”A” , ”GCC” : ”A” , ”GCA” : ”A” , ”GCG” : ”A” , ”TAT” : ”Y” , ”TAC” : ”Y” , ”TAA” : ” ∗ ” , ”TAG” : ” ∗ ” , ”CAT” : ”H” , ”CAC” : ”H” , ”CAA” : ”Q” , ”CAG” : ”Q” , ”AAT” : ”N” , ”AAC” : ”N” , ”AAA” : ”K” , ”AAG” : ”K” , ”GAT” : ”D” , ”GAC” : ”D” , ”GAA” : ”E” , ”GAG” : ”E” , ”TGT” : ”C” , ”TGC” : ”C” , ”TGA” : ” ∗ ” , ”TGG” : ”W” , ”CGT” : ”R” , ”CGC” : ”R” , ”CGA” : ”R” , ”CGG” : ”R” , ”AGT” : ”S” , ”AGC” : ”S” , ”AGA” : ”R” , ”AGG” : ”R” , ”GGT” : ”G” , ”GGC” : ”G” , ”GGA” : ”G” , ”GGG” : ”G” } Mark Voorhies Practical Bioinformatics

Exercise: Transforming sequences 1 Write a function to return the antisense strand of a DNA sequence in 3’ → 5’ orientation. 2 Write a function to return the compliment of a DNA sequence in 5’ → 3’ orientation. 3 Write a function to translate a DNA sequence Mark Voorhies Practical Bioinformatics

Whiteboard Image Mark Voorhies Practical Bioinformatics

Why compare sequences? Mark Voorhies Practical Bioinformatics

Why compare sequences? To find genes with a common ancestor To infer conserved molecular mechanism and biological function To find short functional motifs To find repetitive elements within a sequence To predict cross-hybridizing sequences (e.g. in microarray design) To predict nucleotide secondary structure Mark Voorhies Practical Bioinformatics

Nomenclature Homologs heritable elements with a common evolutionary origin. Mark Voorhies Practical Bioinformatics

Nomenclature Homologs heritable elements with a common evolutionary origin. Orthologs homologs arising from speciation. Paralogs homologs arising from duplication and divergence within a single genome. Mark Voorhies Practical Bioinformatics

Nomenclature Homologs heritable elements with a common evolutionary origin. Orthologs homologs arising from speciation. Paralogs homologs arising from duplication and divergence within a single genome. Xenologs homologs arising from horizontal transfer. Onologs homologs arising from whole genome duplication. Mark Voorhies Practical Bioinformatics

Dotplots Mark Voorhies Practical Bioinformatics

Dotplots If you’re feeling ambitious Given two sequences, write a dotplot 1 in CDT format for JavaTreeView Add a windowing function to smooth 2 the dotplot Mark Voorhies Practical Bioinformatics

Types of alignments Global Alignment Each letter of each sequence is aligned to a letter or a gap ( e.g. , Needleman-Wunsch) Local Alignment An optimal pair of subsequences is taken from the two sequences and globally aligned ( e.g. , Smith-Waterman) Mark Voorhies Practical Bioinformatics

Exercise: Scoring an ungapped alignment s = { ”A” : { ”A” : 1.0 , ”T” : − 1.0 , ”G” : − 1.0 , ”C” : − 1.0 } , ”T” : { ”A” : − 1.0 , ”T” : 1.0 , ”G” : − 1.0 , ”C” : − 1.0 } , ”G” : { ”A” : − 1.0 , ”T” : − 1.0 , ”G” : 1.0 , ”C” : − 1.0 } , ”C” : { ”A” : − 1.0 , ”T” : − 1.0 , ”G” : − 1.0 , ”C” : 1.0 }} Mark Voorhies Practical Bioinformatics

Exercise: Scoring an ungapped alignment s = { ”A” : { ”A” : 1.0 , ”T” : − 1.0 , ”G” : − 1.0 , ”C” : − 1.0 } , ”T” : { ”A” : − 1.0 , ”T” : 1.0 , ”G” : − 1.0 , ”C” : − 1.0 } , ”G” : { ”A” : − 1.0 , ”T” : − 1.0 , ”G” : 1.0 , ”C” : − 1.0 } , ”C” : { ”A” : − 1.0 , ”T” : − 1.0 , ”G” : − 1.0 , ”C” : 1.0 }} N � S ( x ; y ) = s ( x i ; y i ) i Mark Voorhies Practical Bioinformatics

Exercise: Scoring an ungapped alignment s = { ”A” : { ”A” : 1.0 , ”T” : − 1.0 , ”G” : − 1.0 , ”C” : − 1.0 } , ”T” : { ”A” : − 1.0 , ”T” : 1.0 , ”G” : − 1.0 , ”C” : − 1.0 } , ”G” : { ”A” : − 1.0 , ”T” : − 1.0 , ”G” : 1.0 , ”C” : − 1.0 } , ”C” : { ”A” : − 1.0 , ”T” : − 1.0 , ”G” : − 1.0 , ”C” : 1.0 }} N � S ( x ; y ) = s ( x i ; y i ) i 1 Given two equal length sequences and a scoring matrix, return the alignment score for a full length, ungapped alignment. Mark Voorhies Practical Bioinformatics

Exercise: Scoring an ungapped alignment s = { ”A” : { ”A” : 1.0 , ”T” : − 1.0 , ”G” : − 1.0 , ”C” : − 1.0 } , ”T” : { ”A” : − 1.0 , ”T” : 1.0 , ”G” : − 1.0 , ”C” : − 1.0 } , ”G” : { ”A” : − 1.0 , ”T” : − 1.0 , ”G” : 1.0 , ”C” : − 1.0 } , ”C” : { ”A” : − 1.0 , ”T” : − 1.0 , ”G” : − 1.0 , ”C” : 1.0 }} N � S ( x ; y ) = s ( x i ; y i ) i 1 Given two equal length sequences and a scoring matrix, return the alignment score for a full length, ungapped alignment. 2 Given two sequences and a scoring matrix, find the offset that yields the best scoring ungapped alignment. Mark Voorhies Practical Bioinformatics

Exercise: Scoring a gapped alignment 1 Given two equal length gapped sequences (where “-” represents a gap) and a scoring matrix, calculate an alignment score with a -1 penalty for each base aligned to a gap. Mark Voorhies Practical Bioinformatics

Exercise: Scoring a gapped alignment 1 Given two equal length gapped sequences (where “-” represents a gap) and a scoring matrix, calculate an alignment score with a -1 penalty for each base aligned to a gap. 2 Write a new scoring function with separate penalties for opening a zero length gap ( e.g. , G = -11) and extending an open gap by one base ( e.g. , E = -1). gaps � S gapped ( x ; y ) = S ( x ; y ) + ( G + E ∗ len ( i )) i Mark Voorhies Practical Bioinformatics

Homework 1 Read chapter 3 of the BLAST book (Sequence Alignment). 2 Try initializing and filling in a dynamic programming matrix by hand ( e..g , try reproducing one of the examples from the BLAST book on paper). Mark Voorhies Practical Bioinformatics

Practical Bioinformatics Mark Voorhies 4/29/2011 Mark Voorhies - PowerPoint PPT Presentation

Practical Bioinformatics Mark Voorhies 4/29/2011 Mark Voorhies Practical Bioinformatics Our current tool set data (strings, floats, lists, nested lists) logic (if/then, try/except, for, while) functions (def, import, reload) Mark Voorhies

Practical Bioinformatics Mark Voorhies 5/15/2015 Mark Voorhies Practical Bioinformatics

Practical Bioinformatics Mark Voorhies 4/16/2018 Mark Voorhies Practical Bioinformatics

Practical Bioinformatics Mark Voorhies 4/9/2018 Mark Voorhies Practical Bioinformatics

Practical Bioinformatics Mark Voorhies 5/12/2015 Mark Voorhies Practical Bioinformatics

Practical Bioinformatics Mark Voorhies 6/3/2013 Mark Voorhies Practical Bioinformatics

Practical Bioinformatics Mark Voorhies 5/ 24/ 2013 Mark Voorhies Practical Bioinformatics

Practical Bioinformatics Mark Voorhies 5/23/2019 Mark Voorhies Practical Bioinformatics

Practical Bioinformatics Mark Voorhies 5/21/2019 Mark Voorhies Practical Bioinformatics Change

Practical Bioinformatics Mark Voorhies 5/11/2015 Mark Voorhies Practical Bioinformatics

Practical Bioinformatics Mark Voorhies 5/29/2019 Mark Voorhies Practical Bioinformatics

Practical Bioinformatics Mark Voorhies 4/20/2011 Mark Voorhies Practical Bioinformatics Review

Practical Bioinformatics Mark Voorhies 5/21/2013 Mark Voorhies Practical Bioinformatics

Practical Bioinformatics Mark Voorhies 5/26/2015 Mark Voorhies Practical Bioinformatics Habits

Data Mining in Bioinformatics Day 7: Clustering in Bioinformatics Karsten Borgwardt February 25

Practical Bioinformatics Mark Voorhies 5/14/2019 Mark Voorhies Practical Bioinformatics Course

Practical Bioinformatics Mark Voorhies 5/2/2017 Mark Voorhies Practical Bioinformatics

INTRODUCTION TO PYTORCH Caio corro Computation Graph Dynamic: you re-build the computation

Lecture 9: Floating Point Todays topics: Division IEEE 754 representations 1

Evaluating the hardware cost of the posit number system FPL19 Barcelona Yohann Uguen, Luc

CS4402-9535: High-Performance Computing with CUDA Marc Moreno Maza University of Western Ontario,

CS 240 Programming in C Variable Names, Elementary Types, Bit Operations September 25, 2019

Toss-n-Wash Luther Banner, Alex Breton, Ian Tolan 2 A Is this you? 3 A A combination

Welcome to CE488 Environmental Geotechnics Lecture #17 Dept. of Civil Engineering IIT Bombay

Numerical Modeling and Reservoir Management some Concepts and Applications CEMRACS 2013 August 6