Practical Bioinformatics Mark Voorhies 5/19/2015 Mark Voorhies - PowerPoint PPT Presentation

Practical Bioinformatics Mark Voorhies 5/19/2015 Mark Voorhies Practical Bioinformatics

Review – Documentation A program should communicate its intent to both the computer and human programmers. Comments Docstrings Mark Voorhies Practical Bioinformatics

Review – Documentation A program should communicate its intent to both the computer and human programmers. Comments Docstrings Code and inputs defining a complete protocol Mark Voorhies Practical Bioinformatics

Review – Documentation A program should communicate its intent to both the computer and human programmers. Comments Docstrings Code and inputs defining a complete protocol Positive and negative controls Mark Voorhies Practical Bioinformatics

Review – “Top Down” design Experiment in the shell Factor working code into functions and modules Refine from problem-specific to generally applicable functions Mark Voorhies Practical Bioinformatics

Review – “Top Down” design Experiment in the shell Factor working code into functions and modules Refine from problem-specific to generally applicable functions “As simple as possible, but no simpler” Mark Voorhies Practical Bioinformatics

Dictionaries d i c t i o n a r y = { ”A” : ”T” , ”T” : ”A” , ”G” : ”C” , ”C” : ”G” } d i c t i o n a r y [ ”G” ] d i c t i o n a r y [ ”N” ] = ”N” d i c t i o n a r y . has key ( ”C” ) Mark Voorhies Practical Bioinformatics

Dictionaries geneticCode = { ”TTT” : ”F” , ”TTC” : ”F” , ”TTA” : ”L” , ”TTG” : ”L” , ”CTT” : ”L” , ”CTC” : ”L” , ”CTA” : ”L” , ”CTG” : ”L” , ”ATT” : ” I ” , ”ATC” : ” I ” , ”ATA” : ” I ” , ”ATG” : ”M” , ”GTT” : ”V” , ”GTC” : ”V” , ”GTA” : ”V” , ”GTG” : ”V” , ”TCT” : ”S” , ”TCC” : ”S” , ”TCA” : ”S” , ”TCG” : ”S” , ”CCT” : ”P” , ”CCC” : ”P” , ”CCA” : ”P” , ”CCG” : ”P” , ”ACT” : ”T” , ”ACC” : ”T” , ”ACA” : ”T” , ”ACG” : ”T” , ”GCT” : ”A” , ”GCC” : ”A” , ”GCA” : ”A” , ”GCG” : ”A” , ”TAT” : ”Y” , ”TAC” : ”Y” , ”TAA” : ” ∗ ” , ”TAG” : ” ∗ ” , ”CAT” : ”H” , ”CAC” : ”H” , ”CAA” : ”Q” , ”CAG” : ”Q” , ”AAT” : ”N” , ”AAC” : ”N” , ”AAA” : ”K” , ”AAG” : ”K” , ”GAT” : ”D” , ”GAC” : ”D” , ”GAA” : ”E” , ”GAG” : ”E” , ”TGT” : ”C” , ”TGC” : ”C” , ”TGA” : ” ∗ ” , ”TGG” : ”W” , ”CGT” : ”R” , ”CGC” : ”R” , ”CGA” : ”R” , ”CGG” : ”R” , ”AGT” : ”S” , ”AGC” : ”S” , ”AGA” : ”R” , ”AGG” : ”R” , ”GGT” : ”G” , ”GGC” : ”G” , ”GGA” : ”G” , ”GGG” : ”G” } Mark Voorhies Practical Bioinformatics

Whiteboard Image Mark Voorhies Practical Bioinformatics

Exercise: Transforming sequences 1 Write a function to return the antisense strand of a DNA sequence in 3’ → 5’ orientation. 2 Write a function to return the complement of a DNA sequence in 5’ → 3’ orientation. 3 Write a function to translate a DNA sequence Mark Voorhies Practical Bioinformatics

Why compare sequences? Mark Voorhies Practical Bioinformatics

Why compare sequences? To find genes with a common ancestor To infer conserved molecular mechanism and biological function To find short functional motifs To find repetitive elements within a sequence To predict cross-hybridizing sequences ( e.g. , in microarray design) To find genomic origin of imperfectly sequenced fragments ( e.g. , in deep sequencing experiments) To predict nucleotide secondary structure Mark Voorhies Practical Bioinformatics

Nomenclature Homologs heritable elements with a common evolutionary origin. Mark Voorhies Practical Bioinformatics

Nomenclature Homologs heritable elements with a common evolutionary origin. Orthologs homologs arising from speciation. Paralogs homologs arising from duplication and divergence within a single genome. Mark Voorhies Practical Bioinformatics

Nomenclature Homologs heritable elements with a common evolutionary origin. Orthologs homologs arising from speciation. Paralogs homologs arising from duplication and divergence within a single genome. Xenologs homologs arising from horizontal transfer. Onologs homologs arising from whole genome duplication. Mark Voorhies Practical Bioinformatics

Dotplots Unbiased view of all ungapped 1 alignments of two sequences Mark Voorhies Practical Bioinformatics

Dotplots Unbiased view of all ungapped 1 alignments of two sequences Noise can be filtered by applying a 2 smoothing window to the diagonals. Mark Voorhies Practical Bioinformatics

Types of alignments Global Alignment Each letter of each sequence is aligned to a letter or a gap ( e.g. , Needleman-Wunsch) Local Alignment An optimal pair of subsequences is taken from the two sequences and globally aligned ( e.g. , Smith-Waterman) Mark Voorhies Practical Bioinformatics

Exercise: Scoring an ungapped alignment s = { ”A” : { ”A” : 1.0 , ”T” : − 1.0 , ”G” : − 1.0 , ”C” : − 1.0 } , ”T” : { ”A” : − 1.0 , ”T” : 1.0 , ”G” : − 1.0 , ”C” : − 1.0 } , ”G” : { ”A” : − 1.0 , ”T” : − 1.0 , ”G” : 1.0 , ”C” : − 1.0 } , ”C” : { ”A” : − 1.0 , ”T” : − 1.0 , ”G” : − 1.0 , ”C” : 1.0 }} Mark Voorhies Practical Bioinformatics

Exercise: Scoring an ungapped alignment s = { ”A” : { ”A” : 1.0 , ”T” : − 1.0 , ”G” : − 1.0 , ”C” : − 1.0 } , ”T” : { ”A” : − 1.0 , ”T” : 1.0 , ”G” : − 1.0 , ”C” : − 1.0 } , ”G” : { ”A” : − 1.0 , ”T” : − 1.0 , ”G” : 1.0 , ”C” : − 1.0 } , ”C” : { ”A” : − 1.0 , ”T” : − 1.0 , ”G” : − 1.0 , ”C” : 1.0 }} N � S ( x , y ) = s ( x i , y i ) i Mark Voorhies Practical Bioinformatics

Exercise: Scoring an ungapped alignment s = { ”A” : { ”A” : 1.0 , ”T” : − 1.0 , ”G” : − 1.0 , ”C” : − 1.0 } , ”T” : { ”A” : − 1.0 , ”T” : 1.0 , ”G” : − 1.0 , ”C” : − 1.0 } , ”G” : { ”A” : − 1.0 , ”T” : − 1.0 , ”G” : 1.0 , ”C” : − 1.0 } , ”C” : { ”A” : − 1.0 , ”T” : − 1.0 , ”G” : − 1.0 , ”C” : 1.0 }} N � S ( x , y ) = s ( x i , y i ) i 1 Given two equal length sequences and a scoring matrix, return the alignment score for a full length, ungapped alignment. Mark Voorhies Practical Bioinformatics

Exercise: Scoring an ungapped alignment s = { ”A” : { ”A” : 1.0 , ”T” : − 1.0 , ”G” : − 1.0 , ”C” : − 1.0 } , ”T” : { ”A” : − 1.0 , ”T” : 1.0 , ”G” : − 1.0 , ”C” : − 1.0 } , ”G” : { ”A” : − 1.0 , ”T” : − 1.0 , ”G” : 1.0 , ”C” : − 1.0 } , ”C” : { ”A” : − 1.0 , ”T” : − 1.0 , ”G” : − 1.0 , ”C” : 1.0 }} N � S ( x , y ) = s ( x i , y i ) i 1 Given two equal length sequences and a scoring matrix, return the alignment score for a full length, ungapped alignment. 2 Given two sequences and a scoring matrix, find the offset that yields the best scoring ungapped alignment. Mark Voorhies Practical Bioinformatics

Exercise: Scoring a gapped alignment 1 Given two equal length gapped sequences (where “-” represents a gap) and a scoring matrix, calculate an alignment score with a -1 penalty for each base aligned to a gap. Mark Voorhies Practical Bioinformatics

Exercise: Scoring a gapped alignment 1 Given two equal length gapped sequences (where “-” represents a gap) and a scoring matrix, calculate an alignment score with a -1 penalty for each base aligned to a gap. 2 Write a new scoring function with separate penalties for opening a zero length gap ( e.g. , G = -11) and extending an open gap by one base ( e.g. , E = -1). gaps � S gapped ( x , y ) = S ( x , y ) + ( G + E ∗ len ( i )) i Mark Voorhies Practical Bioinformatics

Practical Bioinformatics Mark Voorhies 5/19/2015 Mark Voorhies - PowerPoint PPT Presentation

Practical Bioinformatics Mark Voorhies 5/19/2015 Mark Voorhies Practical Bioinformatics Review Documentation A program should communicate its intent to both the computer and human programmers. Comments Docstrings Mark Voorhies

Practical Bioinformatics Mark Voorhies 5/15/2015 Mark Voorhies Practical Bioinformatics

Practical Bioinformatics Mark Voorhies 4/16/2018 Mark Voorhies Practical Bioinformatics

Practical Bioinformatics Mark Voorhies 4/9/2018 Mark Voorhies Practical Bioinformatics

Practical Bioinformatics Mark Voorhies 5/12/2015 Mark Voorhies Practical Bioinformatics

Practical Bioinformatics Mark Voorhies 6/3/2013 Mark Voorhies Practical Bioinformatics

Practical Bioinformatics Mark Voorhies 5/ 24/ 2013 Mark Voorhies Practical Bioinformatics

Practical Bioinformatics Mark Voorhies 5/23/2019 Mark Voorhies Practical Bioinformatics

Practical Bioinformatics Mark Voorhies 5/21/2019 Mark Voorhies Practical Bioinformatics Change

Practical Bioinformatics Mark Voorhies 5/11/2015 Mark Voorhies Practical Bioinformatics

Practical Bioinformatics Mark Voorhies 5/29/2019 Mark Voorhies Practical Bioinformatics

Practical Bioinformatics Mark Voorhies 4/20/2011 Mark Voorhies Practical Bioinformatics Review

Practical Bioinformatics Mark Voorhies 5/21/2013 Mark Voorhies Practical Bioinformatics

Practical Bioinformatics Mark Voorhies 5/26/2015 Mark Voorhies Practical Bioinformatics Habits

Data Mining in Bioinformatics Day 7: Clustering in Bioinformatics Karsten Borgwardt February 25

Practical Bioinformatics Mark Voorhies 5/14/2019 Mark Voorhies Practical Bioinformatics Course

Practical Bioinformatics Mark Voorhies 5/2/2017 Mark Voorhies Practical Bioinformatics

An Exploratory Ethnographic Study of Issues and Concerns with Whole Genome Sequencing

INFERENCE OF EVOLUTIONARY HISTORY WITH APPROXIMATE BAYESIAN COMPUTATION Ariella Gladstein

Scoring Alignments Genome 373 Genomic Informatics Elhanan Borenstein A quick review Course

Modelling heterogeneity in nucleotide sequence evolution Simon Whelan Supported by: Isaac

Towards Gapless, Chromosome Scale, Haplotype Assemblies Matt Settles, PhD UC Davis

DNA solution framework draft-jinchoi-dna-soln-frame-00.txt JinHyeock Choi Erik Nordmark

The Massive Parallel Sequencing era: "Global sequencing" Richard Christen CNRS UMR

Automated Design of Digital Automated Design of Digital Automated Design of Digital Automated

Practical Bioinformatics Mark Voorhies 5/19/2015 Mark Voorhies - PowerPoint PPT Presentation

Practical Bioinformatics Mark Voorhies 5/19/2015 Mark Voorhies Practical Bioinformatics Review Documentation A program should communicate its intent to both the computer and human programmers. Comments Docstrings Mark Voorhies

Practical Bioinformatics Mark Voorhies 5/15/2015 Mark Voorhies Practical Bioinformatics

Practical Bioinformatics Mark Voorhies 4/16/2018 Mark Voorhies Practical Bioinformatics

Practical Bioinformatics Mark Voorhies 4/9/2018 Mark Voorhies Practical Bioinformatics

Practical Bioinformatics Mark Voorhies 5/12/2015 Mark Voorhies Practical Bioinformatics

Practical Bioinformatics Mark Voorhies 6/3/2013 Mark Voorhies Practical Bioinformatics

Practical Bioinformatics Mark Voorhies 5/ 24/ 2013 Mark Voorhies Practical Bioinformatics

Practical Bioinformatics Mark Voorhies 5/23/2019 Mark Voorhies Practical Bioinformatics

Practical Bioinformatics Mark Voorhies 5/21/2019 Mark Voorhies Practical Bioinformatics Change

Practical Bioinformatics Mark Voorhies 5/11/2015 Mark Voorhies Practical Bioinformatics

Practical Bioinformatics Mark Voorhies 5/29/2019 Mark Voorhies Practical Bioinformatics

Practical Bioinformatics Mark Voorhies 4/20/2011 Mark Voorhies Practical Bioinformatics Review

Practical Bioinformatics Mark Voorhies 5/21/2013 Mark Voorhies Practical Bioinformatics

Practical Bioinformatics Mark Voorhies 5/26/2015 Mark Voorhies Practical Bioinformatics Habits

Data Mining in Bioinformatics Day 7: Clustering in Bioinformatics Karsten Borgwardt February 25

Practical Bioinformatics Mark Voorhies 5/14/2019 Mark Voorhies Practical Bioinformatics Course

Practical Bioinformatics Mark Voorhies 5/2/2017 Mark Voorhies Practical Bioinformatics

An Exploratory Ethnographic Study of Issues and Concerns with Whole Genome Sequencing

INFERENCE OF EVOLUTIONARY HISTORY WITH APPROXIMATE BAYESIAN COMPUTATION Ariella Gladstein

Scoring Alignments Genome 373 Genomic Informatics Elhanan Borenstein A quick review Course

Modelling heterogeneity in nucleotide sequence evolution Simon Whelan Supported by: Isaac

Towards Gapless, Chromosome Scale, Haplotype Assemblies Matt Settles, PhD UC Davis

DNA solution framework draft-jinchoi-dna-soln-frame-00.txt JinHyeock Choi Erik Nordmark

The Massive Parallel Sequencing era: &quot;Global sequencing&quot; Richard Christen CNRS UMR

Automated Design of Digital Automated Design of Digital Automated Design of Digital Automated

The Massive Parallel Sequencing era: "Global sequencing" Richard Christen CNRS UMR