Compression of Genetic Coding Sequences MohammadReza Ghodsi - PowerPoint PPT Presentation

May 07, 2023 •339 likes •421 views

Compression of Genetic Coding Sequences MohammadReza Ghodsi Genetic Code (Recap) The code defines a mapping between tri-nucleotide sequences called codons and amino acids. They begins with start codon (ATG), ends with a stop codon

Compression of Genetic Coding Sequences MohammadReza Ghodsi
Genetic Code (Recap) ● The code defines a mapping between tri-nucleotide sequences called codons and amino acids. ● They begins with start codon (ATG), ends with a stop codon (TAG/TGA/TAA)
Genetic Code (Recap) - 2 ● The genetic code has redundancy but no ambiguity. (There are 4^3=64 codons and only 20 amino acids) ● A position of a codon is said to be a degenerate site if different nucleotides at this position specify the same amino acid.
Lossless Data Compression ● Completely random data streams cannot be compressed. ● Many different algorithms exist that are designed either with a specific type of input data in mind or with specific assumptions about what kinds of redundancy the uncompressed data are likely to contain. ● I am planning to use Ziv-Lempel + Huffman
Lempel-Ziv LZ77 (Recap) ● Idea: Find the longest prefix of S[i..n] that is a substring of S[1..i-1] ● Compression can be done using suffix trees. In linear time and space. This is the method that I intend to use ● Most practical implementations use a sliding window algorithm to achive a more space efficient algorithm (online algorithms).
Huffman coding ● A variable-length code table for encoding a source symbol (Codons in our case) where the variable- length code table has been derived based on the frequency of each possible value of the source symbol. Huffman tree generated from the exact frequencies in the sentence "this is an example of a huffman tree"
Hypothetical Application ● Folding@Home is a distributed computing project designed to perform computationally intensive simulations of protein folding. ● The client periodically connects to a server to retrieve "work units," which are packets of data upon which to perform calculations. Each completed work unit is then sent back to the server.

Recommend

Coding and Applications in Sensor Networks Why coding? Information compression

Coding and Applications in Sensor Networks Why coding? Information compression Robustness to errors (error correction codes) Two categories: Source coding Channel coding Source coding Compression. What is the

917 views • 72 slides

Formal Modeling in Cognitive Science 1 Coding Theorems Lecture 28: Kraft Inequality; Source Coding

Coding Theorems Coding Theorems Huffman Coding Huffman Coding Formal Modeling in Cognitive Science 1 Coding Theorems Lecture 28: Kraft Inequality; Source Coding Theorem; Kraft Inequality Huffman Coding Shannon Information Source Coding

341 views • 5 slides

Lossless compression in lossy compression systems Almost every lossy compression system

Lossless compression in lossy compression systems Almost every lossy compression system contains a lossless compression system Lossy compression system Dequantizer Transform Lossless Lossless Inverse Quantizer Encoder Decoder

771 views • 29 slides

14.9.2 JPEG2000 compression DCT compression basis for JPEG wavelet compression

14.9 JPEG and MPEG image compression 31 14.9.2 JPEG2000 compression DCT compression basis for JPEG wavelet compression basis for JPEG2000 JPEG2000 new international standard for still image compression

492 views • 12 slides

1 2 Genetic Program Genetic Program Parameter 3 Genetic Program Genetic Program 4 Softcoding

1 2 Genetic Program Genetic Program Parameter 3 Genetic Program Genetic Program 4 Softcoding Platform Softcoding Platform Faster More affordable More affordable More predictable 5 Modulating Softcoding Platform g Genetic

1.13k views • 68 slides

Speech & Audio Coding TSBK01 Image Coding and Data Compression Lecture 11, 2003 Jrgen

Speech & Audio Coding TSBK01 Image Coding and Data Compression Lecture 11, 2003 Jrgen Ahlberg Outline Part I - Speech Speech History of speech synthesis & coding Speech coding methods Part II Audio

598 views • 32 slides

VIDEO SIGNALS Lossless coding g LOSSLESS CODING LOSSLESS CODING The goal of lossless image

VIDEO SIGNALS Lossless coding g LOSSLESS CODING LOSSLESS CODING The goal of lossless image compression is to The goal of lossless image compression is to represent an image signal with the smallest possible number of bits without loss

783 views • 60 slides

JPEG Compression Ian Snyder December 11, 2009 Ian Snyder JPEG Compression Outline

Outline Introduction Images and Compression Walkthrough of JPEG Compression Steps Complete Compression Process Results and Conclusion JPEG Compression Ian Snyder December 11, 2009 Ian Snyder JPEG Compression Outline Introduction Images

594 views • 45 slides

Lecture 9: Compression 1 / 52 Compression Recap Bu ff er Management Recap 2 / 52 Compression

Compression Lecture 9: Compression 1 / 52 Compression Recap Bu ff er Management Recap 2 / 52 Compression Recap Bu ff er Management Thread Safety A piece of code is thread-safe if it functions correctly during simultaneous execution

784 views • 52 slides

Coding and Applications in Sensor Networks Coding and Applications in Sensor Networks Why coding?

Coding and Applications in Sensor Networks Coding and Applications in Sensor Networks Why coding? Why coding? Information compression Robustness to errors (error correction codes) Two categories: Two categories: Source

505 views • 23 slides

From Sorting to Heaps to Compression Data Compression video on demand/set top box jpeg

From Sorting to Heaps to Compression Data Compression video on demand/set top box jpeg in browsers gzip, pkzip, compress, zip, ... for files (stacker?) Lossy compression, Lossless compression Huffman coding possible to

275 views • 6 slides

Image and Video Coding: Video Coding Extensions Screen Content Coding Screen Content Coding

Image and Video Coding: Video Coding Extensions Screen Content Coding Screen Content Coding sensor-captured video content screen content video Screen Content Video Increasingly becoming important for a number of applications (e.g., online

573 views • 34 slides

ADVANCED MULTIMEDIA ADVANCED MULTIMEDIA CODING CODING Fernando Pereira Instituto Superior

ADVANCED MULTIMEDIA ADVANCED MULTIMEDIA CODING CODING Fernando Pereira Instituto Superior Tcnico Audiovisual Communications, Fernando Pereira, 2011 Video Coding in MPEG Video Coding in MPEG-4 Video Coding in MPEG Video Coding in MPEG-4

1.23k views • 104 slides

Dynamical systems Expanding maps on the circle. Coding Jana Rodriguez Hertz ICTP 2018 coding

coding the shift transformation Dynamical systems Expanding maps on the circle. Coding Jana Rodriguez Hertz ICTP 2018 coding the shift transformation coding Index coding 1 coding the space + 2 the shift transformation 2

1.46k views • 107 slides

20-03-06 7. Learning Sequences/Behaviors How to use sequences/behaviors? Sequences and more

20-03-06 7. Learning Sequences/Behaviors How to use sequences/behaviors? Sequences and more generally behaviors are about Sequences are used integrating the concept of time into what is learned. In } to analyze time dependent data general,

298 views • 6 slides

Algorithms in the Real World Data Compression 4 Page 1 Compression Outline Introduction : Lossy

Algorithms in the Real World Data Compression 4 Page 1 Compression Outline Introduction : Lossy vs. Lossless, Benchmarks, Information Theory : Entropy, etc. Probability Coding : Huffman + Arithmetic Coding Applications of Probability Coding :

832 views • 22 slides

Organosilicates as Organosilicates as Potential Biosignatures Potential Biosignatures Patrick

Organosilicates as Organosilicates as Potential Biosignatures Potential Biosignatures Patrick J. Liesch and Vera M. Kolb Department of Chemistry University of Wisconsin-Parkside Kenosha, WI 53141-2000 General Objectives General Objectives

568 views • 25 slides

ABRF 2005 ESRG Study Modified Amino Acids in Edman Sequencing Members of the Committee Nancy D.

ABRF 2005 ESRG Study Modified Amino Acids in Edman Sequencing Members of the Committee Nancy D. Denslow (Chair) - Univ. of Florida Daniel C. Brune - Arizona State Univ. Ryuji Kobayashi - Univ. of Texas, M.D. Anderson Cancer Center William S.

315 views • 28 slides

Rapid, compound-specific 13 C and 15 N analysis of amino acids: A chloroformate-based method

Rapid, compound-specific 13 C and 15 N analysis of amino acids: A chloroformate-based method for biological studies Robert G. Walsh, Shaoneng He, Christopher T. Yarnes CSIA of Amino Acids: Why bother? Conventional bulk analysis obscures

402 views • 23 slides

Amino Acids Amino acids are building blocks for proteins They have a central -carbon

Amino Acids Amino acids are building blocks for proteins They have a central -carbon and -amino and - carboxyl groups 20 different amino acids Same core structure, but different side group (R) The -C is chiral

200 views • 17 slides

IR Spectroscopy as Process Analytical Tool For post combustion capture processes Dr. L. Geers

IR Spectroscopy as Process Analytical Tool For post combustion capture processes Dr. L. Geers Introduction Post-combustion capture (PCC) The extraction of CO 2 from (power plant) flue gases Total cost of post combustion capture is around

678 views • 18 slides

Corporate Presentation Agenda Business Updates Operation Overview Financial

1 November 2008 Corporate Presentation Agenda Business Updates Operation Overview Financial Overview 2 Recent Highlights NO foreign currency hedging arrangements / derivative Simple debt structure: Over 80% PRC

620 views • 28 slides

Assessing the ability of fishery by-products to contribute to the quality marine ingredient

Assessing the ability of fishery by-products to contribute to the quality marine ingredient supply in the UK Jean PEIGNON August 2016 Context UK is the biggest fish processor in Europe Aquaculture annual Stagnation of growth +7,5% wild

670 views • 28 slides

Finding and Quantifying Protein Monomeric Structural Pseudo-Symmetry It is well known that

Finding and Quantifying Protein Monomeric Structural Pseudo-Symmetry It is well known that protein complexes are often symmetric, made from multiple copies of non-symmetric monomers arranged symmetrically. It is also the case that many single

468 views • 33 slides