COMP364: Biopython Jrme Waldisphl McGill University What - PowerPoint PPT Presentation

COMP364: ¡Biopython ¡ Jérôme ¡Waldispühl ¡ McGill ¡University ¡

What ¡is ¡Biopython? ¡ A ¡package ¡to ¡make ¡your ¡life ¡(for ¡bioinformaJcs ¡applicaJons) ¡easy! ¡ • ¡Parse ¡bioinformaJcs ¡files ¡(FASTA, ¡GenBank, ¡PDB, ¡etc.) ¡and ¡store ¡ them ¡in ¡appropriate ¡data ¡structures. ¡ • ¡Code ¡to ¡deal ¡with ¡popular ¡on-‑line ¡bioinformaJcs ¡desJnaJons ¡ (E.g. ¡Blast ¡& ¡PubMed ¡at ¡NCBI). ¡ • ¡Interfaces ¡to ¡common ¡bioinformaJcs ¡programs ¡(E.g. ¡ClustalW, ¡ EMBOSS). ¡ • ¡Tools ¡for ¡performing ¡common ¡operaJons ¡on ¡sequences. ¡ • ¡Code ¡to ¡perform ¡classificaJon. ¡ • ¡Code ¡for ¡dealing ¡with ¡alignments. ¡ • ¡GUI-‑based ¡programs ¡to ¡do ¡basic ¡sequence ¡manipulaJons, ¡ translaJons, ¡BLASTing, ¡etc. ¡ • ¡And ¡much ¡more! ¡

StarJng ¡with ¡Biopython ¡ Import ¡Module: ¡ >>> import Bio ¡ Create ¡a ¡sequence ¡object: ¡ >>> import Bio.Seq >>> s = Bio.Seq.Seq(“ACGT”) >>> s Seq('ACGT', Alphabet()) >>> print s ACGT Alphabet() defines ¡the ¡alphabet ¡used ¡by ¡your ¡sequences. ¡ ¡

Sequence ¡object ¡ Works ¡like ¡strings: ¡ >>> for index, letter in enumerate(s): ... print index, letter 0 A 1 C 2 G 3 T With ¡addi1onal ¡capabili1es: ¡ >>> s.complement() Seq('TGCA', Alphabet()) >>> s.reverse_complement() Seq('ACGT', Alphabet())

Parsing ¡(FASTA) ¡ FASTA ¡format: ¡ >gi|2765658|emb|Z78533.1|CIZ78533 CGTAACAAGGTTTCCGTAGGTGAACCTGCGGAAGGATCATTGATGAGACCGTGGAAT AAACGATCGAGTGAATCCGGAGGACCGGTGTACTCAGCTCACCGGGGGCATTGCTCCC … ¡ Read ¡and ¡display ¡each ¡entry: ¡ from Bio import SeqIO for seq_record in SeqIO.parse(”input.fasta", "fasta"): print seq_record.id print repr(seq_record.seq) print len(seq_record) gi|2765564|emb|Z78439.1|PBZ78439 Seq('CGTAACAAGGTTTCCGTAGGTGAA...CGC', SingleLetterAlphabet()) 740 ... gi|2765564|emb|Z78439.1|PBZ78439 Seq('CATTGTTGAGATCACATAATAATT...GCC', SingleLetterAlphabet()) 592

Parsing ¡other ¡formats ¡ Biopython ¡supports ¡many ¡formats: ¡clustal, ¡embl, ¡genbank, ¡phd, ¡ phylip, ¡swiss, ¡stockholm… ¡ ¡ To ¡parse ¡them, ¡you ¡just ¡need ¡to ¡change ¡the ¡2 nd ¡argument: ¡ >>> x = SeqIO.parse(”input.gbk", ”genbank") The ¡rest ¡works ¡exactly ¡the ¡same! ¡

Slicing ¡ >>> from Bio.Seq import Seq >>> from Bio.Alphabet import IUPAC >>> my_seq = Seq(" GATCGATGGGCCTATATAGGATCGAAAATCGC ”, IUPAC.unambiguous_dna) Slice ¡with ¡start ¡& ¡stop: ¡ >>> my_seq[4:12] Seq('GATGGGCC', IUPACUnambiguousDNA()) Stride ¡with ¡step ¡size: ¡ >>> my_seq[1::3] Seq('AGGCATGCATC', IUPACUnambiguousDNA())

UJls ¡ GC-‑content: >>> from Bio.Seq import Seq >>> from Bio.Alphabet import IUPAC >>> from Bio.SeqUtils import GC >>> my_seq = Seq(' GATCGATGGGCCTATATAGGATCGAAAATCGC ', IUPAC.unambiguous_dna ) >>> GC(my_seq) 46.875 ¡ ContatenaJon: ¡ >>> from Bio.Alphabet import IUPAC >>> dna_seq1 = Bio.Seq.Seq("ACGT", IUPAC.unambiguous_dna) >>> dna_seq2 = Bio.Seq.Seq("ACCA", IUPAC.unambiguous_dna) >>> dna_seq1 + dna_seq2 Seq('ACGTACCA', IUPACUnambiguousDNA()) WARNING: ¡The ¡alphabets ¡must ¡be ¡compaJble! ¡

TranscripJon ¡ >>> coding_dna Seq('ATGGCCATTGTAATGGGCCGCTGAAAGGGTGCCCGATAG’, IUPACUnambiguousDNA()) >>> messenger_rna = coding_dna.transcribe() >>> messenger_rna Seq('AUGGCCAUUGUAAUGGGCCGCUGAAAGGGUGCCCGAUAG’, IUPACUnambiguousRNA()) Complete ¡transcripJon ¡from ¡template ¡DNA: >>> template_dna Seq('CTATCGGGCACCCTTTCAGCGGCCCATTACAATGGCCAT’, …) >>> template_dna.reverse_complement().transcribe() Seq('AUGGCCAUUGUAAUGGGCCGCUGAAAGGGUGCCCGAUAG’, …) Reverse ¡transcripJon: ¡ >>> messenger_rna Seq('AUGGCCAUUGUAAUGGGCCGCUGAAAGGGUGCCCGAUAG’, .) >>> messenger_rna.back_transcribe() Seq('ATGGCCATTGTAATGGGCCGCTGAAAGGGTGCCCGATAG’, …)

TranslaJon ¡ >>> from Bio.Seq import Seq >>> from Bio.Alphabet import IUPAC >>> mrna = Seq("AUGGCCAUUGUAAUGGGCCGCUGAAAGGGUGCCCGAUAG", …) >>> mrna Seq('AUGGCCAUUGUAAUGGGCCGCUGAAAGGGUGCCCGAUAG’, …) >>> mrna.translate() Seq('MAIVMGR*KGAR*', HasStopCodon(IUPACProtein(), '*')) Works ¡also ¡directly ¡from ¡DNA! ¡

COMP364: Biopython Jrme Waldisphl McGill University What - PowerPoint PPT Presentation

COMP364: Biopython Jrme Waldisphl McGill University What is Biopython? A package to make your life (for bioinformaJcs applicaJons) easy! Parse

Genome 559 Intro to Statistical and Computational Genomics Lecture 17b-18b: Biopython Larry

Genome 559 Intro to Statistical and Computational Genomics Lecture 17b: Biopython Larry Ruzzo

COMP364: PDB & Biopython Jrme Waldisphl, McGill University

COMP364: Biopython part II Jrme Waldisphl, McGill University

COMP364: Manipula0ng GenBank data with Biopython Jrme

COMP364: Manipula0ng Rfam data with Biopython Jrme Waldisphl,

Important modules: Biopython, SQL & COM Information sources python.org tutor list

Introduction to Biopython Iddo Friedberg Associate Professor College of Veterinary Medicine

Genome 559 Intro to Statistical and Computational Genomics Lecture 20b: Biopython Larry Ruzzo

Introduction to Biopython Iddo Friedberg (based on a lecture by Stuart Brown, NYU) Associate

COMP364: Working with Matplotlib Jrme Waldisphl, McGill

COMP364: PROSITE & Regexp Jrme Waldisphl, McGill University

COMP364: Working with Matplotlib (2) Jrme Waldisphl, McGill

COMP364: Regular expression in Python Jrme Waldisphl, McGill

More on Classes, Biopython Genome 559: Introduction to Statistical and Computational Genomics

SRB in the BioEmergences project Dominique de Waleffe dominique.dewaleffe@denali.be Denali SA

Sequencing Technologies: Illumina BIG BIO Juan De la Hoz THANKS BIG BIO What is Life?

ALADIN: A New Approach for Drug Target Interaction Prediction Krisztian Buza a , Ladislav

Biovigilance Component Hemovigilance Module Surveillance Requirements and Data Reporting

CMOS Switched-Capacitor Circuits: Recent Advances in Bio-Medical and RF Applications David J.

Guidelines for Managing Unique Resource Identifiers Prepared by the iDigBio Information Technology

Xiaolan Wang Xin Luna Dong Alexandra Meliou U NIVERSITY OF M ASSACHUSETTS , A MHERST College

The thermodynamics of cellular computation Sourjik and Wingreen (2012) Cur. Opinions in Cell Bio.

quancol . ........ . . ... . ... ... ... ... ... ... Hillston Dagstuhl 15491 1 /

Sambuz

Useful Links

Newsletter

Mail Us

COMP364: Biopython Jrme Waldisphl McGill University What - PowerPoint PPT Presentation

COMP364: Biopython Jrme Waldisphl McGill University What is Biopython? A package to make your life (for bioinformaJcs applicaJons) easy! Parse

Genome 559 Intro to Statistical and Computational Genomics Lecture 17b-18b: Biopython Larry

Genome 559 Intro to Statistical and Computational Genomics Lecture 17b: Biopython Larry Ruzzo

COMP364: PDB &amp; Biopython Jrme Waldisphl, McGill University

COMP364: Biopython part II Jrme Waldisphl, McGill University

COMP364: Manipula0ng GenBank data with Biopython Jrme

COMP364: Manipula0ng Rfam data with Biopython Jrme Waldisphl,

Important modules: Biopython, SQL &amp; COM Information sources python.org tutor list

Introduction to Biopython Iddo Friedberg Associate Professor College of Veterinary Medicine

Genome 559 Intro to Statistical and Computational Genomics Lecture 20b: Biopython Larry Ruzzo

Introduction to Biopython Iddo Friedberg (based on a lecture by Stuart Brown, NYU) Associate

COMP364: Working with Matplotlib Jrme Waldisphl, McGill

COMP364: PROSITE &amp; Regexp Jrme Waldisphl, McGill University

COMP364: Working with Matplotlib (2) Jrme Waldisphl, McGill

COMP364: Regular expression in Python Jrme Waldisphl, McGill

More on Classes, Biopython Genome 559: Introduction to Statistical and Computational Genomics

SRB in the BioEmergences project Dominique de Waleffe dominique.dewaleffe@denali.be Denali SA

Sequencing Technologies: Illumina BIG BIO Juan De la Hoz THANKS BIG BIO What is Life?

ALADIN: A New Approach for Drug Target Interaction Prediction Krisztian Buza a , Ladislav

Biovigilance Component Hemovigilance Module Surveillance Requirements and Data Reporting

CMOS Switched-Capacitor Circuits: Recent Advances in Bio-Medical and RF Applications David J.

Guidelines for Managing Unique Resource Identifiers Prepared by the iDigBio Information Technology

Xiaolan Wang Xin Luna Dong Alexandra Meliou U NIVERSITY OF M ASSACHUSETTS , A MHERST College

The thermodynamics of cellular computation Sourjik and Wingreen (2012) Cur. Opinions in Cell Bio.

quancol . ........ . . ... . ... ... ... ... ... ... Hillston Dagstuhl 15491 1 /

Sambuz

Useful Links

Newsletter

Mail Us

COMP364: PDB & Biopython Jrme Waldisphl, McGill University

Important modules: Biopython, SQL & COM Information sources python.org tutor list

COMP364: PROSITE & Regexp Jrme Waldisphl, McGill University