practical bioinformatics
play

Practical Bioinformatics Mark Voorhies 5/12/2015 Mark Voorhies - PowerPoint PPT Presentation

Practical Bioinformatics Mark Voorhies 5/12/2015 Mark Voorhies Practical Bioinformatics Gotchas Strings are quoted, names of things are not. mystring = mystring Mark Voorhies Practical Bioinformatics Gotchas Strings are quoted,


  1. Practical Bioinformatics Mark Voorhies 5/12/2015 Mark Voorhies Practical Bioinformatics

  2. Gotchas Strings are quoted, names of things are not. mystring = “mystring” Mark Voorhies Practical Bioinformatics

  3. Gotchas Strings are quoted, names of things are not. mystring = “mystring” Case matters for variable names: mystring � = MyString Mark Voorhies Practical Bioinformatics

  4. Gotchas Strings are quoted, names of things are not. mystring = “mystring” Case matters for variable names: mystring � = MyString Case matters for string comparison: “atg” � = “ATG” Mark Voorhies Practical Bioinformatics

  5. Gotchas Strings are quoted, names of things are not. mystring = “mystring” Case matters for variable names: mystring � = MyString Case matters for string comparison: “atg” � = “ATG” Normalize sequence comparison to uppercase ”ATGCTGTA” . upper () == ”ATgcTgTA” . upper () Mark Voorhies Practical Bioinformatics

  6. Gotchas Strings are quoted, names of things are not. mystring = “mystring” Case matters for variable names: mystring � = MyString Case matters for string comparison: “atg” � = “ATG” Normalize sequence comparison to uppercase ”ATGCTGTA” . upper () == ”ATgcTgTA” . upper () (And treat RNA as cDNA) Mark Voorhies Practical Bioinformatics

  7. Gotchas Statements that precede code blocks (if, def, for, while, ...) end with a colon. def mean( x ) : s = 0.0 for i in x : s += i return s / len ( x ) Mark Voorhies Practical Bioinformatics

  8. Gotchas Statements that precede code blocks (if, def, for, while, ...) end with a colon. def mean( x ) : s = 0.0 for i in x : s += i return s / len ( x ) You can use tab and shift-tab in IPython to indent/unindent blocks of code Mark Voorhies Practical Bioinformatics

  9. Gotchas Statements that precede code blocks (if, def, for, while, ...) end with a colon. def mean( x ) : s = 0.0 for i in x : s += i return s / len ( x ) You can use tab and shift-tab in IPython to indent/unindent blocks of code Loop variables retain their state after the loop is finished (so if you want to reuse the variable, you need to reinitialize it). Mark Voorhies Practical Bioinformatics

  10. Mean def mean( x ) : s = 0.0 i in x : for s += i s / len ( x ) return def mean( x ) : return sum ( x )/ f l o a t ( len ( x )) Mark Voorhies Practical Bioinformatics

  11. Standard Deviation �� N x ) 2 i ( x i − ¯ σ x = N − 1 Mark Voorhies Practical Bioinformatics

  12. Standard Deviation �� N x ) 2 i ( x i − ¯ σ x = N − 1 stdev ( x ) : def m = mean( x ) s = 0.0 for i in x : s += ( i − m) ∗∗ 2 from math import s q r t return s q r t ( s /( len ( x ) − 1)) Mark Voorhies Practical Bioinformatics

  13. Pearson’s Correlation Coefficient � i ( x i − ¯ x )( y i − ¯ y ) r ( x , y ) = �� x ) 2 �� y ) 2 i ( x i − ¯ i ( y i − ¯ Mark Voorhies Practical Bioinformatics

  14. Pearson’s Correlation Coefficient pearson ( x , y ) : def mx = mean( x ) � i ( x i − ¯ x )( y i − ¯ y ) r ( x , y ) = my = mean( y ) �� �� x ) 2 y ) 2 i ( x i − ¯ i ( y i − ¯ sxy = 0.0 ssx = 0.0 ssy = 0.0 for i in range ( len ( x ) ) : dx = x [ i ] − mx dy = y [ i ] − my sxy += dx ∗ dy ssx += dx ∗∗ 2 ssy += dy ∗∗ 2 from math import s q r t return sxy / s q r t ( ssx ∗ ssy ) Mark Voorhies Practical Bioinformatics

  15. Subject, verb that noun! return value = object.function(parameter, ...) “Object, do function to parameter ” file = open(“myfile.txt”) file.read() file.readlines() for line in file: string.split() and string.join() file.write() Mark Voorhies Practical Bioinformatics

  16. Binary files are like genomic DNA hexdump -C computers.png fp = open(“computers.png”) fp.read(50) fp.close() Mark Voorhies Practical Bioinformatics

  17. Text files are like ORFs hexdump -C 3 4 2010.txt Mark Voorhies Practical Bioinformatics

  18. OS X sometimes uses CR newlines hexdump -C macfile.txt tr ’ \ r’ ’ \ n’ < macfile.txt > unixfile.txt Mark Voorhies Practical Bioinformatics

  19. Windows uses CRLF newlines hexdump -C dosfile.txt Mark Voorhies Practical Bioinformatics

  20. supp2data.csv CSV File Mark Voorhies Practical Bioinformatics

  21. open(“supp2data.csv”) File object CSV File Mark Voorhies Practical Bioinformatics

  22. open(“supp2data.csv”).next() single line File object CSV File Mark Voorhies Practical Bioinformatics

  23. open(“supp2data.csv”).read() single line whole file File object CSV File Mark Voorhies Practical Bioinformatics

  24. csv.reader(open(“supp2data.csv”)).next() list reader File object CSV File Mark Voorhies Practical Bioinformatics

  25. csv.reader(urlopen(“http://example.com/csv”)).next() list reader urllib object Web service CSV File Mark Voorhies Practical Bioinformatics

  26. The CDT file format Minimal CLUSTER input Cluster3 CDT output Tab delimited ( \ t) UNIX newlines ( \ n) Missing values → empty cells Mark Voorhies Practical Bioinformatics

  27. Homework 1 Try reading the first few bytes of different files on your computer. Can you distinguish binary files from text files? 2 Create a simple data table in your favorite spreadsheet program and save it in a text format ( e.g. , save as CSV or tab-delimited text from Excel 1 ). Practice reading the data from Python. 3 Write a function to disect supp2data.cdt into three lists of strings (gene names, gene annotations, and experimental conditions) and one matrix (list of lists) of log ratio values (as floats, using None or 0. to represent missing values). 4 If you are familiar with Python classes, write a CDT class based on the parse in the previous exercise. Provide methods for looking up annotations and log ratios by gene name. 1 Note for Mac users: Excel will offer you Macintosh and DOS/Windows text formats. Choose DOS/Windows ; otherwise, Python will think that the entire file is a single line. Mark Voorhies Practical Bioinformatics

Recommend


More recommend