bioinformatics vocabulary
play

Bioinformatics Vocabulary Processing, analyzing, experimenting with - PowerPoint PPT Presentation

Bioinformatics Vocabulary Processing, analyzing, experimenting with data Where does the data come from? How do we get it? What does it mean? What do we do with it? From nucleotide to protein to gene Identification is


  1. Bioinformatics Vocabulary  Processing, analyzing, experimenting with data  Where does the data come from?  How do we get it?  What does it mean?  What do we do with it?  From nucleotide to protein to gene  Identification is important  Annotation is important 2.1 Genome Revolution: COMPSCI 004G

  2. What does DNA (data) look like?  TGAAC v ACTTG  Which direction is right?  What is a base-pair?  nucleotide?  What is a protein, how coded?  Identification?  What is an amino acid?  Codon? Coding?  Why are proteins important?  Finding? Using?… http://www.blc.arizona.edu/Molecular_Graphics/DNA_Structure/DNA_Tutorial.HTML 2.2 Genome Revolution: COMPSCI 004G

  3. How do we get CGATC into software? http://seqcore.brcf.med.umich.edu/doc/educ/dnapr/sequencing.html 2.3 Genome Revolution: COMPSCI 004G

  4. From Shotgun to Gene  Comparing two approaches  HGP: human genome project  Celera Genomics  Why was there a race? Is the race over?  Who owns the data?  What public good does the data serve?  Should scientists be concerned about public policy?  Was the Manhattan project like the HGP? 2.4 Genome Revolution: COMPSCI 004G

  5. What is a program? What is code?  Instructions in a language a computer executes  Languages have different characteristics, strengths, weaknesses  Scheme, BASIC, C++, Fortran, Java, Perl, PHP, …  Computer executes one instruction at a time  Memory and state of machine change  Execute the next instruction  Repeat  Stop, run out of memory, pull plug, … 2.5 Genome Revolution: COMPSCI 004G

  6. From browser to genome analysis  Netscape, first widely distributed browser  Who wrote it?  What operating systems did it run on?  What does it mean for a program to run?  When you execute a Google query what happens?  Where does code run?  How do you see the results?  Search at NCBI, jimwatsonsequence , …  Where does the code execute? 2.6 Genome Revolution: COMPSCI 004G

  7. Writing a program  Create the program using a computer language  Design, test, document, maintain, …  Test and debug the program  Does the program do what you want?  How do you know what the program does?  How do you fix it?  What skills are needed? 2.7 Genome Revolution: COMPSCI 004G

  8. More on understanding programs  You write code in Java, or Perl, or C++ or php or …  The code must run/execute somewhere  You must understand what it does (how?) • •  In your mind and on paper simulate/understand computer’s execution of your code  What you wrote, not what you meant  How do you make a drawing? 2.8 Genome Revolution: COMPSCI 004G

  9. Creating a Program Specify the problem  remove ambiguities  identify constraints  Develop algorithms, design  classes, design software architecture Implement program  revisit design  test, code, debug  revisit design  Documentation, testing,  maintenance of program From ideas to electrons  2.9 Genome Revolution: COMPSCI 004G

  10. Writing and Understanding Java  Language independent skills in programming  What is a loop, how do you design a program?  What is an array, how do you access files?  However, writing programs in any language requires understanding the syntax and semantics of the programming language  Syntax is similar to rules of spelling and grammar: • i before e except after c • Two spaces after a period, then use a capital letter 2.10 Genome Revolution: COMPSCI 004G

  11. Syntax and Semantics  Semantics is what a program (or English sentence) means  You ain’t nothing but a hound dog.  La chienne de ma tante est sur votre tete.  At first it seems like the syntax is hard to master, but the semantics are much harder  Natural languages are more forgiving than programming languages. 2.11 Genome Revolution: COMPSCI 004G

  12. Toward an Understanding of Java  Traditional first program, doesn’t convey power of computing but it illustrates basic components of a simple program public class SayHello { // traditional first program public static void main(String[] args) { System.out.println("Hello World!"); } }  This program must be edited/typed, compiled and executed 2.12 Genome Revolution: COMPSCI 004G

  13. How Things Work: PrintLots.java public class PrintLots { // … public void once(){ twice(); twice(); } public static void main(String[] args){ PrintLots printer = new PrintLots(); printer.once(); } } 2.13 Genome Revolution: COMPSCI 004G

  14. Java Vocabulary  Variable, object, identifier, method, call  Name of something: object or method  The car starts, the dog barks, I speak  Invoke or call method: method lives in object (or in a class)  An object is an instance of a class  My car is a Volvo 850, yours is a BMW …  My car starts, yours stops: v850.start(); 850.start(); 2.14 Genome Revolution: COMPSCI 004G

  15. Methods/Functions can return values  What does the square root function do?  When called with parameters of 4, 6.2, -1  What does the method getGcount() return? public class DNAstuff { public int getGcount(String dna) { int total = 0; for(int k=0; k < dna.length(); k++){ if (dna.charAt(k) == 'g'){ total = total + 1; } } return total; } } 2.15 Genome Revolution: COMPSCI 004G

  16. Lydia Kavraki Awards   Grace Murray Hopper  Brilliant 10 "I like to work on problems that will generally improve the quality of our life," What's the thing you love most about science? “Working with students and interacting with people from diverse intellectual backgrounds. Discovery and the challenge of solving a tough problem, especially when it can really affect the quality of our lives. I find the whole process energizing.” 2.16 Genome Revolution: COMPSCI 004G

  17. John Kemeny (1926-1982) Invented BASIC, assistant to Einstein, Professor and President of Dartmouth " If you have a large number of unrelated ideas, you have to get quite a distance away from them to get a view of all of them, and this is the role of abstraction." "...it is the greatest achievement of a teacher to enable his students to surpass him." 2.17 Genome Revolution: COMPSCI 004G

  18. Anatomy of for-loop Initialization happens once String s = new  String("AGTCCG"); Loop test evaluated String rs = new String("");  If true body executes  for(int k=0; k < 3; k++){ If false skip after loop  rs = rs + s.charAt(k); } After loop body, increment executed  and test re-evaluated What should be true about test?  What about body?  What about together?  2.18 Genome Revolution: COMPSCI 004G

  19. Program Style  People who use your program don’t read your code  You’ll write programs to match user needs  People who maintain or modify your program do read code  Must be readable, understandable without you next door  Use a consistent programming style, adhere to conventions  Identifiers are names of functions, parameters, (variables, classes, …)  Sequence of letters, numbers, underscore __ characters  Cannot begin with a number (we won’t begin with __)  big_head vs. BigHead , we’ll use AlTeRnAtInG format  Make identifiers meaningful, not droll and witty 2.19 Genome Revolution: COMPSCI 004G

  20. Equality of values and objects int x = 3*12; if (x == 36) {is-executed} String s = new String("genetic"); String t = s.substring(0,4); if (t == "gene") {not executed} if (t.equals("gene")) {is-executed} Primitive types are boxes  Object types are labels on boxes   If we don't call new there's no box for the label  No box is called null , it means no object referred to or referenced by variable/pointer/reference 2.20 Genome Revolution: COMPSCI 004G

  21. Objects and values  Primitive variables are boxes  think memory location with value  Object variables are labels that are put on boxes String s = new String("genome"); String t = new String("genome"); if (s == t) {they label the same box} if (s.equals(t)) {contents of boxes the same} t s What's in the boxes? "genome" is in the boxes 2.21 Genome Revolution: COMPSCI 004G

  22. Objects, values, classes For primitive types: int, char, double, boolean   Variables have names and are themselves boxes (metaphorically)  Two int variables assigned 17 are equal with == For object types: String, Sequence, others   Variables have names and are labels for boxes  If no box assigned, created, then label applied to null  Can assign label to existing box (via another label)  Can create new box using new Object types are references or pointers or labels to storage  2.22 Genome Revolution: COMPSCI 004G

  23. Don Knuth (Art of Programming) “My feeling is that when we prepare a program, it can be like composing poetry or music; as Andrei Ershov has said, programming can give us both intellectual and emotional satisfaction, because it is a real achievement to master complexity and to establish a system of consistent rules.” “We have seen that computer programming is an art, because it applies accumulated knowledge to the world, because it requires skill and ingenuity, and especially because it produces objects of beauty.” 2.23 Genome Revolution: COMPSCI 004G

Recommend


More recommend