1/22/09 CSCI1950‐Z Computa3onal Methods for Biology* (*Working Title) Lecture 1 Ben Raphael January 21, 2009 Course Par3culars • Three major topics 1. Phylogeny: ~50% lectures 2. Func3onal Genomics: ~25% lectures 3. Network/Systems Biology: ~25% lectures • Tools – Computer Science: Algorithms and discrete math (e.g. graph theory), Programming – Mathema3cs: Discrete Probability, Linear algebra (vectors and matrices) – Biology: Basics. (What is DNA?) 1
1/22/09 Course Par3culars • Webpage h\p://cs.brown.edu/courses/csci1950‐z/ [readings (including some background material) • Textbook: None • Assignments: mens et manus 1. 4 wri\en assignments: ~40% of grade 2. 3 programming assignments: ~40% of grade 3. Take home final: ~20% of grade Graduate credit • Extra assignment/project – – Talk to me before March 1 Survey • Topic 1: Phylogeny 2
1/22/09 Early Evolu3onary Studies 200 th Anniversary of birth of Charles Darwin From Origin of the Species (1859) Darwin 1960’s • Anatomical features were the dominant criteria used to derive evolu3onary rela3onships between species. • Imprecise, ofen subjec3ve, observa3ons ofen led to inconclusive, contradictory, or incorrect evolu3onary rela3onships between species • Molecular data (DNA and protein sequences) drama3cally improved situa3on. 3
1/22/09 Species Trees Is a panda more closely related to a bear or a raccoon? Bear Raccoon Looks Hiberna3on Pa\ern … ~100 years of arguments Tree derived from DNA sequence data. Steven O’Brien et al. (1985) Human Evolu3onary History From: Molecular Evolu7on a Phylogene7c Approach , R. Page & E. Holmes 4
1/22/09 More Recent Human History Out of Africa Hypothesis: Most ancient ancestor lived in Africa roughly 200,000 years ago 1 2 3 4 5 http://www.becominghuman.org The Origin of Humans: ”Out of Africa” vs Mul3regional Hypothesis Multiregional: Out of Africa: – Humans evolved in • Humans evolved in the last two Africa ~200,000 years million years as a single species. ago Independent appearance of – Humans migrated out modern traits in different areas of Africa, replacing other humanoids • Humans migrated out of Africa around the globe mixing with other humanoids on the way 5
1/22/09 Human Evolu3onary Tree DNA‐based reconstruc3on of the human evolu3onary tree http://www.mun.ca/biology/scarr/Out_of_Africa2.htm Evolu3onary Tree of Humans (mtDNA) • African population is the most diverse (sub-populations had more time to diverge) • Evolutionary tree separates one group of Africans from a group containing all five populations. • Tree rooted on branch between groups of greatest difference. Vigilant, Stoneking, Harpending, Hawkes, and Wilson (1991) 6
1/22/09 Evolu3onary Tree of Humans: (microsatellites) • Neighbor joining tree for 14 human populations genotyped with 30 microsatellite loci. Lineage of Genghis Kahn? In humans, Y‐chromosome passed from father only. Can be used to iden3fy parental lineages. ~8% of males in parts of Asia and 0.5% world‐wide es3mated to be descendants of a resident of Mongolia ~1000 years ago (Zerjal et al. AGHG 2003). 7
1/22/09 Lafaye\e, Louisiana, 1994: • A woman claimed her ex‐ lover (who was a physician) injected her with HIV+ blood • Records show the physician had drawn blood from an HIV+ pa3ent that day • Is there a way to show that blood from that HIV + pa3ent ended up in the woman? HIV Transmission • HIV has a high muta3on rate, which can be used to trace paths of transmission • Two people who were infected from different sources will have very different HIV sequences Alignment of fourteen amino acid sequences from V3 region of HIV‐1 gp120 genes Azizi et al. BMC Immunology 2006 7:25 8
1/22/09 To the Lab! Wet lab • Take mul3ple samples from the pa3ent, the woman, and controls (non‐related HIV+ people) • Obtain DNA sequence from two HIV genes HIV (gp120 and RT). Computer lab • Build phylogene3c tree from the DNA sequences. Phylogene3c Tree Convic3on • Three different tree reconstruc3on techniques used. • In every reconstruc3on, vic3m’s sequences were related to pa3ent’s sequences. • Nes3ng of the vic3m’s sequences within the pa3ent sequence indicated the direc3on of transmission was from pa3ent to vic3m • First 3me phylogene3c analysis was used in a court case as evidence (Metzker, et. al., 2002) 9
1/22/09 Phylogene3c Trees How to build a phylogene7c tree from data? Data 1. Characters/Features Algorithm 2. Pairwise distances Phylogene3c Trees What is a phylogene7c tree? Biology definition: • None (picture) • A “branching diagram…” • Intuition: • Leaves represent existing species • Branch points represent most recent common ancestor. • Length of branches represent evolutionary time. • Root represents the oldest evolutionary ancestor. 10
1/22/09 Phylogene3c Trees What is a phylogene7c tree? Computer science definition tree : A connected acyclic graph G = (V, E). graph : A set V of vertices and a set E of edges, where each edge connects a pair of vertices. Tree Defini3ons tree : A connected acyclic graph G = (V, E). graph : A set V of vertices and a set E of edges , where each edge ( v i , v j ) connects a pair of vertices. A path in G is a sequence ( v 1 , v 2 , …, v n ) of vertices in V such that ( v i , v i+1 ) are edges in E. A graph is connected provided for every pair v i v j of vertices, there is a path between v i and v j . A cycle is a path with the same starting and ending vertices. A graph is acyclic provided it has no cycles. 11
1/22/09 Tree Defini3ons tree : A connected acyclic graph G = (V, E). degree of vertex v is the number of edges incident to v . A phylogenetic tree is a tree with a label for each leaf (vertex of degree one). A binary phylogenetic tree is a phylogenetic tree where every interior (non-leaf) vertex has degree 3; i.e. two children. A rooted (*binary) phylogenetic tree is phylogenetic tree with a single designated vertex r (* of degree 2) Rooted and Unrooted Trees In the unrooted tree the position of the root (“oldest ancestor”) is unknown. Otherwise, they are like rooted trees 12
1/22/09 Evalua3ng Different Phylogenies Value1 Value2 Mouth Smile Frown Eyebrows Normal Pointed Character‐Based Tree Reconstruc3on Which tree is beHer? 13
1/22/09 Character‐Based Tree Reconstruc3on Count changes on tree Character‐Based Tree Reconstruc3on Parsimony : minimize number of changes on edges of tree 14
1/22/09 Character‐Based Tree Reconstruc3on Maximum Likelihood : Given Pr[change], what is tree with maximum probability? Iden3fying Highest Scoring Tree • Naïve, exhaus3ve Algorithm: check all trees. • How many possibili3es? – Restrict to binary trees. 15
1/22/09 Phylogene3c Trees How to efficiently build trees from data? 1 4 Data 1. Characters/Features 3 5 2 2. Pairwise distances 2 3 5 1 4 Phylogene3c Trees How to efficiently build trees from data? 1 4 Methods 1. Characters/Features 3 5 2 • Parsimony: Minimum number of changes • Probabilistic Model 2. Pairwise distances • Clustering (UPGMA, Neighbor joining, …) 5 2 3 1 4 16
1/22/09 Addi3onal Models and Extensions • Comparing trees – Distances between trees. – Sta3s3cal tests: bootstrap, permuta3on tests, etc. • Supertrees and consensus • Gene trees vs. species trees. • Whole‐genome phylogeny. Topic 2: Func3onal Genomics 17
1/22/09 Biology 101 Biology 101 Central Dogma 18
1/22/09 What can we measure? Sequencing (expensive) Hybridiza3on (noisy) Sequencing (expensive) Hybridiza3on (noisy) Mass spectrometry (noisy) Hybridiza3on (very noisy!) DNA Basepairing 19
1/22/09 DNA Microarrays Clustering of Gene Expression Samples Each microarray experiment: expression vector u = ( u 1 , …, u n ) u i = expression value for each gene. Gene expression Group similar vectors. BMC Genomics 2006, 7:279 20
1/22/09 Clustering • Clustering algorithms 1 4 related to distance‐based phylogene3c algorithms. 3 5 2 • Phylogeny gives grouping of related data points. 2 3 5 1 4 Classifica3on Binary classifica@on Given a set of examples ( x i , y i ) , where y i = +‐ 1, from unknown distribu3on D. Design func3on f: R n {‐1,+1} that assigns addi3onal samples x i to one of two classes op7mally . 21
1/22/09 Topics • Methods for Clustering – Hierarchical, Matrix‐based (PCA), Graph based (Clique‐finding) • Methods for Classifica3on – Nearest neighbors, support vector machines • Data Integra3on: Bayesian Networks Topic 3: Network and Systems Biology 22
1/22/09 Biological Interac3on Networks Many types: • Protein‐DNA (regulatory) • Protein‐metabolite (metabolic) • Protein‐protein (signaling) • RNA‐RNA (regulatory) • Gene3c interac3ons (gene knockouts) Regulatory Networks 23
1/22/09 Cis‐regulatory Network Metabolic Networks Nodes = reactants Edges = reac3ons labeled by enzyme (protein) that catalyzes reac3on 24
1/22/09 Protein‐Protein Interac@on (PPI) Network Protein‐Protein Interac3on Network? • Proteins are nodes • Interac3ons are edges • Edges may have weights Yeast PPI network H. Jeong et al. Nature 411, 41 (2001) 25
Recommend
More recommend