Blast summary Blast summary ß Basic ideas: ß Basic ideas: ß Alignment (global/local/affine gaps) Alignment (global/local/affine gaps) ß ß scoring matrices, (DNA/AA(PAM, Blosum62)), scoring matrices, (DNA/AA(PAM, Blosum62)), ß position specific (Later in the course) position specific (Later in the course) ß p-value p-value ß ß Seed selection, algorithms for keyword search Seed selection, algorithms for keyword search ß ß Flavors: ß Flavors: blastn blastn, , blastx blastx, , tblastn tblastn… … ß Other variants: ß Other variants: psi psi-blast.. (later in the course) -blast.. (later in the course)
Assignment 2 schematic Assignment 2 schematic query: genomic sequence exons 3’ UTR Subject: aa seq Predicted cDNA Why does it not match the subject perfectly?
Blast summary Blast summary ß Basic ideas: ß Basic ideas: ß Alignment (global/local/affine gaps) Alignment (global/local/affine gaps) ß ß scoring matrices, (DNA/AA(PAM, Blosum62)), scoring matrices, (DNA/AA(PAM, Blosum62)), ß position specific (Later in the course) position specific (Later in the course) ß p-value p-value ß ß Seed selection, algorithms for keyword search Seed selection, algorithms for keyword search ß ß Flavors: ß Flavors: blastn blastn, , blastx blastx, , tblastn tblastn… … ß Other variants: ß Other variants: psi psi-blast.. (later in the course) -blast.. (later in the course)
Proteins Proteins
CS view of a protein CS view of a protein • >sp|P00974|BPT1_BOVIN Pancreatic >sp|P00974|BPT1_BOVIN Pancreatic • trypsin inhibitor precursor (Basic protease inhibitor precursor (Basic protease trypsin inhibitor) (BPI) (BPTI) (Aprotinin Aprotinin) - ) - Bos Bos inhibitor) (BPI) (BPTI) ( taurus (Bovine). (Bovine). taurus • MKMSRLCLSVALLVLLGTLAASTPGCDT MKMSRLCLSVALLVLLGTLAASTPGCDT • SNQAKAQRPDFCLEPPYTGPCKARIIRY SNQAKAQRPDFCLEPPYTGPCKARIIRY FYNAKAGLCQTFVYGGCRAKRNNFKSA FYNAKAGLCQTFVYGGCRAKRNNFKSA EDCMRTCGGAIGPWENL EDCMRTCGGAIGPWENL
Protein structure basics Protein structure basics
Bond angles form structural Bond angles form structural constraints constraints
Alpha-helix Alpha-helix ß 3.6 residues per ß 3.6 residues per turn turn ß H-bonds between ß H-bonds between 1st and 4th residue 1st and 4th residue stabilize the stabilize the structure. structure. ß First discovered by ß First discovered by Linus Pauling Linus Pauling
Beta-sheet Beta-sheet ß ß Each strand by itself has 2 residues per turn, and is not stable. Each strand by itself has 2 residues per turn, and is not stable. ß ß Adjacent strands hydrogen-bond to form stable beta-sheets, parallel or anti-parallel. Adjacent strands hydrogen-bond to form stable beta-sheets, parallel or anti-parallel. ß ß Beta sheets have long range interactions that stabilize the structure, while alpha- Beta sheets have long range interactions that stabilize the structure, while alpha- helices have local interactions. helices have local interactions.
Domains Domains ß The basic structures (helix, strand, loop) ß The basic structures (helix, strand, loop) combine to form complex 3D structures. combine to form complex 3D structures. ß Certain combinations are popular. Many ß Certain combinations are popular. Many sequences, but only a few folds sequences, but only a few folds
3D structure 3D structure • Predicting tertiary structure is an important problem in Bioinformatics. • Premise: Clues to structure can be found in the sequence. • While de novo tertiary structure prediction is hard, there are many intermediate, and tractable goals.
Protein Domains Protein Domains ß An important realization (in the last ß An important realization (in the last decade) is that proteins have a modular decade) is that proteins have a modular architecture of domains/folds. architecture of domains/folds. ß Example: The zinc finger domain is a ß Example: The zinc finger domain is a DNA-binding domain. DNA-binding domain.
Zinc Finger domain Zinc Finger domain
Proteins containing zf zf Proteins containing domains domains How can we find a motif corresponding to a zf domain
The sequence analysis perspective The sequence analysis perspective ß Zinc Finger motif ß #-X-C-X(1-5)-C-X3-#-X5-#-X2-H-X(3-6)-[H/C] ß 2 conserved C, and 2 conserved H ß How can we search a database using these motifs? ß The ‘regular expression’ motif is weak. How can we make it stronger
Recommend
More recommend