SSA Signal Search Analysis I I SSA – other applications - translation � I n contrast to translation initiation in bacteria, translation initiation in eukaryotes is not guided by a Shine-Dalgarno like motif. � I n eukaryotes translation initiation is guided by the so-called Kozak-consensus sequence: � This sequence on a mRNA molecule is recognized by the ribosome as the translational start site, from which point a protein is coded by that mRNA molecule. � The ribosome requires this sequence, or a possible variation to initiate translation.
SSA – other applications - translation � The Kozak sequence is not to be confused with the ribosomal binding site (RBS). � I n vivo , this site is often not matched exactly on different mRNAs and the amount of protein synthesized from a given mRNA is dependent on the strength of the Kozak sequence. � Some nucleotides in this sequence are more important than others: the AUG is essential since it is the actual initiation codon encoding a methionine amino acid at the N-terminus of the protein. SSA – other applications - polyA � Polyadenylation occurs after transcription of DNA into RNA in the nucleus. � After the polyadenylation signal has been transcribed, the mRNA chain is cleaved through the action of an endonuclease complex associated with RNA polymerase. � The cleavage site is characterized by the presence of the base sequence AAUAAA near the cleavage site. � After the mRNA has been cleaved, 50 to 250 adenine residues are added to the free 3' end at the cleavage site. This reaction is catalyzed by polyadenylate polymerase.
SSA – other applications - splicing � An intron usually contains a clear signal for splicing (e.g., the beta globin gene). � I n some cases (e.g., the sex lethal gene of fruit fly), a splicing signal may be masked by a regulatory protein, resulting in alternative splicing. � I n rare cases (e.g., HI V genes), a pre-mRNA may contain several ambiguous splicing signals, resulting in a few alternatively spliced mRNAs. SSA – other applications - splicing Splicing signal � Most introns start from the sequence GU and end with the sequence AG (in the 5' to 3' direction). � They are referred to as the splice donor and splice acceptor site, respectively. � I n over 60% of cases, the exon sequence is (A/ C)AG at the donor site, and G at the acceptor site.
SSA – other applications - splicing � The sequences at the two sites of the intron are not sufficient to signal the presence of an intron. Another important sequence is called the branch site located 20 - 50 bases upstream of the acceptor site. The consensus sequence of the branch site is "CU(A/ G)A(C/ U)", where A is conserved in all genes. SSA – parameters Search Parameters Window size and shift The fixed length (as determined by the 5' and 3' borders) sequences extracted from the data library are organised into a matrix of oligonucleotides. This matrix is divided into cross-sections or windows. Two parameters define this sub-division process: � Window length - the number of bases into which the matrix is to be divided. � Window displacement length - the number of bases by which the window is displaced rightwards with respect to the preceeding one.
SSA – parameters Search mode This parameter presents the strand of DNA to be searched for. I f a signal is specified bidirectional, in which case the complementary strands of the Window segments are also considered. Signal description The signals can be described in two different formats � Consensus sequence - which indicate the predominant nucleotide at each position of the signal sequence. � Weight matrix - a two dimensional table containing the frequency of each nucleotide at each position of the motif. For example see the default matrix for TATA box printed on the OPROF web site SSA – parameters - SList Selection criteria: Five different Signal selection Criteria has to be provided by the user � Occurrence frequency: This Criteria allows users to look for both over and under represented signal sequences in the data set. � Selection mode: This parameter determines which subsequences will be considered. The parameters are: � If "global maxima/ minima" is specified unique global best fits are selected. � If "local maxima.minima¨ is specified each canditate motif that is not overlapped by an equal or better match is selected � If "all" is specified all occurrences are processed � St-dev cut-off: A Standard deviation for the signal frequencies of all the signals either over represented or under represented in the DNA sequence matrix is calculated and this parameter sets the cut-off for the standard deviation for the signals to be selected � Sort list by: This criteria allows the user to sort the list of signals that over or under represented by either the position or by the standard deviation of the signals.
SSA – parameters - SList Signal collection A Signal sequence collection is required to look at the over or under represented signals in the data set. The program offers three different possibility of choosing a signal sequence collection � A complete signal sequence collection contains all possible sequence of a "signal sequence length” � A random signal sequence collection contains random number of signal sequences of specified number and length. � I f special option is specified then gapped oligonucleotides are used as signal sequences. Gapped oligonucleotides are signal sequences in which distinct positions are unspecified. These positions are represented by an additional character (hyphen or N) which plays the role of a wild card. # Min. of matches: This criteria could be used to search for imperfect occurences of signal sequences. SSA – parameters - Cpr Parameters: range: -199/ 100; wsize: 10; wshift: 2;# of bases: 4 complete; 3 mismatches
SSA – parameters - Cpr Parameters: range: -199/ 100; wsize: 15; wshift: 2;# of bases: 4 complete; 3 mismatches SSA – parameters - Cpr Parameters: range: -199/ 100; wsize: 20; wshift: 2;# of bases: 4 complete; 3 mismatches
SSA – parameters - Cpr Parameters: range: -199/ 100; wsize: 10; wshift: 2;# of bases: 6 complete; 3 mismatches SSA – parameters - Cpr Parameters: range: -199/ 100; wsize: 10; wshift: 2;# of bases: 6 random; 3 mismatches
SSA – parameters - Cpr Parameters: range: -199/ 100; wsize: 10; wshift: 2;# of bases: 7 random; 3 mismatches SSA – parameters - Cpr Parameters: range: -199/ 100; wsize: 10; wshift: 2;# of bases: 6 random; 2 mismatches
SSA – parameters - OProf Parameters: range: -199/ 100; wsize: 30; wshift: 5; 1 mismatch SSA – parameters - OProf Parameters: range: -199/ 100; wsize: 30; wshift: 5; 0 mismatch
SSA – parameters - OProf Parameters: range: -199/ 100; wsize: 30; wshift: 5; 2 mismatch SSA – parameters - OProf Parameters: range: -199/ 100; wsize: 30; wshift: 3; 1 mismatch
SSA – parameters - OProf Parameters: range: -199/ 100; wsize: 20; wshift: 5; 1 mismatch SSA – parameters - OProf Parameters: range: -199/ 100; wsize: 40; wshift: 5; 1 mismatch
SSA – parameters - OProf Parameters: range: -199/ 100; wsize: 50; wshift: 5; 1 mismatch SSA – parameters - OProf Parameters: range: -199/ 100; wsize: 60; wshift: 5; 1 mismatch
Recommend
More recommend