Analysis of DNA Sequence Data Using Freeware Programs: Sequencher 4.9 Sequence Scanner 1.0 Daniel Williams willida@shelterisland.k12.ny.us
Its possible that you have received your sequence data directly from the Sequencing Lab in Word Pad FastA format. FastA is a standard gene sequence format that Everything that follows the ‘>’ is allows certain gene considered the gene or analysis programs to sequence name. recognize the gene name The Next line is recognized as a and the gene sequence. DNA sequence. You may also wish to receive the trace files for your own analysis.
Cold Spring Harbor If you send your DNA to CSHL to be sequenced it is published on their server.
Cold Spring Harbor Your sequence data is published on their database. Use the Dropdown menu to select You can either different samples highlight the text of the sequence and copy it into Word or Word Pad and reformat into FastA or You can download a Right Click and copy of the Sequence choose “Save As” Trace File for your own to save a copy of analysis. your Trace File
Sequencher 4.9 Demo http://www.genecodes.com/ You can download a unlimited Freeware Demo Version that has all the functionality of the software used in the lab except you can not save or print.
Open Sequencher Open Sequencher 4.9 Demo Software A Dialog Box will open alerting you to the fact that this is a “Demo Version” Press “OK” You will now have an empty project window.
Import Sequence Data File Menu –choose “ Import ” and from list choose “ Folder of Sequences”. Select the folder that contains your DNA data. A Dialog Box will open prompting you import ## files. Select the “ Import All Files in Folder” command button.
First Check Your Data The imported DNA sequences are listed by name. Important Fields: •Name At Brookhaven Mike Blewitt Names all sequences so that they can be easily combined and examined. •Size Indicates the number of bases in your sequence. •Quality IMPORTANT FIELD indicates how reliable your data is. You want to have sequences as close to 90% as possible. •Comments If you have received analyzed sequence files often comments are inserted to highlight irregularities
Data Clean Up… Delete any sequences that have extremely low quality numbers (ie. 5%) or if the sequenced fragment is too small for analysis. Sort Data by clicking on a Column Heading Select Sequences by Left Click (You can Hold the Shift Key for Multiple Sequences) Delete Sequences Right Click to obtain drop down menu. Choose “Remove From Project” Sequencher will ask you if you are sure you wish to delete –Select “Throw Them Away”
What Data Should Look Like. A Chromatogram is a graphical representation of the results of the sequence reaction. You should see evenly-spaced peaks, each with only one color. Peak heights may vary 3-fold, which is normal. "Noise" (baseline) peaks may be present, but with good template and primer they will be quite minimal.
Trim Sequences Usually the sequence reaction does not work well (Poor Quality) on either extreme end of your DNA sequence. Therefore your should “Trim” the ends to exclude them from your analysis.
Trimming and Aligning Raw DNA 1. Select all of the imported sequences. From the Select menu choose “Select All” 2. Select the “Sequence” menu. From the menu list choose “Trim”. 3. A window will open showing all of your sequences with a blue line indicating “good bases” and red lines indicating “poor bases”. 4. Press the “Trim Checked Items” box on the icon bar above if you agree with the trimming. If not you must change the trimming parameters. 5. The window will show the trimmed sequences in blue. 6. After reviewing the sequences you can close the window by pressing the ‘X’ in the upper right corner
Establish a Reference Sequence A reference sequence is like a tie breaker. When you are analyzing a chromatogram sometimes the base appears ambiguous, if you have a well known reference sequence it can help determine what the ambiguous base was supposed to be. Select the sequence you wish to establish as a Reference Sequence. Right Click choose “Reference Sequence” from the drop down menu.
Assemble Sequences for Analysis As long as you use a consistent naming scheme with your sequencing reactions, you should be able to use the Assemble by Name function to assemble your fragments.
Naming An example of a naming scheme uses “Dragonfly-Sample-12S_Ai_01” Sample indicates the organism that the DNA came from. 12S_Ai indicates the forward primer 12sai “01” indicates the well number. There should be a corresponding 12S_Bi_01 for the same DNA with a reverse primer. Therefore we want all the “_01” names to be combined –forward sequence and reverse.
Assemble By Name 1. Select all of your sequences press Select Menu choose “ Select All ” 2. Press the “ ABN ” icon on the tool bar –ABN Assemble By Name. The icon toolbar will change. 3. Press Auto Assemble By Name –a preview dialog box will appear. Scroll through the expected ‘contigs’ to ensure everything paired up correctly. 4. Press the Assemble button. If your sequences are named correctly and they have pairs, they will be assembled into “contigs”
View Contig Assembly Double-click on the Contig[0001] icon to open the Contig Overview window To view Restriction Enzyme Sites Select the View Menu choose Bases Map Overview and select Restriction Map
Contig Overview window The Overview contains three sections. The top section displays a schematic of how the fragments are assembled in this contig. The arrows indicate the direction of the fragment in relation to the assembly.
Contig Overview window The next section provides coverage information.
Contig Overview window Below the coverage bar is the open reading frame map. Three bars marked with green flags and red lines, representing start and stop codons respectively. Press the Bases icon to edit/view the base sequence
Edit Assembled Chromatograms The Contig Editor provides the tools for checking and editing sequences. It is divided into four quadrants. To begin the editing process, move your selection to base one in the consensus.
Edit Chromatograms Press the show Chromatogram Icon. Chromatograms will appear. If you see only one chromatogram, that is because you have not select a region of the Contig that relates to three chromatograms. To examine a different base YOU MUST SELECT A BASE IN THE CONSENSUS SEQUENCE
Errors in Base Calling Mis-spaced peaks: One good way to detect artifacts or errors in a sequencing chromatogram is to scan through it, looking for mis-spaced peaks. At the same time, watch for mis- spaced letters in the text sequence along the top. Sometimes, however, those spaces get mis-interpreted as missing nucleotides and an ‘N’ is inserted.
Heterozygous (double) peaks A single peak position within a trace may have but two peaks of different colors instead of just one. Rule of thumb appears to be if the second peak is 35% or less than the major peak you can call the base for the major peak.
Sequence Scanner You can SAVE and PRINT https://www2.appliedbiosystems.com/support/software_community/free_ab_s oftware.cfm
To Copy Gene Sequence Go to the Trace Menu option Select Copy Basecalls Paste into WordPad Edit to FastA
Have a Blast! Select Nucleotide Blast to examine your sequence vs. Other Organisms
Enter your FastA Sequence Press the Blast Icon at the bottom of the page when you wish to execute your search
Compare Organisms Scroll down page to make a phylogenetic tree
Recommend
More recommend