Identify the open reading frame in the following DNA sequence, the protein that this gene encodes for, its function, and the source. You can consult the bioinformatics exercise "Project 1: Databases for the Storage and 'Mining' of Genome Sequences" (ATTACHED). The procedure to identify the gene and the protein that it encodes is as follows:
Click on the DNA sequence from the start site of transcription and select all of the sequence and copy the sequence.
Go to the National Center for Biotechnology Information website http://www.ncbi.nlm.nih.gov/ and click on BLAST on the right hand side under "Popular Resources". BLAST is a program that will allow you to find the protein sequence for the DNA sequence (gene) you submit. Next click on blastx on the left hand column under the title "Basic Blast".
Paste the DNA sequence into the box and click BLAST!. The search may take a few seconds and the page will keep updating until the search is completed.
When the search is complete you will have a figure showing the most homologous results or "sequences producing significant alignments" and following that, a list of what these proteins are. Your protein will be the first one on the list. You can click on the left hand side on the accession number or sequence identifier information which will bring up more information. You should be able to find the name, function, size (number of amino acids) and source (name of the organism) for the protein.
Your answer should include the:
Amino acid sequence of the protein
Size of the protein
Identity of the protein
Function of the protein
I would follow the instructions listed in the "Databases for the storage and mining of genome sequences" document, following approach "b" as it is simpler.
First of all, copy and paste your DNA sequence into the translate tool on the Expasy server, using the verbose output. Reading the guidelines from the document, you are looking for a long string of amino acids, of which there are at least two from the results given by the Expasy translate tool. However, only one is correct. Which do you think it is? What are the two amino acid elements that *must* be present in an open reading frame? (Hint: Where there is a beginning, there must be an end)
Once you have identified the correct ORF, click back on your browser and change the output to "Compact". Find the correct ORF that you have identified previously, and copy that amino acid sequence, starting from the M or methionine. Go to the NCBI Blast website and perform a "protein blast": paste your copied amino acid sequence into the query box, and then click BLAST. The software will crunch the sequence and align the amino acid sequence with the database selected ...
This solution explains how to identify an open reading frame given a DNA sequence in order to determine the protein sequence and ultimately its identify using online databases such as those of the NCBI and Expasy.