Share
Explore BrainMass

Deciphering the information provided in gene annotations

See the attached file.

Describe the E-value distribution are they small (close to zero) large (close to 1).

Is the subject sequence from the top 2 BLAST hits from a genus and or a phylum that you would expect to be closely related to T. oceani?

Are there any inconsistencies found in the T-coffee results, such as large polypeptide regions that do not match? Where?

Identify 3 highly conserved amino acids using Weblogo, and identify their location, the amino acids, and the chemistry of each.

Do you have stretches of conservation? If so, where?

Attachments

Solution Preview

See the attached 'biotinformatics question.doc' file for the WebLogo images.

Q. Describe E-value distribution. Are they small (close to zero) large (close to1)?

http://www.ncbi.nlm.nih.gov/blast

A. The E-value, or the expect value, is a statistical representation of how similar two sequences are to each other. The E-value takes into account the likelihood of sequences being similar to your query by random chance. In the two examples you provided, 2e-131 and 7e-45 are very small and can be said to be close to zero. This indicates that the two sequences are very similar to the query as the smaller the E-value is, the larger the score or resemblance. It is interesting to note that the first query seems to be identical to the BLAST hit and should give you as low an E-value as you can get for the parameters you used.

For an explanation of the scoring statistics given in a BLAST search:
http://www.ncbi.nlm.nih.gov/BLAST/tutorial/Altschul-1.html

The following .pdf gives a nice plain explanation of E-value including the ranges you can get and how you analyze them.
http://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&ved=0CC8QFjAA&url=http%3A%2F%2Fwww.clcbio.com%2Fsciencearticles%2FBE-blast.pdf&ei=WIjSTv-wNqaM2gXpupl9&usg=AFQjCNFdwmz18FcmGvQABwsmTNe0qr9p9w&sig2=VYlY1Y-7zo9V39BdPSOBvA

Q. Is the subject sequence from the top 2 BLAST hits from a genus and/or a phylum ...

Solution Summary

This solution contains a detailed description of how to understand some of the information provided in gene annotations. The problem includes annotations associated with a gene from T. oceani in the form of an .htm file. Specific queries about E-value distributions, relatedness and conservation using a BLAST, T-coffee, and WebLogo analysis are answered in reference to the annotated gene, as well as a general description of each of these modules in a .doc format. The problem and solution contain links to informational websites about gene annotations and to online sources for finding annotations for your own gene of interest. After working through this problem/solution pair, you should be able to easily make use the links to the online tools provided in the problem to find stretches of conservation in your gene/protein of interest, extract information about conserved amino acid residues, and determine the how closely related other genes are to your query.

$2.19