2012_Gene_Annotation_13213005731179

Assignment

perez5773autogen_assignment

Organism

Thermosediminibacter oceani JW/IW-1228P, DSM 16646

IMG gene_oid

2503266088

Instructions for Editing the Lab Notebook
IMG/edu Education

Module Instruction Quick Links


Basic Information Module

DNA Coordinates

go to the IMG Gene Detail Page

http://img.jgi.doe.gov/cgi-bin/edu/main.cgi?section=GeneDetail&page=geneDetail&gene_oid=2503266088

DNA coordinates

2220583..2221134 (-)(552bp)

DNA Sequence

go to the IMG Gene Detail Page (see above)

Nucleotide sequence (FASTA format)

FASTA format

Sequence Length 552bp

>2503266088 RNA polymerase, sigma-24 subunit, ECF subfamily [Thermosediminibacter oceani JW/IW-1228P, DSM 16646 : Tocea_unknown] (-)strand
ATGCTGTCCGATGAACAGCTGGCAAAAAGATGCCTTGCCGGCGATCAGGA
AGCATTAGAAGAACTGTTGGAACGGTACAAGGGCTACGTCTTTGCAATAA
TATTTAATTTCGTGAGCGACAGCGCTGAGGCCGAAGATATAGCCCAGGAG
GTATTCTTACAGGTTTACCGCTCCCTTCCCCGGTGCCGTTTCGATAATTT
CAAGTCCTGGATCGGGAGAATCGCCGTGAACAAGGCCATAGACCACCGGA
GAAAGGCGGAGCATCTTTCGCGAATTGTGCCGGAAGAAAAGGCCGGAGAA
TCGGTACTGCGAACGGAGCAGGCCCCTTCGCCGGAAGAACTGTACCTGGC
GAAGGAAGACTGGCAGAGAGTGCAGCATGTGGCGGGGGAGCTGCCGGAGA
TTTACGGGAGGACTCTGGCAAAATATTATTTCGAAGGCAAGAGCTGCCGG
GAGATCGCAGTGGAGGAAGGAATCAGCGTTAAGACGGTTGAGTCCCGCTT
GAGTCGGGCGAGAGCCCTGTTTAAGAAAAGGTGGGGGAGGGAAAAAGGGT
GA

Protein Sequence

go to the IMG Gene Detail Page (see above)

Amino acid sequence (FASTA format)

FASTA format

Sequence Length 183

>2503266088 RNA polymerase, sigma-24 subunit, ECF subfamily [Tocea_unknown]
 
MLSDEQLAKRCLAGDQEALEELLERYKGYVFAIIFNFVSDSAEAEDIAQE
VFLQVYRSLPRCRFDNFKSWIGRIAVNKAIDHRRKAEHLSRIVPEEKAGE
SVLRTEQAPSPEELYLAKEDWQRVQHVAGELPEIYGRTLAKYYFEGKSCR
EIAVEEGISVKTVESRLSRARALFKKRWGREKG

Isoelectric Point (pI)

go to the IMG Gene Detail Page (see above)

Isoelectric point

7.4369

Sequence-based Similarity Data Module

BLAST

go to http://www.ncbi.nlm.nih.gov/blast

Gene product name (top hit)

RNA polymerase, sigma-24 subunit, ECF subfamily

Organism

[Thermosediminibacter oceani DSM 16646]

Alignment length

183

Score

372 bits (955

E-Value

2e-131

Alignment of the top hit and the query sequence

alignment

Query  1    MLSDEQLAKRCLAGDQEALEELLERYKGYVFAIIFNFVSDSAEAEDIAQEVFLQVYRSLP  60
            MLSDEQLAKRCLAGDQEALEELLERYKGYVFAIIFNFVSDSAEAEDIAQEVFLQVYRSLP
Sbjct  1    MLSDEQLAKRCLAGDQEALEELLERYKGYVFAIIFNFVSDSAEAEDIAQEVFLQVYRSLP  60

Query  61   RCRFDNFKSWIGRIAVNKAIDHRRKAEHLSRIVPEEKAGESVLRTEQAPSPEELYLAKED  120
            RCRFDNFKSWIGRIAVNKAIDHRRKAEHLSRIVPEEKAGESVLRTEQAPSPEELYLAKED
Sbjct  61   RCRFDNFKSWIGRIAVNKAIDHRRKAEHLSRIVPEEKAGESVLRTEQAPSPEELYLAKED  120

Query  121  WQRVQHVAGELPEIYGRTLAKYYFEGKSCREIAVEEGISVKTVESRLSRARALFKKRWGR  180
            WQRVQHVAGELPEIYGRTLAKYYFEGKSCREIAVEEGISVKTVESRLSRARALFKKRWGR
Sbjct  121  WQRVQHVAGELPEIYGRTLAKYYFEGKSCREIAVEEGISVKTVESRLSRARALFKKRWGR  180

Query  181  EKG  183
            EKG
Sbjct  181  EKG  183


Gene product name (second hit)

RNA polymerase, sigma-24 subunit, ECF subfamily

Organism

\[Acetivibrio
cellulolyticus CD2

Alignment length

186

Score

153 bits (386),   

E-value

7e-45

Alignment of the second hit and the query sequence

alignment

Query  1    MLSDEQLAKRCLAGDQEALEELLERYKGYVFAIIFNFVSDSAEAEDIAQEVFLQVYRSLP  60
            M +DE L KR L GD +   ++E+Y+G V+ I FN      EAE+AQE F+QVYRSL
Sbjct  1    MDADELLVKRALKGDSDSFRSIVEKYQGLVYVICFNITGHRQEAENLAQETFIQVYRSLS  60

Query  61   RCRFDNFKSWIGRIAVNKAIDHRRKA--EHLSRIVPEEKAGESVLRTEQAPSPEELYLAK  118
            R     FKSWIG+IA NKAID +RK   E+  +V  E   E  + T     E+L + K
Sbjct  61   RYENKGFKSWIGKIATNKAIDWKRKMKLENEGKLVYLEDISE--ISTDDNSIHEQL-IKK  117

Query  119  EDWQRVQHVAGELPEIYGRTLAKYYFEGKSCREIAVEEGISVKTVESRLSRARALFKKRW  178
            E+ +RV  +  +LPEIY   L K+Y + KS  EI+ E+GIS+KTVESRL RA+   +K+W
Sbjct  118  ENAKRVLELCNKLPEIYSTVLVKFYIQSKSYNEISKEDGISIKTVESRLYRAKQAIRKQW  177

CDD

click on the CDD search results at the top of the BLAST results page

COG number (top hit)

enter in lab report

COG name

enter in lab report

E-value

enter in lab report


Significant COG number (second hit)

enter in lab report

COG name

enter in lab report

E-value

enter in lab report

T-Coffee

go to http://www.ebi.ac.uk/Tools/msa/tcoffee/

Multiple sequence alignment

2503266088      M--------------------------------LSDEQLAKRCLAG---D
637233668       MS-------QSITVSWSTVDARCPEA-SVQVDKLSNHDLILRCQVGLRPD
637459130       ME-------NSLPLPWPLVSA--AKE-PLCLHKMSNQELVVRCQQGFSPD
637720291       MS-------QSITVSWSTVDARCSEA-SVQVDKLSNHDLILRCQLGLRPD
639683629       MV-----------------------------LECTDRELVEGCRRG---E
640012105       MGLLGRKYNNSKPFAEKSNDRRLATGDSMISSQESDQQLVERVQKG---D
640181457       MP-------------------------------LEDQLLVERSKKG---D
640528109       MH-------------------------------PADEILVERSQNG---D
641611980       MN-------RSLSIPIPSQQGVVPKS-GVSPEKLSNYDLILRCQAGSKPE
642603124       MS-------QSITVSWSTVDAKYPEA-SVQVDKLPNHDLILRCQAGLRPD
643473443       MS-------QAIPASWSTAQAKESAA-KVPPEKLSNYDLILRCQEGFHPD
                *                                  :  *      *   :
 
2503266088      QEALEELLERYKGYVFAIIFNFVSDSAEAEDIAQEVFLQVYRSLPRCRF-
637233668       RVAFAELLRRYQTQVDRVLYHLAPDWSDRADLAQEVWIRVYRNINRLQEP
637459130       RAAFAELMRRHQAHIDRLLYHLAPDWQDRSDLAQEVWIRVYRYVKRLKEP
637720291       RVAFAELLRRYQTQVDRVLYHLAPDWSDRADLAQEVWIRVYRNINRLQEP
639683629       REAFRVLFETYKDKIYSIALRFSGDQALAMDIAQDTFLKLYSSIADFRGE
640012105       NTAFDLLVLKYQHKIFGLISRYVRDSDEIQDVAQEAFIKAYRALPKFRGD
640181457       REAFEHLVRLYENKVYTIAYRLMGNHADASDLAQDAFIKIYQALPNFRGD
640528109       LEAFEMLVRRYENKVYTVAYRFLGNHADASDLAQEAFLRLYQALPRFRGE
641611980       RAAFVELLKRYQSHVDRLLYHLAPDWQDRSDLSQEVWIRVYRNLQRLNDP
642603124       RVAFSELLRRYQSQVDRVLYHLAPDWADRADLAQEVWIRVYRNINRLQEP
643473443       RGAFSELLNRYQSHVDRILYHLAPDWQDRADLAQEVWIRVYRNIKRLNEP
                  *:  *.  ::  :  :  .   :     *::*:.::: *  :   .
 
2503266088      DNFKSWIGRIAVNKAIDHRRKAEHLSRIVP-EE----KAG---ESVLRTE
637233668       SKFRGWLSRIATNLFYDELRKRKRVVSPLSLDAPRSVDDGEMD--WEIAG
637459130       EKFRSWAGRIATNLFYDELRRRRRGWPPLSLDAPIQTKDGELD--WELAA
637720291       SKFRGWLSRIATNLFYDELRKRKRVVSPLSLDAPRSVDDGEMD--WEIAG
639683629       SQFSTWVYRLVVNSCLDHKRKSWRMI-PLADEL-L------------AVM
640012105       SAFYTWLYRIAINTAKNHLVSRSRRP-PA-TDVD--VEDAEYYESASSLR
640181457       SSFSTWIYHITVNVCRDELRKRQRRP-TVSLDDNSSDSNNSNT--YEIRS
640528109       SSFMTWLYRITANACRDELRRRQRNS-TVSLDGENGLEHIQNY--SLFSG
641611980       QKFKGWLSRIITNLFYDELRKRKRVRRPLSLDNPFQTQDGEVA--WDVAS
642603124       AKFRGWLSRIATNLFYDELRKRKRVVSPLSLDAPRSLEDGEMD--WEIAG
643473443       VKFRGWLSRIATNLFYDELRKRKRVSHPVSLDAPRRVDDGEIE--WDIVS
                  *  *  ::  *   :.     :       :
 
2503266088      QAPSPE-ELYLAKEDWQRVQHVAGELPEIYGRTLAKYYFEGKSCREIAVE
637233668       DTPGPE-EELTTREFYEQLREAIADLPEVFRTTIVLREIEGMAYEEIAEI
637459130       QSPGPD-ENLVTCEFYEHLRRAIAELPEAFRTTIVLREIEGMAYEEIAET
637720291       DTPGPE-EELTTREFYEQLREAIADLPEVFRTTIVLREIEGMAYEEIAEI
639683629       RAPGDALHAVIRTEMSERVQRAVEKLPEEQRIVVVLRYTEGLAYEQIAEV
640012105       DIENPE-NALYGEELKQVVESAMKELPEDLRTAVTLREFDGLSYEDIADV
640181457       NDPGPE-EMLDRSETQAMIQACLNTLSDDYREIIVMREIQELAYEEIAEI
640528109       YIPSPE-DTVEKKELNELVQMCLNSISGEHRLILVMREIQGMTYDEISAV
641611980       DDPSPD-DDLATQEFYEHLREAIAELPEVFRTTIVLREIEGLAYEEIAEI
642603124       DTPGPE-EELTTREFYEQLREAIADLPEVFRTTIVLREIEGMAYEEIAEI
643473443       DYPSPD-DNLATREFYDRLQVAIADLPEAFRMTIVLREIEGMAYEEIAQL
                   .   .     *    :.     :.      :.    :  :  :*:
 
2503266088      EGISVKTVESRLSRARALFKKRWGRE--------------KG
637233668       TGVSLGTVKSRIARARSRLQTQLQTY-------------LDA
637459130       LGISVGTVKSRIARARRRLQSQLNAY---------------L
637720291       TGVSLGTVKSRIARARSRLQTQLQTY-------------LDA
639683629       IGCSMGTVASRLNRAHRALERRLANLK----------GAAHV
640012105       MECPVGTVRSRIFRAREAIDKRVKQQIFGAESDD-HLTLVKN
640181457       LGCSLGTVKSRLSRARQALKEKISKQMELI---T-PAKRLAK
640528109       MNCSLGTVKSRLSRARRAFREKFNGLKELMELTSRQEKYGGI
641611980       TGVSLGTVKSRIARARAKLQEMLQPY-------------LAD
642603124       TGVSLGTVKSRIARARSRLQAYLQNY-------------LDS
643473443       TGVSLGTVKSRIARARAKLQSVLQQY-------------L-D
                   .: ** **: **:  :
 
 

WebLogo

go to http://weblogo.berkeley.edu/

Sequence logo

logo image

Comments

comments


Cellular Localization Data Module

Gram Stain

go to http://www.ncbi.nlm.nih.gov/pubmed/

Gram stain of the microbe

Gram-negative bacteria

TMHMM

go to http://www.cbs.dtu.dk/services/TMHMM/

Number of predicted TMH's

0

The transmembrane topology graph

http://www.cbs.dtu.dk/cgi-bin/nph-webface?jobid=TMHMM2,4EC30566028482F6&opt=none

comments

SignalP

go to http://www.cbs.dtu.dk/services/SignalP/

Signal peptide probability

SignalP-4.0 euk predictions

Most likely cleavage site (between position # and #)

cleavage site

Signal peptide graph

http://www.cbs.dtu.dk/cgi-bin/webface?jobid=signalp,4EC305C902849BA0&opt=none

PSORT

go to http://www.psort.org/psortb/

Cytoplasmic score

9.97

CytoplasmicMembrane score

.01

Cellwall score

0.0

Periplasmic score

.01

OuterMembrane score

0.0

Extracellular score

0.0

PSORT final prediction

9.97

Phobius

go to http://phobius.sbc.su.se/

Phobius probability graph

http://phobius.sbc.su.se/cgi-bin/predict.pl

Hypothesis

Where do you expect to find this protein?

hypothesis


Alternative Open Reading Frame Module

go to the IMG Gene Detail Page

http://img.jgi.doe.gov/cgi-bin/edu/main.cgi?section=GeneDetail&page=geneDetail&gene_oid=2503266088

Proposed DNA coordinates

enter in lab report

Explanation of choice

explanation


Structure-based Evidence Module

TIGRfam

go to http://tigrblast.tigr.org/web-hmm/

TIGRfam number

enter in lab report

TIGRfam name

enter in lab report

Score

score

E-value

enter in lab report

Pfam

go to http://pfam.sanger.ac.uk/search

Pfam number (PF#####) for top hit

PF04542

Pfam name

Sigma-70 region 2

Clan name

Clan HTH

Clan number (CL####)

CL0123

Score

55.6

E-value

2.4e-15

Pairwise alignment

alignment

 #HMM       lverylplvrrlarrllgsgadaeDlvQeaflrllraverfdpergvkfsawlltiirnaildalrrr
#MATCH     l+ery+  v++ ++++++a+aeD+Qe+fl r+    +fw i n++d r++
#PP        789***********************************999..66.8*****************96
#SEQ       LLERYKGYVFAIIFNFVSDSAEAEDIAQEVFLQVYRSLPRCR--FD-NFKSWIGRIAVNKAIDHRRKA

HMM logo

http://pfam.sanger.ac.uk/family/PF04542.8#tabview=tab4

Key functional/structural residues (e.g I2, W7, F13)

http://pfam.sanger.ac.uk/family/alignment/download/html?acc=PF04542&alnType=seed&viewer=html

Pfam number (PF#####) for second hit

PF08281

Pfam name

Sigma-70, region 4

Clan name

Clan HTH

Clan number (CL####)

CL0123

Score

42.4

E-value

2.7e-11

Pairwise alignment

alignment




#HMM       qalrealaeLperqreifllryleglsykEIAellgisegtVksrlsRArk
#MATCH     q++   eLpe{}    +y+eg+s EIA  gis++tV+srlsRAr+
#PP        66778889*****************************************96
#SEQ       QRVQHVAGELPEIYGRTLAKYYFEGKSCREIAVEEGISVKTVESRLSRARA
HMM logo

http://pfam.sanger.ac.uk/family/PF08281.6#tabview=tab4

Key functional/structural residues (e.g I2, W7, F13)

http://pfam.sanger.ac.uk/family/alignment/download/html?acc=PF08281&alnType=seed&viewer=html  

PDB

go to http://www.rcsb.org/pdb/home/home.do

PDB code

enter in lab report

PDB name

enter in lab report

Alignment length

enter in lab report

E-value

enter in lab report

Pairwise alignment

alignment

Enzymatic Function Module

KEGG

go to http://www.genome.jp/kegg/pathway.html

KEGG pathway ID

enter in lab report

Pathway map

map image

MetaCyc

go to http://metacyc.org/

Pathway map

map image

E.C. Number

http://www.expasy.ch/enzyme/enzyme-search-ec.html

EC Number

enter in lab report

EC Name

enter in lab report

Duplication and Degradation Module

Paralog

go to the IMG Gene Detail Page

http://img.jgi.doe.gov/cgi-bin/edu/main.cgi?section=GeneDetail&page=geneDetail&gene_oid=2503266088

Gene product name

name

Percent identity

identity

Alignment length

length

E-value

e-value

Pairwise alignment

alignment

Pseudogene

http://expasy.org/tools/scanprosite
http://pfam.sanger.ac.uk

Is this a pseudogene?

enter in lab report

Horizontal Gene Transfer Module

Phylogenetic Tree

http://www.phylogeny.fr/

Phylogenetic tree

tree image

Interpretation of phylogenetic tree

interpretation

Gene Context

go to the IMG Gene Detail Page

http://img.jgi.doe.gov/cgi-bin/edu/main.cgi?section=GeneDetail&page=geneDetail&gene_oid=2503266088

Ortholog Neighborhood Region of organism and examples of similarities or differences

gene neigborhood images

comment on the ortholog neighborhood regions

text

Chromosome Viewer GC Heat Map

go to the IMG Gene Detail Page (see above)

Characteristic GC% of the genome

enter in lab report

Average GC% of the gene

enter in lab report

RNA Module

Rfam

go to http://rfam.sanger.ac.uk

Rfam number (RF#####)

enter in lab report

Rfam name

enter in lab report

Score

score

E-value

e-value

Pairwise Alignment

alignment

Proposed Annotation

Proposed annotation.

enter in lab report

Miscellaneous Information

Additional information requested by your instructor

information

External Quick Links

IMG/edu

NCBI Blast
NCBI CDD
T-Coffee
WebLogo
TMHMM
SignalP
PSORT
Phobius
TIGRfam
Pfam
InterProScan
PDB
KEGG
MetaCyc
Rfam
UniProt