a student in a molecular genetics lab is using the SSCP technique to analyze DNA from breast cancer tumours and has identified five different shifts in fragments of the BARD1 gene. (BARD1 [genbank accession number NM_000465] is a tumour suppressor gene, mutations in which are associated with breast cancer.) Partial sequences from those shifted bands are given below; for each, describe what type of mutation has occurred (e.g.. transition, etc.), the molecular consequences of the mutations (i.e. what happens to the protein), and whether they are likely to be benign polymorphisms or deleterious mutations. (N.B. Sequences are 5’ -> 3’ for the coding strand, given in triplets, numbers corresponding to the genbank sequence)
(a) 269-tgt gag cac atc ttc tgt agt aat tgt gta agt gac tgc att gga act gga tga cca gtg tgt tac acc
(b) 944-act aag aga gga atg aag tag tga ctc ctg aga agg tct gca aaa att atc tta cat cta aga aat ctt
(c) 971-cct gag aag gtc tgc aaa aat tat ctt aca tct aag aat tct ttg cca tta gaa aat aat gga aaa cgt
(d) 1994-cga aga aaa gta cgt gaa cag gaa gaa aag tat gaa att cct gaa ggt cca cgc aga agc agg ctc
(e) 2225-aat aca gtc gca tac cat gcg aga ccc gat tct gat cag cgc ttc tgc aca cag tat atc gtc tat gaa
* I have looked into the genbank and tried to compare the mutations, but the genbank sequences skip 60. i dont understand that. They have query 1 then they have query 61 and so on. When i tried to compare the mutations. I realized how all the bases are different. I'm really hvaing trouble with this one. can i get an answer for part a or something then i might know what to do.
You're right: GenBank delivers your sequence information as 1... 61... 121...
<br>This is because you'll notice that there are 60 nucleotides given for each line, so the first line goes from 1 to 60. The second line goes from 61 through 120, etc. The numbers on the right are just there to help you count faster. So, to check your first mutation fragment:
<br>(a) 269-tgt gag cac atc ttc tgt agt aat tgt gta agt gac tgc att gga act gga tga cca gtg tgt tac acc
<br>You would have to go to this part of the GenBank sequence:
<br>241 tctgagagag cctgtgtgtt taggaggatg tgagcacatc ttctgtagta attgtgtaag
<br>301 tgactgcatt ggaactggat gtccagtgtg ttacaccccg gcctggatac aagacttgaa
<br>And then you would have to count in towards where your sequence is. Let me help with that by breaking the above lines up (you can check back to make sure that they are the same nucleotides in the same order, not just to satisfy your own curiosity, but also to make sure I didn't make a cut&pasting mistake)
<br>And let's take it a set further, and just get to the part of the sequence we need, and break it up into triplets (note: I didn't catch where the start codon was, so we'll just have to assume that the way the nucleotides are broken up into triplets is the way they will actually be coded into protein -- shouldn't be a bad assumption, since even if we give ourselves a nonsense protein because we made a mistake in our triplet break-points, if that changes in a mutation, it will change in the real protein ...