Explore BrainMass
Share

Explore BrainMass

    Molecular Biology: Sequence Analysis

    This content was COPIED from BrainMass.com - View the original, and get the already-completed solution here!

    See attached. please show all work in detail

    --------------------------------------------------------

    (a)

    Given two sequences x and y as shown below
    Determine the minimum number of edit operations (substitution, and indels) required to transform one into another
    ________________________________________________________________________

    (b)

    Determine the Hamming distance between the strings: CENTURY and SANCTUARY
    Determine the Levenshtein distance between the strings: BIO-INFORMATICS and TRI-TELEMATICS
    _____________________________________________________________________

    Binary representation of a DNA sequence: Concept

    Sometimes DNA sequence analysis can be done by converting the sequence into binary format. Foe example, suppose the following dibit representation is pursued: A ==> (11); C ==> (01); G ==> (10); T ==> (00). The a sequence ACCTGCA, for example can be written as: 11 01 01 00 10 01 11

    (c)

    By constructing binary format of the pair of sequences x and y given below determine the Hamming distance between them.
    (Hint: For binary strings a and b, the Hamming distance is equal to the number of ones resulting in a XOR b operation).
    _____________________________________________________________________

    (d)

    Given a template sequence: CCCAAGGGGTTCCAATG. Identify the underlying mutations and derivatives namely, point-mutations, deletions, inversions, transportations, duplications, insertions in the following set of strings that resemble the template:

    CCCAAGGGGTTTCAATG
    CCCAAGGGGTTTCxxxx
    CCGGAACGGTTTC
    TTTCCCGGAACGG
    TTTCCCGGGGAAGG
    TTTCCCGGTTAACTTTGG
    TTTCCCGGTTAACTTGG

    How will you designate the following sequence in relation to the template?
    AAAGGCCAATTGAAACC
    _____________________________________________________________________

    (e)

    Transition mutations are more common than transversions mutations.

    Construct a matrix to illustrate such characteristics of the mutations. Assume proportionate percentage to depict each type of mutation.

    A T G C
    A
    T
    G
    C

    x: T A G C T A T C G G G A A C T G
    y G C T C A C G G T T G G G A C T

    © BrainMass Inc. brainmass.com October 10, 2019, 1:42 am ad1c9bdddf
    https://brainmass.com/biology/molecular-biology/molecular-biology-sequence-analysis-345978

    Attachments

    Solution Preview

    Let me know any questions or if you need more details or explanations.

    --------------------------------------

    (a)
    Given two sequences x and y as shown below
    Determine the minimum number of edit operations (substitution, and indels) required to transform one into another

    x: T A G C T A T C G G G A A C T G
    y G C T C A C G G T T G G G A C T

    This problem is basically asking for the Levenshtein distance. The Levenshtein distance is defined as the minimum number of edits needed to transform one string into the other (the edits that are allowed are insertion, deletion, and substitution).

    I used the calculator here (http://www.miislita.com/searchito/levenshtein-edit-distance.html) to determine that the Levenshtein distance for these two strings is 13 (this is if you cut-and-paste the sequences, including all the spaces).

    If, instead, you use the strings that I believe the question is asking for (no spaces except one after the first string and one before the second string),

    TAGCTATCGGGAACTG
    GCTCACGGTTGGGACT

    the answer becomes 10.

    This website (http://www.merriampark.com/ld.htm) explains how the matrix used to compute the distance is created.

    (b)
    Determine the Hamming distance between the strings: CENTURY and SANCTUARY
    Determine the Levenshtein distance between the strings: BIO-INFORMATICS and TRI-TELEMATICS

    (1) The Hamming distance is similar to the Levenshtein distance, except instertions and deletions are not allowed. You find it by counting the number of characters that are different (the number of substitutions). (See http://www.ehow.com/how_5179242_calculate-hamming-distance.html for an explanation.)

    Since CENTURY (7 letters) and SANCTUARY (9 letters) are of different length, the Hamming distance ...

    Solution Summary

    The expert examines the sequence analysis for molecular biology.

    $2.19