# sequence similarity between two 100 Mb DNA sequences

Q1) Determine where and how similar two 100 Mb DNA sequences are. Assume that similarity will vary over the lengths of the sequences.

1.Restate the biological problem in computational terms.

2.What kinds of data do you need?

3.What controls do you need?

4.Do you need to define the problem's conditions more precisely? How?

5.How would you represent the data and the problem?

6.Describe an algorithm for solving the problem.

Q2) Now do the same for two entire genomes. Does your algorithm scale? What other phenomena could you encounter that could confuse your algorithm or results? ( This question you can give it to me later on another post)

Biology help to reach the answers:

This question is asking for you to find sequence similarity between two 100 Mb DNA sequences. Basically, what parts of the sequence are the most similar, or possibly even a perfect match, and what parts of the sequences are less of a match. The sequences will essentially be aligned with one another, and the nucleotide sequences will be compared. Sequence similarity takes precise matches into consideration, and is useful only when such substitutions are scored based on value of 'difference' or 'sameness', with conservative or very likely substitutions given higher scores than those that are non-conservative or unlikely. In genetics, the bases are A, T, C, and G; the 100 Mb sequences will consist of a long stretch of these bases in a certain order.

2. For this analysis, it would be useful to have the full sequences of both of these 100 Mb DNA sequences. That is, you want both of them to be fully ...

