Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

In this assignment, you will calculate the similarity of two sequences using Ham

ID: 3796468 • Letter: I

Question

In this assignment, you will calculate the similarity of two sequences using Hamming distance between two strings, search a genomic string looking for matches to a subsequence, and calculate the similarity scores for a sample DNA sequences compared to known DNA sequences. Here we have provided a small portion of a DNA sequence from a human, amouse, and an unknown species. Smaller DNA sequences will be compared to each of these larger DNA sequences to determine which has the best match. Each of the DNA sequences can be copied from this write-up and stored as a constant or variable in your program.   
Part 1 – Compare two sequences to each other
Your program will ask the user for two small sequences (no more than 10 characters) and calculate a similarity score between the sequences. Pass the two sequences to a function named calcSimilarity() that returns a floating-point result (similarity score described above). Have your main code repeat the similarity calculations until the value of sequence 1 is a single character ‘*’.float calcSimilarity (string sequenceOne, string sequenceTwo) The calcSimilarity () function will take two arguments that are both strings. The function calculates the Hamming distance and returns the similarity score as a            floating-point number. This function should only calculate the similarity if the two strings are the same length, otherwise return 0.

Part 2 – Find locations of matches between genome and sequence Your program will ask the user for a small DNA sequence and search each of the given genomes (human, mouse, unknown) to find exact matches. Your main program should prompt the user for the search sequence, and pass each of the predefined genomes, species names and the small search sequence to the function listSequencePositions(), which does not return any value. The function will print the name of the genome followed by all locations of the exact matches. Repeat the search across all genomes in your main code until the small sequence given is a single character ‘*’.   void listSequencePositions (string genomeSequence, string genomeName, string seq) The listSequencePositions () function will take three arguments that are all strings           and print each of the positions where genomeSequence has an exact match to seq .              The function will print thegenomeName of thegenome,followedbythelocationsof              all exact matches on a single line and separated by spaces.

Part 3 – Find best match between genome and sequence Your program will ask the user for a sequence that will be compared to each of the genomes (Human, Mouse, Unknown) to find which genome has the best match to the given sequence. YourprogramwillprovideafunctioncompareDNA() to find the highest similarity score of the given sequence anywhere along the given genome. Your program will output the name of the genome with the best match. Have your main code repeat until the sequence given is a single character ‘*’. float compareDNA(string genome, string seq) The compareDNA() function should take two arguments that are both strings and            return the best similarity score that can be found for that sequence in the genome You should use the calcSimilarity() function described above to compare the sequence to all substrings of the genome. void compare3Genomes(string genome1, string name1, string genome2, string name2, string genome3, string name3, string seq) The compare3Genomes() function should take seven arguments that are all strings and print the name of the genome with the best similarity score that can be found for that sequence in the genome. In the case that multiple genomes have the same best similarity score, print the names of all of the genomes with the same score.

For part 1 I have

float calcSimilarity(string sequence1, string sequence2)
{
    float sim =0.00;
    int hamming_dist=0;

    //finding hamming distance
    if((sequence1.length() == sequence2.length()))
    {
        for(int i=0; i<sequence1.length();i++)
        {
            if((sequence1[i] == sequence2[i]))
            {
                hamming_dist++;
            }
        }
    }
        else
        {
            hamming_dist=-1;
        }
            if(hamming_dist >= 0)

                sim=((float)(sequence1.length()-hamming_dist)/(float)sequence1.length());

                else
                    sim=0;
                return sim;
}


int main()
{
    string sequence1;
    string sequence2;   

float sim;

string humanDNA = "CGCAAATTTGCCGGATTTCCTTTGCTGTTCCTGCATGTAGTTTAAACGAGATTGCCAGCACCGGGTATCATTCACCATTTTTCTTTTCGTTAA
CTTGCCGTCAGCCTTTTCTTTGACCTCTTCTTTCTGTTCATGTGTATTTGCTGTCTCTTAGCCCAGACTTCCCGTGTCCTTTCCACCGGGCCTTTGAGAGGTCACAGGGTCT
TGATGCTGTGGTCTTCATCTGCAGGTGTCTGACTTCCAGCAACTGCTGGCCTGTGCCAGGGTGCAGCTGAGCACTGGAGTGGAGTTTTCCTGTGGAGAGGAGCCATGCCTAGAGTGGGATGGGCCATTGTTCATG";
string mouseDNA = "CGCAATTTTTACTTAATTCTTTTTCTTTTAATTCATATATTTTTAATATGTTTACTATTAATGGTT
ATCATTCACCATTTAACTATTTGTTATTTTGACGTCATTTTTTTCTATTTCCTCTTTTTTCAATTCATGTTTATTTTCTGTATTTTTGTTAA
GTTTTCACAAGTCTAATATAATTGTCCTTTGAGAGGTTATTTGGTCTATATTTTTTTTTCTTCATCTGTATTTTTATGATTTCATTTAATTGATTTTCATTGAC
AGGGTTCTGCTGTGTTCTGGATTGTATTTTTCTTGTGGAGAGGAACTATTTCTTGAGTGGGATGTACCTTTGTTCTTG";
string unknownDNA = "CGCATTTTTGCCGGTTTTCCTTTGCTGTTTATTCATTTATTTTAAACGATATTTATATCATCGGGTTTCATTCACTATTTTTCTTTTCGATAA
ATTTTTGTCAGCATTTTCTTTTACCTCTTCTTTCTGTTTATGTTAATTTTCTGTTTCTTAACCCAGTCTTCTCGATTCTTATCTACCGGACCTATTATAGGTCACAGGGTCTTG
ATGCTTTGGTTTTCATCTGCAAGAGTCTGACTTCCTGCTAATGCTGTTCTGTGTCAGGGTGCATCTGAGCACTGATGTGGAGTTTTCTTGTGGATATGAGCCATTCATAGTGTGGGATGTGCCATAGTTCATG";

calcSimilarity(sequence1, sequence2);
    {
        cout<<"enter sequence 1: " <<endl;
        getline(cin, sequence1);
        cout << "enter sequence 2;" << endl;
        getline(cin, sequence2);

        sim=calcSimilarity(sequence1, sequence2);
        cout<<"similarity: "<<endl;
        cout<<sim<<endl;

        return 0;
    }

Explanation / Answer

calcSimilarity(sequence1, sequence2);
    {
        cout<<"enter sequence 1: " <<endl;
        getline(cin, sequence1);
        cout << "enter sequence 2;" << endl;
        getline(cin, sequence2);

        sim=calcSimilarity(sequence1, sequence2);
        cout<<"similarity: "<<endl;
        cout<<sim<<endl;

        return 0;
    }

CALUCULATES UR ANSWER