Matlab Write a program that allows the user to repeatedly analyze the DNA sequen
ID: 3670090 • Letter: M
Question
Matlab
Write a program that allows the user to repeatedly analyze the DNA sequence in the file “dna.txt” by counting
the number of times a particular pattern occurs. Include a main (primary) function that repeatedly asks for a
search pattern until no pattern is entered. Include a subfunction that receives a search pattern and returns the
quantity of invalid characters (not a, c, g, or t) in the pattern. If invalid characters are found, print an error
message indicating the quantity and do not perform the search.
Sample output: Below are examples of how your output might look:
>> problem1
Enter a search pattern: aagct
There are 5 occurrences.
Enter a search pattern: xagz
ERROR: Found 2 invalid characters.
Enter a search pattern: c
There are 818 occurrences.
Enter a search pattern:
(At this point, the program has terminated because the user has pressed ENTER without entering a
search pattern.)
About DNA: In genetics, a DNA sequence can be represented as a string comprised of four letters: a, c, g, and t
(see http://en.wikipedia.org/wiki/Nucleic_acid_sequence). Patterns in a sequence may indicate important
features in the genetic code (see http://en.wikipedia.org/wiki/Sequence_motif). For example, the sequence
‘gatcctccatatcc’ contains the pattern ‘tcc’ in three places.
Tip (data types): The data file contains the ASCII codes for the characters in the DNA sequence. The following
example shows you how to look at the first four values:
>> dna = load('dna.txt');
>> dna(1:4)
ans =
97 99 97 103
To use them as characters, the array must be converted to the “char” data type using the “char” function. The
following example shows how to convert the first four values (you need to convert the whole array):
>> char(dna(1:4))
ans =
acag
Now try typing “char(dna)” to see the whole DNA sequence. You may have to scroll over to see all the
characters!
Tip (checking for invalid characters): You’ll need a loop to check every character in the search pattern. To
check a single character, you can use a relational expression. The following examples would check to see if the
first character in the string “pattern” is not an ‘a’. Notice that the value that is returned (0 or 1) represents false
or true.
>> pattern = 'aagct';
>> pattern(1) ~= 'a'
ans =
0
>> pattern = 'xagct';
>> pattern(1) ~= 'a'
ans =
1
A character is invalid if it is not an ‘a’, not a ‘g’, not a ‘c’, and not a ‘t’. Note that the instructions say to write a
subfunction! This will help to organize your program.
Explanation / Answer
dna= load("dna.txt");
while(true)
prompt="enter the pattern";
pattern=input(prompt);
if double(get(gfc,'CurrentCharacter'))==13
break;
else
result=length(strfind(dna,patter));
fprintf('there are %d occurences',result);
end
end