Me School of Computing and Information Systems comp10002 Foundations of Algorith
ID: 3744317 • Letter: M
Question
Me School of Computing and Information Systems comp10002 Foundations of Algorithms Semester 2, 2018 Assignment 1 Learning Outcomes In this project you will demonstrate your understanding of arrays, strings, and functions. You may also use typedets and structs if you wish (see Chapter 8) - and will probably find the program easier to assemble if you do-but you are not required to use them in order to obtain full marks. You should not make any use of malloc) (Chapter 10) or file operations (Chapter 11) in this project. Superstring Assembly Suppose that multiple copies of a long string are cut up into much smaller pieces. If you are given the available fragments, is it possible to rebuild the larger string? For example, if the original string was "algorithnsarefunfunfun", and you got given a list of fragments like "refun and "unfun" "rith" and "1go" and so on, plus lots more small fragments, could you rebuild the original string? Probably you could, right? But what about if the original string was millions of characters long, like War and Peace? Or billions of characters long, like your genetic sequence? Let's ask a slightly different question now: if you got given a set of small string fragments each from a larger string that you didn't know anything about, could you at least find the shortest string in which every supplied fragment occurs at least once? This problem is known as the shortest superstring problem, and can be thought of as having parameters rn, the number of string fragments provided, and m, the total length of those string fragments. The shortest superstring problem is very important in bioinformatics, where genome reassembly from short fragments called reads helps contribute to our understanding of the behavior of organisms. Unfortunately, finding optimal solutions to the shortest superstring problem has been shown to be very challenging, and falls into a family of intractable problems that we'll talk about later in the semester. In this project you will implement some approximation techniques that generate non-optimal superstrings from a collection of string fragments. Input Data Input to your program will (always) come from stdin, and will (always) consist of strictly lower-case alphabetic strings, one per input line, each line terminated by a single newline 'In' character. Be sure to read the message on the FAQ page about newlines. For example, the following file testo.txt containing six fragments is a valid input: ctag 2 gtacctacta J cgatgco As you develop your program according to the stages listed below, the output will evolve. Full output examples for this and two other test files are linked from the FAQ page. You should also check your program against other inputs too; and can easily generate your own tests using a text editor. Testing and debugging is your responsibilityExplanation / Answer
#include<stdio.h>
#include<string.h>
//calculating minimum of two numbers
int min(int a, int b)
{
if(a<b)
return a;
else
return b;
}
// calculating maximum overlap in two given strings
int findOverlappingPair(char str1[], char str2[], char str[])
{
// max will store maximum overlap i.e maximum
// length of the matching prefix and suffix
int max,i,j,isSame;
int len1 = strlen(str1);
int len2 = strlen(str2);
// check suffix of str1 matches with prefix of str2
for (int i = 1; i <= min(len1, len2); i++)
{
// compare last i characters in str1 with first i
// characters in str2
for(j=0,k=len2-1;j<len1,k>=0;j++,k++)
{
if(str2[j]!=str1[k])
{
isSame=0;
break;
}
}
if (isSame)
{
if (max < i)
{
//update max and str
max = i;
str = str1 + str2[i];
}
}
}
// check prefix of str1 matches with suffix of str2
for (int i = 1; i <= min(len1, len2); i++)
{
// compare first i characters in str1 with last i
// characters in str2
for(j=0,k=len2-1;j<len1,k>=0;j++,k++)
{
if(str1[j]!=str2[k])
{
isSame=0;
break;
}
}
if (isSame)
{
if (max < i)
{
//update max and str
max = i;
str = str2 + str1[i];
}
}
}
return max;
}
// Function to calculate smallest string that contains each string in the given set as substring.
char *findShortestSuperstring(char *strings, int len)
{
// run len-1 times to consider every pair
while(len != 1)
{
int max; // to store maximum overlap
int l, r; // to store array index of strings
// involved in maximum overlap
char resStr[]; // to store resultant string after
// maximum overlap
for (int i = 0; i < len; i++)
{
for (int j = i + 1; j < len; j++)
{
char str[];
// res will store maximum length of the matching
// prefix and suffix str is passed by reference.
int res = findOverlappingPair(arr[i], arr[j], str);
// check for maximum overlap
if (max < res)
{
max = res;
resStr=strncat(resStr,str,max);
l = i, r = j;
}
}
}
len--; //ignore last element in next cycle
// if no overlap, append arr[len] to arr[0]
if (max)
arr[0] += arr[len];
else
{
arr[l] = resStr; // copy resultant string to index l
arr[r] = arr[len]; // copy string at last index to index r
}
}
return arr[0];
}
void main()
{
int min();
char *findShortestSuperstring();
int findOverlappingPair();
char arr[][10] = {"catgc", "ctaagt", "gcta", "ttca", "atgcatc"};
int len = sizeof(arr)/sizeof(arr[0]);
printf("The Shortest Superstring is",findShortestSuperstring(arr, len));
}