Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

If you use a program for problem A, it can be any language but I just need the a

ID: 3796034 • Letter: I

Question

If you use a program for problem A, it can be any language but I just need the answer. Thank you. I need help from A-D


Problem 1 (20 points): Alignment Statistics

A). Write a computer program or use a calculator to calculate a series of scoring matrices for aligning DNA sequences with 60, 70, 80, and 90% of identity, assuming A, C, G and T have equal probabilities. Do not scale the scores to integers (so is equal to 1 in all the matrices).

B). Assuming that you are using one of the matrices above to perform ungapped local alignment (i.e., gap penalty = minus infinity) between two 1000bp-long DNA sequences, and got a score of 25. Using the extreme value distribution, calculate the E-value of the alignment score (assuming K = 0.1). Is this alignment significant?

C). Redo (B) for score = 10.

D). Redo (B), but you are comparing a sequence of length 1000bp to a database of 1 million sequences, each of which is 1000bp long and the highest score you get is 25.

Explanation / Answer

For any 2 Dna sequences you can align them and calculate the alignment score by using needleman wunsh algorithm . I have implemented it in R language.

I have taken some assumptions like

match=2
mismatch=-3
gap=-2

It depends on the question.

code --

seq1='ACTGATTCA'
seq2='ACGCATCA'

m=nchar(seq1)
n=nchar(seq2)
match=2
mismatch=-3
gap=-2

mat=matrix(0,m+1,n+1)
trav=matrix(0,m+1,n+1)
mat[1,1]=0
for(i in 2:(n+1))
{
mat[1,i]=mat[1,i-1]+gap
}
for(i in 2:(m+1))
{
mat[i,1]=mat[i-1,1]+gap
}
seq1=strsplit(seq1,"")[[1]]
seq2=strsplit(seq2,"")[[1]]
for(i in 2:(m+1))
{
for(j in 2:(n+1))
{
if(seq1[i-1]==seq2[j-1])
{
p=mat[i-1,j-1]+match
q=mat[i-1,j]+gap
r=mat[i,j-1]+gap
l=c(p,q,r)
ind=which.max(l)
mat[i,j]=l[ind]
if(ind==1)
{
trav[i,j]='d'
}
else if(ind==2)
{
trav[i,j]='u'
}
else
{
trav[i,j]='l'
}
  
#mat[i,j]=max(mat[i-1,j-1]+match,mat[i-1,j]+gap,mat[i,j-1]+gap)
}
else
{
p=mat[i-1,j-1]+mismatch
q=mat[i-1,j]+gap
r=mat[i,j-1]+gap
  
l=c(p,q,r)
ind=which.max(l)
mat[i,j]=l[ind]
if(ind==1)
{
trav[i,j]='d'
}
else if(ind==2)
{
trav[i,j]='u'
}
else
{
trav[i,j]='l'
}
#mat[i,j]=max(mat[i-1,j-1]+mismatch,mat[i-1,j]+gap,mat[i,j-1]+gap)
}
}
}
print(mat)
print(trav)
l1=c()
l2=c()
i=m+1
j=n+1
a=1
while(i>1 & j>1)
{
if(trav[i,j]=='d')
{
l1[a]=seq1[i-1]
l2[a]=seq2[j-1]
  
i=i-1
j=j-1
}
else if(trav[i,j]=='u')
{
l1[a]=seq1[i-1]
l2[a]='-'
i=i-1
  
}
else
{
l1[a]='-'
l2[a]=seq2[j-1]
j=j-1
}
a=a+1
}
l1=rev(l1)
l2=rev(l2)
print(l1)
print(l2)
print(mat[m+1,n+1])