Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

The following sequence was obtained (written in 5\' to 3\' orientation) CACAAAGG

ID: 81028 • Letter: T

Question

The following sequence was obtained (written in 5' to 3' orientation)

CACAAAGGGC CATAAAAATG TTCATAATCT GGTGGTGTG GTGGCTCATG

CCTGTAATCC CAGCATTTG GGAGGCCAAG GTGGGAGGAT GCCTTGAGTC

TAGGAGTTG AGAGATGCCT GGATAACACA GAGAGACCCT CATCTCTACA

AAA

Question1) : Using a BlastN search, identify matches to the above sequence ( if the search results show sequences corresponding to a "human contig", these not an acceptable- choose sequences with a defined gene or sequence identity). Identify the name of the repetitive DNA sequence that is the best match. Remember that not all Blast hits are real, and you need to inspect the matches closely to identify real hits

Question 2:Using information you can find in journal articles (from NCBI for example) write a paragraph (no more than half a page) describing the biological features of the repetitive sequence you have identified. In your write up, include details such as

a) in which species is the repeat found?

b) estimation of copy number of the repeat in the genome

c) the dispersion pattern of the repeat

d) are there any associations of the repeat with disease?

Question 3) The human haploid genome contains 3x10^9 bp and a repetitive sequence, of 35bp in length, is present 150000 times. What is the copy number of the repeat in the human diploid genome and what proportion of the diploid genome does the repeat occupy? Would you class this repeat as a highly or moderately repetitive sequence of the human genome?

Explanation / Answer

(1) While using a BlastN search, the given DNA showed 91% identity to:

(a) An Arabidopsis thaliana P-glycoprotein 13 (ABCB13)

Query TGGATAACACAGAGAGACCCTC
||||||||||||| ||| ||||
Sbjct TGGATAACACAGAAAGATCCTC

(b) an Arabidopsis thaliana SOH1 family protein (MED31)

Query AGAGAGACCCTCATCTCTACAA
||||||||||| | ||||||||
Sbjct AGAGAGACCCTAAACTCTACAA

(2) No repeats could not be identified in the given DNA

(3) copy number of the repeat in the human diploid genome is 150,000 copies. The DNA that represents the size of the genome is given by (35 bpX150,000 copies) = 5.25x10^6 bp, which implies that the repeat occupies 0.175% of the genome. This is a highly repetitive sequence of the human genome.