In this assignment, you will be concerned with de novo protein-coding gene annot
ID: 143856 • Letter: I
Question
In this assignment, you will be concerned with de novo protein-coding gene annotation (identifying the location and extent of all protein-coding genes) of the M. genitalium prokaryotic genome using only the genome as input. In short, your goal is to identify which ORFs are CDSs. The best bioinformatics annotation pipelines use all kinds of additional evidence, including knowledge of genes in other genomes, to achieve the best annotations, but de novo annotation is usually a part of or at least has informed modern annotation pipelines
Plan and write pseudocode for an algorithm to solve the problem. Answer the following ques-tions in the process. How do you dene an ORF? How to you nd all ORFs in the M. genitalium genome? What signal will you use to identify CDS? How will you compute this signal? How will you use the signal to make decisions about which ORFs are CDSs?
Explanation / Answer
(ORF) is the part of a reading frame that has the ability to be translated. It is a continuous stretch of codons containing a start codon (usually AUG) and a stop codon (UAA, UAG or UGA).
Consider a sequence, Divide the sequence into 6 different reading frames, by considering the sequence in words.The first reading frame is obtained .The second reading frame is formed after leaving the first nucleotide and then grouping the sequence into words of 3 nucleotides,The third reading frame is formed after leaving the first 2 nucleotides and then grouping the sequence into words of 3 nucleotides.The other 3 reading frames can be found only after finding the reverse complement. Now mark the start codon and stop codons in the reading frames,Identify the open reading frame (ORF) - sequence stretch begining with a start codon and ending in a stop codon. the peptide sequence is found Based on the amino acid table