Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

Consider an experiment where you perform RNA-seq comparing human cells grown in

ID: 169345 • Letter: C

Question

Consider an experiment where you perform RNA-seq comparing human cells grown in glucose to cells grown in galactose.

Consider an experiment where you perform RNA-seq comparing human cells grown in glucose to cells grown in galactose. (a) Gene A changes 10-fold between these two conditions and Gene B changes 1.2-fold. Explain how it could be that the 10-fold change is statistically insignificant whereas the 1.2-fold change is statistically significant. (b) You are looking to find regions of statistically significant differential expression, you consider two distinct ways of looking at the problem. In the first, you look at all windows of length 10 kb. In the second, you consider only the 20,000 annotated protein coding genes. Give the pros and cons of these two approaches, being sure to comment on the statistical cutoff. (Recall that the human genome is 3.0 times 10^9 bp.)

Explanation / Answer

a) According to statistical terms, if expression level of gebes changes systematically between two treatment conditions,it is differentially expressed, regardless of how small the difference might be.

On the other hand, in scientific terms, a gene is likely to be considered differentially expressed only if its expression level changes by worthwhile amount.


A 1.2 fold change is considered a worthwhile cutoff to judge differential expression of human cells grown in two different conditions and hence statistically significant rather than a 10 fold change.

b) The importance to understand the global distribution of gene expression levels lies on the fact that helps to determine whether the protein or RNA product is functional in a cell or tissue.

A global indicator of the overall sequencing accuracy and of the presence of contaminating DNA is the percentage of mapped reads.
When reads are mapped against the transcriptome,whch in this case is all windows of length 10kb, a slightly lower total mapping percentages are expected because reads coming from unannotated transcripts will be lost, and significantly more multi-mapping reads because of reads falling onto exons that are shared by different transcript isoforms of the same gene.

If the reference transcriptome is well annotated which in this case is the 20,000 annoted protein coding genes, we could analyze the biotype composition of the sample, which is indicative of the quality of the RNA purification step.