This is assignment is due by 5pm, Tuesday 24th March.
All questions are to be answered on the same R Markdown / PDF, regardless of if they require a plain text answer, or require execution of code.
We strongly advise working in the folder ~/transcriptomics/assignment1
on your virtual machine. Using an R Project for each individual assignment is also strongly advised.
If all files required for submission are contained on your VM:
.zip
If all files are on your on your local Windows machine:
Send to > Compressed (zipped) folder
.zip
If all the files are on your *local macOS machine`:
Choose two different RNA types and contrast them with each other. Aspects to consider may be method of transcription, cellular location, post-transcriptional processing, biological function or any other aspect which you determine to be important
In R, you will commonly encounter 3 types of ‘unexpected’ output. 1) Errors, 2) Warnings and 3) Messages. Describe the role of each of these and how to interpret them.
Two possible definitions of a gene are given by the high-profile journal Nature and the US National Institute of Health. Discuss the limitations of these definitions, giving particular consideration to promoters and protein products which arise from multiple distinct locations within the genome. Two interesting discussion on this subject are available in this paper and this lecture. Feel free to use these resources, or find your own. Provide references where appropriate.
For this question, you will need a list of file names. Each student will be given a unique set so that everyone has their own unique problems to solve. This is specifically to encourage collaboration between students without any risk of plagiarism.
To obtain your own set of file names, please execute the following lines of code, using your own student number instead of the example given (`“a1234567”’).
source("https://uofabioinformaticshub.github.io/transcriptomics_applications/assignments/A1Funs.R")
makeSampleNames("a1234567")
After you have run these lines of code, you will have two objects in your workspace called sampleNames
and librarySizes
. These are the two objects which we will work with for the next two questions
pander()
from the package pander
to present the sample names that you have using in-line code of the style `r function(objectName)`
[1 mark]sampleNames
provided, create a tibble
containing the metadata for your experiment. This tibble should be named metaData
and should minimally contain the columns 1) date, 2) sex, 3) group, 4) researcher, 5) reads, and 6) sampleID. You will have to use functions from stringr
and dplyr
to perform this task. [7 marks]pander()
to present this table in your submission, including an appropriate table caption. [3 marks]Combine your metaData
object created in Question 4 with the object librarySizes
and generate a barplot of the library sizes for all samples. Colour your bars by the experimental treatment group, and ensure that all axes and other labels are of a standard suitable for publication.
Do you think that any of your metadata columns may have contributed to the variation in library sizes? Provide a clear explanation. (Please note that your answer may different to any other student’s answer)