Instructions

Submission Format [3 marks]

This is assignment is due by 5pm, Tuesday 24th March.

  • Submissions must be made as a zip archive containing 3 files:
    1. Your source R Markdown Document (with an Rmd suffix)
    2. A compiled pdf, showing all code
    3. The signed cover sheet as required by the University [NB: This is no longer required]
  • All file names within the zip archive must start with your student number. However the name of the zip archive is not important as myUni will likely modify this during submission. See here for help creating a zip archive

All questions are to be answered on the same R Markdown / PDF, regardless of if they require a plain text answer, or require execution of code.

We strongly advise working in the folder ~/transcriptomics/assignment1 on your virtual machine. Using an R Project for each individual assignment is also strongly advised.

Creating a zip archive

On Your VM

If all files required for submission are contained on your VM:

  1. Select all three files using the Files pane in R Studio
  2. Click export
  3. They will automatically be placed into a single zip archive. Please name this in whatever informative name you decide is suitable, but it should contain the suffix .zip

Windows

If all files are on your on your local Windows machine:

  1. Using File Explorer, enter the folder containing all 3 files
  2. Select all files simultaneously by using Ctrl + Click
  3. Right click on one of the files and select Send to > Compressed (zipped) folder
  4. Rename as appropriate, ensuring the archive ends with the suffix .zip

Mac OS

If all the files are on your *local macOS machine`:

  1. Locate the items to zip in the Mac Finder (file system)
  2. Right-click on a file, folder, or files you want to zip
  3. Select “Compress Items”
  4. Find the newly created .zip archive in the same directory and name as appropriate

Questions

Question 1 [8 marks]

Choose two different RNA types and contrast them with each other. Aspects to consider may be method of transcription, cellular location, post-transcriptional processing, biological function or any other aspect which you determine to be important

Question 2 [3 marks]

In R, you will commonly encounter 3 types of ‘unexpected’ output. 1) Errors, 2) Warnings and 3) Messages. Describe the role of each of these and how to interpret them.

Question 3 [8 marks]

Two possible definitions of a gene are given by the high-profile journal Nature and the US National Institute of Health. Discuss the limitations of these definitions, giving particular consideration to promoters and protein products which arise from multiple distinct locations within the genome. Two interesting discussion on this subject are available in this paper and this lecture. Feel free to use these resources, or find your own. Provide references where appropriate.

Question 4 [13 marks]

For this question, you will need a list of file names. Each student will be given a unique set so that everyone has their own unique problems to solve. This is specifically to encourage collaboration between students without any risk of plagiarism.

To obtain your own set of file names, please execute the following lines of code, using your own student number instead of the example given (`“a1234567”’).

source("https://uofabioinformaticshub.github.io/transcriptomics_applications/assignments/A1Funs.R")
makeSampleNames("a1234567")

After you have run these lines of code, you will have two objects in your workspace called sampleNames and librarySizes. These are the two objects which we will work with for the next two questions

  1. Include the above code chunk in your submission, with an informative chunk label, and using a label that does not include any white-space. [2 marks]
  2. In a plain text paragraph or sentence, use the function pander() from the package pander to present the sample names that you have using in-line code of the style `r function(objectName)` [1 mark]
  3. Using the sampleNames provided, create a tibble containing the metadata for your experiment. This tibble should be named metaData and should minimally contain the columns 1) date, 2) sex, 3) group, 4) researcher, 5) reads, and 6) sampleID. You will have to use functions from stringr and dplyr to perform this task. [7 marks]
  4. Create a table summarising the number of samples per experimental group paying attention to the spread of sample sex within each group. Use pander() to present this table in your submission, including an appropriate table caption. [3 marks]

Question 5 [6 marks]

Combine your metaData object created in Question 4 with the object librarySizes and generate a barplot of the library sizes for all samples. Colour your bars by the experimental treatment group, and ensure that all axes and other labels are of a standard suitable for publication.

Do you think that any of your metadata columns may have contributed to the variation in library sizes? Provide a clear explanation. (Please note that your answer may different to any other student’s answer)

Total: 41 marks