Instructions

Submission Format [3 marks]

This is assignment is due by 5pm, Tuesday 9^th June.

Submissions must be made as a zip archive containing 2 files:
1. Your source R Markdown Document (with an Rmd suffix)
2. A compiled pdf, showing all code
All file names within the zip archive must start with your student number. However the name of the zip archive is not important as myUni will likely modify this during submission. See here for help creating a zip archive

All questions are to be answered on the same R Markdown / PDF, regardless of if they require a plain text answer, or require execution of code.

Marks directly correspond to the amount of time and effort we expect for each question, so please answer with this is mind.

We strongly advise working in the folder ~/transcriptomics/assignment4 on your virtual machine. Using an R Project for each individual assignment is also strongly advised.

Creating a zip archive

On Your VM

If all files required for submission are contained on your VM:

Select both files using the Files pane in R Studio
Click export
They will automatically be placed into a single zip archive. Please name this in whatever informative name you decide is suitable, but it should contain the suffix .zip

Windows

If all files are on your on your local Windows machine:

Using File Explorer, enter the folder containing both files
Select all files simultaneously by using Ctrl + Click
Right click on one of the files and select Send to > Compressed (zipped) folder
Rename as appropriate, ensuring the archive ends with the suffix .zip

Mac OS

If all the files are on your *local macOS machine`:

Locate the items to zip in the Mac Finder (file system)
Right-click on a file, folder, or files you want to zip
Select “Compress Items”
Find the newly created .zip archive in the same directory and name as appropriate

Questions

Question 1 [5 marks]

A Module Eigengene (ME) is one of the fundamental concepts of gene co-expression network analysis using WGCNA.

Is ME one of the genes in a co-expression module? Explain your reasoning
Describe the roles (usage) of ME in gene co-expression network analysis

Question 2 [4 marks]

We have discussed that the key point of WGCNA analysis is to discover biological significance. Describe how can we build the link between our co-expression network analysis and biological significance.

Question 3 [4 marks]

Bulk RNA-seq and scRNA-seq differ in many ways. Describe two common aspects shared between the two approaches, and provide details about two key differences between the two.

Question 4 [12 marks]

Use the given gene expression dataset and corresponding clinical data to identify co-expression module(s) with the strongest correlation with clinical data. The gene expression dataset includes raw counts for human reference genes across 37 samples (skin cancer patients), and the clinical data includes diagnosed cancer stage status of 37 patients. The results should minimally include:

A figure showing which and how the power beta is selected.
A gene dendrogram and coloured co-expression modules detection using dynamic tree cut

Question 5 [12 marks]

Using the SingleCellExperiment object provided, perform the following steps. Please note that undetectable genes and low quality cells have already been removed from this dataset. Counts are already normalised and log transformed, and contain expression patterns from 622 mouse neuronal cells.

Perform PCA and generate a plot of the cells using PC1 and PC2
Choosing a suitable method, define clusters of cell types
Visualise your clusters using only one of either a PCA, UMAP or tSNE plot
Find the top 10 marker genes for one of your clusters and visualise the expression patterns
Discuss your clusters and comment on how effective your approach was

Transcriptomics Applications

Assignment 5