Transcriptomics Applications

Major Project

In this project, you are to take an RNA-Seq dataset from the beginning to the end of a differential expression analysis, including enrichment and an exploration of the underlying biology. This will include all pre-processing steps and statistical analysis.

You are free to choose your own dataset, however some suggested datasets are given in the table below and these may give you a good guideline for your own choices. You are encouraged to discuss any dataset of your own choosing with Steve or Dan to ensure you’re not taking on an insurmountable challenge. If you’re unsure about which steps to perform, or need advice on any aspect of the project, please contact us.

ID URL
a1787867 https://www.ebi.ac.uk/arrayexpress/experiments/E-GEOD-71862/samples/
a1686683 https://www.ebi.ac.uk/arrayexpress/experiments/E-GEOD-61024/samples/
a1792442 https://www.ebi.ac.uk/arrayexpress/experiments/E-MTAB-5622/samples
a1811380 https://www.ebi.ac.uk/arrayexpress/experiments/E-GEOD-44384/samples/

Some of the important issues we’d like to see addressed are:

  • How did you obtain the data
  • How did you process the data
  • What biological experiment does the dataset investigate, and what is in the data
  • Were there any biases in the data and was it of good quality
  • What methods were applied and why
  • How confident are you in the results and what are they telling us
  • How can you best visualise and communicate any results

There may be many other issues, but from our experience these in particular can easily be forgotten or overlooked.

Submission Format

We expect all details to be submitted as a compiled html or pdf, along with the source Rmd file. The report is due at 5pm, 19th June 2020.

Please prepare your submission in a form including the standard components of a scientific report:

  • Introduction (background on the study and identification of the research hypothesis)
  • Methods (analysis steps and programs used)
  • Results (what you found) and;
  • Discussion (how the results relate to the research hypothesis and the published literature).

Some steps may be best suited to being provided as supplementary material and this is at your discretion.

Marks will be awarded as follows:

Section Mark
Abstract 5%
Introduction and hypothesis 10%
Methods 20%
Results and Discussion 30%
References 5%
Analysis scripts 30%

Difficulties you may face

  • The capacity of the VM is insufficient to index a genome. We will provide indexed versions of the relevant genomes for you to import directly into your project to overcome this issue
  • Be cautious with the disk space on your VM. It is limited so please contact us if you’re having troubles so we can help find a solution.

Good luck and remember we are here to help you develop your skills, not to make your life hard. Please contact us when you need help.

Assessment Checklist

Have you:

  • [ ] Answered all the questions?
  • [ ] Followed naming conventions for Assessments?
  • [ ] Checked that you have not breached the Academic Honesty Policy.
  • [ ] Identified the work as yours?
    • Emails should have the course and assessment task names.
    • Documents should be named with your name, the course name and the assessment task.
    • Printed documents should have you name and the course and assessment task in the text/footer/header.
  • [ ] Used appropriate electronic communication with assessors?
    • Emails should have a meaningful subject.
  • [ ] Handed in the assignment before the due time (see MyUni)?