Biotech7005

Major Project (15%)

In this course, the following next-generation sequencing (NGS) datasets/protocols are described in detail:

Each of these NGS approaches uses similar programs and analysis approaches, such as quality control (quality and sequencing adapter trimming), genome alignment, and downstream visualisation and statistical methods. They also aim to address a particular scientific question and investigate a scientific hypothesis. For the major project, you will take a published dataset and complete all the analysis tasks (from raw data to final results) and write up a report. This report must be structured like a small journal article, with abstract (summarising the project), introduction (background on the study and identification of the research hypothesis), methods (analysis steps and programs used), results (what you found) and discusson (how the results relate to the research hypothesis) sections. Marks will also be awarded to the bash/R or RMarkdown scripts that you use.

Section Mark
Abstract 5%
Introduction + hypothesis 10%
Methods 20%
Results & Discussion 30%
References 5%
Analysis scripts 30%

You have the freedom to choose any dataset from any research article you would like, however you need to let either Dan or Steve know beforehand.

If you cannot find a suitable dataset, we have provided a dataset from a plant RNAseq profiling study, which has multiple mutants of the histone deacetylase gene hda, which is involved in the regulation of flowering time in Arabidopsis thaliana. The details of the sequencing experiment are found at this GEO link. To ensure that everyone is not working on the same data, each student should work on a separate sample group:

ID Group
a1762813 hda5-1
a1202401 hda5-1
a1776494 hda9-1
a1778618 hda6-6
a1705074 hda6-6
a1790457 hda9-1
a1758382 hda9-1
a1789913 hda5-1
a1776998 hda6-6
a1776430 hda6-6
a1770716 hda5-1
a1739682 hda6-6
a1758693 hda6-6
a1781987 hda9-1
a1768114 hda5-1
a1700271 hda6-6
a1646510 hda9-1
a1779487 hda6-6
a1778287 hda9-1
a1783535 hda6-6
a1789486 hda9-1
a1758223 hda9-1
a1769266 hda9-1
a1771465 hda9-1
a1768183 hda6-6
a1781873 hda6-6
a1755783 hda9-1
a1679475 hda5-1
a1662216 hda5-1
a1780859 hda6-6
a1775769 hda5-1

For this particular dataset, we expect you to run a differential expression analysis between the replicates of your sample group against the Col wildtype control sample. You will report differentially expressed genes in your results section and discuss how this relates to the study’s experimental hypothesis

Raw FASTQ files will be provided via a data link provided in a myuni announcement/email.