This is assignment is due by 5pm, Thursday 9th April.
All questions are to be answered on the same R Markdown / PDF, regardless of if they require a plain text answer, or require execution of code.
Marks directly correspond to the amount of time and effort we expect for each question, so please answer with this is mind.
We strongly advise working in the folder ~/transcriptomics/assignment2
on your virtual machine. Using an R Project for each individual assignment is also strongly advised.
If all files required for submission are contained on your VM:
.zip
If all files are on your on your local Windows machine:
Send to > Compressed (zipped) folder
.zip
If all the files are on your *local macOS machine`:
Many early technologies still have a relevant place in modern transcriptomics. Of the technologies covered in Lecture 2: Early Transcriptomic Strategies, which technology might be suitable for analysis of pri-miRNAs? Explain why in one or two brief sentences.
Briefly contrast two of the technologies presented in Lecture 2: Early Transcriptomic Strategies and Lecture 3: Microarrays. Discuss their strengths and limitations, paying particular attention to how each represented a breakthrough at the time they gained prominence.
Some possible discussion points are
When performing a statistical test, we usually frame our analysis in terms of a Null Hypothesis and an Alternate Hypothesis, eventually returning a \(p\)-value. Explain why we do this, including clear description of what a \(p\)-value represents?
When conducting a \(T\)-test, we estimate two population-level parameters. Describe both of these, using the context of comparing gene expression patterns across two treatment groups.
To obtain your own set of qPCR data, please execute the following lines of code, using your own student number instead of the example given ("a1234567"
). This will create an object called qPCR
in your R Environment. This object will contain \(C_t\) values for a gene of interest and a housekeeper gene, across two cell types. Each experiment is run as a series of pairs so that you will have 4 values for each pair (2X Cell Types + 2X Genes).
source("https://uofabioinformaticshub.github.io/transcriptomics_applications/assignments/A2Funs.R")
makeRT("a1234567")
For this question, please perform the following tasks, showing all code. Where suitable, use pander()
to display the results.
t.test()
and interpret the outputqPCR
## # A tibble: 16 x 5
## cellType pair gene Ct type
## <fct> <int> <fct> <dbl> <fct>
## 1 Th 1 RPL19 15.5 Housekeeper Gene
## 2 Th 1 FOXO1 12.9 Study Gene
## 3 Th 2 RPL19 14.6 Housekeeper Gene
## 4 Th 2 FOXO1 12.1 Study Gene
## 5 Th 3 RPL19 15.0 Housekeeper Gene
## 6 Th 3 FOXO1 13.5 Study Gene
## 7 Th 4 RPL19 15.0 Housekeeper Gene
## 8 Th 4 FOXO1 14.2 Study Gene
## 9 Treg 1 RPL19 15.6 Housekeeper Gene
## 10 Treg 1 FOXO1 15.4 Study Gene
## 11 Treg 2 RPL19 15.3 Housekeeper Gene
## 12 Treg 2 FOXO1 13.1 Study Gene
## 13 Treg 3 RPL19 14.7 Housekeeper Gene
## 14 Treg 3 FOXO1 13.9 Study Gene
## 15 Treg 4 RPL19 14.9 Housekeeper Gene
## 16 Treg 4 FOXO1 14.2 Study Gene
qPCR %>%
ggplot(aes(gene, Ct, fill = gene)) +
geom_boxplot() +
facet_wrap(~cellType) +
theme_bw()
Calculation of dCT using pivot_wider()
to rearrange columns. There are numerous alternative approaches, but this was the one I chose.
dCT <- qPCR %>%
pivot_wider(
id_cols = c(cellType, pair),
names_from = gene,
values_from = Ct
) %>%
mutate(
dCt = FOXO1 - RPL19
)
dCT
## # A tibble: 8 x 5
## cellType pair RPL19 FOXO1 dCt
## <fct> <int> <dbl> <dbl> <dbl>
## 1 Th 1 15.5 12.9 -2.65
## 2 Th 2 14.6 12.1 -2.53
## 3 Th 3 15.0 13.5 -1.51
## 4 Th 4 15.0 14.2 -0.826
## 5 Treg 1 15.6 15.4 -0.189
## 6 Treg 2 15.3 13.1 -2.22
## 7 Treg 3 14.7 13.9 -0.886
## 8 Treg 4 14.9 14.2 -0.709
Calculation of ddCT using a similar strategy
ddCT <- dCT %>%
pivot_wider(
id_cols = pair,
names_from = cellType,
names_prefix = "dCt_",
values_from = dCt
) %>%
mutate(
ddCT = dCt_Th - dCt_Treg
)
ddCT
## # A tibble: 4 x 4
## pair dCt_Th dCt_Treg ddCT
## <int> <dbl> <dbl> <dbl>
## 1 1 -2.65 -0.189 -2.46
## 2 2 -2.53 -2.22 -0.308
## 3 3 -1.51 -0.886 -0.624
## 4 4 -0.826 -0.709 -0.116
To perform the t-test
t.test(ddCT$ddCT)
##
## One Sample t-test
##
## data: ddCT$ddCT
## t = -1.6308, df = 3, p-value = 0.2014
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
## -2.5864868 0.8338008
## sample estimates:
## mean of x
## -0.876343
For \(H_0:\) true average change in FOXO1 expression is zero between Th and Treg, the t-test produced a p-value of 0.2014. As this is \(>0.05\) we would accept \(H_0\) and conclude that the true average difference in expression is zero.
For this question, you will need the objects cpm.tsv
, topTable.csv
and de.tsv
. You will be assigned a gene set using the following command, again remembering to use your own student ID number.
source("https://uofabioinformaticshub.github.io/transcriptomics_applications/assignments/A2Funs.R")
chooseGeneSet("a1234567")
For this question, your task is to:
de.tsv
, and these should be formed into a character
vector.cpm.tsv
. (Hint: You can use topTable.csv
to map from gene names to gene IDs)pheatmap()
, create a heatmap of these genes using the expression values contained in cpm.tsv
. For reference, these values are provided as logCPM
values, which are suitable for plotting directly. Include an annotation for each sample, indicating which genotype it represents.chooseGeneSet("a1234567")
## [1] "GO_RESPONSE_TO_CYTOKINE"
First we need to find which DE genes belong to this geneset
de <- here::here("practicals/data/de.tsv") %>%
read_tsv() %>%
dplyr::filter(gs_name == "GO_RESPONSE_TO_CYTOKINE")
deGenes <- str_split(de$DE, pattern = ";")[[1]]
deGenes
## [1] "creb1b" "rpl3" "otud4" "tuba1c" "srm" "rps16"
## [7] "uba52" "psmd14" "ccl25b" "cd40" "gbp1" "mif"
## [13] "sharpin" "ube2kb" "mt2" "mmp2" "twistnb" "rplp0"
## [19] "ptpn12" "stat5a" "ddost" "slc25a5" "irf1b" "jagn1a"
## [25] "hcls1" "vrk2" "selplg" "calcoco2" "psmb10" "tfr1a"
## [31] "pkz" "cmklr1"
Given that the object cpm.tsv
and topTable.csv
contain only a subset of genes, we can subset our vector of DE genes.
topTable <- here::here("practicals/data/topTable.csv") %>%
read_csv()
myDeGenes <- deGenes[deGenes %in% topTable$gene_name]
myDeGenes
## [1] "uba52" "gbp1" "sharpin" "ube2kb" "tfr1a"
Now we can use these to create a heatmap
logCPM <- here::here("practicals/data/cpm.tsv") %>%
read_tsv()
library(pheatmap)
library(scales)
topTable %>%
dplyr::filter(gene_name %in% myDeGenes) %>%
dplyr::select(gene_id, gene_name) %>%
left_join(logCPM) %>%
dplyr::select(gene_name, starts_with("Ps")) %>%
rename_all(
str_replace,
pattern = ".+(Het|WT).+_F3_([0-9]+).+",
replacement = "\\1_\\2"
) %>%
as.data.frame() %>%
column_to_rownames("gene_name") %>%
pheatmap(
color = viridis_pal(option = "magma")(100),
cutree_cols = 2,
cutree_rows = 2,
angle_col = 0
)
highest | median |
---|---|
9.2 | 6.7 |