Last updated: 2020-04-02
Checks: 7 0
Knit directory: 20170327_Psen2S4Ter_RNASeq/
This reproducible R Markdown analysis was created with workflowr (version 1.6.0). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.
Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.
Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.
The command set.seed(20200119)
was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.
Great job! Recording the operating system, R version, and package versions is critical for reproducibility.
Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.
Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.
Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility. The version displayed above was the version of the Git repository at the time these results were generated.
Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish
or wflow_git_commit
). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:
Ignored files:
Ignored: .Rhistory
Ignored: .Rproj.user/
Untracked files:
Untracked: basic_sample_checklist.txt
Untracked: experiment_paired_fastq_spreadsheet_template.txt
Unstaged changes:
Modified: analysis/_site.yml
Modified: data/cpmPostNorm.rds
Modified: data/dgeList.rds
Modified: data/fit.rds
Modified: output/psen2HomVsHet.csv
Modified: output/psen2VsWT.csv
Staged changes:
Modified: analysis/_site.yml
Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.
These are the previous versions of the R Markdown and HTML files. If you’ve configured a remote Git repository (see ?wflow_git_remote
), click on the hyperlinks in the table below to view them.
File | Version | Author | Date | Message |
---|---|---|---|---|
html | 33f5f93 | Steve Ped | 2020-03-07 | Updated html after minor formatting |
Rmd | 3b120d9 | Steve Ped | 2020-03-07 | Corrected table justifications |
html | a5cdf5d | Steve Ped | 2020-03-06 | Finished Revisions |
Rmd | 154085b | Steve Ped | 2020-03-06 | Reduced ranked lists to fry only |
Rmd | 433a729 | Steve Ped | 2020-03-05 | Prepared to remove EGSEA-type approach |
Rmd | b617456 | Steve Ped | 2020-03-05 | Tweaked comparison of mutants |
html | 1385ceb | Steve Ped | 2020-03-05 | Compiled after renaming columns |
Rmd | 2c56cef | Steve Ped | 2020-03-05 | Tidied column names for enrichment output |
html | bad81c1 | Steve Ped | 2020-03-05 | Updated MutVsWt UpSet plots for KEGG/HALLMARK |
Rmd | c0b89ed | Steve Ped | 2020-03-05 | Reran after changing to cpmPotNorm instead of the incorrect fit$fitted.values |
Rmd | afdf91e | Steve Ped | 2020-03-04 | Fixed typo |
html | e0288c6 | Steve Ped | 2020-02-19 | Generated enrichment tables |
Rmd | 541fbf0 | Steve Ped | 2020-02-19 | Revised Hom Vs Het Enrichment |
html | 876e40f | Steve Ped | 2020-02-17 | Compiled after minor corrections |
Rmd | 53ed0e3 | Steve Ped | 2020-02-17 | Corrected Ensembl Release |
Rmd | f55be85 | Steve Ped | 2020-02-17 | Added commas |
Rmd | 55143d4 | Steve Ped | 2020-02-17 | Corrected gsea analyses |
html | 9104ecd | Steve Ped | 2020-01-28 | First draft of Hom Vs Het |
Rmd | 2cc69b5 | Steve Ped | 2020-01-28 | Added data load correction |
Rmd | 207cdc8 | Steve Ped | 2020-01-28 | Added code for Hom Vs Het Enrichment |
library(tidyverse)
library(magrittr)
library(edgeR)
library(scales)
library(pander)
library(msigdbr)
library(AnnotationDbi)
library(RColorBrewer)
library(ngsReports)
theme_set(theme_bw())
panderOptions("table.split.table", Inf)
panderOptions("table.style", "rmarkdown")
panderOptions("big.mark", ",")
samples <- here::here("data/samples.csv") %>%
read_csv() %>%
distinct(sampleName, .keep_all = TRUE) %>%
dplyr::select(sample = sampleName, sampleID, genotype) %>%
mutate(
genotype = factor(genotype, levels = c("WT", "Het", "Hom")),
mutant = genotype %in% c("Het", "Hom"),
homozygous = genotype == "Hom"
)
genoCols <- samples$genotype %>%
levels() %>%
length() %>%
brewer.pal("Set1") %>%
setNames(levels(samples$genotype))
dgeList <- here::here("data/dgeList.rds") %>% read_rds()
cpmPostNorm <- here::here("data/cpmPostNorm.rds") %>% read_rds()
entrezGenes <- dgeList$genes %>%
dplyr::filter(!is.na(entrezid)) %>%
unnest(entrezid) %>%
dplyr::rename(entrez_gene = entrezid)
deTable <- here::here("output", "psen2HomVsHet.csv") %>%
read_csv() %>%
mutate(
entrezid = dgeList$genes$entrezid[gene_id]
)
formatP <- function(p, m = 0.0001){
out <- rep("", length(p))
out[p < m] <- sprintf("%.2e", p[p<m])
out[p >= m] <- sprintf("%.4f", p[p>=m])
out
}
Enrichment analysis for this dataset present some challenges. Despite normalisation to account for gene length and GC bias, some appeared to still be present in the final results. In addition, the confounding of incomplete rRNA removal with genotype may lead to other distortions in both DE genes and ranking statistics.
As the list of DE genes for this comparison was small (\(n_{\text{DE}} = 7\)), enrichment testing was only performed using ranked-list approaches. For enrichment within larger gene lists, fry
can take into account inter-gene correlations. Values supplied will be logCPM for each gene/sample after being adjusted for GC and length biases.
Data was sourced using the msigdbr
package. The initial database used for testing was the Hallmark Gene Sets, with mappings from gene-set to EntrezGene IDs performed by the package authors.
hm <- msigdbr("Danio rerio", category = "H") %>%
left_join(entrezGenes) %>%
dplyr::filter(!is.na(gene_id)) %>%
distinct(gs_name, gene_id, .keep_all = TRUE)
hmByGene <- hm %>%
split(f = .$gene_id) %>%
lapply(extract2, "gs_name")
hmByID <- hm %>%
split(f = .$gs_name) %>%
lapply(extract2, "gene_id")
Mappings are required from gene to pathway, and Ensembl identifiers were used to map from gene to pathway, based on the mappings in the previously used annotations (Ensembl Release 98). A total of 3,375 Ensembl IDs were mapped to pathways from the hallmark gene sets.
kg <- msigdbr("Danio rerio", category = "C2", subcategory = "CP:KEGG") %>%
left_join(entrezGenes) %>%
dplyr::filter(!is.na(gene_id)) %>%
distinct(gs_name, gene_id, .keep_all = TRUE)
kgByGene <- kg %>%
split(f = .$gene_id) %>%
lapply(extract2, "gs_name")
kgByID <- kg %>%
split(f = .$gs_name) %>%
lapply(extract2, "gene_id")
The same mapping process was applied to KEGG gene sets. A total of 3,530 Ensembl IDs were mapped to pathways from the KEGG gene sets.
goSummaries <- url("https://uofabioinformaticshub.github.io/summaries2GO/data/goSummaries.RDS") %>%
readRDS() %>%
mutate(
Term = Term(id),
gs_name = Term %>% str_to_upper() %>% str_replace_all("[ -]", "_"),
gs_name = paste0("GO_", gs_name)
)
minPath <- 3
go <- msigdbr("Danio rerio", category = "C5") %>%
left_join(entrezGenes) %>%
dplyr::filter(!is.na(gene_id)) %>%
left_join(goSummaries) %>%
dplyr::filter(shortest_path >= minPath) %>%
distinct(gs_name, gene_id, .keep_all = TRUE)
goByGene <- go %>%
split(f = .$gene_id) %>%
lapply(extract2, "gs_name")
goByID <- go %>%
split(f = .$gs_name) %>%
lapply(extract2, "gene_id")
For analysis of gene-sets from the GO database, gene-sets were restricted to those with 3 or more steps back to the ontology root terms. A total of 10,975 Ensembl IDs were mapped to pathways from restricted database of 8,825 GO gene sets.
gsSizes <- bind_rows(hm, kg, go) %>%
dplyr::select(gs_name, gene_symbol, gene_id) %>%
chop(c(gene_symbol, gene_id)) %>%
mutate(
gs_size = vapply(gene_symbol, length, integer(1)),
de_id = lapply(
X = gene_id,
FUN = intersect,
y = dplyr::filter(deTable, DE)$gene_id
),
de_size = vapply(de_id, length, integer(1))
)
hmFry <- cpmPostNorm %>%
fry(
index = hmByID,
design = dgeList$design,
contrast = "homozygous",
sort = "directional"
) %>%
rownames_to_column("gs_name") %>%
as_tibble()
Using the directional results from fry
, no Hallmark Gene Sets were detected as different between mutant genotypes, with a minimum FDR being found as 64%. Similarly for a non-directional approach, no Hallmark Gene Sets were detected as different, with a minimum FDR.Mixed
being 81%.
kgFry <-cpmPostNorm%>%
fry(
index = kgByID,
design = dgeList$design,
contrast = "homozygous",
sort = "directional"
) %>%
rownames_to_column("gs_name") %>%
as_tibble()
Using the directional results from fry
, no KEGG Gene Sets were detected as different between mutant genotypes, with a minimum FDR being found as 14%.
kgFry %>%
arrange(PValue.Mixed) %>%
dplyr::select(gs_name, NGenes, contains("Mixed")) %>%
set_names(str_remove(names(.), ".Mixed")) %>%
dplyr::filter(FDR < 0.05) %>%
dplyr::select(
`KEGG Gene Set` = gs_name,
`Expressed Genes` = NGenes,
PValue, FDR
) %>%
mutate_at(
vars(PValue, FDR), formatP
) %>%
pander(
justify = "lrrr",
caption = "The only KEGG Gene Set detected as different between mutants."
)
KEGG Gene Set | Expressed Genes | PValue | FDR |
---|---|---|---|
KEGG_GLYCINE_SERINE_AND_THREONINE_METABOLISM | 24 | 0.0003 | 0.0474 |
Using a non-directional (i.e. Mixed) approach, one KEGG gene set was detected as different. However, an FDR of 0.047 in this instance does not provide much confidence to this observation.
goFry <- cpmPostNorm %>%
fry(
index = goByID,
design = dgeList$design,
contrast = "homozygous",
sort = "directional"
) %>%
rownames_to_column("gs_name") %>%
as_tibble()
No GO Gene Sets were detected as different between mutant genotypes, with a minimum FDR being found as 15%.
goFry %>%
arrange(PValue.Mixed) %>%
dplyr::select(gs_name, NGenes, contains("Mixed")) %>%
set_names(str_remove(names(.), ".Mixed")) %>%
dplyr::filter(FDR < 0.05) %>%
dplyr::select(
`GO Gene Set` = gs_name,
`Expressed Genes` = NGenes,
PValue, FDR
) %>%
mutate_at(
vars(PValue, FDR), formatP
) %>%
pander(
justify = "lrrr",
caption = "The only GO Gene Set detected as different between mutants."
)
GO Gene Set | Expressed Genes | PValue | FDR |
---|---|---|---|
GO_REGULATION_OF_AUTOPHAGOSOME_MATURATION | 11 | 5.05e-06 | 0.0446 |
Again, using a non-directional (i.e. Mixed) approach, one GO gene set was detected as different. However, an FDR of 0.046 in this instance does not provide much confidence to this observation.
All enriched gene sets terms with an FDR adjusted p-value < 0.05 were exported as a single csv file.
bind_rows(
hmFry,
kgFry,
goFry
) %>%
dplyr::filter(FDR.Mixed < 0.05) %>%
left_join(gsSizes) %>%
dplyr::select(
gs_name, NGenes,
PValue = PValue.Mixed,
FDR = FDR.Mixed,
gene_symbol, de_size
) %>%
mutate(
DE = lapply(
X = gene_symbol,
FUN = intersect,
y = dplyr::filter(deTable, DE)$gene_name
),
DE = vapply(DE, paste, character(1), collapse = ";")
) %>%
dplyr::select(
`Gene Set` = gs_name,
`Nbr Detected Genes` = NGenes,
`Nbr DE Genes` = de_size,
PValue, FDR,
`DE Genes` = DE
) %>%
write_csv(
here::here("output", "Enrichment_Hom_V_Het.csv")
)
devtools::session_info()
─ Session info ───────────────────────────────────────────────────────────────
setting value
version R version 3.6.3 (2020-02-29)
os Ubuntu 18.04.4 LTS
system x86_64, linux-gnu
ui X11
language en_AU:en
collate en_AU.UTF-8
ctype en_AU.UTF-8
tz Australia/Adelaide
date 2020-04-02
─ Packages ───────────────────────────────────────────────────────────────────
package * version date lib source
AnnotationDbi * 1.48.0 2019-10-29 [2] Bioconductor
assertthat 0.2.1 2019-03-21 [2] CRAN (R 3.6.0)
backports 1.1.5 2019-10-02 [2] CRAN (R 3.6.1)
Biobase * 2.46.0 2019-10-29 [2] Bioconductor
BiocGenerics * 0.32.0 2019-10-29 [2] Bioconductor
BiocParallel 1.20.1 2019-12-21 [2] Bioconductor
Biostrings 2.54.0 2019-10-29 [2] Bioconductor
bit 1.1-15.2 2020-02-10 [2] CRAN (R 3.6.2)
bit64 0.9-7 2017-05-08 [2] CRAN (R 3.6.0)
bitops 1.0-6 2013-08-17 [2] CRAN (R 3.6.0)
blob 1.2.1 2020-01-20 [2] CRAN (R 3.6.2)
broom 0.5.4 2020-01-27 [2] CRAN (R 3.6.2)
callr 3.4.2 2020-02-12 [2] CRAN (R 3.6.2)
cellranger 1.1.0 2016-07-27 [2] CRAN (R 3.6.0)
cli 2.0.1 2020-01-08 [2] CRAN (R 3.6.2)
cluster 2.1.0 2019-06-19 [2] CRAN (R 3.6.1)
colorspace 1.4-1 2019-03-18 [2] CRAN (R 3.6.0)
crayon 1.3.4 2017-09-16 [2] CRAN (R 3.6.0)
data.table 1.12.8 2019-12-09 [2] CRAN (R 3.6.2)
DBI 1.1.0 2019-12-15 [2] CRAN (R 3.6.2)
dbplyr 1.4.2 2019-06-17 [2] CRAN (R 3.6.0)
DelayedArray 0.12.2 2020-01-06 [2] Bioconductor
desc 1.2.0 2018-05-01 [2] CRAN (R 3.6.0)
devtools 2.2.2 2020-02-17 [2] CRAN (R 3.6.2)
digest 0.6.25 2020-02-23 [2] CRAN (R 3.6.2)
dplyr * 0.8.4 2020-01-31 [2] CRAN (R 3.6.2)
edgeR * 3.28.1 2020-02-26 [2] Bioconductor
ellipsis 0.3.0 2019-09-20 [2] CRAN (R 3.6.1)
evaluate 0.14 2019-05-28 [2] CRAN (R 3.6.0)
FactoMineR 2.2 2020-02-05 [2] CRAN (R 3.6.2)
fansi 0.4.1 2020-01-08 [2] CRAN (R 3.6.2)
flashClust 1.01-2 2012-08-21 [2] CRAN (R 3.6.1)
forcats * 0.4.0 2019-02-17 [2] CRAN (R 3.6.0)
fs 1.3.1 2019-05-06 [2] CRAN (R 3.6.0)
generics 0.0.2 2018-11-29 [2] CRAN (R 3.6.0)
GenomeInfoDb 1.22.0 2019-10-29 [2] Bioconductor
GenomeInfoDbData 1.2.2 2019-11-21 [2] Bioconductor
GenomicAlignments 1.22.1 2019-11-12 [2] Bioconductor
GenomicRanges 1.38.0 2019-10-29 [2] Bioconductor
ggdendro 0.1-20 2016-04-27 [2] CRAN (R 3.6.0)
ggplot2 * 3.2.1 2019-08-10 [2] CRAN (R 3.6.1)
ggrepel 0.8.1 2019-05-07 [2] CRAN (R 3.6.0)
git2r 0.26.1 2019-06-29 [2] CRAN (R 3.6.1)
glue 1.3.1 2019-03-12 [2] CRAN (R 3.6.0)
GO.db 3.10.0 2019-11-21 [2] Bioconductor
gtable 0.3.0 2019-03-25 [2] CRAN (R 3.6.0)
haven 2.2.0 2019-11-08 [2] CRAN (R 3.6.1)
here 0.1 2017-05-28 [2] CRAN (R 3.6.0)
hms 0.5.3 2020-01-08 [2] CRAN (R 3.6.2)
htmltools 0.4.0 2019-10-04 [2] CRAN (R 3.6.1)
htmlwidgets 1.5.1 2019-10-08 [2] CRAN (R 3.6.1)
httpuv 1.5.2 2019-09-11 [2] CRAN (R 3.6.1)
httr 1.4.1 2019-08-05 [2] CRAN (R 3.6.1)
hwriter 1.3.2 2014-09-10 [2] CRAN (R 3.6.0)
IRanges * 2.20.2 2020-01-13 [2] Bioconductor
jpeg 0.1-8.1 2019-10-24 [2] CRAN (R 3.6.1)
jsonlite 1.6.1 2020-02-02 [2] CRAN (R 3.6.2)
kableExtra 1.1.0 2019-03-16 [2] CRAN (R 3.6.1)
knitr 1.28 2020-02-06 [2] CRAN (R 3.6.2)
later 1.0.0 2019-10-04 [2] CRAN (R 3.6.1)
lattice 0.20-40 2020-02-19 [4] CRAN (R 3.6.2)
latticeExtra 0.6-29 2019-12-19 [2] CRAN (R 3.6.2)
lazyeval 0.2.2 2019-03-15 [2] CRAN (R 3.6.0)
leaps 3.1 2020-01-16 [2] CRAN (R 3.6.2)
lifecycle 0.1.0 2019-08-01 [2] CRAN (R 3.6.1)
limma * 3.42.2 2020-02-03 [2] Bioconductor
locfit 1.5-9.1 2013-04-20 [2] CRAN (R 3.6.0)
lubridate 1.7.4 2018-04-11 [2] CRAN (R 3.6.0)
magrittr * 1.5 2014-11-22 [2] CRAN (R 3.6.0)
MASS 7.3-51.5 2019-12-20 [4] CRAN (R 3.6.2)
Matrix 1.2-18 2019-11-27 [2] CRAN (R 3.6.1)
matrixStats 0.55.0 2019-09-07 [2] CRAN (R 3.6.1)
memoise 1.1.0 2017-04-21 [2] CRAN (R 3.6.0)
modelr 0.1.6 2020-02-22 [2] CRAN (R 3.6.2)
msigdbr * 7.0.1 2019-09-04 [2] CRAN (R 3.6.2)
munsell 0.5.0 2018-06-12 [2] CRAN (R 3.6.0)
ngsReports * 1.1.2 2019-10-16 [1] Bioconductor
nlme 3.1-144 2020-02-06 [4] CRAN (R 3.6.2)
pander * 0.6.3 2018-11-06 [2] CRAN (R 3.6.0)
pillar 1.4.3 2019-12-20 [2] CRAN (R 3.6.2)
pkgbuild 1.0.6 2019-10-09 [2] CRAN (R 3.6.1)
pkgconfig 2.0.3 2019-09-22 [2] CRAN (R 3.6.1)
pkgload 1.0.2 2018-10-29 [2] CRAN (R 3.6.0)
plotly 4.9.2 2020-02-12 [2] CRAN (R 3.6.2)
plyr 1.8.5 2019-12-10 [2] CRAN (R 3.6.2)
png 0.1-7 2013-12-03 [2] CRAN (R 3.6.0)
prettyunits 1.1.1 2020-01-24 [2] CRAN (R 3.6.2)
processx 3.4.2 2020-02-09 [2] CRAN (R 3.6.2)
promises 1.1.0 2019-10-04 [2] CRAN (R 3.6.1)
ps 1.3.2 2020-02-13 [2] CRAN (R 3.6.2)
purrr * 0.3.3 2019-10-18 [2] CRAN (R 3.6.1)
R6 2.4.1 2019-11-12 [2] CRAN (R 3.6.1)
RColorBrewer * 1.1-2 2014-12-07 [2] CRAN (R 3.6.0)
Rcpp 1.0.3 2019-11-08 [2] CRAN (R 3.6.1)
RCurl 1.98-1.1 2020-01-19 [2] CRAN (R 3.6.2)
readr * 1.3.1 2018-12-21 [2] CRAN (R 3.6.0)
readxl 1.3.1 2019-03-13 [2] CRAN (R 3.6.0)
remotes 2.1.1 2020-02-15 [2] CRAN (R 3.6.2)
reprex 0.3.0 2019-05-16 [2] CRAN (R 3.6.0)
reshape2 1.4.3 2017-12-11 [2] CRAN (R 3.6.0)
rlang 0.4.4 2020-01-28 [2] CRAN (R 3.6.2)
rmarkdown 2.1 2020-01-20 [2] CRAN (R 3.6.2)
rprojroot 1.3-2 2018-01-03 [2] CRAN (R 3.6.0)
Rsamtools 2.2.3 2020-02-23 [2] Bioconductor
RSQLite 2.2.0 2020-01-07 [2] CRAN (R 3.6.2)
rstudioapi 0.11 2020-02-07 [2] CRAN (R 3.6.2)
rvest 0.3.5 2019-11-08 [2] CRAN (R 3.6.1)
S4Vectors * 0.24.3 2020-01-18 [2] Bioconductor
scales * 1.1.0 2019-11-18 [2] CRAN (R 3.6.1)
scatterplot3d 0.3-41 2018-03-14 [2] CRAN (R 3.6.1)
sessioninfo 1.1.1 2018-11-05 [2] CRAN (R 3.6.0)
ShortRead 1.44.3 2020-02-03 [2] Bioconductor
statmod 1.4.34 2020-02-17 [2] CRAN (R 3.6.2)
stringi 1.4.6 2020-02-17 [2] CRAN (R 3.6.2)
stringr * 1.4.0 2019-02-10 [2] CRAN (R 3.6.0)
SummarizedExperiment 1.16.1 2019-12-19 [2] Bioconductor
testthat 2.3.1 2019-12-01 [2] CRAN (R 3.6.1)
tibble * 2.1.3 2019-06-06 [2] CRAN (R 3.6.0)
tidyr * 1.0.2 2020-01-24 [2] CRAN (R 3.6.2)
tidyselect 1.0.0 2020-01-27 [2] CRAN (R 3.6.2)
tidyverse * 1.3.0 2019-11-21 [2] CRAN (R 3.6.1)
truncnorm 1.0-8 2018-02-27 [2] CRAN (R 3.6.0)
usethis 1.5.1 2019-07-04 [2] CRAN (R 3.6.1)
vctrs 0.2.3 2020-02-20 [2] CRAN (R 3.6.2)
viridisLite 0.3.0 2018-02-01 [2] CRAN (R 3.6.0)
webshot 0.5.2 2019-11-22 [2] CRAN (R 3.6.1)
whisker 0.4 2019-08-28 [2] CRAN (R 3.6.1)
withr 2.1.2 2018-03-15 [2] CRAN (R 3.6.0)
workflowr * 1.6.0 2019-12-19 [2] CRAN (R 3.6.2)
xfun 0.12 2020-01-13 [2] CRAN (R 3.6.2)
xml2 1.2.2 2019-08-09 [2] CRAN (R 3.6.1)
XVector 0.26.0 2019-10-29 [2] Bioconductor
yaml 2.2.1 2020-02-01 [2] CRAN (R 3.6.2)
zlibbioc 1.32.0 2019-10-29 [2] Bioconductor
zoo 1.8-7 2020-01-10 [2] CRAN (R 3.6.2)
[1] /home/steveped/R/x86_64-pc-linux-gnu-library/3.6
[2] /usr/local/lib/R/site-library
[3] /usr/lib/R/site-library
[4] /usr/lib/R/library