library(pander)
library(magrittr)
library(dplyr)
library(scales)
library(ggplot2)
library(ggrepel)
library(parallel)
library(readr)
library(reshape2)
logit <- binomial()$linkfun
inv.logit <- binomial()$linkinv
w <- 169/25.4
h <- 169/25.4
This analysis takes the filtered SNPs from under analysis and simulates genetic drift under no selective pressure, in order to compare detected changes in allele frequencies to the range of values predicted by the drift model.
This analysis uses a custom set of scripts collected as an R package and available from https://github.com/steveped/driftSim. This is able to be installed as below.
devtools::install_github("steveped/driftSim")
library(driftSim)
nGen <- 16
nSim <- 5000
litterSize <- 20
migRate <- 0.15
alpha <- 0.001
Ne <- c(`1996` = 222, `2012` = 116)
pSurv <- 0.1
f0 <- seq(0.5, 0.95, by = 0.05)
nPops <- c(4, 8)
sigma <- c(0.5, 0.8)
As decided the population parameters were defined as:
Parameter | Value | Comment |
---|---|---|
Ne1996 | 222 | Effective Population Size in 1996 |
Ne2012 | 116 | Effective Population Size in 2012 |
p | 0.1 | Probability of survival for each generation |
g | 16 | The number of generations between 1996 and 2012 |
n | 4, 8 | The number of neighbouring populations |
l | 20 | The annual litter size |
r | 0.15 | The migration rate |
f0 | 0.5 to 0.95 | The starting allele frequency in the main population, increased in steps of 0.05 |
σ | 0.5, 0.8 | The variability around f0 for neighbouring populations |
No selective advantage was specified for any allele.
The simulation parameters were defined as:
Parameter | Value | Comment |
---|---|---|
nsim | 5000 | The number of simulations at each starting frequency f0 |
α | 0.001 | The significance level for plotting confidence bands |
To summarise the above:
nSamp <- 1e06
x_0 <- runif(nSamp, 0.5, 0.95)
y_0 <- sigma %>%
lapply(function(sigma){
inv.logit(rnorm(nSamp, logit(x_0), sigma))
})
In order to express these values for σ in terms of correlations, 1,000,000 random values were simulated for a starting frequency (f0) anywhere between 0.5 and 0.95. 1,000,000 neighbouring population starting values were randomly sampled around this using the values of σ = 0.5 and 0.8, with the random sampling taking place on the logit scale.
Sigma | Correlation |
---|---|
0.5 | 0.81 |
0.8 | 0.65 |
allParam <- expand.grid(f0 = f0,
mig = migRate,
n = nPops,
sd = sigma) %>%
mutate(N0 = round(Ne["1996"], 0),
Nt = round(Ne["2012"], 0),
t = nGen,
surv = pSurv,
litter = litterSize) %>%
split(f = factor(1:nrow(.)))
allSim <- allParam %>%
mclapply(function(x){
x <- as.list(x)
replicate(nSim, do.call(simDrift, args = x)$ft)
},
mc.cores = 20)
As expected the mean allele frequency after simulation of drift remained approximately equal to the initial starting frquency, however the variability of the final frequency was heavily affected by the number of neighbouring populations and their similarity to the to main population of interest.
allSim99 <- allSim %>%
lapply(quantile, probs = c(alpha/2, 1-alpha/2)) %>%
lapply(as.list) %>%
lapply(as.data.frame) %>%
bind_rows() %>%
set_names(c("lwr", "upr"))
driftIntervals <- allParam %>%
bind_rows() %>%
dplyr::select(f0, n, sd) %>%
as_data_frame() %>%
cbind(allSim99)
flkResults <- file.path("..", "results", "flkResults.tsv") %>%
read_tsv
genotypeResults <- file.path("..", "results", "genotypeResults.tsv") %>%
read_tsv
sigSnps <- c(filter(flkResults, FDR < 0.05)$snpID,
filter(genotypeResults, FDR < 0.1)$snpID) %>%
unique
## png
## 2
save.image("../temp/05_GeneticDrift.RData")