Due before 12pm, Monday 2nd September
Your answers to all questions should be submitted to myUni as a .zip
file containing three files:
1) a bash script for Q1 2) a bash script for Q2, and 3) the answers to the statistics questions (Q3 and Q4) in a single Rmarkdown document.
Note that the file my_species_gff_features.txt
is not required as part of your submission for Q1, only the script which will generate this file!
Similarly, for Q2, only the script is required.
my_species_gff_features.txt
where you use your assigned species name instead of my_species
[1 mark].
NB: If your actual species is not included in the name, no marks will be given.NB: If identical comments are identified in any submissions, a mark of zero will be given for this question for all suspicious submissions.
In a single rmarkdown file answer the following questions:
3 Two groups of people have volunteered to take part in a genetic study. Group 1 (n = 126) are volunteers with no history of Type I Diabetes in their immediate family, whilst Group 2 (n = 183) have all been diagnosed with Type I Diabetes. A genotyping study was undertaken on these volunteers using 25,786 SNPs selected due to their proximity to key immune genes.
Researchers are looking to identify any SNP genotypes which may increase the risk of Type I Diabetes. In your answer, consider the reference SNP allele as A
and the alternate SNP allele as B
, using the genotypes AA
, AB
and BB
.
a. For an individual SNP, what test would be appropriate for this comparison? [1 mark]
b. Define H₀ and Hₐ for the genotype at each individual SNP. [2 marks]
c. If there was no true difference in any genotypes between the two groups, how many p-values would you expect to see < 0.05? [1 mark]
d. Using Bonferroni’s method, what would a suitable cutoff value be to consider a SNP as being associated with an increased risk of Type I diabetes, i.e. to reject H₀ [1 mark]
e. Given the following genotype table, would you accept or reject H₀? Provide your working and a full explanation. [3 marks]
Group | AA | AB | BB |
---|---|---|---|
Control | 25 | 60 | 41 |
T1D | 21 | 55 | 103 |
4 An experiment was repeated multiple times, in which GFP fluorescence was measured in a cell culture as a measurement of gene expression, both before and after viral transfection.
GFP was present on a plasmid as a reporter for activity at a specific promoter.
The change in fluorescence values obtained for each repeat are given below as the vector x
, presented on the log2 scale for your individual subset of experiments.
a. Define H₀ and Hₐ [2 marks]
b. Calculate the sample mean and sample variance in R
[2 marks]
c. Calculate the T-statistic using R
. [1 mark]
d. What would the degrees of freedom be for your t-test? [1 mark]
e. Calculate the p-value using R
[1 mark]
Show all working & code.
If your student number is not listed, please contact Dan to ensure you are added to the list
ID | Species | Taxonomy ID | Common Name |
---|---|---|---|
a1202401 | Pogona vitticeps | 103695 | Central Bearded Dragon |
a1646510 | Otolemur garnettii | 30611 | Small-Eared Galago |
a1662216 | Xiphophorus couchianus | 32473 | Monterrey Platyfish |
a1679475 | Mus spretus | 10096 | Western Wild Mouse |
a1700271 | Bos mutus | 72004 | Wild Yak |
a1705074 | Microcebus murinus | 30608 | Gray Mouse Lemur |
a1739682 | Saccharomyces cerevisiae | 4932 | Baker’s Yeast |
a1755783 | Fukomys damarensis | 885580 | Damara Mole-Rat |
a1758223 | Astatotilapia calliptera | 8154 | Eastern Happy |
a1758382 | Erinaceus europaeus | 9365 | Western European Hedgehog |
a1758693 | Phascolarctos cinereus | 38626 | Koala |
a1762813 | Microtus ochrogaster | 79684 | Prairie Vole |
a1768114 | Meleagris gallopavo | 9103 | Turkey |
a1768183 | Anser brachyrhynchus | 132585 | Pink-Footed Goose |
a1769266 | Neovison vison | 452646 | American Mink |
a1770716 | Mus musculus | 10090 | House Mouse |
a1771465 | Rhinopithecus roxellana | 61622 | Golden Snub-Nosed Monkey |
a1775769 | Cercocebus atys | 9531 | Sooty Mangabey |
a1776430 | Cavia aperea | 37548 | Brazilian Guinea Pig |
a1776494 | Ochotona princeps | 9978 | American Pika |
a1776998 | Neovison vison | 452646 | American Mink |
a1778287 | Mastacembelus armatus | 205130 | Zig-Zag Eel |
a1778618 | Capra hircus | 9925 | Goat |
a1779487 | Procavia capensis | 9813 | Cape Rock Hyrax |
a1780859 | Theropithecus gelada | 9565 | Gelada |
a1781873 | Apteryx haastii | 8823 | Great Spotted Kiwi |
a1781987 | Apteryx rowi | 308060 | Okarito Brown Kiwi |
a1783535 | Rattus norvegicus | 10116 | Norway Rat |
a1789486 | Scleropages formosus | 113540 | Asian Bonytongue |
a1789913 | Ornithorhynchus anatinus | 9258 | Platypus |
a1790457 | Panthera tigris | 9694 | Tiger |
If your student number is not listed, please contact Dan to ensure you are added to the list
The results you are analysing for Q4 are as follows.
You can simply paste these values into your RMarkdown document as the object x
and perform all of your analysis on these values.
ID | Values |
---|---|
a1202401 | x <- c(-0.9513, 1.6303, -0.045, -0.7745, 0.4975, 0.548) |
a1646510 | x <- c(0.2462, -0.3341, 2.782, 1.3727, -2.2463, -1.302, -0.8207, 1.8788, 0.9241, 3.5149) |
a1662216 | x <- c(2.1101, 2.6845, -0.6054, 3.2921, 0.5918, -1.3179) |
a1679475 | x <- c(2.8489, -1.169, 0.194, 2.1414, 1.1182, 0.8952, 1.7453, 2.2781) |
a1700271 | x <- c(2.0825, -1.8909, 1.3076, -0.4417, -0.6932, 1.5703) |
a1705074 | x <- c(1.4906, -1.91, -0.3847, 1.8721, 0.0724, -0.2576, 0.7764, -0.2081, 1.6768, 0.5433) |
a1739682 | x <- c(-0.0698, -0.0496, 1.0847, -1.8116, 1.9262, -0.4381, 0.8199, -1.7124, 0.9042) |
a1755783 | x <- c(0.9639, 2.7818, 3.1581, -0.9618, 0.899) |
a1758223 | x <- c(2.4081, -0.1234, 1.0343, 0.1866, 1.2536, 3.0342, 1.3065, 1.9026, 1.9277, 0.8871) |
a1758382 | x <- c(0.9154, 3.0455, 0.9424, 0.2602) |
a1758693 | x <- c(0.8874, -1.2703, 1.5661, -1.4728, 0.3263, -0.5432, 1.2529, -1.058, 1.3997) |
a1762813 | x <- c(1.5863, 1.8493, 0.148, 0.9625) |
a1768114 | x <- c(-1.2482, 1.5015, 0.4375, -0.8861, 2.2182, -3.4134, 1.2606, 1.3364) |
a1768183 | x <- c(2.3343, 4.2557, 1.4483, -1.1698, 2.9268) |
a1769266 | x <- c(2.0184, -2.5565, 0.828, 0.7598, 2.4209, 0.9782) |
a1770716 | x <- c(0.9099, 1.5083, 3.8821, 1.857, -1.2774, -2.4848) |
a1771465 | x <- c(-2.1028, 2.2444, 0.2528, 0.4546, 0.5281, 1.9097) |
a1775769 | x <- c(2.1548, 2.6573, 0.0405, 1.8806, 1.2804) |
a1776430 | x <- c(1.1573, 0.9534, 1.6556, 0.3031, -0.7595) |
a1776494 | x <- c(-1.2097, -0.3836, -0.9921, 3.163) |
a1776998 | x <- c(1.17, 1.5779, 2.1255, 0.2414, -0.2254, 3.6169, -0.4608) |
a1778287 | x <- c(2.191, -1.2205, -0.1718, -0.8329, 1.3216, -0.2207, -0.4791, -1.6403, 1.4002) |
a1778618 | x <- c(0.2766, 0.9948, 1.0277, 1.446, 0.5142, -2.3367, 2.7549) |
a1779487 | x <- c(1.2239, 1.6841, 3.9703, 1.4247, 1.6411, 3.3794, 1.6216) |
a1780859 | x <- c(0.9849, 3.2516, -0.6605, -1.7701, 1.8174, 1.407, 0.998, 0.7658, 3.4027) |
a1781873 | x <- c(2.6989, -0.2447, -1.1002, 3.2813, 0.8118, 2.8494, 3.4665, -0.7619, -0.3138, 1.3657) |
a1781987 | x <- c(2.2343, 2.2869, 0.9392, 0.0993, -2.5061) |
a1783535 | x <- c(-0.7149, -0.507, 2.0843, 0.7257, -0.4206, 1.5086, 0.4469, -0.0513) |
a1789486 | x <- c(0.2182, -1.0116, 1.0219, -1.0481, -0.2522, 0.0143, 0.8276, -1.4257, 2.3013) |
a1789913 | x <- c(1.6467, 2.8552, -3.5222, 0.1931) |
a1790457 | x <- c(1.1158, 1.5058, -0.3448, -1.2364, 2.6939, 0.4026, -3.2485, -1.8346, -0.6862, 0.6051) |