22 July 2016
ShortRead
Rsamtools
samtoolsGRanges objectsNumerous functions for handling .fastq files
Uses Biostrings classes
DNAStringSet for sequences
BStringSet for quality scores
Personally I can't see any advantage to the set of external tools
GenomicRanges.bam filesGenomic RangesAnchor point between different analytic requirements
In "Day 3/data":
Let's load the data so we can form a GRanges object
Change into the "Day 3" directory
library(readr)
snpFile <- file.path("data", "HsSNPs.csv")
file.exists(snpFile)
snps <- read_csv(snpFile)
Genomic RangesLoad the important Bioconductor packages
library(GenomicRanges) library(TxDb.Hsapiens.UCSC.hg19.knownGene) txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene
Genomic RangesAs we only have the autosomal SNPS:
Seqinfo objectGRanges objectautoSeq <- list(seqlevels = seqlevels(txdb)[1:22]) autoSeq$seqinfo <- seqinfo(txdb)[autoSeq$seqlevels]
Genomic RangesNow we have everything we need!
snpGR <- GRanges(Rle(snps$Chr),
IRanges(snps$BP, end = snps$BP),
strand = "*",
seqinfo = autoSeq$seqinfo)
We can explore this using length(), seqlengths(), seqlevels() etc.
head(start(snpGR))
autosomes <- GRanges(Rle(autoSeq$seqlevels),
IRanges(1, seqlengths(autoSeq$seqinfo)),
strand = "*",
seqinfo = autoSeq$seqinfo)
countOverlaps()Now we have our two GRanges objects
snpPerChrom <- countOverlaps(autosomes, snpGR, type="any") names(snpPerChrom) <- seqlevels(autosomes)
?countOverlapsNow the fun stuff:
Genomic RangesLet's look by gene first
Define a gene as the range between the start & end points of all transcripts
txGRL <- transcriptsBy(txdb, "gene") gnGRL <- range(txGRL)
This time we have GRangesList objects as both objects
GenomicRangesNow we can find the overlaps
gnOverlaps <- countOverlaps(gnGRL, snpGR) gnOverlaps <- gnOverlaps[gnOverlaps > 0]
Genomic RangesNow we can find the overlaps looking the other way
snpGnOverlaps <- countOverlaps(snpGR, gnGRL) snpGnOverlaps <- snpGnOverlaps[snpGnOverlaps > 0]
(Perhaps I should've simulated my data better…)
Genomic RangesWe can find the actual mappings from SNP to gene
mapSnpGns <- findOverlaps(snpGR,gnGRL)
Genomic RangesThere's an even shorter way to count SNP hits for exons
exBySNP <- exonsByOverlaps(txdb, snpGR)
And CDS
cdsBySNP <- cdsByOverlaps(txdb, snpGR)
Genomic RangesSNPs are pretty easy