Four common classes of annotation
| Object type | contents |
|---|---|
| OrgDb | gene based information |
| BSgenome | genome sequence |
| TxDb | transcriptome ranges |
| OrganismDb | composite information |
22 July 2016
Four common classes of annotation
| Object type | contents |
|---|---|
| OrgDb | gene based information |
| BSgenome | genome sequence |
| TxDb | transcriptome ranges |
| OrganismDb | composite information |
TxDb ObjectsThese are the objects with the transcriptome information
GRanges classesGenomicRanges & IRanges packageslibrary(TxDb.Hsapiens.UCSC.hg19.knownGene) txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene
txdb
This will load all the package dependencies as well
GRanges objectsLet's look at a GRanges object
Note that our txdb object used EntrezGene Ids
ids <- c(BRCA1="672", PTEN="5728") genes(txdb, filter=list(gene_id=ids))
Rle vectorsRun Length Encoding format vectors
Rle vectorsrle : Part of the base R package
Rle : S4Vectors version
Rle is used extensively in GenomicRanges
x <- c(1, 0, 0, 0, 1, 1, 2, 0, 0) Rle(x)
GRanges objectgr <- GRanges(seqnames=Rle(c("chr1", "chrMT"), c(2, 4)),
ranges=IRanges(15:20, 20),
strand=rep(c("+", "-", "*"), 2))
Print the object by typing gr
The essential components are:
seqnames & rangesstrand is omitted, the value * is addedGRanges objectseqnames(gr) strand(gr) ranges(gr) seqinfo(gr) length(gr) gr[1] width(gr) start(gr)
seqinfo() returns an object with a formal classSeqinfo objects contain metadata about each sequencenames(gr) <- paste0("Rng", LETTERS[1:length(gr)])
We can assign names to the ranges:
Now look at the object again
We can also add some key information about the sequences
seqlengths(gr) <- c(5e6, 1.5e5)
isCircular(gr) <- c(FALSE, TRUE)
genome(gr) <- c("madeUp.v1")
seqinfo(gr)
GRanges objects also have columns for metadata
Let's add:
mcols(gr) <- data.frame(score = 10^(-rexp(6)),
altChr = rep(c("G001", "G002"),
times=c(2, 4)))
GRanges objectsgr[1:3] gr[1:2, 1] subset(gr, score < 0.05) subset(gr, width==1) subset(gr, start > 18) subset(gr, start > 18 | width ==5) table(gr$altChr) summary(mcols(gr)[,"score"])
GRangesListGRanges objects can also be extended to GRangesList objects
exByGn <- exonsBy(txdb, "gene") length(exByGn)
exByGn
GRangesListAs well as the exonsBy() methods, other methods include
transcriptsBy(), cdsBy(), threeUTRsByTranscript() + moreIn the current example exons are listed by gene, but can also be listed by exon, cds or tx
GRangesListThese behave like normal list objects in R
exByGn[[1]]
exByGn$`1`
exByGn[1:2]
sapply(exByGn[1:10],
function(x){length(subset(x, width<100))})
unlist(exByGn[1:5])
Ask if you're unsure about what any of the above commands do
TxDb Object as a DatabaseAs well as extracting GRanges from these objects, they share methods with OrgDb objects
keytypes(txdb) columns(txdb)
GenomicFeaturesLoading a TxDb object will also load dependencies such as GenomicFeatures
Contains many useful functions
makeTxDbFromBiomart(), makeTxDbFromGFF(), makeTxDbFromGRanges(), makeTxDbFromUCSC()GenomicFeaturesTxDb objects not currently accesible from AnnotationHub
Gives the possibility to use methods for non-model organisms using our own annotations, genomes etc
GenomicFeaturesAlso contains other useful methods
promoters(txdb, upstream=100, downstream=50,
columns = c("tx_name", "gene_id"))
library(mirbase.db) microRNAs(txdb)[1:3]