304: TxDb Objects

22 July 2016

AnnotationData

Four common classes of annotation

Object type	contents
OrgDb	gene based information
BSgenome	genome sequence
TxDb	transcriptome ranges
OrganismDb	composite information

`TxDb` Objects

These are the objects with the transcriptome information

Saved using GRanges classes
Derived heavily from the GenomicRanges & IRanges packages
The key idea is to refer the the genome using ranges to define locations

Workspace Setup

library(TxDb.Hsapiens.UCSC.hg19.knownGene)
txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene

txdb

This will load all the package dependencies as well

`GRanges` objects

Let's look at a GRanges object

Note that our txdb object used EntrezGene Ids

ids <- c(BRCA1="672", PTEN="5728")
genes(txdb, filter=list(gene_id=ids))

`Rle` vectors

Run Length Encoding format vectors

More memory efficient way to store positional information
highly efficient for long regions of "no information", or
also efficient for data with long stretches of repeats

`Rle` vectors

rle : Part of the base R package

Rle : S4Vectors version

Rle is used extensively in GenomicRanges

x <- c(1, 0, 0, 0, 1, 1, 2, 0, 0)
Rle(x)

Creating a `GRanges` object

gr <- GRanges(seqnames=Rle(c("chr1", "chrMT"), c(2, 4)),
              ranges=IRanges(15:20, 20),
              strand=rep(c("+", "-", "*"), 2))

Print the object by typing gr

The essential components are:

seqnames & ranges
If strand is omitted, the value * is added

Working with a `GRanges` object

Try these commands:

seqnames(gr)
strand(gr)
ranges(gr)
seqinfo(gr)
length(gr)
gr[1]
width(gr)
start(gr)

seqinfo() returns an object with a formal class
Seqinfo objects contain metadata about each sequence

Adding more information

names(gr) <- paste0("Rng", LETTERS[1:length(gr)])

We can assign names to the ranges:

Could be exons, genes, SNPs, CDS or any other feature

Now look at the object again

Adding more information

We can also add some key information about the sequences

seqlengths(gr) <- c(5e6, 1.5e5)
isCircular(gr) <- c(FALSE, TRUE)
genome(gr) <- c("madeUp.v1")
seqinfo(gr)

Adding more information

GRanges objects also have columns for metadata

Let's add:

Some \(p\)-values from a hypothesis test
Alternative names for the Chromosomes

mcols(gr) <- data.frame(score = 10^(-rexp(6)),
                        altChr = rep(c("G001", "G002"), 
                                     times=c(2, 4)))

Subsetting `GRanges` objects

Try these commands:

gr[1:3]
gr[1:2, 1]
subset(gr, score < 0.05)
subset(gr, width==1)
subset(gr, start > 18)
subset(gr, start > 18 | width ==5)
table(gr$altChr)
summary(mcols(gr)[,"score"])

`GRangesList`

GRanges objects can also be extended to GRangesList objects

exByGn <- exonsBy(txdb, "gene")
length(exByGn)

exByGn

`GRangesList`

As well as the exonsBy() methods, other methods include

transcriptsBy(), cdsBy(), threeUTRsByTranscript() + more

In the current example exons are listed by gene, but can also be listed by exon, cds or tx

`GRangesList`

These behave like normal list objects in R

Try these commands

exByGn[[1]]
exByGn$`1`
exByGn[1:2]
sapply(exByGn[1:10], 
       function(x){length(subset(x, width<100))}) 
unlist(exByGn[1:5])

Ask if you're unsure about what any of the above commands do

The `TxDb` Object as a Database

As well as extracting GRanges from these objects, they share methods with OrgDb objects

keytypes(txdb)
columns(txdb)

`GenomicFeatures`

Loading a TxDb object will also load dependencies such as GenomicFeatures

Contains many useful functions

makeTxDbFromBiomart(), makeTxDbFromGFF(), makeTxDbFromGRanges(), makeTxDbFromUCSC()

`GenomicFeatures`

TxDb objects not currently accesible from AnnotationHub

Other source objects are, e.g. GTF files

Gives the possibility to use methods for non-model organisms using our own annotations, genomes etc

`GenomicFeatures`

Also contains other useful methods

promoters(txdb, upstream=100, downstream=50,
          columns = c("tx_name", "gene_id"))

library(mirbase.db)
microRNAs(txdb)[1:3]

AnnotationData

TxDb Objects

Workspace Setup

GRanges objects

Rle vectors

Rle vectors

Creating a GRanges object

Working with a GRanges object

Try these commands:

Adding more information

Adding more information

Adding more information

Subsetting GRanges objects

Try these commands:

GRangesList

GRangesList

GRangesList

Try these commands

The TxDb Object as a Database

GenomicFeatures

GenomicFeatures

GenomicFeatures

`TxDb` Objects

`GRanges` objects

`Rle` vectors

`Rle` vectors

Creating a `GRanges` object

Working with a `GRanges` object

Subsetting `GRanges` objects

`GRangesList`

`GRangesList`

`GRangesList`

The `TxDb` Object as a Database

`GenomicFeatures`

`GenomicFeatures`

`GenomicFeatures`