BIOINF3005/7160: Transcriptomics Applications

18^th March 2020

`R` Objects

Before we start

Create a new R Project practical_3 (~/transcriptomics)
Create a new R Markdown:
- File > New File > R_Markdown
- Save as DataTypes.Rmd

library(tidyverse)

Recap Of Weeks 1 & 2

We learned how to:

Write reports with rmarkdown
Import tabular data
Look through & summarise a tibble/data.frame
Use the magrittr (%>%)
Generate plots with ggplot2

`R` Objects

Main data type so far has been a data.frame
- Very much like a spreadsheet
- tibble = a data.frame with pretty wrapping paper
Each column has the same type of data, e.g. numeric, character etc.
The columns can be a different type to the other columns
In R each column is a vector

Vectors

The key building blocks for R objects: Vectors

There is no such thing as a scalar in R
Everything is based around the concept of a vector

What is a vector?

Vectors

Definition

A vector is one or more values of the same type

Vectors

Examples

A simple vector would be

##  [1]  1  2  3  4  5  6  7  8  9 10

What type of values are in this vector?

Vectors

Examples

Another vector might be

## [1] "a"     "cat"   "video"

What type of values are in this vector?

Vectors

Examples

What type of values are in this vector?

## [1] "742"       "Evergreen" "Tce"

The 4 Atomic Vector Types

Atomic Vectors are the building blocks for everything in R
There are four main types
Plus two we can ignore

(Please start running these examples in your own R Markdown)

The 4 Atomic Vector Types

Logical Vectors

logical: Can only hold the values TRUE or FALSE

logi_vec <- c(TRUE, TRUE, FALSE)
logi_vec

## [1]  TRUE  TRUE FALSE

The 4 Atomic Vector Types

Integer Vectors

logical
integer: Counts, ranks or indexing positions

int_vec <- 1:5
int_vec

## [1] 1 2 3 4 5

The 4 Atomic Vector Types

Double (i.e. Double Precision) Vectors

logical
integer
double: Often (& lazily) referred to as numeric

dbl_vec <- c(0.618, 1.414, 2)
dbl_vec

Why are these called doubles?

The 4 Atomic Vector Types

Character Vectors

logical
integer
double
character

char_vec <- c("blue", "red", "green")
char_vec

The 4 Atomic Vector Types

These are the basic building blocks for all R objects

logical
integer
double
character

The 4 Atomic Vector Types

There are two more rare types we’ll ignore:
- complex & raw
All R data structures are built on these

Properties of a vector

What properties might a vector have?

The actual values
Length, accessed by the function length()
The type, accessed by the function typeof()
- Similar but preferable to class()

Properties of a vector

What properties might a vector have?

The actual values
Length, accessed by the function length()
The type, accessed by the function typeof()
Any optional & additional attributes: attributes()
- Holds data such as names etc.

Working with Vectors

We can combine two vectors in R, using the function c()

c(1, 2)

## [1] 1 2

The numbers 1 & 2 were both vectors with length() == 1
We have combined two vectors of length 1, to make a vector of length 2

Working with Vectors

What would happen if we combined two vectors of different types?

Let’s try & see what happens:

new_vec <- c(logi_vec, int_vec)
print(new_vec)
typeof(new_vec)

Working with Vectors

Q: What happened to the logical values?

Answer: R coerced them into a common type (i.e. integers).

Coercion

Discussion Questions

What other types could logical vectors be coerced into?

Try using the functions: as.integer(), as.double() & as.character() on logi_vec

Coercion

Can character vectors be coerced into numeric vectors?

simp_vec <- c("742", "Evergreen", "Terrace")
as.numeric(simp_vec)

## [1] 742  NA  NA

Warning message:
NAs introduced by coercion

Subsetting Vectors

One or more elements of a vector can be called using []

char_vec
char_vec[2]
char_vec[2:3]

Subsetting Vectors

Double brackets ([[]]) can be used to return single elements only

char_vec[[2]]

## [1] "red"

If you tried char_vec[[2:3]] you would receive an error message

Subsetting Vectors

Double brackets ([[]]) can be used to return single elements only

char_vec[[2]]

## [1] "red"

If you tried char_vec[[2:3]] you would receive an error message

Error in char_vec[[2:3]] : 
  attempt to select more than one element in vectorIndex

Subsetting Vectors

If a vector has name attributes, we can call values by name.

Here we’ll use the built-in vector euro

head(euro)

##       ATS       BEF       DEM       ESP       FIM       FRF 
##  13.76030  40.33990   1.95583 166.38600   5.94573   6.55957

euro["ESP"]

##     ESP 
## 166.386

Subsetting Vectors

Try repeating the call-by-name approach using double brackets

euro["ESP"]
euro[["ESP"]]

What was the difference in the output?

Subsetting Vectors

Try repeating the call-by-name approach using double brackets

euro["ESP"]
euro[["ESP"]]

What was the difference in the output?

Using [] returned the vector with the identical structure
Using [[]] removed the attributes & just gave the R Object at that position (i.e. a numeric vector of length 1)

Subsetting Vectors

Discussion Question

Is it better to call by position, or by name?

Subsetting Vectors

Discussion Question

Is it better to call by position, or by name?

Things to consider:

Which is easier to type on the fly?
Which is easier to read?
Which is more robust to undocumented changes in an object?

Subsetting Vectors

Extracting Multiple Values

What is really happening in this line?

euro[1:3]

##      ATS      BEF      DEM 
## 13.76030 40.33990  1.95583

Subsetting Vectors

Extracting Multiple Values

What is really happening in this line?

euro[1:3]

##      ATS      BEF      DEM 
## 13.76030 40.33990  1.95583

We are using the integer vector 1:3 to extract values from the euro vector

Subsetting Vectors

Extracting Multiple Values

int_vec

## [1] 1 2 3 4 5

euro[int_vec]

##       ATS       BEF       DEM       ESP       FIM 
##  13.76030  40.33990   1.95583 166.38600   5.94573

Vector Operations

We can also combine the above logical test and subsetting

dbl_vec

## [1] 0.618 1.414 2.000

dbl_vec > 1

## [1] FALSE  TRUE  TRUE

dbl_vec[dbl_vec > 1]

Vector Operations

An additional logical test: %in% (read as: “is in”)

dbl_vec

## [1] 0.618 1.414 2.000

int_vec

## [1] 1 2 3 4 5

dbl_vec %in% int_vec

Vector Operations

dbl_vec %in% int_vec

## [1] FALSE FALSE  TRUE

Returns TRUE/FALSE for each value in dbl_vec if it is in int_vec

NB: int_vec was coerced silently to a double vector

Matrices

Vectors are strictly one dimensional and have a length attribute.
A matrix is the two dimensional equivalent

int_mat <- matrix(1:6, ncol = 2)
print(int_mat)

Matrices

Matrices can only hold one type of value
- i.e. logical, integer, double, character
Have additional attributes such as dim(), nrow() ncol()
Can have optional rownames() & colnames()

Matrices

Some commands to try:

dim(int_mat)
nrow(int_mat)
typeof(int_mat)
class(int_mat)
attributes(int_mat)
colnames(int_mat)
length(int_mat)

Please ask questions if anything is confusing

Matrices

Use square brackets to extract values by row & column
The form is x[row, col]
Leaving either row or col blank selects the entire row/column

int_mat[2, 2]
int_mat[1,]

How would we just get the first column?

Matrices

NB: Forgetting the comma will treat the matrix as a single vector running down the columns

int_mat

##      [,1] [,2]
## [1,]    1    4
## [2,]    2    5
## [3,]    3    6

int_mat[5]

## [1] 5

length(int_mat)

## [1] 6

Matrices

NB: Forgetting the comma will treat the matrix as a single vector running down the columns

int_mat[5]

## [1] 5

length(int_mat)

## [1] 6

Matrices

Requesting a row or column that doesn’t exist is the source of a very common error message

dim(int_mat)

## [1] 3 2

int_mat[5,]

Error in int_mat[5, ] : subscript out of bounds

Matrices

If row/colnames are assigned:

Can also extract values using these instead of by position

Arrays

Arrays extend matrices to 3 or more dimensions

Beyond the scope of this course, but we just have more commas in the square brackets, e.g.

dim(iris3)

## [1] 50  4  3

dimnames(iris3)

Summary

Homogeneous Data Types

Vectors, Matrices & Arrays are the basic homogeneous data types of R
All are essentially just vectors

Heterogeneous Data Types

Summary of main data types in R

Dimension	Homogeneous	Heterogeneous
1d	`vector`	`list`
2d	`matrix`	`data.frame`
3d+	`array`

Lists

A list is a heterogeneous vector.

Each element is an R object
Can be a vector, or matrix
Could be another list
Any other R object type we haven’t seen yet

These are incredibly common in R

Lists

Many R functions provide output as a list

testResults <- t.test(dbl_vec)
class(testResults)
typeof(testResults)
testResults

NB: There is a function (print.htest()) that tells R how to print the results to the Console

Lists

Explore the various attributes of the object testResults

attributes(testResults)

Compare this with the results from:

attributes(euro)

Lists

length(testResults)
names(testResults)

Lists

We can call the individual components of a list using the $ symbol followed by the name

testResults$statistic
testResults$conf.int
testResults$method

Note that each component is quite different to the others.

Subsetting Lists

A list is a vector so we can also subset using the [] method

testResults[1]
typeof(testResults[1])

Using single square brackets returns a list
- i.e. an object which is a subset of the larger object, but of the same type

Subsetting Lists

Double brackets again retrieve a single element of the vector

Returns the actual element as the underlying R object

testResults[[1]]
typeof(testResults[[1]])

Subsetting Lists

We can also use names instead of positions

testResults[c("statistic", "p.value")]
testResults[["statistic"]]

testResults[["statistic"]] is identical to testResults$statistic

Lists

Note also the Environment Tab in the top right of RStudio
Click the arrow next to testResults to expand the entry
This is the output of str(testResults)

Data Frames

Finally!

These are the most common type of data you will work with
Each column is a vector
Columns can be different types of vectors
Column vectors MUST be the same length

Data Frames

Analogous to matrices, but are specifically for heterogeneous data
Have many of the same attributes as matrices
- dim(), nrow(), ncol(), rownames(), colnames()
colnames() & rownames() are NOT optional & are assigned by default

Data Frames

Let’s use band_members again

Try these commands

colnames(band_members)
rownames(band_members)
dim(band_members)
nrow(band_members)

Data Frames

Individual entries can also be extracted using the square brackets

band_members[1:2, 1]

We can also refer to columns by name (same as matrices)

band_members[1:2, "name"]

Data Frames

Thinking of columns being vectors is quite useful

We can call each column vector of a data.frame using the $ operator

band_members$name[1:2]

There is no equivalent for rows!!!

Data Frames & Matrices

Many matrix objects look exactly like a data.frame
- If both rownames and colnames are set
tibble objects are still clearly distinct
- There is no tibble equivalent for matrices
The easiest way to check is class(object)
If you try dplyr functions on a matrix, you will get errors
- Just use as.data.frame() to coerce a matrix to a data.frame

Data Frames & Lists

Data frames are actually special cases of lists

Each column of a data.frame is an element of a list
The element must all be vectors of the same length
Data frames can be treated identically to a list
Have additional subsetting operations and attributes

Data Frames & Lists

Forgetting the comma, now gives a completely different result to a matrix!

band_members[1]

Was that what you expected?

Try using the double bracket method

Data Frames & Lists

More Errors

What do you think will happen if we type:

band_members[5]

Data Frames & Lists

More Errors

What do you think will happen if we type:

band_members[5]

Error: Positive column indexes in [ must match number of columns: * .data has 2 columns * Position 1 equals 5

Working With `R` Objects

Vectors

Name Attributes

How do we assign names?

named_vec <- c(a = 1, b = 2, c = 3)

OR we can name an existing vector

names(int_vec) <- c("a", "b", "c", "d", "e")

Vectors

Name Attributes

Can we remove names?

The NULL, or empty, vector in R is created using c()

null_vec <- c()
length(null_vec)

Vectors

Name Attributes

We can use this to remove names

names(int_vec) <- c()

Lists

Lists can have names, but not row/colnames

my_list <- list(int_vec, dbl_vec)
names(my_list) <- c("integers", "doubles")

my_list <- list(integers = int_vec, doubles = dbl_vec)

Lists

What happens if we try this?

my_list$logical <- logi_vec

Data Frames

This is exactly the same as creating lists, but

The names attribute will also be the colnames()

my_df <- data.frame(doubles = dbl_vec, logical = logi_vec)
names(my_df) == colnames(my_df)

## [1] TRUE TRUE

Data Frames

What happens if we try to add components that aren’t the same length?

my_df <- data.frame(
  integers = int_vec, doubles = dbl_vec, logical = logi_vec
  )

Error in data.frame(integers = int_vec, doubles = dbl_vec, logical = logi_vec) : arguments imply differing number of rows: 5, 3

Summary

These are all of the basic R objects you will come across
Every R object is based on these!
All are S3 objects
In transcriptomics S4 objects are very common
- The same basic structures still underlie S4 objects

Logical Tests

Is Equal To: ==
Not equal: !=
And: &
Or: |
Less than: <; Greater than: >
Less than or equal to: <=; Greater than or equal to >=

Logical Tests

NA represents a missing value

x <- c(1:5, NA)
x == 5
x != 5
x > 3
x > 3 | x == 2
is.na(x)

Logical Tests

With logical vectors, ! will invert all values

logi_vec
!logi_vec
!is.na(x)
!x == 5

Note the precedence of operations in the last one!
Is there a more transparent way of writing this?

Logical Tests

A few more challenging tests which may give unexpected results

is.integer(x)
x == int_vec
x[!is.na(x)] == int_vec
x[5:1] == int_vec

Did you understand all of these results?

Logical Tests

One final and important test in R

%in% can be read as is in

"red" %in% char_vec

## [1] TRUE

char_vec %in% "red"

## [1] FALSE  TRUE FALSE

R Objects

Before we start

Recap Of Weeks 1 & 2

R Objects

Vectors

Vectors

Vectors

Definition

Vectors

Examples

Vectors

Examples

Vectors

Examples

The 4 Atomic Vector Types

The 4 Atomic Vector Types

Logical Vectors

The 4 Atomic Vector Types

Integer Vectors

The 4 Atomic Vector Types

Double (i.e. Double Precision) Vectors

The 4 Atomic Vector Types

Character Vectors

The 4 Atomic Vector Types

The 4 Atomic Vector Types

Properties of a vector

Properties of a vector

Working with Vectors

Working with Vectors

Working with Vectors

Coercion

Coercion

Discussion Questions

Coercion

Subsetting Vectors

Subsetting Vectors

Subsetting Vectors

Subsetting Vectors

Subsetting Vectors

Subsetting Vectors

Subsetting Vectors

Subsetting Vectors

Discussion Question

Subsetting Vectors

Discussion Question

Subsetting Vectors

Extracting Multiple Values

Subsetting Vectors

Extracting Multiple Values

Subsetting Vectors

Extracting Multiple Values

Vector Operations

Vector Operations

Vector Operations

Matrices

Matrices

Matrices

Matrices

Matrices

Matrices

Matrices

Matrices

Matrices

Arrays

Summary

Homogeneous Data Types

Heterogeneous Data Types

Heterogeneous Data Types

Lists

Lists

Lists

Lists

Lists

Subsetting Lists

Subsetting Lists

Subsetting Lists

Lists

Data Frames

Data Frames

Data Frames

`R` Objects

`R` Objects

Working With `R` Objects