18th March 2020
R Objectspractical_3 (~/transcriptomics)File > New File > R_MarkdownDataTypes.Rmdlibrary(tidyverse)
We learned how to:
rmarkdowntibble/data.frame%>%)ggplot2R Objectsdata.frame
tibble = a data.frame with pretty wrapping papernumeric, character etc.R each column is a vectorThe key building blocks for R objects: Vectors
RWhat is a vector?
A vector is one or more values of the same type
A simple vector would be
## [1] 1 2 3 4 5 6 7 8 9 10
What type of values are in this vector?
Another vector might be
## [1] "a" "cat" "video"
What type of values are in this vector?
What type of values are in this vector?
## [1] "742" "Evergreen" "Tce"
R(Please start running these examples in your own R Markdown)
TRUE or FALSElogi_vec <- c(TRUE, TRUE, FALSE) logi_vec
## [1] TRUE TRUE FALSE
int_vec <- 1:5 int_vec
## [1] 1 2 3 4 5
numericdbl_vec <- c(0.618, 1.414, 2) dbl_vec
Why are these called doubles?
char_vec <- c("blue", "red", "green")
char_vec
These are the basic building blocks for all R objects
complex & rawR data structures are built on theseWhat properties might a vector have?
length()typeof()
class()What properties might a vector have?
length()typeof()attributes()
names etc.We can combine two vectors in R, using the function c()
c(1, 2)
## [1] 1 2
The numbers 1 & 2 were both vectors with length() == 1
We have combined two vectors of length 1, to make a vector of length 2
What would happen if we combined two vectors of different types?
Let’s try & see what happens:
new_vec <- c(logi_vec, int_vec) print(new_vec) typeof(new_vec)
Q: What happened to the logical values?
Answer: R coerced them into a common type (i.e. integers).
What other types could logical vectors be coerced into?
Try using the functions: as.integer(), as.double() & as.character() on logi_vec
Can character vectors be coerced into numeric vectors?
simp_vec <- c("742", "Evergreen", "Terrace")
as.numeric(simp_vec)
## [1] 742 NA NA
Warning message: NAs introduced by coercion
One or more elements of a vector can be called using []
char_vec char_vec[2] char_vec[2:3]
Double brackets ([[]]) can be used to return single elements only
char_vec[[2]]
## [1] "red"
If you tried char_vec[[2:3]] you would receive an error message
Double brackets ([[]]) can be used to return single elements only
char_vec[[2]]
## [1] "red"
If you tried char_vec[[2:3]] you would receive an error message
Error in char_vec[[2:3]] : attempt to select more than one element in vectorIndex
If a vector has name attributes, we can call values by name.
Here we’ll use the built-in vector euro
head(euro)
## ATS BEF DEM ESP FIM FRF ## 13.76030 40.33990 1.95583 166.38600 5.94573 6.55957
euro["ESP"]
## ESP ## 166.386
Try repeating the call-by-name approach using double brackets
euro["ESP"] euro[["ESP"]]
What was the difference in the output?
Try repeating the call-by-name approach using double brackets
euro["ESP"] euro[["ESP"]]
What was the difference in the output?
[] returned the vector with the identical structure[[]] removed the attributes & just gave the R Object at that position (i.e. a numeric vector of length 1)Is it better to call by position, or by name?
Is it better to call by position, or by name?
Things to consider:
What is really happening in this line?
euro[1:3]
## ATS BEF DEM ## 13.76030 40.33990 1.95583
What is really happening in this line?
euro[1:3]
## ATS BEF DEM ## 13.76030 40.33990 1.95583
We are using the integer vector 1:3 to extract values from the euro vector
int_vec
## [1] 1 2 3 4 5
euro[int_vec]
## ATS BEF DEM ESP FIM ## 13.76030 40.33990 1.95583 166.38600 5.94573
We can also combine the above logical test and subsetting
dbl_vec
## [1] 0.618 1.414 2.000
dbl_vec > 1
## [1] FALSE TRUE TRUE
dbl_vec[dbl_vec > 1]
An additional logical test: %in% (read as: “is in”)
dbl_vec
## [1] 0.618 1.414 2.000
int_vec
## [1] 1 2 3 4 5
dbl_vec %in% int_vec
dbl_vec %in% int_vec
## [1] FALSE FALSE TRUE
Returns TRUE/FALSE for each value in dbl_vec if it is in int_vec
NB: int_vec was coerced silently to a double vector
length attribute.matrix is the two dimensional equivalentint_mat <- matrix(1:6, ncol = 2) print(int_mat)
dim(), nrow() ncol()rownames() & colnames()Some commands to try:
dim(int_mat) nrow(int_mat) typeof(int_mat) class(int_mat) attributes(int_mat) colnames(int_mat) length(int_mat)
Please ask questions if anything is confusing
x[row, col]row or col blank selects the entire row/columnint_mat[2, 2] int_mat[1,]
How would we just get the first column?
NB: Forgetting the comma will treat the matrix as a single vector running down the columns
int_mat
## [,1] [,2] ## [1,] 1 4 ## [2,] 2 5 ## [3,] 3 6
int_mat[5]
## [1] 5
length(int_mat)
## [1] 6
NB: Forgetting the comma will treat the matrix as a single vector running down the columns
int_mat[5]
## [1] 5
length(int_mat)
## [1] 6
Requesting a row or column that doesn’t exist is the source of a very common error message
dim(int_mat)
## [1] 3 2
int_mat[5,]
Error in int_mat[5, ] : subscript out of bounds
If row/colnames are assigned:
Can also extract values using these instead of by position
Arrays extend matrices to 3 or more dimensions
Beyond the scope of this course, but we just have more commas in the square brackets, e.g.
dim(iris3)
## [1] 50 4 3
dimnames(iris3)
RSummary of main data types in R
| Dimension | Homogeneous | Heterogeneous |
|---|---|---|
| 1d | vector |
list |
| 2d | matrix |
data.frame |
| 3d+ | array |
A list is a heterogeneous vector.
R objectvector, or matrixlistR object type we haven’t seen yetThese are incredibly common in R
Many R functions provide output as a list
testResults <- t.test(dbl_vec) class(testResults) typeof(testResults) testResults
NB: There is a function (print.htest()) that tells R how to print the results to the Console
Explore the various attributes of the object testResults
attributes(testResults)
Compare this with the results from:
attributes(euro)
length(testResults) names(testResults)
We can call the individual components of a list using the $ symbol followed by the name
testResults$statistic testResults$conf.int testResults$method
Note that each component is quite different to the others.
A list is a vector so we can also subset using the [] method
testResults[1] typeof(testResults[1])
list
Double brackets again retrieve a single element of the vector
R objecttestResults[[1]] typeof(testResults[[1]])
We can also use names instead of positions
testResults[c("statistic", "p.value")]
testResults[["statistic"]]
testResults[["statistic"]] is identical to testResults$statistictestResults to expand the entrystr(testResults)Finally!
vectordim(), nrow(), ncol(), rownames(), colnames()colnames() & rownames() are NOT optional & are assigned by defaultLet’s use band_members again
Try these commands
colnames(band_members) rownames(band_members) dim(band_members) nrow(band_members)
Individual entries can also be extracted using the square brackets
band_members[1:2, 1]
We can also refer to columns by name (same as matrices)
band_members[1:2, "name"]
Thinking of columns being vectors is quite useful
data.frame using the $ operatorband_members$name[1:2]
There is no equivalent for rows!!!
matrix objects look exactly like a data.frame
tibble objects are still clearly distinct
tibble equivalent for matricesclass(object)dplyr functions on a matrix, you will get errors
as.data.frame() to coerce a matrix to a data.frameData frames are actually special cases of lists
data.frame is an element of a listlistForgetting the comma, now gives a completely different result to a matrix!
band_members[1]
Was that what you expected?
Try using the double bracket method
What do you think will happen if we type:
band_members[5]
What do you think will happen if we type:
band_members[5]
Error: Positive column indexes in [ must match number of columns: * .data has 2 columns * Position 1 equals 5
R ObjectsHow do we assign names?
named_vec <- c(a = 1, b = 2, c = 3)
OR we can name an existing vector
names(int_vec) <- c("a", "b", "c", "d", "e")
Can we remove names?
The NULL, or empty, vector in R is created using c()
null_vec <- c() length(null_vec)
We can use this to remove names
names(int_vec) <- c()
Lists can have names, but not row/colnames
my_list <- list(int_vec, dbl_vec)
names(my_list) <- c("integers", "doubles")
OR
my_list <- list(integers = int_vec, doubles = dbl_vec)
What happens if we try this?
my_list$logical <- logi_vec
This is exactly the same as creating lists, but
The names attribute will also be the colnames()
my_df <- data.frame(doubles = dbl_vec, logical = logi_vec) names(my_df) == colnames(my_df)
## [1] TRUE TRUE
What happens if we try to add components that aren’t the same length?
my_df <- data.frame( integers = int_vec, doubles = dbl_vec, logical = logi_vec )
Error in data.frame(integers = int_vec, doubles = dbl_vec, logical = logi_vec) : arguments imply differing number of rows: 5, 3
S3 objectsS4 objects are very common
S4 objects==!=&|<; Greater than: ><=; Greater than or equal to >=NA represents a missing valuex <- c(1:5, NA) x == 5 x != 5 x > 3 x > 3 | x == 2 is.na(x)
With logical vectors, ! will invert all values
logi_vec !logi_vec !is.na(x) !x == 5
A few more challenging tests which may give unexpected results
is.integer(x) x == int_vec x[!is.na(x)] == int_vec x[5:1] == int_vec
Did you understand all of these results?
One final and important test in R
%in% can be read as is in"red" %in% char_vec
## [1] TRUE
char_vec %in% "red"
## [1] FALSE TRUE FALSE