18th March 2020
R
Objectspractical_3
(~/transcriptomics
)File > New File > R_Markdown
DataTypes.Rmd
library(tidyverse)
We learned how to:
rmarkdown
tibble/data.frame
%>%
)ggplot2
R
Objectsdata.frame
tibble
= a data.frame
with pretty wrapping papernumeric
, character
etc.R
each column is a vector
The key building blocks for R
objects: Vectors
R
What is a vector?
A vector is one or more values of the same type
A simple vector would be
## [1] 1 2 3 4 5 6 7 8 9 10
What type of values are in this vector?
Another vector might be
## [1] "a" "cat" "video"
What type of values are in this vector?
What type of values are in this vector?
## [1] "742" "Evergreen" "Tce"
R
(Please start running these examples in your own R Markdown)
TRUE
or FALSE
logi_vec <- c(TRUE, TRUE, FALSE) logi_vec
## [1] TRUE TRUE FALSE
int_vec <- 1:5 int_vec
## [1] 1 2 3 4 5
numeric
dbl_vec <- c(0.618, 1.414, 2) dbl_vec
Why are these called doubles?
char_vec <- c("blue", "red", "green") char_vec
These are the basic building blocks for all R
objects
complex
& raw
R
data structures are built on theseWhat properties might a vector have?
length()
typeof()
class()
What properties might a vector have?
length()
typeof()
attributes()
names
etc.We can combine two vectors in R
, using the function c()
c(1, 2)
## [1] 1 2
The numbers 1
& 2
were both vectors with length()
== 1
We have combined two vectors of length 1, to make a vector of length 2
What would happen if we combined two vectors of different types?
Let’s try & see what happens:
new_vec <- c(logi_vec, int_vec) print(new_vec) typeof(new_vec)
Q: What happened to the logical
values?
Answer: R
coerced them into a common type (i.e. integers).
What other types could logical
vectors be coerced into?
Try using the functions: as.integer()
, as.double()
& as.character()
on logi_vec
Can character
vectors be coerced into numeric
vectors?
simp_vec <- c("742", "Evergreen", "Terrace") as.numeric(simp_vec)
## [1] 742 NA NA
Warning message: NAs introduced by coercion
One or more elements of a vector can be called using []
char_vec char_vec[2] char_vec[2:3]
Double brackets ([[]]
) can be used to return single elements only
char_vec[[2]]
## [1] "red"
If you tried char_vec[[2:3]]
you would receive an error message
Double brackets ([[]]
) can be used to return single elements only
char_vec[[2]]
## [1] "red"
If you tried char_vec[[2:3]]
you would receive an error message
Error in char_vec[[2:3]] : attempt to select more than one element in vectorIndex
If a vector has name attributes, we can call values by name.
Here we’ll use the built-in vector euro
head(euro)
## ATS BEF DEM ESP FIM FRF ## 13.76030 40.33990 1.95583 166.38600 5.94573 6.55957
euro["ESP"]
## ESP ## 166.386
Try repeating the call-by-name approach using double brackets
euro["ESP"] euro[["ESP"]]
What was the difference in the output?
Try repeating the call-by-name approach using double brackets
euro["ESP"] euro[["ESP"]]
What was the difference in the output?
[]
returned the vector with the identical structure[[]]
removed the attributes
& just gave the R Object at that position (i.e. a numeric vector of length 1)Is it better to call by position, or by name?
Is it better to call by position, or by name?
Things to consider:
What is really happening in this line?
euro[1:3]
## ATS BEF DEM ## 13.76030 40.33990 1.95583
What is really happening in this line?
euro[1:3]
## ATS BEF DEM ## 13.76030 40.33990 1.95583
We are using the integer
vector 1:3
to extract values from the euro
vector
int_vec
## [1] 1 2 3 4 5
euro[int_vec]
## ATS BEF DEM ESP FIM ## 13.76030 40.33990 1.95583 166.38600 5.94573
We can also combine the above logical test and subsetting
dbl_vec
## [1] 0.618 1.414 2.000
dbl_vec > 1
## [1] FALSE TRUE TRUE
dbl_vec[dbl_vec > 1]
An additional logical test: %in%
(read as: “is in”)
dbl_vec
## [1] 0.618 1.414 2.000
int_vec
## [1] 1 2 3 4 5
dbl_vec %in% int_vec
dbl_vec %in% int_vec
## [1] FALSE FALSE TRUE
Returns TRUE/FALSE
for each value in dbl_vec
if it is in int_vec
NB: int_vec
was coerced silently to a double
vector
length
attribute.matrix
is the two dimensional equivalentint_mat <- matrix(1:6, ncol = 2) print(int_mat)
dim()
, nrow()
ncol()
rownames()
& colnames()
Some commands to try:
dim(int_mat) nrow(int_mat) typeof(int_mat) class(int_mat) attributes(int_mat) colnames(int_mat) length(int_mat)
Please ask questions if anything is confusing
x[row, col]
row
or col
blank selects the entire row/columnint_mat[2, 2] int_mat[1,]
How would we just get the first column?
NB: Forgetting the comma will treat the matrix as a single vector running down the columns
int_mat
## [,1] [,2] ## [1,] 1 4 ## [2,] 2 5 ## [3,] 3 6
int_mat[5]
## [1] 5
length(int_mat)
## [1] 6
NB: Forgetting the comma will treat the matrix as a single vector running down the columns
int_mat[5]
## [1] 5
length(int_mat)
## [1] 6
Requesting a row or column that doesn’t exist is the source of a very common error message
dim(int_mat)
## [1] 3 2
int_mat[5,]
Error in int_mat[5, ] : subscript out of bounds
If row/colnames are assigned:
Can also extract values using these instead of by position
Arrays extend matrices to 3 or more dimensions
Beyond the scope of this course, but we just have more commas in the square brackets, e.g.
dim(iris3)
## [1] 50 4 3
dimnames(iris3)
R
Summary of main data types in R
Dimension | Homogeneous | Heterogeneous |
---|---|---|
1d | vector |
list |
2d | matrix |
data.frame |
3d+ | array |
A list
is a heterogeneous vector.
R
objectvector
, or matrix
list
R
object type we haven’t seen yetThese are incredibly common in R
Many R
functions provide output as a list
testResults <- t.test(dbl_vec) class(testResults) typeof(testResults) testResults
NB: There is a function (print.htest()
) that tells R
how to print the results to the Console
Explore the various attributes of the object testResults
attributes(testResults)
Compare this with the results from:
attributes(euro)
length(testResults) names(testResults)
We can call the individual components of a list using the $
symbol followed by the name
testResults$statistic testResults$conf.int testResults$method
Note that each component is quite different to the others.
A list
is a vector
so we can also subset using the []
method
testResults[1] typeof(testResults[1])
list
Double brackets again retrieve a single element of the vector
R
objecttestResults[[1]] typeof(testResults[[1]])
We can also use names instead of positions
testResults[c("statistic", "p.value")] testResults[["statistic"]]
testResults[["statistic"]]
is identical to testResults$statistic
testResults
to expand the entrystr(testResults)
Finally!
vector
dim()
, nrow()
, ncol()
, rownames()
, colnames()
colnames()
& rownames()
are NOT optional & are assigned by defaultLet’s use band_members
again
Try these commands
colnames(band_members) rownames(band_members) dim(band_members) nrow(band_members)
Individual entries can also be extracted using the square brackets
band_members[1:2, 1]
We can also refer to columns by name (same as matrices)
band_members[1:2, "name"]
Thinking of columns being vectors is quite useful
data.frame
using the $
operatorband_members$name[1:2]
There is no equivalent for rows!!!
matrix
objects look exactly like a data.frame
tibble
objects are still clearly distinct
tibble
equivalent for matricesclass(object)
dplyr
functions on a matrix
, you will get errors
as.data.frame()
to coerce a matrix to a data.frame
Data frames are actually special cases of lists
data.frame
is an element of a list
list
Forgetting the comma, now gives a completely different result to a matrix!
band_members[1]
Was that what you expected?
Try using the double bracket method
What do you think will happen if we type:
band_members[5]
What do you think will happen if we type:
band_members[5]
Error: Positive column indexes in [
must match number of columns: * .data
has 2 columns * Position 1 equals 5
R
ObjectsHow do we assign names?
named_vec <- c(a = 1, b = 2, c = 3)
OR we can name an existing vector
names(int_vec) <- c("a", "b", "c", "d", "e")
Can we remove names?
The NULL
, or empty, vector in R
is created using c()
null_vec <- c() length(null_vec)
We can use this to remove names
names(int_vec) <- c()
Lists can have names
, but not row/colnames
my_list <- list(int_vec, dbl_vec) names(my_list) <- c("integers", "doubles")
OR
my_list <- list(integers = int_vec, doubles = dbl_vec)
What happens if we try this?
my_list$logical <- logi_vec
This is exactly the same as creating lists, but
The names
attribute will also be the colnames()
my_df <- data.frame(doubles = dbl_vec, logical = logi_vec) names(my_df) == colnames(my_df)
## [1] TRUE TRUE
What happens if we try to add components that aren’t the same length?
my_df <- data.frame( integers = int_vec, doubles = dbl_vec, logical = logi_vec )
Error in data.frame(integers = int_vec, doubles = dbl_vec, logical = logi_vec) : arguments imply differing number of rows: 5, 3
S3
objectsS4
objects are very common
S4
objects==
!=
&
|
<
; Greater than: >
<=
; Greater than or equal to >=
NA
represents a missing valuex <- c(1:5, NA) x == 5 x != 5 x > 3 x > 3 | x == 2 is.na(x)
With logical vectors, !
will invert all values
logi_vec !logi_vec !is.na(x) !x == 5
A few more challenging tests which may give unexpected results
is.integer(x) x == int_vec x[!is.na(x)] == int_vec x[5:1] == int_vec
Did you understand all of these results?
One final and important test in R
%in%
can be read as is in"red" %in% char_vec
## [1] TRUE
char_vec %in% "red"
## [1] FALSE TRUE FALSE