Today's Topics:
RObjects & Data Types- Writing Functions in
R - Digging Deeper in
R - Running RStudio In The Cloud
21 July 2016
Today's Topics:
R Objects & Data TypesRRR ObjectsFile > New File > R Scriptwd <- "Day_2" setwd(wd)
wd <- "Day_2" setwd(wd)
Global EnvironmentR object called wdsetwd()Where is the R object wd?
What the the function setwd() do?
R ObjectsMain data type so far has been a data.frame
R each column is a vectorThe key building blocks for R objects: Vectors
RA vector is one or more values of the same type
A simple vector would be
## [1] 1 2 3 4 5 6 7 8 9 10
What type of values are in this vector?
Another vector might be
## [1] "a" "cat" "video"
What type of values are in this vector?
What about this vector?
## [1] "742" "Evergreen" "Tce"
What type of values are in this vector?
RCan only hold the values TRUE or FALSE
logi_vec <- c(TRUE, TRUE, FALSE) print(logi_vec)
## [1] TRUE TRUE FALSE
Useful for counts, ranks or indexing positions (e.g. column 3; nucleotide 254731)
int_vec <- 1:5 print(int_vec)
## [1] 1 2 3 4 5
Often (& lazily) referred to as numeric
dbl_vec <- c(0.618, 1.414, 2) print(dbl_vec)
## [1] 0.618 1.414 2.000
char_vec <- c("blue", "red", "green")
print(char_vec)
## [1] "blue" "red" "green"
These are the basic building blocks for all R objects
complex & rawWhat defining properties might a vector have?
There are four…
What defining properties might a vector have?
length()typeof()
class()attributes()Let's try them on our vectors
typeof(char_vec) length(int_vec) attributes(logi_vec) class(dbl_vec) typeof(dbl_vec)
Were you surprised by the results?
We can combine two vectors in R, using the function c()
c(1, 2)
## [1] 1 2
The numbers 1 & 2 were both vectors with length() = 1
We have combined two vectors of length 1, to make a vector of length 2
What would happen if we combined two vectors of different types?
Let's try & see what happens:
new_vec <- c(logi_vec, int_vec) print(new_vec) typeof(new_vec)
Q: What happened to the logical values?
What would happen if we combined two vectors of different types?
Let's try & see what happens:
new_vec <- c(logi_vec, int_vec) print(new_vec) typeof(new_vec)
Q: What happened to the logical values?
Answer: R will coerce them into a common type (i.e. integers).
Try using the functions:as.integer(), as.logical(), as.double() & as.character()
What about character vectors?
simp_vec <- c(742, "Evergreen", "Terrace") as.numeric(simp_vec)
## [1] 742 NA NA
The elements of a vector can be called using []
y <- c("A", "B", "C", "D", "E")
y[2]
## [1] "B"
y[c(1, 3)]
## [1] "A" "C"
Double brackets ([[]]) can be used to return single elements only
y[[2]]
## [1] "B"
If you tried y[[c(1,3)]] you would receive an error message
If a vector has name attributes, we can call values by name
euro[1:5]
## ATS BEF DEM ESP FIM ## 13.76030 40.33990 1.95583 166.38600 5.94573
euro[c("ATS", "BEF")]
## ATS BEF ## 13.7603 40.3399
Try repeating the call-by-name approach using double brackets
euro["ATS"] euro[["ATS"]]
What was the difference in the output?
Try repeating the call-by-name approach using double brackets
euro["ATS"] euro[["ATS"]]
What was the difference in the output?
[] returned the vector with the identical structure[[]] removed the attributes & just gave the valueIs it better to call by position, or by name?
Things to consider:
R Functions are designed to work on vectors
dbl_vec - 1 dbl_vec > 1 dbl_vec^2 mean(dbl_vec) sd(dbl_vec) sqrt(int_vec)
This is one of the real strengths of R
We can combine the above logical test and subsetting
dbl_vec
## [1] 0.618 1.414 2.000
dbl_vec > 1
## [1] FALSE TRUE TRUE
dbl_vec[dbl_vec > 1]
## [1] 1.414 2.000
An additional logical test: %in%
(read as: "is in")
dbl_vec %in% int_vec
## [1] FALSE FALSE TRUE
Returns TRUE/FALSE for each value in dbl_vec if it is in int_vec
NB: int_vec was coerced silently to a double vector
length attribute.matrix is the two dimensional equivalentint_mat <- matrix(1:6, ncol=2) print(int_mat)
## [,1] [,2] ## [1,] 1 4 ## [2,] 2 5 ## [3,] 3 6
dim(), nrow() ncol()rownames() & colnames()Some commands to try:
dim(int_mat) nrow(int_mat) typeof(int_mat) class(int_mat) attributes(int_mat) colnames(int_mat) length(int_mat)
Ask questions if anything is confusing
x[row, col]row or col blank selects the entire row/columnint_mat[2, 2] int_mat[1,]
How would we just get the first column?
NB: Forgetting the comma when subsetting will treat the matrix as a single vector spread down the columns
int_mat
## [,1] [,2] ## [1,] 1 4 ## [2,] 2 5 ## [3,] 3 6
int_mat[5]
## [1] 5
Arrays extend matrices to 3 or more dimensions
Beyond the scope of today, but we just have more commas in the square brackets, e.g.
dim(iris3)
## [1] 50 4 3
dimnames(iris3)
Vectors, Matrices & Arrays are the basic homogeneous data types of R
Summary of main data types in R
| Dimension | Homogeneous | Heterogeneous |
|---|---|---|
| 1d | vector |
list |
| 2d | matrix |
data.frame |
| 3d+ | array |
A list is a heterogeneous vector.
R object typevector, or matrixlistR object type we haven't seen yetMany R functions provide output as a list
testResults <- t.test(dbl_vec) typeof(testResults) testResults
NB: There is a function (print.htest()) that tells R how to print the results to the Console
Explore the various attributes of the object testResults
attributes(testResults) length(testResults) names(testResults) typeof(testResults)
We can call the individual components of a list using the $ symbol followed by the name
testResults$statistic testResults$conf.int testResults$method
Note that each component is quite different to the others.
A list is a vector so we can also subset using the [] method
testResults[1] typeof(testResults[1])
Using single square brackets returns a list with the structure intact
Double brackets again retrieve a single element of the vector
R objecttestResults[[1]] typeof(testResults[[1]])
When would we use either method?
testResults to expand the entrystr(testResults)Finally!
dim(), nrow(), ncol(), rownames(), colnames()colnames() & rownames() are NOT optional & are assigned by defaultHere's an example data.frame from the package datasets
Use ?ToothGrowth to find out what's in the object
head(ToothGrowth)
## len supp dose ## 1 4.2 VC 0.5 ## 2 11.5 VC 0.5 ## 3 7.3 VC 0.5 ## 4 5.8 VC 0.5 ## 5 6.4 VC 0.5 ## 6 10.0 VC 0.5
Try these commands
colnames(ToothGrowth) dim(ToothGrowth) nrow(ToothGrowth)
Individual entries can also be extracted using the square brackets, as for matrices
ToothGrowth[1:2, 1]
## [1] 4.2 11.5
We can also refer to columns by name (same as matrices)
ToothGrowth[1:2, "len"]
## [1] 4.2 11.5
The concept of columns being distinct vectors is quite important & useful
data.frame using the $ operatorToothGrowth$len[1:2]
This does NOT work for rows!!!
R is column major by default (as is FORTRAN & Matlab)R was designed for statistical analysis, but has developed capabilities far beyond thisWe will see this advantage this afternoon
Data frames are actually special cases of lists
data.frame is a component of a listlistForgetting the comma, now gives a completely different result to a matrix!
ToothGrowth[1]
Was that what you expected?
Try using the double bracket method
R ObjectsHow do we assign names?
named_vec <- c(a = 1, b = 2, c = 3)
OR we can name an existing vector
names(int_vec) <- c("a", "b", "c", "d", "e")
Can we remove names?
The NULL, or empty, vector in R is created using c()
null_vec <- c() length(null_vec)
We can use this to remove names
names(named_vec) <- c()
Don't forget to put the names back…
We can convert vectors to matrices, as earlier
int_mat <- matrix(1:6, ncol=2)
R is column major so fills columns by default
row_mat <- matrix(1:6, ncol=2, byrow =TRUE)
We can assign row names & column names after creation
colnames(row_mat) <- c("odds", "evens")
Or using dimnames()
dimnames(row_mat)
This a list of length 2 with rownames then colnames as the components.
rec_mat <- matrix(int_vec, ncol = 2)
What has happened here?
This is a major criticism made of R
my_list <- list(int_vec, dbl_vec)
names(my_list) <- c("integers", "doubles")
OR
my_list <- list(integers = int_vec, doubles = dbl_vec)
What happens if we try this?
my_list$logical <- logi_vec
We can coerce vectors to lists as well
int_list <- as.list(named_vec)
This is exactly the same as creating lists, but
The names attribute will also be the colnames()
my_df <- data.frame(doubles = dbl_vec, logical = logi_vec) names(my_df) == colnames(my_df)
## [1] TRUE TRUE
What happens if we try to add components that aren't the same length?
my_df <- data.frame(integers = int_vec,
doubles = dbl_vec, logical = logi_vec)