- R User for >10 years
- Co-ordinator, Bioinformatics Hub
- Level 4, Santos Petroleum Engineering Building
Also helping today:
- Alastair Ludington
- Jimmy Breen (morning) & Hien To (afternoon)
27 November 2016
Also helping today:
Introduction to R and RStudio
Loading Data Into R
What have we really done?
The Genomics Era
Excel is notorious for converting values from one thing to another inappropriately.
Gene names are often converted to dates (e.g. SEPT9)
Genotypes can be converted into numeric values (e.g. the homozygote "1/1")
In R we generally work with plain text files.
With power comes great responsibility - Uncle Ben
With the extra capability R offers, we need to understand a little about:
We'll get to that later…
First, we'll just explore the R Console
1 + 2
## [1] 3
+, -, *, /, ^2^3
## [1] 8
1 + 2 - 3 * 4 / 5
## [1] 0.6
log or \(\sqrt{~}\)sqrt(2)
## [1] 1.414214
log(10)
## [1] 2.302585
log2(0.5) \(\equiv\)`log(0.5, base = 2)log2(0.5)
## [1] -1
log10(0.001)
## [1] -3
sin(), cos(), tan()abs() for the absolute valueabs(-1)
## [1] 1
inverse() function.R we can save objects and give them a name<-, then the valuex <- 5
<-) acts like an arrow placing the value in the object xYes we could have written:
x = 5
<-R object.= sign have?We could also have written
5 -> x
But no-one ever does…
x contains, we just type it's name:x
## [1] 5
print() commandprint(x)
## [1] 5
sqrt(x) x^2 x + 1
R we can combine many numbers together into a vectorc() (for combine)Rx <- c(1, 2, 4) x
## [1] 1 2 4
min(x) mean(x) sd(x) range(x)
vector with one commandx + 1 sqrt(x) log2(x)
R!R, everything is a vectorlengthlength(x)
R is considered a vector of length 1RCan only hold the values TRUE or FALSE
logi_vec <- c(TRUE, TRUE, FALSE) print(logi_vec)
Useful for counts, ranks or indexing positions (e.g. column 3; nucleotide 254731)
int_vec <- 1:5 print(int_vec)
Often (& lazily) referred to as numeric
dbl_vec <- c(0.618, 1.414, 2) print(dbl_vec)
char_vec <- c("blue", "red", "green")
print(char_vec)
## [1] "blue" "red" "green"
These are the basic building blocks for all R objects
complex & rawTo find what type of vector we have
typeof(char_vec)
For example, if there is one number amongst some character values, it will also be coerced to a character
simp_vec <- c(742, "Evergreen", "Terrace") simp_vec
Notice how the 742 now has quotation marks
typeof(simp_vec)
as.numeric(simp_vec)
You will see this error many times…
Vectors can be coerced up the hierarchy with no information loss
logical \(\rightarrow\) integer \(\rightarrow\) numeric \(\rightarrow\) characteras.integer(logi_vec) as.character(logi_vec)
Information will be lost going the other way
as.integer(dbl_vec) as.logical(c(2, 1, 0))
names(x) <- c("a", "b", "c")
x
character vector[]x[2] x[c(1, 3)]
[[]]) can be used to return single elements onlyx[[2]]
x[[c(1,3)]] you would receive an error messageIf a vector has names attributes, we can call values by name
x[c("a", "c")]
x[["b"]]
[] returned the vector with the identical structure[[]] removed the attributes & just gave the valueIs it better to call by position, or by name?
Things to consider:
We can combine the above logical test and subsetting
dbl_vec dbl_vec > 1 dbl_vec[dbl_vec > 1]
An additional logical test: %in%
(read as: "is in")
dbl_vec %in% int_vec
Returns TRUE/FALSE for each value in dbl_vec if it is in or is not in int_vec
NB: int_vec was coerced silently to a double vector