To return to the previous page click here or use the back button on your browser.
An object type called a data.frame
is probably the most common structure in R. These are very similar to a spreadsheet in Excel, and have columns with information in them. Each column must be the same length!
mouseLife <- data.frame(lifeSpans, Groups)
mouseLife
## lifeSpans Groups
## 1 2.1 treated
## 2 1.9 control
## 3 1.7 control
## 4 1.8 control
## 5 2.3 treated
## 6 2.5 treated
Note how R automatically named the columns. We could also specify these manually.
mouseLife <- data.frame(lifeSpans, Groups,
sex = c("M", "M", "M", "F", "F", "F"))
mouseLife
## lifeSpans Groups sex
## 1 2.1 treated M
## 2 1.9 control M
## 3 1.7 control M
## 4 1.8 control F
## 5 2.3 treated F
## 6 2.5 treated F
We can easily subset a data.frame
based on values in a column.
subset(mouseLife, sex =="M")
## lifeSpans Groups sex
## 1 2.1 treated M
## 2 1.9 control M
## 3 1.7 control M
We can also call a column individually, using the $
symbol followed by the column name.
mouseLife$Groups
## [1] treated control control control treated treated
## Levels: control treated
With our data.frame
we can conduct some \(t\)-tests.
First we we could test the null hypothesis:
\[ H_0: \mu = 2 \]
with alternative hypothesis
\[ H_A: \mu \neq 2 \]
where \(\mu\) is the true mean lifespan
Here we’ll need to just call the column of mouseLife
called lifeSpans
, which we can perform using the $
symbol
t.test(mouseLife$lifeSpans, mu = 2)
Note that we have now given the function t.test
an extra parameter called mu
If we didn’t specify mu = 2
, the function defaults to mu = 0
.
control
Vs treated
To perform this test, we can use the R formula syntax, where the symbol ~
can be interpreted as: - depends on, or - as a function of
t.test(lifeSpans~Groups, mouseLife)
To call the help page on any function, we just preface it with a ?
?t.test
Sometimes R help pages can be tricky to understand.
In our code above we have used the second and third versions of the function described on this page. R has automatically detected which version to call by our placement of either a vector in the first position (mouseLife$lifeSpans
), or by the placement of a formula (lifeSpans~Groups
).
mu
by name.mouseLife
as our data object by placing it after the formula.We can also perform linear regression very simply using the function lm()
.
Here we’ll create a new R object with the results from the linear regression where lifeSpan
is dependent on the Group
(as with the \(t\)-test).
mouseLife_lm <- lm(lifeSpans~Groups, mouseLife)
Now we have the linear model saved, we can find all the important information by passing this object to the functions anova()
and summary()
anova(mouseLife_lm)
summary(mouseLife_lm)
The interpretation of these is beyond the scope of today’s session.