rmarkdown
rmarkdown
is a cohesive way to
knitr
is the engine behind thisAll practical sessions were written this way
We can output our analysis directly as:
ioslides
presentationsWe never need to use MS Word, Excel or Powerpoint again!
The file suffix is .Rmd
, and these files allow us to include normal text alongside embedded R
code.. An rmarkdown
file can:
All output will be generated as a well-formatted, complete file, which can be simply re-generated at any time.
Let’s create our first rmarkdown
document
File
drop-down menu in RStudioRMarkdownTutorial.Rmd
There are quite a few features of note here. Starting at the top, a header section is contained between the two sets of ---
lines on line 1 and line 6
Editing the YAML header is beyond the scope of this course, however using this we can set custom .css
files, load LaTeX packages and control numerous output parameters
Immediately following the header (lines 8 to 10) is a code chunk
r
(before any commas). These are highly advisable & can make it far easier to navigate through your document.Line 12 is a Section Heading, starting with ##
Tools
> Global Options
> RMarkdown
that needs to be set.)Check the help for a guide to the syntax.
Help > Markdown Quick Reference
The default format is an html_document
& we can change this later. Generate the default document by clicking Knit HTML
A preview window will appear with the compiled report
summary(cars)
temperature
Vs. pressure
has been embeddedecho = FALSE
We could also export this as an MS Word document by clicking the small ‘down’ arrow next to the word Knit
. By default, this will be Read-Only, but can be helpful for sharing with collaborators.
Saving as a .PDF
usually requires an installation of \(\LaTeX\), so we’ll ignore that for now.
Now we can modify the code to create our own analysis.
PlantGrowth
dataset which comes with R
?PlantGrowth
First we should change the title of the report to something suitable, e.g. The Effects of Two Herbicide Treatments on Plant Growth
Now let’s add a section header for our analysis to start the report
# Data Description
after the header and after leaving a blank lineMy example text was
Plants were treated with two different herbicides and the effects on growth were compared using the dried weight of the plants after one month. Both treatments were compared to a control group of plants which were not treated with any herbicide. Each group contained 10 plants, giving a total of 30 plants.
Hopefully you mentioned that there were 10 plants in each group, with a total of 30.
Can we get that information from the data itself?
We know that the code nrow(PlantGrowth)
would give the total number of samples. We can embed this in our data description!
nrow(PlantGrowth)
`Required Packages
library(tidyverse)
.Hint: You can create an empty code chunk using Ctrl+Alt+I
This has loaded all of the core tidyverse
packages for all code chunks within the whole document. All subsequent code chunks can now use any functions in these packages.
Notice that loading the tidyverse
packages gave us an overly informative message. We can turn this off by editing the chunk header.
r
at the start of the code chunk, add a commamessage
and use the auto-complete feature to set message = FALSE
After our description, we could also have a look at the data in a summary. Add the following in a code chunk.
PlantGrowth %>%
group_by(group) %>%
summarise(n = n(), Mean = mean(weight))
(Recompile…)
To change this table into a nicely formatted one, we can use the package pander
pander
into the workspace by placing library(pander)
below library(tidyverse)
Then head back to the code chunk and add the following.
PlantGrowth %>%
group_by(group) %>%
summarise(n = n(), Mean = mean(weight)) %>%
pander(caption = "Sample Sizes and average weights for each group")
(Recompile…)
The package pander
is great for formatting R
output as well, including vectors. Add the following line to your data description:
“The three groups are classified as `
r
pander(levels(PlantGrowth$group))
`”
R
to be a categorical variable. R
stores these as a data-type called a factor, and the different values it can take are known as levels
. Here we have three different categories (or treatments) which represent the control and two treatments.
We can use ggplot2
for this
library(tidyverse)
geom_boxplot()
group
variableggplot(PlantGrowth, aes(x = group, y = weight, fill = group)) +
geom_boxplot() +
theme_bw() +
labs(x = "Treatment Group", y = "Dried Weight (g)")
Here we can fit a simple linear regression using:
weight
as the response variablegroup
as the predictor variableTo fit a linear model in R
:
lm()
In the following line of code, we are saying the the weight
measurements are dependent on the group
each plant belongs to (weight ~ group
). In R
we can usually read the ~
symbol as depends on (or is a function of).
model_fit <- lm(weight ~ group, data = PlantGrowth)
We can view the summary()
or anova()
for a given model using this functions on the saved model.
summary(model_fit)
anova(model_fit)
To place these as formatted tables in the text we can also use pander()
, which has a default format setting for these types of objects.
model_fit %>% summary() %>% pander()
model_fit %>% anova() %>% pander()
You can change the default captions if you like using the argument caption = "my caption"
inside the pnder()
function.
After you’re happy with the way your analysis looks
Session Info
R
command sessionInfo()
sessionInfo()
inside the pander()
function to provide yet another nicely formatted section in our report.So far we’ve been compiling everything as HTML, but let’s switch to an MS Word document. We could email this to our supervisors, or upload to Google docs for collaborators…
This basic process is incredibly useful
R
and other documentsR
analysis