15 April 2019
rmarkdown
rmarkdown
is a cohesive way to
knitr
is the engine behind thisrmarkdown
We can output our analysis directly as:
ioslides
presentationsWe never need to use MS Word, Excel or Powerpoint again!
rmarkdown
.Rmd
R
code.Let's create our first rmarkdown
document
File
drop-down menu in RStudioRMarkdownTutorial.Rmd
A header section is contained between the ---
lines at the top
.css
files, load LaTeX packages, set parameters etc.Lines 8 to 10 are a code chunk
R
code goes between these two delineatorsr
Line 12 is a Subsection Heading, starting with ##
Check the help for a guide to the syntax.
Help > Markdown Quick Reference
#
gives Section ->
Subsection ->
Subsubsection etc.Typewriter font
is set using a single backtick `Typewriter`The default format is an html_document
& we can change this later. Generate the default document by clicking Knit
The Viewer Pane will appear with the compiled report (probably)
summary(cars)
temperature
Vs. pressure
has been embeddedecho = FALSE
.PDF
may require an installation of \(\LaTeX\), so we'll ignore that for now.Now we can modify the code to create our own analysis.
PlantGrowth
dataset which comes with R
?PlantGrowth
First we should change the title of the report to something suitable, e.g. The Effects of Two Herbicide Treatments on Plant Growth
Now let's add a section header for our analysis to start the report
# Data Description
after the header and after leaving a blank lineMy example text:
Plants were treated with two different herbicides and the effects on growth were compared using the dried weight of the plants after one month. Both treatments were compared to a control group of plants which were not treated with any herbicide. Each group contained 10 plants, giving a total of 30 plants.
Hopefully you mentioned that there were 10 plants in each group, with a total of 30.
Can we get that information from the data itself?
We know that the code nrow(PlantGrowth)
would give the total number of samples. We can embed this in our data description!
r
nrow(PlantGrowth)
`Setup
library(tidyverse)
.Hint: You can create an empty code chunk using Ctrl+Alt+I
This has loaded the tidyverse
packages for the whole document. All subsequent code chunks can use any functions in the package.
Notice that this gave us an overly informative message. We can turn this off:
r
at the start of the code chunk, add a commamessage
and use the auto-complete feature to set message = FALSE
After our description, we could also have a look at the data in a summary. Add the following in a code chunk.
PlantGrowth %>% group_by(group) %>% summarise(n = n(), Mean = mean(weight))
(Recompile…)
To change this table into a nicely formatted one:
pander
into the workspacepander
In the Setup
section, on the line after loading tidyverse
enter:
library(pander)
Then head back to the code chunk and add the following.
PlantGrowth %>% group_by(group) %>% summarise(n = n(), Mean = mean(weight)) %>% pander(caption = "Sample Sizes and average weights for each group")
(Recompile…)
pander
The package pander
is great for formatting R
output.
Add the following line to your data description:
"The three groups are classified as `
r
pander(levels(PlantGrowth$group))
`"
(We'll explain this bit of code tomorrow)
geom_boxplot()
group
variableHere we can fit a simple linear regression using:
weight
as the response variablegroup
as the predictor variableTo fit a linear model in R
:
lm()
model_fit <- lm(weight ~ group, data = PlantGrowth)
We can view the summary()
or anova()
for a given model using
anova(model_fit)
## Analysis of Variance Table ## ## Response: weight ## Df Sum Sq Mean Sq F value Pr(>F) ## group 2 3.7663 1.8832 4.8461 0.01591 * ## Residuals 27 10.4921 0.3886 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary(model_fit)
## ## Call: ## lm(formula = weight ~ group, data = PlantGrowth) ## ## Residuals: ## Min 1Q Median 3Q Max ## -1.0710 -0.4180 -0.0060 0.2627 1.3690 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 5.0320 0.1971 25.527 <2e-16 *** ## grouptrt1 -0.3710 0.2788 -1.331 0.1944 ## grouptrt2 0.4940 0.2788 1.772 0.0877 . ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 0.6234 on 27 degrees of freedom ## Multiple R-squared: 0.2641, Adjusted R-squared: 0.2096 ## F-statistic: 4.846 on 2 and 27 DF, p-value: 0.01591
To place these as formatted tables in the text we can use pander()
model_fit %>% anova() %>% pander()
model_fit %>% summary() %>% pander()
You can change the default captions if you like
plot(model_fit, which = 1) plot(model_fit, which = 2)
In the chunk header add:echo=FALSE, fig.show='hold', fig.width = 6, fig.cap = "Diagnostic plots for model fit"
(Try using tab auto-complete to speed that up)
After you're happy with the way your analysis looks
Session Info
sessionInfo()
So far we've been compiling everything as HTML, but let's switch to an MS Word document. We could email this to our supervisors, or upload to Google docs for collaborators…
This basic process is incredibly useful