15 April 2019
rmarkdownrmarkdown is a cohesive way to
knitr is the engine behind thisrmarkdownWe can output our analysis directly as:
ioslides presentationsWe never need to use MS Word, Excel or Powerpoint again!
rmarkdown.RmdR code.Let's create our first rmarkdown document
File drop-down menu in RStudioRMarkdownTutorial.RmdA header section is contained between the --- lines at the top
.css files, load LaTeX packages, set parameters etc.Lines 8 to 10 are a code chunk
R code goes between these two delineatorsrLine 12 is a Subsection Heading, starting with ##
Check the help for a guide to the syntax.
Help > Markdown Quick Reference
# gives Section -> Subsection -> Subsubsection etc.Typewriter font is set using a single backtick `Typewriter`The default format is an html_document & we can change this later. Generate the default document by clicking Knit
The Viewer Pane will appear with the compiled report (probably)
summary(cars)temperature Vs. pressure has been embeddedecho = FALSE.PDF may require an installation of \(\LaTeX\), so we'll ignore that for now.Now we can modify the code to create our own analysis.
PlantGrowth dataset which comes with R?PlantGrowth
First we should change the title of the report to something suitable, e.g. The Effects of Two Herbicide Treatments on Plant Growth
Now let's add a section header for our analysis to start the report
# Data Description after the header and after leaving a blank lineMy example text:
Plants were treated with two different herbicides and the effects on growth were compared using the dried weight of the plants after one month. Both treatments were compared to a control group of plants which were not treated with any herbicide. Each group contained 10 plants, giving a total of 30 plants.
Hopefully you mentioned that there were 10 plants in each group, with a total of 30.
Can we get that information from the data itself?
We know that the code nrow(PlantGrowth) would give the total number of samples. We can embed this in our data description!
r nrow(PlantGrowth)`Setuplibrary(tidyverse).Hint: You can create an empty code chunk using Ctrl+Alt+I
This has loaded the tidyverse packages for the whole document. All subsequent code chunks can use any functions in the package.
Notice that this gave us an overly informative message. We can turn this off:
r at the start of the code chunk, add a commamessage and use the auto-complete feature to set message = FALSEAfter our description, we could also have a look at the data in a summary. Add the following in a code chunk.
PlantGrowth %>%
group_by(group) %>%
summarise(n = n(),
Mean = mean(weight))
(Recompile…)
To change this table into a nicely formatted one:
pander into the workspacepanderIn the Setup section, on the line after loading tidyverse enter:
library(pander)
Then head back to the code chunk and add the following.
PlantGrowth %>%
group_by(group) %>%
summarise(n = n(),
Mean = mean(weight)) %>%
pander(caption = "Sample Sizes and average weights for each group")
(Recompile…)
panderThe package pander is great for formatting R output.
Add the following line to your data description:
"The three groups are classified as `
rpander(levels(PlantGrowth$group))`"
(We'll explain this bit of code tomorrow)
geom_boxplot()group variableHere we can fit a simple linear regression using:
weight as the response variablegroup as the predictor variableTo fit a linear model in R:
lm()model_fit <- lm(weight ~ group, data = PlantGrowth)
We can view the summary() or anova() for a given model using
anova(model_fit)
## Analysis of Variance Table ## ## Response: weight ## Df Sum Sq Mean Sq F value Pr(>F) ## group 2 3.7663 1.8832 4.8461 0.01591 * ## Residuals 27 10.4921 0.3886 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary(model_fit)
## ## Call: ## lm(formula = weight ~ group, data = PlantGrowth) ## ## Residuals: ## Min 1Q Median 3Q Max ## -1.0710 -0.4180 -0.0060 0.2627 1.3690 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 5.0320 0.1971 25.527 <2e-16 *** ## grouptrt1 -0.3710 0.2788 -1.331 0.1944 ## grouptrt2 0.4940 0.2788 1.772 0.0877 . ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 0.6234 on 27 degrees of freedom ## Multiple R-squared: 0.2641, Adjusted R-squared: 0.2096 ## F-statistic: 4.846 on 2 and 27 DF, p-value: 0.01591
To place these as formatted tables in the text we can use pander()
model_fit %>% anova() %>% pander()
model_fit %>% summary() %>% pander()
You can change the default captions if you like
plot(model_fit, which = 1) plot(model_fit, which = 2)
In the chunk header add:echo=FALSE, fig.show='hold', fig.width = 6, fig.cap = "Diagnostic plots for model fit"
(Try using tab auto-complete to speed that up)
After you're happy with the way your analysis looks
Session InfosessionInfo()So far we've been compiling everything as HTML, but let's switch to an MS Word document. We could email this to our supervisors, or upload to Google docs for collaborators…
This basic process is incredibly useful