20 July 2016
rmarkdownrmarkdown is a cohesive way to
knitr is the engine behind thisAll sessions for RAdelaide were written this way
rmarkdownWe can output our analysis directly as
ioslides presentationsWe never need to use MS Word, Excel or Powerpoint again!
rmarkdown.Rmd files allow us to include normal text alongside embedded R code.
Let's create our first rmarkdown document
File drop-down menu in RStudioRMarkdownTutorial.Rmd--- lines
.css files, load LaTeX packages etc.chunk
rHelp > Markdown Quick Reference
html_document & we can change this later.Knit HTMLA preview window will appear with the compiled report
summary(cars)temperature Vs. pressure has been embeddedecho = FALSEWe could also export this as an MS Word document
By default, this will be Read-Only
Saving as a .PDF may require an installation of LaTeX.
Now we can modify the code to create our own analysis.
PlantGrowth dataset which comes with R?PlantGrowth
First we should change the title of the report to something suitable, e.g. The Effects of Two Herbicide Treatments on Plant Growth
Now let's add a section header for our analysis to start the report
# Data Description after the header and after leaving a blank linePlants were treated with two different herbicides and the effects on growth were compared using the dried weight of the plants after one month. Both treatments were compared to a control group of plants which were not treated with any herbicide. Each group contained 10 plants, giving a total of 30 plants.
Hopefully you mentioned that there were 10 plants in each group, with a total of 30.
Can we get that information from the data itself?
The code nrow(PlantGrowth) would give the total number of samples.
We can embed this in our data description!
r nrow(PlantGrowth)`Two possible approaches using dplyr
filter(PlantGrowth, group == "ctrl") %>% nrow()
OR
nrow(filter(PlantGrowth, group == "ctrl"))
Required Packageslibrary(dplyr).Hint: You can create an empty code chunk using Ctrl+Alt+I
This has loaded the package dplyr for the whole document. All subsequent code chunks can use any functions in the package
Notice that loading dplyr gave us an overly informative message.
We can turn this off:
r at the start of the code chunk, add a commamessage and use the auto-complete feature to set message = FALSEAfter our description, we could also have a look at the data in a summary
PlantGrowth %>%
group_by(group) %>%
summarise(n = n(),
Mean = mean(weight))
(Recompile…)
To change this table into a nicely formatted one:
pander into the workspacepanderThe line after loading dplyr enter:
library(pander)
PlantGrowth %>%
group_by(group) %>%
summarise(n = n(),
Mean = mean(weight)) %>%
pander(caption = "Sample Sizes and average weights for each group")
| group | n | Mean |
|---|---|---|
| ctrl | 10 | 5.032 |
| trt1 | 10 | 4.661 |
| trt2 | 10 | 5.526 |
panderThe package pander is great for formatting R output.
Add the following line to your data description:
"The three groups are classified as `
rpander(levels(PlantGrowth$group))`"
(We'll understand this code better after tomorrow…)
We can use ggplot2 for this
geom_boxplot()group variableHere we can fit a simple linear regression using:
weight as the response variablegroup as the predictor variableWe can describe the model in words or mathematically
Data will be fit using the model
\[ y_{ij} = \mu + \alpha_i + \epsilon_{ij} \]
The text creating this is:
y_{ij} = \mu + \alpha_i + \epsilon_{ij}
Add double dollar signs ($$) on the lines immediately before and after the equation
\[ y_{ij} = \mu + \alpha_i + \epsilon_{ij} \]
To fit a linear model in R:
lm()model_fit <- lm(weight ~ group, data = PlantGrowth)
We can view the summary() or anova() for a given model using
summary(model_fit) anova(model_fit)
To place these as tables in the text: pander()
summary(model_fit) %>% pander()
anova(model_fit) %>% pander()
You can change the default captions if you like
How could we make a barchart with error bars from this data?
PlantGrowth %>%
group_by(group) %>%
summarise(mean = mean(weight), sd = sd(weight)) %>%
ggplot(aes(x = group, y = mean, fill = group)) +
geom_bar(stat = "identity") +
geom_errorbar(aes(ymin = mean - sd, ymax = mean + sd),
width = 0.6) +
theme_bw() +
labs(x = "Treatment Group", y = "Mean Weight (g)") +
guides(fill = FALSE) +
ggtitle("Mean Growth For All Treatment Groups.")
After you're happy with the way your analysis looks
Session InfosessionInfo()So far we've been compiling everything as HTML, but let's switch to an MS Word document
We could email this to our supervisors, or upload to Google docs for collaborators…
This basic process is incredibly useful