1.4: Using R Markdown

15 April 2019

R Markdown

Writing Reports Using `rmarkdown`

rmarkdown is a cohesive way to
- Load & tidy data
- Analyse data, including figures & tables
- Publish everything in a complete report/analysis
Everything is one document, with our analysis code embedded alongside our results
The package knitr is the engine behind this

Writing Reports Using `rmarkdown`

We can output our analysis directly as:

HTML
MS Word Documents
PDF Documents (If you have $\LaTeX$ installed)
Slidy or ioslides presentations

We never need to use MS Word, Excel or Powerpoint again!

Writing Reports Using `rmarkdown`

The file suffix is .Rmd
Include normal text alongside embedded R code.
Create all of our figures & tables directly from the data
Data, experimental and analytic descriptions
Mathematical/Statistical equations
Nicely Formatted Results
Any other information

Creating an R Markdown document

Let's create our first rmarkdown document

Go to the File drop-down menu in RStudio
New File -> R Markdown…

Creating an R Markdown document

Change the Title to: My First Report
Change the Author to your own name
Leave everything else as it is & hit OK
Save the file as RMarkdownTutorial.Rmd

Looking at the file automatically created

A header section is contained between the --- lines at the top

Nothing can be placed before this!
Uses YAML (YAML Ain't Markup Language)
Editing is beyond the scope of this course
Can set custom .css files, load LaTeX packages, set parameters etc.

Looking at the file automatically created

Lines 8 to 10 are a code chunk

Chunks always begin with ```{r}
Chunks always end with ```
Executed R code goes between these two delineators
Chunk names are optional and directly follow the r
Other parameters are set here, e.g. do we show/hide the code

Looking at the file automatically created

Line 12 is a Subsection Heading, starting with ##

Click the staggered text symbol in the top-right of the Script Window to open the document outline

Chunk names are shown in italics
Section Names in plain text

Getting Help

Check the help for a guide to the syntax.

Help > Markdown Quick Reference

Increasing numbers of # gives Section -> Subsection -> Subsubsection etc.
Bold is set by **Knit** (or __Knit__)
Italics can be set using a single asterisk/underline: *Italics* or _Italics_
Typewriter font is set using a single backtick `Typewriter`

Compiling The Report

The default format is an html_document & we can change this later. Generate the default document by clicking Knit

Compiling The Report

The Viewer Pane will appear with the compiled report (probably)

Note the hyperlink to the RMarkdown website & the bold typeface for the word Knit
The R code and the results are printed for summary(cars)
The plot of temperature Vs. pressure has been embedded
The code for the plot was hidden using echo = FALSE

Compiling The Report

We could also export this as an MS Word document by clicking the small 'down' arrow next to the word Knit.
By default, this will be Read-Only, but can be helpful for sharing with collaborators.
Saving as a .PDF may require an installation of $\LaTeX$, so we'll ignore that for now.

Making our own report

Now we can modify the code to create our own analysis.

Delete everything in your R Markdown file EXCEPT the header
We'll analyse the PlantGrowth dataset which comes with R
First we'll need to describe the data

?PlantGrowth

Rename the report

First we should change the title of the report to something suitable, e.g. The Effects of Two Herbicide Treatments on Plant Growth

Create a "Data Description" Section

Now let's add a section header for our analysis to start the report

Type # Data Description after the header and after leaving a blank line
Use your own words to describe the data

Create a ``Data Description" Section

My example text:

Plants were treated with two different herbicides and the effects on growth were compared using the dried weight of the plants after one month. Both treatments were compared to a control group of plants which were not treated with any herbicide. Each group contained 10 plants, giving a total of 30 plants.

Create a ``Data Description" Section

Hopefully you mentioned that there were 10 plants in each group, with a total of 30.

Can we get that information from the data itself?

Create a ``Data Description" Section

We know that the code nrow(PlantGrowth) would give the total number of samples. We can embed this in our data description!

Instead of the number 30 in your description, enter `r nrow(PlantGrowth)`
Compile the HTML document.

Loading R packages

Before the Data Description header, add a new header called Setup
Create a code chunk with the contents library(tidyverse).
Recompile the HTML

Hint: You can create an empty code chunk using Ctrl+Alt+I

This has loaded the tidyverse packages for the whole document. All subsequent code chunks can use any functions in the package.

Loading R packages

Notice that this gave us an overly informative message. We can turn this off:

After the r at the start of the code chunk, add a comma
Start typing the word message and use the auto-complete feature to set message = FALSE
Recompile

Writing the Report

After our description, we could also have a look at the data in a summary. Add the following in a code chunk.

PlantGrowth %>% 
  group_by(group) %>% 
  summarise(n = n(),
            Mean = mean(weight))

(Recompile…)

Formatting Tables

To change this table into a nicely formatted one:

Load pander into the workspace
Use the function pander

In the Setup section, on the line after loading tidyverse enter:

library(pander)

Formatting Tables

Then head back to the code chunk and add the following.

PlantGrowth %>% 
  group_by(group) %>% 
  summarise(n = n(),
            Mean = mean(weight)) %>%
  pander(caption = "Sample Sizes and average weights for each group")

(Recompile…)

Using `pander`

The package pander is great for formatting R output.

Add the following line to your data description:

"The three groups are classified as `r pander(levels(PlantGrowth$group))`"

(We'll explain this bit of code tomorrow)

Add a plot of the data

Create a plot using geom_boxplot()
Fill the boxes based on the group variable

Analyse the data

Here we can fit a simple linear regression using:

weight as the response variable
group as the predictor variable

To fit a linear model in R:

Use the function lm()
Save the results as a new object

Analyse the data

model_fit <- lm(weight ~ group, data = PlantGrowth)

We can view the summary() or anova() for a given model using

anova(model_fit)

## Analysis of Variance Table
## 
## Response: weight
##           Df  Sum Sq Mean Sq F value  Pr(>F)  
## group      2  3.7663  1.8832  4.8461 0.01591 *
## Residuals 27 10.4921  0.3886                  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

summary(model_fit)

## 
## Call:
## lm(formula = weight ~ group, data = PlantGrowth)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.0710 -0.4180 -0.0060  0.2627  1.3690 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   5.0320     0.1971  25.527   <2e-16 ***
## grouptrt1    -0.3710     0.2788  -1.331   0.1944    
## grouptrt2     0.4940     0.2788   1.772   0.0877 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.6234 on 27 degrees of freedom
## Multiple R-squared:  0.2641, Adjusted R-squared:  0.2096 
## F-statistic: 4.846 on 2 and 27 DF,  p-value: 0.01591

Analyse the data

To place these as formatted tables in the text we can use pander()

model_fit %>% anova() %>% pander()

model_fit %>% summary() %>% pander()

You can change the default captions if you like

Add Some Diagnostic Plots

plot(model_fit, which = 1)
plot(model_fit, which = 2)

In the chunk header add:
echo=FALSE, fig.show='hold', fig.width = 6, fig.cap = "Diagnostic plots for model fit"

(Try using tab auto-complete to speed that up)

Finishing the analysis

After you're happy with the way your analysis looks

A good habit is to finish with a section called Session Info
Add a code chunk which calls the R command sessionInfo()

So far we've been compiling everything as HTML, but let's switch to an MS Word document. We could email this to our supervisors, or upload to Google docs for collaborators…

Summary

This basic process is incredibly useful

We never need to cut & paste anything between R and other documents
Every piece of information comes directly from our R analysis
We can very easily incorporate new data as it arrives
Creates reproducible research
Highly compatible with collaborative analysis & version control (Git)

R Markdown

Writing Reports Using rmarkdown

Writing Reports Using rmarkdown

Writing Reports Using rmarkdown

Creating an R Markdown document

Creating an R Markdown document

Creating an R Markdown document

Looking at the file automatically created

Looking at the file automatically created

Looking at the file automatically created

Getting Help

Compiling The Report

Compiling The Report

Compiling The Report

Making our own report

Rename the report

Create a "Data Description" Section

Create a ``Data Description" Section

Create a ``Data Description" Section

Create a ``Data Description" Section

Loading R packages

Loading R packages

Writing the Report

Formatting Tables

Formatting Tables

Using pander

Add a plot of the data

Analyse the data

Analyse the data

Analyse the data

Add Some Diagnostic Plots

Finishing the analysis

Summary

Writing Reports Using `rmarkdown`

Writing Reports Using `rmarkdown`

Writing Reports Using `rmarkdown`

Using `pander`