15 April 2019

Data Visualisation

Using the Package ggplot2

The package ggplot2

R has numerous plotting functions in the base package graphics

?plot
?boxplot
?hist

Go to the Examples at the bottom of each help page and copy a few lines

The package ggplot2

  • ggplot2 gives much more flexibility and power
    • Is part of the core tidyverse
  • Has unique syntax and approach
  • We add layers of plotting information like geometry, colours, themes etc

The package ggplot2: aesthetics

The main function is ggplot()

  • In this first stage we set the plotting aesthetics using aes()
  • Defines what is plotted on which axis, what defines the colour/shape etc.
ggplot(transport, aes(x = weight, y = height))

No data will be plotted. We get the plot area only…

The package ggplot2: geometry

  • After defining the plot aesthetics, we:
    • Tell R that "more is to come" by adding a + symbol at the end of the line
    • Add the geometry using various geom_...() functions
ggplot(transport, aes(x = weight, y = height)) +
  geom_point()

The package ggplot2: geometry

The package ggplot2: aesthetics

There are numerous aesthetics available for geom_point()

?geom_point
ggplot(transport, aes(x = weight, y = height, colour = method)) +
  geom_point()
ggplot(transport, aes(x = weight, y = height, 
                      colour = method, shape = gender)) +
  geom_point()

The package ggplot2: aesthetics

We can put the general aesthetics in ggplot(), with the geom_point() specific ones in that line

ggplot(transport, aes(x = weight, y = height)) +
  geom_point(aes(colour = method, shape = gender))
  • aesthetics set in ggplot() are passed to all geoms

The package ggplot2: aesthetics

Aesthetics set outside of aes() are general across all points

ggplot(transport, aes(x = weight, y = height)) +
  geom_point(aes(colour = method, shape = gender), size = 4)

The package ggplot2: adding multiple geoms

ggplot(transport, aes(x = weight, y = height)) +
  geom_point(aes(colour = method, shape = gender)) +
  geom_smooth()

This defaults to a loess fit

ggplot(transport, aes(x = weight, y = height)) +
  geom_point(aes(colour = method, shape = gender)) +
  geom_smooth(method = "lm", formula = y~x, se = FALSE)

The package ggplot2: labels

Point labels can be added using geom_text()

ggplot(transport, aes(x = weight, y = height)) +
  geom_point(aes(colour = method, shape = gender)) +
  geom_smooth(method = "lm", formula = y~x, se = FALSE) +
  geom_text(aes(label= name)) +
  labs(x = "Weight (kg)", y = "Height (cm)", 
       shape = "Gender", colour = "Transport")

The package ggplot2: labels

They tend to be clumsy so

library(ggrepel)
ggplot(transport, aes(x = weight, y = height)) +
  geom_point(aes(colour = method, shape = gender)) +
  geom_smooth(method = "lm", formula = y~x, se = FALSE) +
  geom_text_repel(aes(label= name)) +
  labs(x = "Weight (kg)", y = "Height (cm)", 
       shape = "Gender", colour = "Transport")

The package ggplot2: labels

Axis and legend labels can be added using labs()

ggplot(transport, aes(x = weight, y = height)) +
  geom_point(aes(colour = method, shape = gender)) +
  geom_smooth(method = "lm", formula = y~x, se = FALSE) +
  geom_text_repel(aes(label= name)) +
  labs(x = "Weight (kg)", y = "Height (cm)", 
       shape = "Gender", colour = "Transport")

The package ggplot2: facets

(This is my favourite feature)

ggplot(transport, aes(x = weight, y = height)) +
  geom_point(aes(colour = method, shape = gender)) +
  geom_smooth(method = "lm", formula = y~x, se = FALSE) +
  geom_text_repel(aes(label= name)) +
  labs(x = "Weight (kg)", y = "Height (cm)", 
       shape = "Gender", colour = "Transport") +
  facet_wrap(~gender) 

The package ggplot2: Different geoms

Enter geom_ in the Console followed by the tab key

ggplot(transport, aes(x = height, fill = gender)) +
  geom_density(alpha = 0.5)
ggplot(transport, aes(x = gender, y =height, fill = gender)) +
  geom_boxplot()

The package ggplot2: geom_bar()

We can summarise our data before plotting

transport %>%
  filter(!is.na(height)) %>%
  group_by(method, gender) %>%
  summarise(mn_height = mean(height), sd_height = sd(height)) %>%
  ggplot(aes(x = method, y = mn_height, fill = method)) +
  geom_bar(stat = "identity") +
  facet_wrap(~gender) +
  guides(fill = FALSE)

NB: geom_bar() requires stat = "identity"

The package ggplot2: geom_errorbar()

transport %>%
  filter(!is.na(height)) %>%
  group_by(method, gender) %>%
  summarise(mn_height = mean(height), sd_height = sd(height)) %>%
  ggplot(aes(x = method, y = mn_height, fill = method)) +
  geom_bar(stat = "identity") +
  geom_errorbar(aes(ymin = mn_height - sd_height,
                    ymax = mn_height + sd_height),
                width = 0.6)+
  facet_wrap(~gender) +
  guides(fill =FALSE)

Making pie charts

These are not intuitive so here's how:

transport %>%
  filter(!is.na(height)) %>%
  group_by(method) %>%
  summarise(n = n()) %>%
  ggplot(aes(x = 1, y = n, fill = method)) +
  geom_bar(stat = "identity", colour = "black") +
  coord_polar("y") +
  theme_void()

The package ggplot2: facets

How could we get histograms for both weight and height using facets?

  • The geom to use is geom_histogram()

The package ggplot2: facets

How could we get histograms for both weight and height using facets?

transport %>%
  gather(key = "measurement", value = "value",
         ends_with("ght")) %>%
  ggplot(aes(x = value, fill = measurement)) +
  geom_histogram(bins = 10, colour = "black") +
  facet_wrap(~measurement, scales = "free_x") +
  guides(fill = FALSE)

The package ggplot2: facets

transport %>%
  gather(key = "measurement", value = "value",
         ends_with("ght")) %>%
  ggplot(aes(x = gender, y = value, fill = gender)) +
  geom_boxplot() +
  facet_wrap(~measurement, scales = "free_y")