Themes + axes + annotation

Lecture 7

Dr. Mine Çetinkaya-Rundel

Duke University
STA 313 - Spring 2024

Warm up

Announcements

  • HW 2 is due today at 5 pm

  • Lab tomorrow: Rubenstein Library visit for Duke’s collection of historical data visualizations

  • Thursday: Du Bois visualizations + reading

Setup

# load packages
library(countdown)
library(tidyverse)
library(janitor)
library(colorspace)
library(fs)
library(palmerpenguins)
library(ThemePark)
library(ggthemes)
library(duke)
library(tidykids)
library(openintro)
library(glue)

# set theme for ggplot2
ggplot2::theme_set(ggplot2::theme_minimal(base_size = 18))

# set figure parameters for knitr
knitr::opts_chunk$set(
  fig.width = 7, # 7" width
  fig.asp = 0.618, # the golden ratio
  fig.retina = 3, # dpi multiplier for displaying HTML output on retina
  fig.align = "center", # center align figures
  dpi = 300 # higher dpi, sharper image
)

Themes

Complete themes

p <- ggplot(penguins, aes(x = flipper_length_mm, y = body_mass_g)) +
  geom_point()

p + theme_gray() + labs(title = "Gray")
p + theme_void() + labs(title = "Void")
p + theme_dark() + labs(title = "Dark")

Themes from ggthemes

p + theme_fivethirtyeight() + labs(title = "FiveThirtyEight")
p + theme_economist() + labs(title = "Economist")
p + theme_wsj() + labs(title = "Wall Street Journal")

Themes and color scales from ggthemes

p + 
  aes(color = species) +
  scale_color_wsj() +
  theme_wsj() + 
  labs(title = "Wall Street Journal")

Themes from ThemePark

p + 
  geom_point(color = barbie_theme_colors["medium"]) +
  theme_barbie()
p + 
  geom_point(color = gameofthrones_theme_colors["medium"]) +
  theme_gameofthrones(gameofthrones_font = TRUE)

Duke theme!

p + 
  aes(color = species) +
  theme_duke() +
  scale_duke_color_discrete()
Warning: Removed 2 rows containing missing values (`geom_point()`).

Modifying theme elements

p + 
  labs(title = "Palmer Penguins") +
  theme(
    plot.title = element_text(color = "red", face = "bold"),
    plot.background = element_rect(color = "red", fill = "mistyrose")
  )

Axes

Axis breaks

How can the following figure be improved with custom breaks in axes, if at all?

`geom_smooth()` using method = 'loess' and formula = 'y ~ x'

kids_plot <- tidykids |>
  mutate(year = as.numeric(year)) |>
  filter(
    state %in% c("North Carolina", "California", "Florida"),
    variable == "pubhealth"
  ) |>
  ggplot(aes(x = year, y = inf_adj_perchild, color = state, linetype = state)) +
  geom_smooth(se = FALSE) +
  scale_color_colorblind() +
  theme(legend.position = c(0.15, 0.8))

kids_plot
`geom_smooth()` using method = 'loess' and formula = 'y ~ x'

Context matters

kids_plot +
  scale_x_continuous(breaks = seq(from = 1996, to = 2016, by = 2))

Conciseness matters

kids_plot +
  scale_x_continuous(breaks = seq(from = 1996, to = 2016, by = 4))

Precision matters

kids_plot +
  scale_x_continuous(breaks = seq(2000, 2022, 4)) +
  labs(x = "Election year")

Annotation

Why annotate?

Video recap

03:00

geom_text()

Can be useful when individual observations are identifiable, but can also get overwhelming…

ggplot(state_stats, aes(x = homeownership, y = pop2010)) + 
  geom_point()

ggplot(state_stats, aes(x = homeownership, y = pop2010)) + 
  geom_text(aes(label = abbr))

geom_text()

03:00

Now go ahead and improve it!

Open ae-06 and implement the improvements you came up with in Part 1.

Revisit Durham AQI

Recreate the following visualization, in Part 2 of ae-06.

All of the data doesn’t tell a story

Highlighting in ggplot2

We have (at least) two options:

  1. Native ggplot2 – use layers

  2. gghighlight: https://yutannihilation.github.io/gghighlight/articles/gghighlight.html

Data: SF AQI

sf_files <- fs::dir_ls(here::here("data/san-francisco"))
sf <- read_csv(sf_files, na = c(".", ""))

sf <- sf |>
  janitor::clean_names() |>
  mutate(date = mdy(date)) |>
  arrange(date) |>
  select(date, aqi_value)

sf
# A tibble: 2,557 × 2
   date       aqi_value
   <date>         <dbl>
 1 2016-01-01        32
 2 2016-01-02        37
 3 2016-01-03        45
 4 2016-01-04        33
 5 2016-01-05        27
 6 2016-01-06        39
 7 2016-01-07        39
 8 2016-01-08        31
 9 2016-01-09        20
10 2016-01-10        20
# ℹ 2,547 more rows

Data prep

sf <- sf |>
  mutate(
    year = year(date),
    day_of_year = yday(date)
  )
# check
sf |>
  filter(day_of_year < 3)
# A tibble: 14 × 4
   date       aqi_value  year day_of_year
   <date>         <dbl> <dbl>       <dbl>
 1 2016-01-01        32  2016           1
 2 2016-01-02        37  2016           2
 3 2017-01-01        55  2017           1
 4 2017-01-02        36  2017           2
 5 2018-01-01        87  2018           1
 6 2018-01-02        95  2018           2
 7 2019-01-01        33  2019           1
 8 2019-01-02        50  2019           2
 9 2020-01-01        53  2020           1
10 2020-01-02        43  2020           2
11 2021-01-01        79  2021           1
12 2021-01-02        57  2021           2
13 2022-01-01        53  2022           1
14 2022-01-02        55  2022           2

Plot AQI over years

ggplot(sf, aes(x = day_of_year, y = aqi_value, group = year)) +
  geom_line()

Plot AQI over years

ggplot(sf, aes(x = day_of_year, y = aqi_value, group = year, color = year)) +
  geom_line()

Plot AQI over years

ggplot(sf, aes(x = day_of_year, y = aqi_value, group = year, color = factor(year))) +
  geom_line()

Highlight 2016

ggplot(sf, aes(x = day_of_year, y = aqi_value, group = year)) +
  geom_line(color = "gray") +
  geom_line(data = sf |> filter(year == 2016), color = "red") +
  labs(
    title = "AQI levels in SF in 2016",
    subtitle = "Versus all years 2016 - 2022",
    x = "Day of year", y = "AQI value"
  )

Highlight 2017

ggplot(sf, aes(x = day_of_year, y = aqi_value, group = year)) +
  geom_line(color = "gray") +
  geom_line(data = sf |> filter(year == 2017), color = "red") +
  labs(
    title = "AQI levels in SF in 2017",
    subtitle = "Versus all years 2016 - 2022",
    x = "Day of year", y = "AQI value"
  )

Highlight 2018

ggplot(sf, aes(x = day_of_year, y = aqi_value, group = year)) +
  geom_line(color = "gray") +
  geom_line(data = sf |> filter(year == 2018), color = "red") +
  labs(
    title = "AQI levels in SF in 2018",
    subtitle = "Versus all years 2016 - 2022",
    x = "Day of year", y = "AQI value"
  )

Highlight any year

year_to_highlight <- 2018

ggplot(sf, aes(x = day_of_year, y = aqi_value, group = year)) +
  geom_line(color = "gray") +
  geom_line(data = sf |> filter(year == year_to_highlight), color = "red") +
  labs(
    title = glue("AQI levels in SF in {year_to_highlight}"),
    subtitle = "Versus all years 2016 - 2022",
    x = "Day of year", y = "AQI value"
  )

Highlight with gghighlight

Highlight years where maximum observed AQI value exceeds 200 using gghighlight.