Lecture 10
Duke University
STA 313 - Spring 2024
Sequential plots: Motivation, then resolution
A single plot: Resolution, and hidden in it motivation
Project note: you’re asked to create two plots per question. One possible approach: Start with a plot showing the raw data, and show derived quantities (e.g. percent increases, averages, coefficients of fitted models) in the subsequent plot.
When you’re trying to show too much data at once you may end up not showing anything.
Never assume your audience can rapidly process complex visual displays
Don’t add variables to your plot that are tangential to your story
Don’t jump straight to a highly complex figure; first show an easily digestible subset (e.g., show one facet first)
Aim for memorable, but clear
Project note: Make sure to leave time to iterate on your plots after you practice your presentation. If certain plots are getting too wordy to explain, take time to simplify them!
Be consistent but don’t be repetitive.
Use consistent features throughout plots (e.g., same color represents same level on all plots)
Aim to use a different type of visualization for each distinct analysis
project-1
https://vizdata-s24.github.io/project-1-YOUR_TEAM_NAME/
p_hist <- ggplot(mtcars, aes(x = mpg)) +
geom_histogram(binwidth = 2)
p_box <- ggplot(mtcars, aes(x = factor(vs), y = mpg)) +
geom_boxplot()
p_scatter <- ggplot(mtcars, aes(x = disp, y = mpg)) +
geom_point()
p_text <- mtcars |>
rownames_to_column() |>
ggplot(aes(x = disp, y = mpg)) +
geom_text_repel(aes(label = rowname)) +
coord_cartesian(clip = "off")
The plot will fill the empty space in the slide.
If there is more text on the slide
The plot will shrink
To make room for the text
fig-width
For a zoomed-in look
fig-width
For a zoomed-out look
fig-width
affects text sizeFirst, ask yourself, must you include multiple plots on a slide? For example, is your narrative about comparing results from two plots?
If no, then don’t! Move the second plot to to the next slide!
If yes,
Insert columns using the Insert anything tool
Use layout-ncol
chunk option
Use the patchwork package
Possibly, use pivoting to reshape your data and then use facets
Insert > Slide Columns
Quarto will automatically resize your plots to fit side-by-side.
layout-ncol
Learn more at https://patchwork.data-imaginist.com.
Look into the source code at https://github.com/vizdata-s24/vizdata-s24/tree/main/slides.
# A tibble: 500 × 6
title author date abstract column url
<chr> <chr> <date> <chr> <chr> <chr>
1 All the world’s a stage Anna … 2024-02-22 If we a… STUDE… http…
2 Words that matter: For Alexei Navalny Carol… 2024-02-22 In some… STUDE… http…
3 Which would you save: Friend or roma… Jess … 2024-02-22 Love sh… STUDE… http…
4 Happiness is not what you’re looking… Paul … 2024-02-21 We hing… STUDE… http…
5 Closing Duke's Herbarium: A fear of … Matth… 2024-02-21 Without… LETTE… http…
6 CS Majors launch 'ambiguous and labe… Monda… 2024-02-20 Unlike … STUDE… http…
7 The fear of being single Heidi… 2024-02-20 But it … STUDE… http…
8 Save the Duke Herbarium Henry… 2024-02-17 The Duk… LETTE… http…
9 What Duke can learn from retiring ex… Rober… 2024-02-17 In Duke… GUEST… http…
10 Love, love Gabri… 2024-02-16 Somehow… STUDE… http…
# ℹ 490 more rows
chronicle_to_plot <- chronicle |>
tidytext::unnest_tokens(word, abstract) |>
anti_join(tidytext::stop_words) |>
left_join(tidytext::get_sentiments("afinn")) |>
group_by(author, title) |>
summarize(total_sentiment = sum(value, na.rm = TRUE), .groups = "drop") |>
group_by(author) |>
summarize(
n_articles = n(),
avg_sentiment = mean(total_sentiment, na.rm = TRUE),
) |>
filter(n_articles > 1 & !is.na(author)) |>
arrange(desc(avg_sentiment)) |>
slice(c(1:10, 49:58)) |>
mutate(
author = fct_reorder(author, avg_sentiment),
neg_pos = if_else(avg_sentiment < 0, "neg", "pos"),
label_position = if_else(neg_pos == "neg", 0.25, -0.25)
)
Joining with `by = join_by(word)`
Joining with `by = join_by(word)`
# A tibble: 20 × 5
author n_articles avg_sentiment neg_pos label_position
<fct> <int> <dbl> <chr> <dbl>
1 Alex Berkman 2 5.5 pos -0.25
2 Amy Unell 2 4 pos -0.25
3 Gabrielle Mollin 2 2.5 pos -0.25
4 Miranda Straubel 2 2.5 pos -0.25
5 Anna Sorensen 4 2.25 pos -0.25
6 Monday Monday 17 1.53 pos -0.25
7 Duke Climate Coalition 2 1.5 pos -0.25
8 Susan Chemmanoor 6 1.5 pos -0.25
9 Jess Jiang 5 1.4 pos -0.25
10 Angikar Ghosal 9 1.33 pos -0.25
11 Viktoria Wulff-Andersen 7 -1 neg 0.25
12 Pilar Kelly 9 -1.22 neg 0.25
13 Billy Cao 5 -1.4 neg 0.25
14 Valerie Tan 11 -1.45 neg 0.25
15 Dan Reznichenko 3 -1.67 neg 0.25
16 Matthew Arakaky 3 -1.67 neg 0.25
17 Sydney Brown 2 -2 neg 0.25
18 Spencer Chang 3 -2.33 neg 0.25
19 Ayesham Khan 2 -4 neg 0.25
20 Carol Apollonio 3 -4 neg 0.25
chronicle_to_plot |>
ggplot(aes(y = author, x = avg_sentiment)) +
geom_col(aes(fill = neg_pos), show.legend = FALSE) +
geom_text(
aes(x = label_position, label = author, color = neg_pos),
hjust = c(rep(1,10), rep(0, 10)),
show.legend = FALSE,
fontface = "bold"
) +
scale_fill_manual(values = c("neg" = "#4d4009", "pos" = "#FF4B91")) +
scale_color_manual(values = c("neg" = "#4d4009", "pos" = "#FF4B91"))
chronicle_to_plot |>
ggplot(aes(y = author, x = avg_sentiment)) +
geom_col(aes(fill = neg_pos), show.legend = FALSE) +
geom_text(
aes(x = label_position, label = author, color = neg_pos),
hjust = c(rep(1,10), rep(0, 10)),
show.legend = FALSE,
fontface = "bold"
) +
geom_text(
aes(label = round(avg_sentiment, 1)),
hjust = c(rep(1.25,10), rep(-0.25, 10)),
color = "white",
fontface = "bold"
) +
scale_fill_manual(values = c("neg" = "#4d4009", "pos" = "#FF4B91")) +
scale_color_manual(values = c("neg" = "#4d4009", "pos" = "#FF4B91"))
chronicle_to_plot |>
ggplot(aes(y = author, x = avg_sentiment)) +
geom_col(aes(fill = neg_pos), show.legend = FALSE) +
geom_text(
aes(x = label_position, label = author, color = neg_pos),
hjust = c(rep(1,10), rep(0, 10)),
show.legend = FALSE,
fontface = "bold"
) +
geom_text(
aes(label = round(avg_sentiment, 1)),
hjust = c(rep(1.25,10), rep(-0.25, 10)),
color = "white",
fontface = "bold"
) +
scale_fill_manual(values = c("neg" = "#4d4009", "pos" = "#FF4B91")) +
scale_color_manual(values = c("neg" = "#4d4009", "pos" = "#FF4B91")) +
scale_x_continuous(breaks = -5:5, minor_breaks = NULL) +
scale_y_discrete(breaks = NULL) +
coord_cartesian(xlim = c(-5, 5))
chronicle_to_plot |>
ggplot(aes(y = author, x = avg_sentiment)) +
geom_col(aes(fill = neg_pos), show.legend = FALSE) +
geom_text(
aes(x = label_position, label = author, color = neg_pos),
hjust = c(rep(1,10), rep(0, 10)),
show.legend = FALSE,
fontface = "bold"
) +
geom_text(
aes(label = round(avg_sentiment, 1)),
hjust = c(rep(1.25,10), rep(-0.25, 10)),
color = "white",
fontface = "bold"
) +
scale_fill_manual(values = c("neg" = "#4d4009", "pos" = "#FF4B91")) +
scale_color_manual(values = c("neg" = "#4d4009", "pos" = "#FF4B91")) +
scale_x_continuous(breaks = -5:5, minor_breaks = NULL) +
scale_y_discrete(breaks = NULL) +
coord_cartesian(xlim = c(-5, 5)) +
labs(
x = "negative ← Average sentiment score (AFINN) → positive",
y = NULL,
title = "The Chronicle - Opinion pieces\nAverage sentiment scores of abstracts by author",
subtitle = "Top 10 average positive and negative scores",
caption = "Source: Data scraped from The Chronicle on Feb 21, 2024"
)
chronicle_to_plot |>
ggplot(aes(y = author, x = avg_sentiment)) +
geom_col(aes(fill = neg_pos), show.legend = FALSE) +
geom_text(
aes(x = label_position, label = author, color = neg_pos),
hjust = c(rep(1,10), rep(0, 10)),
show.legend = FALSE,
fontface = "bold"
) +
geom_text(
aes(label = round(avg_sentiment, 1)),
hjust = c(rep(1.25,10), rep(-0.25, 10)),
color = "white",
fontface = "bold"
) +
scale_fill_manual(values = c("neg" = "#4d4009", "pos" = "#FF4B91")) +
scale_color_manual(values = c("neg" = "#4d4009", "pos" = "#FF4B91")) +
scale_x_continuous(breaks = -5:5, minor_breaks = NULL) +
scale_y_discrete(breaks = NULL) +
coord_cartesian(xlim = c(-5, 5)) +
labs(
x = "negative ← Average sentiment score (AFINN) → positive",
y = NULL,
title = "The Chronicle - Opinion pieces\nAverage sentiment scores of abstracts by author",
subtitle = "Top 10 average positive and negative scores",
caption = "Source: Data scraped from The Chronicle on Feb 21, 2024"
) +
theme_void(base_size = 16) +
theme(
plot.title = element_text(hjust = 0.5),
plot.subtitle = element_text(hjust = 0.5, margin = unit(c(0.5, 0, 1, 0), "lines")),
axis.text.y = element_blank(),
plot.caption = element_text(color = "gray30")
)
```{r}
#| output-location: slide
#| code-line-numbers: "|4-6"
#| fig-width: 8
#| fig-asp: 0.75
#| fig-align: center
chronicle_to_plot |>
ggplot(aes(y = author, x = avg_sentiment)) +
geom_col(aes(fill = neg_pos), show.legend = FALSE) +
geom_text(
aes(x = label_position, label = author, color = neg_pos),
hjust = c(rep(1,10), rep(0, 10)),
show.legend = FALSE,
fontface = "bold"
) +
geom_text(
aes(label = round(avg_sentiment, 1)),
hjust = c(rep(1.25,10), rep(-0.25, 10)),
color = "white",
fontface = "bold"
) +
scale_fill_manual(values = c("neg" = "#4d4009", "pos" = "#FF4B91")) +
scale_color_manual(values = c("neg" = "#4d4009", "pos" = "#FF4B91")) +
scale_x_continuous(breaks = -5:5, minor_breaks = NULL) +
scale_y_discrete(breaks = NULL) +
coord_cartesian(xlim = c(-5, 5)) +
labs(
x = "negative ← Average sentiment score (AFINN) → positive",
y = NULL,
title = "The Chronicle - Opinion pieces\nAverage sentiment scores of abstracts by author",
subtitle = "Top 10 average positive and negative scores",
caption = "Source: Data scraped from The Chronicle on Feb 21, 2024"
) +
theme_void(base_size = 16) +
theme(
plot.title = element_text(hjust = 0.5),
plot.subtitle = element_text(hjust = 0.5, margin = unit(c(0.5, 0, 1, 0), "lines")),
axis.text.y = element_blank(),
plot.caption = element_text(color = "gray30")
)
```