Skip to content

Data visualization

This page covers data visualization using different programming languages.

Related pages: Data descriptionDiagramming

Resources

Documentation

Cheatsheets

Style guides

Templates

Style

theme_minimal() +
theme(
  panel.grid.major.y = element_line(color = "gray90", linewidth = 0.3),
  panel.grid.minor.y = element_blank(),
  panel.grid.major.x = element_blank(),
  panel.grid.minor.x = element_blank(),
  axis.title.x = element_text(margin = margin(t = 8)),
  axis.title.y = element_text(margin = margin(r = 8)),
  axis.title = element_text(size = 10, color = "black"),
  axis.text = element_text(size = 9, color = "black"),
  axis.line = element_line(color = "black", linewidth = 0.3),
  axis.ticks.x = element_line(color = "black", linewidth = 0.3),
  axis.ticks.y = element_line(color = "black", linewidth = 0.3),
  legend.position = "top",
  legend.title = element_blank(),
  legend.text = element_text(size = 9, color = "black"),
  legend.key.height = unit(10, "pt"),
  legend.key.width = unit(10, "pt"),
  legend.margin = margin(b = 0) # or -10
)

Graphs made from one dataset

ggplot(data = DATA) +

  labs(
    x = "[X_AXIS_LABEL]",
    y = "[Y_AXIS_LABEL]",
    # title = "[PLOT_TITLE]",
  ) +
  theme_minimal()

Languages and packages

ggplot2

R
ggplot(data = DATA) +
    GEOM_FUNCTION() +
    COORDINATE_FUNCTION() +
    FACET_FUNCTION() +
    SCALE_FUNCTION() +
    ANNOTATION_FUNCTION() +
    THEME_FUNCTION()

Summary

§1. Types of graphs

Bar plots

geom_col(
  aes(x = VAR_X, y = VAR_Y), 
  fill = "grey30"
) +

TBD

TBD

Line plots

TBD

TBD

TBD

Plots with intervals

Graphs with intervals are great for visualizing confidence intervals, among other things.

A rule of thumb: error cap \(\approx\) \(20--25\%\) of bar width.

TBD

TBD

Stakced area plots

Scatter plots

TBD

TBD

TBD

Binned scatter plots

TBD

TBD

TBD

Coefficient plots

TBD

TBD

TBD

§2. Scales

Percentages

scale_y_continuous(
  labels = function(x) paste0(x, "%"), 
  limits = c(0, 100), 
  breaks = seq(0, 100, 25),
  expand = c(0, 0)
) +
scale_y_continuous(
  labels = percent_format(accuracy = 1), 
  limits = c(0, 1),
  breaks = seq(0, 1, 0.25),
  expand = c(0, 0)
) +

Log scale

TBD

§3. Plot elements

Vertical/horizontal lines

geom_hline(yintercept = VALUE, linetype = "dashed", color = "black") +

TBD

TBD

Grid

Put inside `theme()`
panel.grid.major.x = element_blank(),
panel.grid.minor.x = element_blank(),

Axis title

Put inside `theme()`
axis.title.x = element_text(margin = margin(t = 10)),
axis.title.y = element_text(margin = margin(r = 10)),

Legend

Put inside `theme()`
legend.position = "bottom",
legend.position = "none",

§4. Data Aesthetics

Color (color / fill)

In the terminology of ggplot2, color refers to both the color and fill aesthetics.

scale_fill_manual(
  values = c("Democrat" = "#2E5EAA", "Republican" = "#C93135"),
  name = "Participant\nParty"
) +

Transparency

TBD

Linetype

TBD

Linewidth

TBD

Shape

TBD

Size

TBD

§5. Manipulating plots

Faceting

facet_wrap(~ VAR)

Combining plots

In R, we can use the patchwork package to combine several plots into a single figure.

TBD

Exporting plots

For reproducibility, we can specify as many ambiguously determined parameters as possible.

ggsave("xxx.png", p, width = 6, height = 3.5, unit = "in", bg = "white")
ggsave("xxx.pdf", p, width = 6, height = 3.5, unit = "in", bg = "white")

Figure size recommendations:

Full page width:

  • 6.5 \(\times\) 3.5 in

Middle ground:

  • 6 \(\times\) 3.5 in

  • 6 \(\times\) 3.8 in

Compact:

  • 4.8 \(\times\) 3.2 in