Visualizing Uncertainty

Lecture 15

Dr. Greg Chism

University of Arizona
INFO 526 - Fall 2024

Announcements

  • Project 1 feedback is posted

  • Scores for all assignments so far will be on D2L soon

  • Project 2 groups have been announced, repos should be assigned

  • Take a note of the deadlines for the rest of the semester

Setup

# load packages
if(!require(pacman))
  install.packages("pacman")

pacman::p_load(countdown,
               tidymodels,
               tidyverse,
               colorspace,
               cowplot,
               waffle,
               distributional,
               emmeans,
               gapminder,
               ggdist,
               margins,
               ggtext,
               ggpubr)

# for github packages (make sure to use #| output: false)
pacman::p_load_gh("wilkelab/ungeviz")

# set theme for ggplot2
ggplot2::theme_set(ggplot2::theme_minimal(base_size = 14, base_family = "Myriad Pro"))

# set width of code output
options(width = 65)

# set figure parameters for knitr
knitr::opts_chunk$set(
  fig.width = 7, # 7" width
  fig.asp = 0.618, # the golden ratio
  fig.retina = 3, # dpi multiplier for displaying HTML output on retina
  fig.align = "center", # center align figures
  dpi = 300 # higher dpi, sharper image
)

Uncertainty

Playing

Image by Wikimedia user Jahobr, released into the public domain.

https://commons.wikimedia.org/wiki/File:Disappearing_dots.gif

[Sorry, you lost.] 🙂

[How does that make you feel?]

We are bad at judging uncertainty

  • You had a 10% chance of losing
  • One in ten playing this game will lost
  • 90% chance of winning is nowhere near a certain win

Uncertainty in probability

Probability distributions

Whats the probability that the blue party wins the election?

Probability distributions

Uncertainty of point estimates

Uncertainty of point estimates

Uncertainty of point estimates

Frequentist interpretation of a confidence interval

Everest

everest <- read_csv("data/everest.csv")
everest
# A tibble: 21,813 × 21
   expedition_id member_id   peak_id peak_name  year season sex  
   <chr>         <chr>       <chr>   <chr>     <dbl> <chr>  <chr>
 1 EVER63101     EVER63101-… EVER    Everest    1963 Spring M    
 2 EVER63101     EVER63101-… EVER    Everest    1963 Spring M    
 3 EVER63101     EVER63101-… EVER    Everest    1963 Spring M    
 4 EVER63101     EVER63101-… EVER    Everest    1963 Spring M    
 5 EVER63101     EVER63101-… EVER    Everest    1963 Spring M    
 6 EVER63101     EVER63101-… EVER    Everest    1963 Spring M    
 7 EVER63101     EVER63101-… EVER    Everest    1963 Spring M    
 8 EVER63101     EVER63101-… EVER    Everest    1963 Spring M    
 9 EVER63101     EVER63101-… EVER    Everest    1963 Spring M    
10 EVER63101     EVER63101-… EVER    Everest    1963 Spring M    
# ℹ 21,803 more rows
# ℹ 14 more variables: age <dbl>, citizenship <chr>,
#   expedition_role <chr>, hired <lgl>, highpoint_metres <dbl>,
#   success <lgl>, solo <lgl>, oxygen_used <lgl>, died <lgl>,
#   death_cause <chr>, death_height_metres <dbl>, injured <lgl>,
#   injury_type <chr>, injury_height_metres <dbl>

Highest point reached on Everest in 2019

Includes only climbers and expedition members who did not summit

Marginal effects: Height reached on Everest

Average height reached relative to:
a male climber who climbed with oxygen, summited, and survived

Marginal effects: Height reached on Everest

Other visualization options: half-eye

Marginal effects: Height reached on Everest

Other visualization options: gradient interval

Marginal effects: Height reached on Everest

Other visualization options: quantile dotplot

Marginal effects: Height reached on Everest

Other visualization options: quantile dotplot

Marginal effects: Height reached on Everest

Other visualization options: quantile dotplot

Making a plot with error bars in R

Example: Relationship between life expectancy and GDP per capita

Making a plot with error bars in R

Example: Relationship between life expectancy and GDP per capita

Gapminder

See gapminder.org for fantastic visualizations and up-to-date data

gapminder
# A tibble: 1,704 × 6
   country     continent  year lifeExp      pop gdpPercap
   <fct>       <fct>     <int>   <dbl>    <int>     <dbl>
 1 Afghanistan Asia       1952    28.8  8425333      779.
 2 Afghanistan Asia       1957    30.3  9240934      821.
 3 Afghanistan Asia       1962    32.0 10267083      853.
 4 Afghanistan Asia       1967    34.0 11537966      836.
 5 Afghanistan Asia       1972    36.1 13079460      740.
 6 Afghanistan Asia       1977    38.4 14880372      786.
 7 Afghanistan Asia       1982    39.9 12881816      978.
 8 Afghanistan Asia       1987    40.8 13867957      852.
 9 Afghanistan Asia       1992    41.7 16317921      649.
10 Afghanistan Asia       1997    41.8 22227415      635.
# ℹ 1,694 more rows

Making a plot with error bars in R

lm_data <- gapminder |>
  nest(data = -c(continent, year))

lm_data
# A tibble: 60 × 3
   continent  year data             
   <fct>     <int> <list>           
 1 Asia       1952 <tibble [33 × 4]>
 2 Asia       1957 <tibble [33 × 4]>
 3 Asia       1962 <tibble [33 × 4]>
 4 Asia       1967 <tibble [33 × 4]>
 5 Asia       1972 <tibble [33 × 4]>
 6 Asia       1977 <tibble [33 × 4]>
 7 Asia       1982 <tibble [33 × 4]>
 8 Asia       1987 <tibble [33 × 4]>
 9 Asia       1992 <tibble [33 × 4]>
10 Asia       1997 <tibble [33 × 4]>
# ℹ 50 more rows

Making a plot with error bars in R

lm_data <- gapminder |>
  nest(data = -c(continent, year)) |>
  mutate(
    fit = map(data, ~lm(lifeExp ~ log(gdpPercap), data = .x))
  )

lm_data
# A tibble: 60 × 4
   continent  year data              fit   
   <fct>     <int> <list>            <list>
 1 Asia       1952 <tibble [33 × 4]> <lm>  
 2 Asia       1957 <tibble [33 × 4]> <lm>  
 3 Asia       1962 <tibble [33 × 4]> <lm>  
 4 Asia       1967 <tibble [33 × 4]> <lm>  
 5 Asia       1972 <tibble [33 × 4]> <lm>  
 6 Asia       1977 <tibble [33 × 4]> <lm>  
 7 Asia       1982 <tibble [33 × 4]> <lm>  
 8 Asia       1987 <tibble [33 × 4]> <lm>  
 9 Asia       1992 <tibble [33 × 4]> <lm>  
10 Asia       1997 <tibble [33 × 4]> <lm>  
# ℹ 50 more rows

Making a plot with error bars in R

lm_data <- gapminder |>
  nest(data = -c(continent, year)) |>
  mutate(
    fit = map(data, ~lm(lifeExp ~ log(gdpPercap), data = .x)),
    tidy_out = map(fit, tidy)
  )

lm_data
# A tibble: 60 × 5
   continent  year data              fit    tidy_out        
   <fct>     <int> <list>            <list> <list>          
 1 Asia       1952 <tibble [33 × 4]> <lm>   <tibble [2 × 5]>
 2 Asia       1957 <tibble [33 × 4]> <lm>   <tibble [2 × 5]>
 3 Asia       1962 <tibble [33 × 4]> <lm>   <tibble [2 × 5]>
 4 Asia       1967 <tibble [33 × 4]> <lm>   <tibble [2 × 5]>
 5 Asia       1972 <tibble [33 × 4]> <lm>   <tibble [2 × 5]>
 6 Asia       1977 <tibble [33 × 4]> <lm>   <tibble [2 × 5]>
 7 Asia       1982 <tibble [33 × 4]> <lm>   <tibble [2 × 5]>
 8 Asia       1987 <tibble [33 × 4]> <lm>   <tibble [2 × 5]>
 9 Asia       1992 <tibble [33 × 4]> <lm>   <tibble [2 × 5]>
10 Asia       1997 <tibble [33 × 4]> <lm>   <tibble [2 × 5]>
# ℹ 50 more rows

Making a plot with error bars in R

lm_data <- gapminder |>
  nest(data = -c(continent, year)) |>
  mutate(
    fit = map(data, ~lm(lifeExp ~ log(gdpPercap), data = .x)),
    tidy_out = map(fit, tidy)
  ) |>
  unnest(cols = tidy_out)

lm_data
# A tibble: 120 × 9
   continent  year data     fit    term        estimate std.error
   <fct>     <int> <list>   <list> <chr>          <dbl>     <dbl>
 1 Asia       1952 <tibble> <lm>   (Intercept)    15.8       9.27
 2 Asia       1952 <tibble> <lm>   log(gdpPer…     4.16      1.25
 3 Asia       1957 <tibble> <lm>   (Intercept)    18.1       9.70
 4 Asia       1957 <tibble> <lm>   log(gdpPer…     4.17      1.28
 5 Asia       1962 <tibble> <lm>   (Intercept)    16.6       9.52
 6 Asia       1962 <tibble> <lm>   log(gdpPer…     4.59      1.24
 7 Asia       1967 <tibble> <lm>   (Intercept)    19.8       9.05
 8 Asia       1967 <tibble> <lm>   log(gdpPer…     4.50      1.15
 9 Asia       1972 <tibble> <lm>   (Intercept)    21.9       8.14
10 Asia       1972 <tibble> <lm>   log(gdpPer…     4.44      1.01
# ℹ 110 more rows
# ℹ 2 more variables: statistic <dbl>, p.value <dbl>

Making a plot with error bars in R

lm_data <- gapminder |>
  nest(data = -c(continent, year)) |>
  mutate(
    fit = map(data, ~lm(lifeExp ~ log(gdpPercap), data = .x)),
    tidy_out = map(fit, tidy)
  ) |>
  unnest(cols = tidy_out) |>
  select(-fit, -data)

lm_data
# A tibble: 120 × 7
   continent  year term      estimate std.error statistic p.value
   <fct>     <int> <chr>        <dbl>     <dbl>     <dbl>   <dbl>
 1 Asia       1952 (Interce…    15.8       9.27      1.71 9.78e-2
 2 Asia       1952 log(gdpP…     4.16      1.25      3.33 2.28e-3
 3 Asia       1957 (Interce…    18.1       9.70      1.86 7.20e-2
 4 Asia       1957 log(gdpP…     4.17      1.28      3.26 2.71e-3
 5 Asia       1962 (Interce…    16.6       9.52      1.74 9.11e-2
 6 Asia       1962 log(gdpP…     4.59      1.24      3.72 7.94e-4
 7 Asia       1967 (Interce…    19.8       9.05      2.19 3.64e-2
 8 Asia       1967 log(gdpP…     4.50      1.15      3.90 4.77e-4
 9 Asia       1972 (Interce…    21.9       8.14      2.69 1.13e-2
10 Asia       1972 log(gdpP…     4.44      1.01      4.41 1.16e-4
# ℹ 110 more rows

Making a plot with error bars in R

lm_data <- gapminder |>
  nest(data = -c(continent, year)) |>
  mutate(
    fit = map(data, ~lm(lifeExp ~ log(gdpPercap), data = .x)),
    tidy_out = map(fit, tidy)
  ) |>
  unnest(cols = tidy_out) |>
  select(-fit, -data) |>
  filter(term != "(Intercept)", continent != "Oceania")

lm_data
# A tibble: 48 × 7
   continent  year term      estimate std.error statistic p.value
   <fct>     <int> <chr>        <dbl>     <dbl>     <dbl>   <dbl>
 1 Asia       1952 log(gdpP…     4.16     1.25       3.33 2.28e-3
 2 Asia       1957 log(gdpP…     4.17     1.28       3.26 2.71e-3
 3 Asia       1962 log(gdpP…     4.59     1.24       3.72 7.94e-4
 4 Asia       1967 log(gdpP…     4.50     1.15       3.90 4.77e-4
 5 Asia       1972 log(gdpP…     4.44     1.01       4.41 1.16e-4
 6 Asia       1977 log(gdpP…     4.87     1.03       4.75 4.42e-5
 7 Asia       1982 log(gdpP…     4.78     0.852      5.61 3.77e-6
 8 Asia       1987 log(gdpP…     5.17     0.727      7.12 5.31e-8
 9 Asia       1992 log(gdpP…     5.09     0.649      7.84 7.60e-9
10 Asia       1997 log(gdpP…     5.11     0.628      8.15 3.35e-9
# ℹ 38 more rows

Making a plot with error bars in R

ggplot(lm_data) +
  aes(
    x = year, y = estimate,
    ymin = estimate - 1.96*std.error,
    ymax = estimate + 1.96*std.error,
    color = continent
  ) +
  geom_pointrange(
    position = position_dodge(width = 1)
  ) +
  scale_x_continuous(
    breaks = gapminder |> distinct(year) |> pull(year)
  ) + 
  theme(legend.position = "top")

Data prep

For 1952 only:

lm_data_1952 <- lm_data |>
  filter(year == 1952) |>
  mutate(
    continent = fct_reorder(continent, estimate) 
  )

Half-eye

ggdist::stat_dist_halfeye():

lm_data_1952 |>
  ggplot(aes(x = estimate, y = continent)) +
  stat_dist_halfeye(
    aes(
      dist = dist_normal(
        mu = estimate, sigma = std.error
      )
    ),
    point_size = 4
  )

Gradient interval

ggdist::stat_dist_gradientinterval():

lm_data_1952 |>
  ggplot(aes(x = estimate, y = continent)) +
  stat_dist_gradientinterval(
    aes(
      dist = dist_normal(
        mu = estimate, sigma = std.error
      )
    ),
    point_size = 4,
    fill = "skyblue"
  )

Dots interval

ggdist::stat_dist_dotsinterval():

lm_data_1952 |>
  ggplot(aes(x = estimate, y = continent)) +
  stat_dist_dotsinterval(
    aes(
      dist = dist_normal(
        mu = estimate, sigma = std.error
      )
    ),
    point_size = 4,
    fill = "skyblue",
    quantiles = 20
  )

Dots interval

ggdist::stat_dist_dotsinterval():

lm_data_1952 |>
  ggplot(aes(x = estimate, y = continent)) +
  stat_dist_dotsinterval(
    aes(
      dist = dist_normal(
        mu = estimate, sigma = std.error
      )
    ),
    point_size = 4,
    fill = "skyblue",
    quantiles = 10
  )

Further reading and acknowledgements