Data visualization with Python |

Lecture 22

Dr. Mine Çetinkaya-Rundel

Duke University
STA 313 - Spring 2026

Warm up

Announcements

From last time: Generative art resources

Setup

  • R:
library(tidyverse)
  • Python:
import polars as pl
from plotnine import ggplot, aes, geom_point, geom_segment, labs, theme, theme_minimal

Data visualization with Python

Spot the difference

Plot A:

Plot B:

Spot the difference

Plot A:

Plot B:

Overview

  • Goal:
    • Get a taste of data visualization with Python with plotnine – a Python data visualization package based on the grammar of graphics and inspired by ggplot2.
    • Not a comprehensive introduction to Python or even to data visualization with Python.
  • Approach:
    • We will cover the basics of plotnine and how to create different types of plots.
    • We will use polars to read in data from CSV files, but we will not cover how to prepare your data for visualization in Python.

But…



there are some Python details we can’t avoid…

Package management
in Python with uv

Python packaging landscape

Python’s packaging ecosystem has historically been fragmented:

  • Multiple tools: pip, virtualenv, venv, conda, poetry, pipenv, etc.
  • Multiple config files: requirements.txt, setup.py, pyproject.toml, etc.
  • Version management often handled separately (pyenv)

uv is a modern tool that aims to unify these concerns with a fast, Rust-based implementation.

What is uv?

uv is a Python package and project manager developed by Astral.


Key features:

  • Extremely fast (10-100x faster than pip)
  • Manages Python versions
  • Creates and manages virtual environments
  • Installs packages
  • Handles project dependencies via pyproject.toml
  • Drop-in replacement for pip and virtualenv
  • Directly supported by Positron and Reticulate

Installing uv

uv is already installed on the departmental servers, for local installs:

On MacOS/Linux:

curl -LsSf https://astral.sh/uv/install.sh | sh

or with homebrew:

brew install uv

or with pip / pipx

pipx install uv
pip install uv

Verify installation

Once installed you should be able to run the following,

uv --version
uv 0.11.3 (Homebrew 2026-04-01 aarch64-apple-darwin)


As long as you have something higher than 0.9.* you should be fine.

Python versions

uv python list
cpython-3.15.0a7-macos-aarch64-none                 <download available>
cpython-3.15.0a7+freethreaded-macos-aarch64-none    <download available>
cpython-3.14.3-macos-aarch64-none                   /opt/homebrew/bin/python3.14 -> ../Cellar/python@3.14/3.14.3_1/bin/python3.14
cpython-3.14.3-macos-aarch64-none                   /opt/homebrew/bin/python3 -> ../Cellar/python@3.14/3.14.3_1/bin/python3
cpython-3.14.3-macos-aarch64-none                   /Users/mine/.local/bin/python3.14 -> /Users/mine/.local/share/uv/python/cpython-3.14-macos-aarch64-none/bin/python3.14
cpython-3.14.3-macos-aarch64-none                   /Users/mine/.local/share/uv/python/cpython-3.14-macos-aarch64-none/bin/python3.14
cpython-3.14.3+freethreaded-macos-aarch64-none      <download available>
cpython-3.13.12-macos-aarch64-none                  <download available>
cpython-3.13.12+freethreaded-macos-aarch64-none     <download available>
cpython-3.13.11-macos-aarch64-none                  /opt/homebrew/bin/python3.13 -> ../Cellar/python@3.13/3.13.11/bin/python3.13
cpython-3.12.13-macos-aarch64-none                  /Users/mine/.local/share/uv/python/cpython-3.12-macos-aarch64-none/bin/python3.12
cpython-3.12.6-macos-aarch64-none                   /opt/homebrew/bin/python3.12 -> ../Cellar/python@3.12/3.12.6/bin/python3.12
cpython-3.12.4-macos-aarch64-none                   /usr/local/bin/python3.12 -> ../../../Library/Frameworks/Python.framework/Versions/3.12/bin/python3.12
cpython-3.12.4-macos-aarch64-none                   /usr/local/bin/python3 -> ../../../Library/Frameworks/Python.framework/Versions/3.12/bin/python3
cpython-3.11.15-macos-aarch64-none                  <download available>
cpython-3.10.20-macos-aarch64-none                  <download available>
cpython-3.9.25-macos-aarch64-none                   <download available>
cpython-3.9.6-macos-aarch64-none                    /usr/bin/python3
cpython-3.8.20-macos-aarch64-none                   <download available>
pypy-3.11.15-macos-aarch64-none                     <download available>
pypy-3.10.16-macos-aarch64-none                     <download available>
pypy-3.9.19-macos-aarch64-none                      <download available>
pypy-3.8.16-macos-aarch64-none                      <download available>
graalpy-3.12.0-macos-aarch64-none                   <download available>
graalpy-3.11.0-macos-aarch64-none                   <download available>
graalpy-3.10.0-macos-aarch64-none                   <download available>
graalpy-3.8.5-macos-aarch64-none                    <download available>

Managing Python versions

uv can install and manage multiple Python versions,

uv python install 3.14
Python 3.14 is already installed
uv python pin 3.14
Pinned `.python-version` to `3.14`

The pinned version is stored in ~/.python-version and will be used automatically.

Initializing a project

Use uv init to create a new project,

mkdir my-project
cd my-project
uv init
Initialized project `my-project`
ls -la
total 32
drwxr-xr-x   8 mine  staff   256 Apr  6 00:22 .
drwx------@ 55 mine  staff  1760 Apr  6 00:22 ..
drwxr-xr-x@  9 mine  staff   288 Apr  6 00:22 .git
-rw-r--r--@  1 mine  staff   109 Apr  6 00:22 .gitignore
-rw-r--r--@  1 mine  staff     5 Apr  6 00:22 .python-version
-rw-r--r--@  1 mine  staff    88 Apr  6 00:22 main.py
-rw-r--r--@  1 mine  staff   156 Apr  6 00:22 pyproject.toml
-rw-r--r--@  1 mine  staff     0 Apr  6 00:22 README.md

This creates a pyproject.toml, a sample main.py script, and basic git infrastructure. Generally, we only really care about the pyproject.toml which we can exclusively generate via uv init --bare.

pyproject.toml

Modern project metadata file, tracks python version and package dependencies among other details.

[project]
name = "my-project"
version = "0.1.0"
description = "Add your description here"
readme = "README.md"
requires-python = ">=3.14"
dependencies = []

Adding dependencies

Once we have our project setup we can add (and install) dependencies directly via uv. uv add updates pyproject.toml and installs the package (creating a venv if needed).

uv add plotnine
Using CPython 3.14.3
Creating virtual environment at: .venv
Resolved 19 packages in 188ms
Installed 17 packages in 134ms
 + contourpy==1.3.3
 + cycler==0.12.1
 + fonttools==4.62.1
 + kiwisolver==1.5.0
 + matplotlib==3.10.8
 + mizani==0.14.4
 + numpy==2.4.4
 + packaging==26.0
 + pandas==3.0.2
 + patsy==1.0.2
 + pillow==12.2.0
 + plotnine==0.15.3
 + pyparsing==3.3.2
 + python-dateutil==2.9.0.post0
 + scipy==1.17.1
 + six==1.17.0
 + statsmodels==0.14.6
uv add polars
Resolved 21 packages in 142ms
Installed 2 packages in 6ms
 + polars==1.39.3
 + polars-runtime-32==1.39.3

Updated pyproject.toml

[project]
name = "my-project"
version = "0.1.0"
description = "Add your description here"
readme = "README.md"
requires-python = ">=3.14"
dependencies = [
    "plotnine>=0.15.3",
    "polars>=1.39.3",
]

Virtual environments

Virtual environments isolate project dependencies from the system Python and other projects. Packages are installed in a local folder in your project.

As we just saw, using uv add will create a new virtual environment in .venv by default if there is not an existing venv.

Activating environments in Positron with uv

Positron automatically detects virtual environments in your project directory. When you open a folder containing a .venv directory (created by uv), Positron will:

  • Detect the environment and offer to use it
  • Show the active Python interpreter in the status bar
  • Use the environment for the Python console and when running scripts

If not automatically detected, you can manually select the interpreter via the Command Palette (Cmd+Shift+P / Ctrl+Shift+P) and searching for “Python: Select Interpreter”.

uv sync

Since the .venv folder is system specific (and large) it is not typically committed to git. Instead you will likely clone a repository that just has a pyproject.toml file.

Use uv sync to construct the venv and install all dependencies for the project

Listing packages

uv pip list
Package           Version
----------------- -----------
contourpy         1.3.3
cycler            0.12.1
fonttools         4.62.1
kiwisolver        1.5.0
matplotlib        3.10.8
mizani            0.14.4
numpy             2.4.4
packaging         26.0
pandas            3.0.2
patsy             1.0.2
pillow            12.2.0
plotnine          0.15.3
polars            1.39.3
polars-runtime-32 1.39.3
pyparsing         3.3.2
python-dateutil   2.9.0.post0
scipy             1.17.1
six               1.17.0
statsmodels       0.14.6
uv pip show plotnine
Name: plotnine
Version: 0.15.3
Location: /Users/mine/Desktop/my-project/.venv/lib/python3.14/site-packages
Requires: matplotlib, mizani, numpy, pandas, scipy, statsmodels
Required-by:

uv pip install and uv pip freeze are also available as lower-level commands for working outside of a project context or with requirements.txt files.

Dependency tree

uv tree
Resolved 21 packages in 18ms
my-project v0.1.0
├── plotnine v0.15.3
│   ├── matplotlib v3.10.8
│   │   ├── contourpy v1.3.3
│   │   │   └── numpy v2.4.4
│   │   ├── cycler v0.12.1
│   │   ├── fonttools v4.62.1
│   │   ├── kiwisolver v1.5.0
│   │   ├── numpy v2.4.4
│   │   ├── packaging v26.0
│   │   ├── pillow v12.2.0
│   │   ├── pyparsing v3.3.2
│   │   └── python-dateutil v2.9.0.post0
│   │       └── six v1.17.0
│   ├── mizani v0.14.4
│   │   ├── numpy v2.4.4
│   │   ├── pandas v3.0.2
│   │   │   ├── numpy v2.4.4
│   │   │   └── python-dateutil v2.9.0.post0 (*)
│   │   └── scipy v1.17.1
│   │       └── numpy v2.4.4
│   ├── numpy v2.4.4
│   ├── pandas v3.0.2 (*)
│   ├── scipy v1.17.1 (*)
│   └── statsmodels v0.14.6
│       ├── numpy v2.4.4
│       ├── packaging v26.0
│       ├── pandas v3.0.2 (*)
│       ├── patsy v1.0.2
│       │   └── numpy v2.4.4
│       └── scipy v1.17.1 (*)
└── polars v1.39.3
    └── polars-runtime-32 v1.39.3
(*) Package tree already displayed

Common workflows

New project setup:

mkdir my-project # or use Positron's new project wizard
cd my-project    # and navigate to the project folder
uv init --bare
uv add plotine polars


Clone existing project:

git clone <repo-url> # or use Positron's git integration
cd <repo>            # and navigate to the project folder
uv sync

Let’s give it a try!

Go to ae-16 and let’s make a simple plot with plotnine!

Introduction to plotnine

What is plotnine?

plotnine is a Python visualization library that implements the grammar of graphics.

  • Based on the same principles as ggplot2 in R
  • Enables data visualization through composable, layered components
  • Maps data to visual properties systematically
  • Works with Pandas and Polars DataFrames

The grammar of graphics

A plot is built from layers of components:

Component Description
Data The dataset to visualize
Aesthetics Mappings from data to visual properties
Geoms Geometric objects that represent data
Scales Control how data values map to visual values
Facets Split data into multiple subplots
Coords Coordinate system for the plot
Themes Control non-data visual elements

Data

All plots begin with passing data to ggplot():

from plotnine import ggplot

(
    ggplot(data=my_dataframe)
)

Tip

Plotnine works best with tidy data:

  • Each variable is a column
  • Each observation is a row
  • Each type of observational unit is a table

Aesthetic mappings

The aes() function maps data columns to visual properties:

from plotnine import ggplot, aes

ggplot(df, aes(x="column1", y="column2"))


Common aesthetic mappings:

Aesthetic Description
x, y Position on axes
color Color of points/lines
fill Fill color of shapes
size Size of points
shape Shape of points
alpha Transparency

Geometric objects (geoms)

from plotnine import ggplot, aes, geom_point

(
    ggplot(df, aes(x="x_var", y="y_var")) 
    + geom_point()
)


Note

In Python, wrap the entire plot expression in parentheses () to allow line breaks.

Geometric objects (geoms)

Geoms determine how data is visually represented:

Geom Description
geom_point() Scatter plot
geom_line() Line plot
geom_bar() Bar chart
geom_histogram() Histogram
geom_boxplot() Box plot
geom_smooth() Smoothed line
geom_text() Text labels
geom_segment() Line segments
geom_area() Area plot
geom_density() Density plot

Layering with +

Components are combined using the + operator:

from plotnine import ggplot, aes, geom_point, geom_smooth

(
    ggplot(df, aes(x="x_var", y="y_var"))
    + geom_point()
    + geom_smooth(method="lm")
)


Note

In Python, move the + to the start of the line and add a line break before +.

Scales

Scales customize how data values map to visual values.

Naming pattern: scale_<aesthetic>_<type>

from plotnine import ggplot, aes, geom_point, scale_color_continuous

(
    ggplot(df, aes(x="x_var", y="y_var", color="z_var"))
    + geom_point()
    + scale_color_continuous(cmap_name="viridis")
)

Common scale functions:

  • scale_x_continuous(), scale_y_continuous() - continuous axes
  • scale_x_log10(), scale_y_log10() - log-transformed axes
  • scale_color_manual(), scale_fill_manual() - custom colors
  • scale_color_brewer(), scale_fill_brewer() - ColorBrewer palettes

Facets

Facets split data into multiple subplots:

facet_wrap() - wraps panels into rows

from plotnine import facet_wrap

(
    ggplot(df, aes(x="x", y="y"))
    + geom_point()
    + facet_wrap("~category")
)

facet_grid() - creates a grid of panels

from plotnine import facet_grid

(
    ggplot(df, aes(x="x", y="y"))
    + geom_point()
    + facet_grid("row_var ~ col_var")
)

Coordinate systems

Coordinate functions specify the plot’s coordinate system:

from plotnine import coord_fixed, coord_cartesian, coord_trans

# Equal aspect ratio
ggplot(...) + coord_fixed()

# Zoom without clipping data
ggplot(...) + coord_cartesian(xlim=(0, 10))

# Transformed cartesian coordinate system
ggplot(...) + coord_trans()

Themes

Themes control non-data visual elements like fonts, colors, and grid lines.

Pre-built themes:

from plotnine import theme_minimal, theme_bw, theme_classic

ggplot(...) + theme_minimal()
ggplot(...) + theme_bw()
ggplot(...) + theme_classic()

Custom theme adjustments:

from plotnine import theme, element_text, element_line

(
    ggplot(...)
    + theme_minimal()
    + theme(
        axis_text=element_text(size=12),
        legend_position="bottom",
        figure_size=(10, 6)
    )
)

Labels

Use labs() to add titles and axis labels:

from plotnine import labs

(
    ggplot(df, aes(x="x_var", y="y_var"))
    + geom_point()
    + labs(
        title="My Plot Title",
        subtitle="A descriptive subtitle",
        x="X Axis Label",
        y="Y Axis Label",
        color="Legend Title"
    )
)

Putting it all together

from plotnine import *
from plotnine.data import mpg

(
    ggplot(mpg, aes(x="cty", y="hwy"))
    + geom_point(aes(color="displ"), alpha=0.7)
    + geom_smooth(method="lm", color="blue")
    + scale_color_continuous(cmap_name="viridis")
    + facet_wrap("~drv", ncol=1)
    + labs(
        title="City vs Highway MPG",
        x="City MPG",
        y="Highway MPG",
        color="Engine\nDisplacement"
    )
    + theme_bw()
    + theme(
        figure_size=(3, 6), 
        legend_position="bottom"
    )
)

Key differences from ggplot2

ggplot2 (R) plotnine (Python)
aes(x = var) aes(x="var") (quoted strings)
+ at end of line + at start of line (inside parens)
theme(legend.position = ...) theme(legend_position=...) (underscores)
No parens needed Wrap in () for multi-line plots
ggsave() .save() method on plot object


# Saving a plot
p = ggplot(...) + geom_point()
p.save("my_plot.png", width=10, height=6, dpi=300)