Chapter 1: Understanding Docker Image Layers


Before you can optimise a Docker image, you need a precise mental model of what an image actually is and how its size accumulates. This chapter builds that model. Understanding it makes every subsequent technique feel obvious rather than arbitrary.


What Is a Docker Image?

A Docker image is not a single file. It is an ordered sequence of read-only layers, each representing a filesystem change. When you run a container, Docker adds a thin writable layer on top — the container layer — but the image layers beneath it are never modified.

The filesystem technology that makes this work is called a union filesystem. On Linux, Docker defaults to OverlayFS, which presents the stacked layers as a single unified view. Files in higher layers shadow files in lower layers; if a file exists in layer 3 and layer 7, the container sees layer 7’s version.

Container (writable)          ← ephemeral, discarded on rm
─────────────────────
Layer 4: COPY . /app          ← your source code
Layer 3: RUN pip install ...  ← your dependencies
Layer 2: RUN apt-get install  ← system packages
Layer 1: Base image (Ubuntu)  ← OS foundation

Each layer stores only the diff relative to the layer below it — the files added, modified, or marked for deletion.


How Layers Are Created

Every Dockerfile instruction that modifies the filesystem creates a new layer:

Instructions that do not modify the filesystem — ENV, ARG, EXPOSE, LABEL, WORKDIR, CMD, ENTRYPOINT — create metadata but do not add layer size.


The Whiteout Problem

This is the single most important concept for understanding why naive cleanup commands do not reduce image size.

Suppose you write this Dockerfile:

FROM ubuntu:22.04
RUN apt-get update && apt-get install -y build-essential
RUN apt-get purge -y build-essential && apt-get autoremove -y

You might expect the final image to be roughly the size of the base Ubuntu image, since you installed and then removed the same package. It is not. The build-essential files are baked into layer 2. Layer 3 does not delete them — it adds whiteout entries that hide them from the union filesystem view, but the layer data is still present in the image and still downloaded when the image is pulled.

The only way to prevent a file from being in the final image is to never create it in a layer that is committed to the image. This is why:


Inspecting Layers

Docker provides several commands for examining an image’s layer structure.

docker history

docker history --no-trunc myapp:latest

This prints each layer, the instruction that created it, and its size. The --no-trunc flag shows the full command without truncation. This is your first diagnostic tool when an image is larger than expected.

docker image inspect

docker image inspect myapp:latest

Returns JSON with RootFS.Layers (the list of layer digests) and Size (total uncompressed size in bytes). Useful for scripting.

dive (Chapter 8)

The dive tool provides an interactive terminal UI that shows which files are in each layer, which files were modified across layers (wasted space), and an efficiency score. It is the most practical tool for identifying where size is coming from.


Layer Caching

Layer caching is Docker’s mechanism for avoiding redundant work during repeated builds. It is also the reason that layer ordering matters for build speed.

Docker hashes each layer using:

If a layer’s hash matches a cached layer, Docker reuses the cache and skips execution. If a layer’s hash does not match — because the instruction changed, or a COPYed file changed — Docker executes that layer and invalidates the cache for all subsequent layers.

This has a direct implication for how you order your Dockerfile instructions:

# BAD: source code copied before dependencies
# Any source change invalidates the pip install cache
COPY . /app
RUN pip install -r /app/requirements.txt

# GOOD: dependency manifest copied first
# pip install cache is only invalidated when requirements.txt changes
COPY requirements.txt /app/
RUN pip install -r /app/requirements.txt
COPY . /app

In the “good” version, pip install only re-runs when requirements.txt changes. In the “bad” version, it re-runs on every single source code change.


Key Takeaways


← Introduction Table of Contents Chapter 2: Choosing the Right Base Image →