Generated using AI. Be aware that everything might not be accurate.

Chapter 4: Dockerfile Instruction Optimization

Even with an optimal base image and a multi-stage build, a poorly written Dockerfile can add hundreds of megabytes of unnecessary weight and make every build slower than it needs to be. This chapter covers the instruction-level patterns that eliminate bloat and improve cache efficiency.

Chain `RUN` Instructions

Each RUN instruction creates a new layer. As established in Chapter 1, files created in one layer and deleted in a subsequent layer are still present in the image — only hidden by whiteout entries.

Wrong:

RUN apt-get update
RUN apt-get install -y build-essential
RUN rm -rf /var/lib/apt/lists/*

This creates three layers. build-essential is in layer 2. rm -rf creates layer 3 which hides layer 2’s files, but layer 2 still exists in the image.

Correct:

RUN apt-get update && \
    apt-get install -y --no-install-recommends build-essential && \
    rm -rf /var/lib/apt/lists/*

One layer. build-essential is installed and cleaned up atomically. No residual layer carries the package manager cache.

This pattern is the single most common fix for unnecessarily large images, and it applies to every package manager (apt, apk, yum, dnf).

Order Instructions by Change Frequency

Cache invalidation cascades downward: the moment a layer’s hash changes, every subsequent layer must be rebuilt. Instructions that change frequently should be last; instructions that change rarely should be first.

The typical order for a Python application:

FROM python:3.12-slim-bookworm          # 1. Base — changes rarely
WORKDIR /app                             # 2. Config — never changes
RUN apt-get update && apt-get install …  # 3. System deps — changes rarely
COPY requirements.txt .                  # 4. Dep manifest — changes occasionally
RUN pip install --no-cache-dir -r …     # 5. Deps — invalidated when #4 changes
COPY . .                                 # 6. Source code — changes on every commit
CMD ["python", "main.py"]               # 7. Config — changes rarely

If you COPY . . before pip install, every source code change forces pip to reinstall all packages — even if requirements.txt has not changed. This is the most common build performance mistake.

Use `.dockerignore`

The Docker build context is the directory (or URL) you pass to docker build. Docker sends the entire context to the build daemon before processing the Dockerfile. Without a .dockerignore file, this includes:

.git/ — often hundreds of megabytes of history
__pycache__/ and *.pyc — Python bytecache
node_modules/ — if present from a local dev build
tests/, docs/ — not needed in production
.env — secrets you absolutely do not want in an image

A minimal .dockerignore for a Python project:

.git
.gitignore
__pycache__
*.pyc
*.pyo
.pytest_cache
.mypy_cache
tests/
docs/
.env
*.md

A large build context increases the docker build time even before the first layer is processed, because the daemon must receive and unpack the entire context. A 200 MB .git directory sent on every build adds seconds to every developer’s workflow.

`COPY` vs `ADD`

ADD is a superset of COPY that also:

Decompresses local tar archives into the destination
Fetches files from URLs

Unless you specifically need these features, always use COPY. ADD with a URL bypasses the layer cache (the remote content can change without the URL changing) and introduces a network dependency into the build. COPY is explicit, predictable, and cacheable.

Use ADD only for decompressing a local archive that you want to unpack inline, and document why.

Drop Root with `USER`

By default, the process inside a container runs as root. This is unnecessary for most applications and creates risk: if an attacker achieves remote code execution in your container, they have root access to the container’s filesystem and to any mounted volumes.

Add a non-root user at the end of your Dockerfile:

RUN addgroup --system app && adduser --system --ingroup app app
USER app

On Alpine, the syntax is slightly different:

RUN addgroup -S app && adduser -S app -G app
USER app

Many slim and Alpine images include a nobody user for convenience:

USER nobody

Do this in every Dockerfile. It is a one-line change with no size impact.

Use `WORKDIR` Instead of `RUN mkdir && cd`

# Bad
RUN mkdir -p /app && cd /app

# Good
WORKDIR /app

WORKDIR creates the directory if it does not exist, sets it as the working directory for all subsequent instructions, and is more readable. Use absolute paths — relative paths are resolved against the previous WORKDIR, which can cause confusion.

`ENV` vs `ARG`

ENV sets environment variables that persist into the runtime image. Use ENV for configuration the application needs at runtime (PORT, LOG_LEVEL, etc.).
ARG sets build-time variables that do not persist into the runtime image. Use ARG for values needed only during the build (BUILD_VERSION, TARGETARCH, etc.).

Never store secrets in ENV or ARG. ENV values are visible in docker inspect; ARG values appear in docker history. Use Docker secrets or environment variables injected at runtime instead.

`LABEL` for Metadata

Labels are zero-size metadata attached to the image. They are useful for tooling, automation, and auditing:

LABEL org.opencontainers.image.source="https://github.com/org/repo"
LABEL org.opencontainers.image.version="1.2.3"
LABEL org.opencontainers.image.licenses="MIT"

The OCI image annotations are a widely adopted convention. Tools like Docker Scout and Trivy use them to correlate images with their source repositories.

Key Takeaways

Chain all related RUN commands with && to prevent deleted files from persisting in earlier layers.
Order Dockerfile instructions from least-changing to most-changing for maximum cache efficiency.
Add a .dockerignore file to exclude .git, caches, secrets, and test files from the build context.
Prefer COPY over ADD unless you specifically need archive extraction or URL fetching.
Drop root with USER in every Dockerfile — it is free and meaningfully reduces risk.
Use WORKDIR with absolute paths instead of RUN mkdir && cd.
Keep secrets out of ENV and ARG; they appear in image metadata.

← Chapter 3: Multi-Stage Builds

Table of Contents

Chapter 5: Package Manager Best Practices →

>> You can subscribe to my mailing list here for a monthly update. <<

Gaëlle Candel

Chapter 4: Dockerfile Instruction Optimization

Chain RUN Instructions

Order Instructions by Change Frequency

Use .dockerignore

COPY vs ADD

Drop Root with USER

Use WORKDIR Instead of RUN mkdir && cd

ENV vs ARG

LABEL for Metadata