Chapter 4: Dockerfile Instruction Optimization


Even with an optimal base image and a multi-stage build, a poorly written Dockerfile can add hundreds of megabytes of unnecessary weight and make every build slower than it needs to be. This chapter covers the instruction-level patterns that eliminate bloat and improve cache efficiency.


Chain RUN Instructions

Each RUN instruction creates a new layer. As established in Chapter 1, files created in one layer and deleted in a subsequent layer are still present in the image — only hidden by whiteout entries.

Wrong:

RUN apt-get update
RUN apt-get install -y build-essential
RUN rm -rf /var/lib/apt/lists/*

This creates three layers. build-essential is in layer 2. rm -rf creates layer 3 which hides layer 2’s files, but layer 2 still exists in the image.

Correct:

RUN apt-get update && \
    apt-get install -y --no-install-recommends build-essential && \
    rm -rf /var/lib/apt/lists/*

One layer. build-essential is installed and cleaned up atomically. No residual layer carries the package manager cache.

This pattern is the single most common fix for unnecessarily large images, and it applies to every package manager (apt, apk, yum, dnf).


Order Instructions by Change Frequency

Cache invalidation cascades downward: the moment a layer’s hash changes, every subsequent layer must be rebuilt. Instructions that change frequently should be last; instructions that change rarely should be first.

The typical order for a Python application:

FROM python:3.12-slim-bookworm          # 1. Base — changes rarely
WORKDIR /app                             # 2. Config — never changes
RUN apt-get update && apt-get install# 3. System deps — changes rarely
COPY requirements.txt .                  # 4. Dep manifest — changes occasionally
RUN pip install --no-cache-dir -r# 5. Deps — invalidated when #4 changes
COPY . .                                 # 6. Source code — changes on every commit
CMD ["python", "main.py"]               # 7. Config — changes rarely

If you COPY . . before pip install, every source code change forces pip to reinstall all packages — even if requirements.txt has not changed. This is the most common build performance mistake.


Use .dockerignore

The Docker build context is the directory (or URL) you pass to docker build. Docker sends the entire context to the build daemon before processing the Dockerfile. Without a .dockerignore file, this includes:

A minimal .dockerignore for a Python project:

.git
.gitignore
__pycache__
*.pyc
*.pyo
.pytest_cache
.mypy_cache
tests/
docs/
.env
*.md

A large build context increases the docker build time even before the first layer is processed, because the daemon must receive and unpack the entire context. A 200 MB .git directory sent on every build adds seconds to every developer’s workflow.


COPY vs ADD

ADD is a superset of COPY that also:

Unless you specifically need these features, always use COPY. ADD with a URL bypasses the layer cache (the remote content can change without the URL changing) and introduces a network dependency into the build. COPY is explicit, predictable, and cacheable.

Use ADD only for decompressing a local archive that you want to unpack inline, and document why.


Drop Root with USER

By default, the process inside a container runs as root. This is unnecessary for most applications and creates risk: if an attacker achieves remote code execution in your container, they have root access to the container’s filesystem and to any mounted volumes.

Add a non-root user at the end of your Dockerfile:

RUN addgroup --system app && adduser --system --ingroup app app
USER app

On Alpine, the syntax is slightly different:

RUN addgroup -S app && adduser -S app -G app
USER app

Many slim and Alpine images include a nobody user for convenience:

USER nobody

Do this in every Dockerfile. It is a one-line change with no size impact.


Use WORKDIR Instead of RUN mkdir && cd

# Bad
RUN mkdir -p /app && cd /app

# Good
WORKDIR /app

WORKDIR creates the directory if it does not exist, sets it as the working directory for all subsequent instructions, and is more readable. Use absolute paths — relative paths are resolved against the previous WORKDIR, which can cause confusion.


ENV vs ARG

Never store secrets in ENV or ARG. ENV values are visible in docker inspect; ARG values appear in docker history. Use Docker secrets or environment variables injected at runtime instead.


LABEL for Metadata

Labels are zero-size metadata attached to the image. They are useful for tooling, automation, and auditing:

LABEL org.opencontainers.image.source="https://github.com/org/repo"
LABEL org.opencontainers.image.version="1.2.3"
LABEL org.opencontainers.image.licenses="MIT"

The OCI image annotations are a widely adopted convention. Tools like Docker Scout and Trivy use them to correlate images with their source repositories.


Key Takeaways


← Chapter 3: Multi-Stage Builds Table of Contents Chapter 5: Package Manager Best Practices →