Even with an optimal base image and a multi-stage build, a poorly written Dockerfile can add hundreds of megabytes of unnecessary weight and make every build slower than it needs to be. This chapter covers the instruction-level patterns that eliminate bloat and improve cache efficiency.
RUN InstructionsEach RUN instruction creates a new layer. As established in Chapter 1, files created in one layer and deleted in a subsequent layer are still present in the image — only hidden by whiteout entries.
Wrong:
RUN apt-get update
RUN apt-get install -y build-essential
RUN rm -rf /var/lib/apt/lists/*
This creates three layers. build-essential is in layer 2. rm -rf creates layer 3 which hides layer 2’s files, but layer 2 still exists in the image.
Correct:
RUN apt-get update && \
apt-get install -y --no-install-recommends build-essential && \
rm -rf /var/lib/apt/lists/*
One layer. build-essential is installed and cleaned up atomically. No residual layer carries the package manager cache.
This pattern is the single most common fix for unnecessarily large images, and it applies to every package manager (apt, apk, yum, dnf).
Cache invalidation cascades downward: the moment a layer’s hash changes, every subsequent layer must be rebuilt. Instructions that change frequently should be last; instructions that change rarely should be first.
The typical order for a Python application:
FROM python:3.12-slim-bookworm # 1. Base — changes rarely
WORKDIR /app # 2. Config — never changes
RUN apt-get update && apt-get install … # 3. System deps — changes rarely
COPY requirements.txt . # 4. Dep manifest — changes occasionally
RUN pip install --no-cache-dir -r … # 5. Deps — invalidated when #4 changes
COPY . . # 6. Source code — changes on every commit
CMD ["python", "main.py"] # 7. Config — changes rarely
If you COPY . . before pip install, every source code change forces pip to reinstall all packages — even if requirements.txt has not changed. This is the most common build performance mistake.
.dockerignoreThe Docker build context is the directory (or URL) you pass to docker build. Docker sends the entire context to the build daemon before processing the Dockerfile. Without a .dockerignore file, this includes:
.git/ — often hundreds of megabytes of history__pycache__/ and *.pyc — Python bytecachenode_modules/ — if present from a local dev buildtests/, docs/ — not needed in production.env — secrets you absolutely do not want in an imageA minimal .dockerignore for a Python project:
.git
.gitignore
__pycache__
*.pyc
*.pyo
.pytest_cache
.mypy_cache
tests/
docs/
.env
*.md
A large build context increases the docker build time even before the first layer is processed, because the daemon must receive and unpack the entire context. A 200 MB .git directory sent on every build adds seconds to every developer’s workflow.
COPY vs ADDADD is a superset of COPY that also:
Unless you specifically need these features, always use COPY. ADD with a URL bypasses the layer cache (the remote content can change without the URL changing) and introduces a network dependency into the build. COPY is explicit, predictable, and cacheable.
Use ADD only for decompressing a local archive that you want to unpack inline, and document why.
USERBy default, the process inside a container runs as root. This is unnecessary for most applications and creates risk: if an attacker achieves remote code execution in your container, they have root access to the container’s filesystem and to any mounted volumes.
Add a non-root user at the end of your Dockerfile:
RUN addgroup --system app && adduser --system --ingroup app app
USER app
On Alpine, the syntax is slightly different:
RUN addgroup -S app && adduser -S app -G app
USER app
Many slim and Alpine images include a nobody user for convenience:
USER nobody
Do this in every Dockerfile. It is a one-line change with no size impact.
WORKDIR Instead of RUN mkdir && cd# Bad
RUN mkdir -p /app && cd /app
# Good
WORKDIR /app
WORKDIR creates the directory if it does not exist, sets it as the working directory for all subsequent instructions, and is more readable. Use absolute paths — relative paths are resolved against the previous WORKDIR, which can cause confusion.
ENV vs ARGENV sets environment variables that persist into the runtime image. Use ENV for configuration the application needs at runtime (PORT, LOG_LEVEL, etc.).ARG sets build-time variables that do not persist into the runtime image. Use ARG for values needed only during the build (BUILD_VERSION, TARGETARCH, etc.).Never store secrets in ENV or ARG. ENV values are visible in docker inspect; ARG values appear in docker history. Use Docker secrets or environment variables injected at runtime instead.
LABEL for MetadataLabels are zero-size metadata attached to the image. They are useful for tooling, automation, and auditing:
LABEL org.opencontainers.image.source="https://github.com/org/repo"
LABEL org.opencontainers.image.version="1.2.3"
LABEL org.opencontainers.image.licenses="MIT"
The OCI image annotations are a widely adopted convention. Tools like Docker Scout and Trivy use them to correlate images with their source repositories.
RUN commands with && to prevent deleted files from persisting in earlier layers..dockerignore file to exclude .git, caches, secrets, and test files from the build context.COPY over ADD unless you specifically need archive extraction or URL fetching.USER in every Dockerfile — it is free and meaningfully reduces risk.WORKDIR with absolute paths instead of RUN mkdir && cd.ENV and ARG; they appear in image metadata.| ← Chapter 3: Multi-Stage Builds | Table of Contents | Chapter 5: Package Manager Best Practices → |