Multi-Stage Dockerfile with CPU-Only PyTorch

What

A three-stage Docker build (base → dependencies → application) that isolates system dependencies from the application layer, forces the lightweight CPU-only build of PyTorch before installing the rest of requirements.txt, and downloads the Spacy NLP model required by Microsoft Presidio as part of the image layer.

Project Context

In full_project_context_updated.txt -> ./Dockerfile, the second stage installs torch from https://download.pytorch.org/whl/cpu BEFORE the main requirements.txt so the much heavier CUDA-enabled Torch never sneaks in. Presidio’s en_core_web_lg model is downloaded as a wheel from GitHub in the same stage so it is baked into the image, not fetched at runtime. The third stage drops root, switches to an unprivileged appuser, and brings the application code in with a non-root ownership. A HEALTHCHECK pings /api/v1/health every 30 seconds.

How

Stages in order


FROM python:3.11-slim as base
ENV PYTHONUNBUFFERED=1 PYTHONDONTWRITEBYTECODE=1 PIP_NO_CACHE_DIR=1
RUN apt-get update && apt-get install -y gcc postgresql-client && rm -rf /var/lib/apt/lists/*
RUN useradd -m -u 1000 appuser
WORKDIR /app
 
FROM base as dependencies
COPY requirements.txt .
RUN pip install --upgrade pip
RUN pip install torch --index-url https://download.pytorch.org/whl/cpu
RUN pip install --no-cache-dir --default-timeout=1000 -r requirements.txt
RUN pip install https://github.com/explosion/spacy-models/releases/download/en_core_web_lg-3.7.1/en_core_web_lg-3.7.1-py3-none-any.whl
 
FROM dependencies as application
COPY --chown=appuser:appuser . .
USER appuser
EXPOSE 8000
HEALTHCHECK --interval=30s --timeout=10s --start-period=5s --retries=3 \
    CMD python -c "import requests; requests.get('http://localhost:8000/api/v1/health')" || exit 1
CMD uvicorn app.main:app --host 0.0.0.0 --port ${PORT:-8000}

The three stages share an as base → as dependencies → as application alias chain; each later stage copies state forward. The final image contains app + Python deps + Spacy model but EXCLUDES gcc and postgresql-client from the running image (those stay in the discarded base layer’s history).
The forced CPU-only Torch install avoids pulling a 2 GB CUDA bundle when the FCA app never uses GPU; the --index-url switch overrides PyPI’s default resolution.
--default-timeout=1000 defends against sporadic dependency-mirror timeouts when requirements.txt has many large wheels; the default 60s is too tight on slow CI.
USER appuser forces non-root execution at runtime — every pip install and uvicorn runs as UID 1000.
--start-period=5s prevents spurious restarts while uvicorn is still binding the port.

Common Pitfalls

Forgetting rm -rf /var/lib/apt/lists/* bloats each image layer by hundreds of MB of apt cache. Always purge in the same RUN that calls apt-get install.

Building with a GPU host but a CPU image silently passes; tests pass; torch.cuda.is_available() returns False at runtime. Always pin --index-url https://download.pytorch.org/whl/cpu regardless of host GPU presence.

Real-World Interview Prep

Q1: When do you need a multi-stage Dockerfile vs a single `FROM python:3.12` build?

A: Multi-stage when (a) the runtime image has a different surface area than the build image — e.g., you compile C extensions in stage 1, then copy just the .so files to a slim image; (b) you want to bake models/assets into the image (Spacy, HuggingFace) without polluting the source layer; (c) you want a non-root runtime but your pip install needs root. Single stage is fine when (a) your dependency tree is simple Python wheels, (b) you don’t care about a 50MB-ish image, (c) you ship to a tightly controlled environment where reproducibility > size. Industry trend: multi-stage is the default because image size affects cold-start time on FaaS and egress costs on Cloud Run.

Q2: Your image is 2GB. How do you shrink it to under 500MB?

A: Five-step ladder (apply in order). (1) Switch base from python:3.12 to python:3.12-slim (~150MB savings). (2) Use multi-stage — discard build tools, copy only .so and .dist-info (300MB savings for libs with C extensions). (3) Install CPU-only PyTorch if you don’t use GPU (1.5GB savings). (4) Use --no-cache-dir and PIP_NO_CACHE_DIR=1 to avoid keeping wheel caches. (5) For Python-only libs, use pip install --no-deps and manually copy wheels. The FCA build is already 1.3GB; ~80% is Spacy + PyTorch. If you don’t need both, you can split into sidecar containers (PyTorch worker, Spacy-only web).

Q3: How do you debug “spacy model not found” at runtime despite the wheel install?

A: Verify in three places. (1) Container: docker run -it image /bin/bash; pip list | grep en_core_web — confirms the wheel is registered. (2) Spacy’s load path: it can’t find models installed via wheel unless they’re registered with pip install, not manually pip install <url> to a custom dir. The wheel URL https://github.com/explosion/spacy-models/releases/download/... is the canonical install method. (3) python -c "import spacy; spacy.load('en_core_web_lg')" — explicit load. Common mistake: pip-installing the wheel in one stage but switching USER appuser before registering; the wheel install needs root, the Spacy load runs as appuser — correct order is install as root, run as non-root.

Top-to-Bottom Code Walkthrough (`Dockerfile`)

Multi-stage Dockerfiles split “I need gcc to compile stuff” from “I need to ship a small image”. Each FROM line begins a new stage; artefacts can be COPY --from=<stage> into later stages.

Stage 1: `base`


FROM python:3.11-slim as base
 
ENV PYTHONUNBUFFERED=1 \
    PYTHONDONTWRITEBYTECODE=1 \
    PIP_NO_CACHE_DIR=1 \
    PIP_DISABLE_PIP_VERSION_CHECK=1
 
RUN apt-get update && apt-get install -y \
    gcc \
    postgresql-client \
    && rm -rf /var/lib/apt/lists/*
 
RUN useradd -m -u 1000 appuser
 
WORKDIR /app

PYTHONUNBUFFERED=1 — Python flushes stdout immediately; logs appear in real-time.
PIP_NO_CACHE_DIR=1 — don’t keep ~/.cache/pip (saves ~200 MB).
gcc — needed to compile some packages (psycopg2-binary’s transitive deps occasionally).
postgresql-client — psql and pg_isready available inside the container for debugging.
useradd -m -u 1000 appuser — non-root user; never run as root in production.
rm -rf /var/lib/apt/lists/* — clean apt lists from the same layer (saves 30 MB).

Stage 2: `dependencies`


FROM base as dependencies
 
COPY requirements.txt .
 
RUN pip install --upgrade pip
RUN pip install torch --index-url https://download.pytorch.org/whl/cpu
 
RUN pip install --no-cache-dir --default-timeout=1000 -r requirements.txt
 
RUN pip install https://github.com/explosion/spacy-models/releases/download/en_core_web_lg-3.7.1/en_core_web_lg-3.7.1-py3-none-any.whl

Why the split: a pip install step takes minutes. If you only changed application code, Docker caches this stage entirely. With torch --index-url=https://download.pytorch.org/whl/cpu you pull the CPU-only version (~200 MB) instead of the CUDA build (~4 GB).

--default-timeout=1000 — pip’s default 15-second timeout fails on slow CI networks. Bumping to 1000 seconds prevents spurious install failures.

en_core_web_lg-3.7.1 — Presidio’s required spaCy English model. Pre-baked into the image so the app starts instantly.

Stage 3: `application`


FROM dependencies as application
 
COPY --chown=appuser:appuser . .
 
USER appuser
 
EXPOSE 8000
 
HEALTHCHECK --interval=30s --timeout=10s --start-period=5s --retries=3 \
    CMD python -c "import requests; requests.get('http://localhost:8000/api/v1/health')" || exit 1
 
CMD uvicorn app.main:app --host 0.0.0.0 --port ${PORT:-8000}

COPY --chown=appuser:appuser . . — copy code; chown to the non-root user.
USER appuser — switch identity.
HEALTHCHECK — Docker pings /api/v1/health every 30 seconds; 3 failures marks the container unhealthy.
${PORT:-8000} — platform-port-overrideable default.

Image size

This Dockerfile produces an image ~1.5 GB (mostly the PyTorch + spaCy model + langgraph). A slimmer alternative strips torch entirely if Presidio’s NLP is replaced with regex-only detection.

Common Pitfalls

Not cleaning apt lists — adds 30 MB to image. Running everything as root — security risk; container escapes are possible. Forgetting EXPOSE — Docker can’t show port mappings in docker ps (cosmetic, but useful). Building a :latest image in production — pinned tags (:v1.2.3) make rollback trivial.

Real-World Interview Prep

Q1: Why three stages vs one?

A: Layer caching. If you change app/main.py, Docker rebuilds only the application stage and reuses everything up to dependencies. With a single stage, every code change re-installs 200+ packages.

Q2: Why `--index-url https://download.pytorch.org/whl/cpu`?

A: GPU-enabled PyTorch is 4 GB; CPU-only is 200 MB. For inference workloads (typical FCA scenario), CPU-only is sufficient and cuts 20x off image size and 20x off build time.

Q3: When would you switch from Debian-slim to Alpine?

A: Alpine is smaller but Python’s wheels don’t always build cleanly against musl-libc. cryptography (used by JWT) has known issues. Stick with python:3.11-slim unless size is critical.

Multi-Stage Dockerfile with CPU-Only PyTorch

What

Project Context

How

Stages in order

Common Pitfalls

Real-World Interview Prep

Q1: When do you need a multi-stage Dockerfile vs a single FROM python:3.12 build?

Q2: Your image is 2GB. How do you shrink it to under 500MB?

Q3: How do you debug “spacy model not found” at runtime despite the wheel install?

Top-to-Bottom Code Walkthrough (Dockerfile)

Stage 1: base

Stage 2: dependencies

Stage 3: application

Image size

Common Pitfalls

Real-World Interview Prep

Q1: Why three stages vs one?

Q2: Why --index-url https://download.pytorch.org/whl/cpu?

Q3: When would you switch from Debian-slim to Alpine?

Q1: When do you need a multi-stage Dockerfile vs a single `FROM python:3.12` build?

Top-to-Bottom Code Walkthrough (`Dockerfile`)

Stage 1: `base`

Stage 2: `dependencies`

Stage 3: `application`

Q2: Why `--index-url https://download.pytorch.org/whl/cpu`?