Kubernetes Probes — `/live`, `/ready`, `/health`

What? (Concept Overview)

Kubernetes pod lifecycle has three orthogonal probes. /live (liveness) answers “kill the pod if it has hung”. /ready (readiness) answers “send traffic to this pod”. /health is a diagnostic summary used by humans/uptime checkers, not by k8s. Conflating them produces pods that either thrash (over-eager liveness) or receive traffic before they can serve (premature readiness).

Project Context

The FCA Support Agent’s app/routers/health.py exposes three endpoints, each tuned for a different consumer:

/api/v1/live — Never touches the DB; instant 200 OK. The Kubernetes liveness probe hits this every 5s.
/api/v1/ready — Calls check_db_connection(); returns 503 if Postgres is unreachable. The Kubernetes readiness probe gates the Service.
/api/v1/health — Full diagnostic with all subsystems; humans and external uptime checkers (Pingdom/UptimeRobot) consume this.

How? (Quick Reference Blocks)

3.1 The Health Router


# app/routers/health.py
from fastapi import APIRouter, status
from fastapi.responses import JSONResponse
from pydantic import BaseModel
from datetime import datetime, timezone
 
from app.config import settings
from app.database import check_db_connection
 
router = APIRouter()
 
class HealthResponse(BaseModel):
    status: str
    timestamp: str
    version: str
    environment: str
    checks: dict

3.2 `/live` — Liveness Probe


# app/routers/health.py
@router.get("/live", tags=["Health"])
async def liveness() -> dict:
    """Used by Kubernetes to determine if pod is alive."""
    return {
        "status": "alive",
        "timestamp": datetime.now(timezone.utc)
            .isoformat().replace("+00:00", "Z"),
    }

3.3 `/ready` — Readiness Probe


# app/routers/health.py
@router.get("/ready", tags=["Health"])
async def readiness() -> dict:
    db_healthy = await check_db_connection()
    if db_healthy:
        return {
            "status": "ready",
            "timestamp": datetime.now(timezone.utc)
                .isoformat().replace("+00:00", "Z"),
        }
    return JSONResponse(
        status_code=status.HTTP_503_SERVICE_UNAVAILABLE,
        content={
            "status": "not_ready",
            "reason": "database_unhealthy",
            "timestamp": datetime.now(timezone.utc)
                .isoformat().replace("+00:00", "Z"),
        },
    )

3.4 `/health` — Full Diagnostic


# app/routers/health.py
@router.get("/health", response_model=HealthResponse, tags=["Health"])
async def health_check() -> HealthResponse:
    db_healthy = await check_db_connection()
    return HealthResponse(
        status="healthy" if db_healthy else "unhealthy",
        timestamp=datetime.now(timezone.utc)
            .isoformat().replace("+00:00", "Z"),
        version=settings.app_version,
        environment=settings.environment,
        checks={
            "database": {"status": "healthy" if db_healthy else "unhealthy",
                         "type": "postgresql",
                         "pool_size": settings.database_pool_size},
            "redis": {"status": "healthy" if settings.redis_enabled
                       else "disabled", "enabled": settings.redis_enabled},
            "groq_ai": {"status": "configured" if settings.groq_api_key
                        else "not_configured",
                        "model": settings.groq_model},
        },
    )

Why? (Parameter Breakdown

/live MUST NOT touch external dependencies — If liveness hits Postgres and Postgres is slow, all pods get killed simultaneously → cluster-wide outage. Liveness failure should imply the process itself is broken (deadlock, OOM, leaked file descriptors).
/ready SHOULD touch critical upstream dependencies — A pod is “ready” iff it can serve a real request. Without DB checks, you’ll route traffic to a pod that fails every query → end-user errors.
/health as the human-facing endpoint — Uptime monitors want one endpoint to alarm on; that endpoint should be the most diagnostic. Don’t ask uptime monitors to poll /ready — the 503s will be noisy.
Three separate endpoints instead of one with mode flag — Avoids accidental k8s misconfiguration. If you have /health?mode=live, an ops engineer can fat-finger the readinessProbe config and accidentally kill pods on DB blips. Separation by URL is unambiguous.
pydantic.BaseModel response model for /health — Self-documents the schema and validates the response shape at runtime. Useful for uptime-monitor onboarding.
UTC timestamps with explicit Z — Parses unambiguously in any SIEM/observability stack (see Structured Logging page).

Common Pitfalls

Liveness probe hitting /health — Cascading failure. DB hiccups → readiness probe fails → which is OK, traffic shifts, BUT if liveness also fails → Kubernetes kills the pod → every pod gets killed → total outage. Always use the bare-minimum /live.
Readiness probe timeout shorter than connect timeout — If readinessProbe timeoutSeconds: 1 but Postgres connect timeout is 30s, the probe always times out → pod never marked ready → Service has zero endpoints.

Real-World Interview Prep

Q1: Your pods are flapping — `kubectl get pods` shows `CrashLoopBackOff`. Walk through your diagnostic ladder.

A: (1) kubectl describe pod <name> — Events section: did the readiness probe fail the pod? Or was it an explicit kill? (2) kubectl logs <pod> --previous — read the logs of the last crashed container. Look for uncaught exceptions in lifespan startup. (3) If probe-related, increase initialDelaySeconds from default 0 → 30s in the deployment manifest. (4) Check the liveness vs readiness: if liveness is firing, the Python process is wedged (infinite LLM loop, deadlock on a SQLAlchemy session). (5) Reproduce locally: simulate the same DB-down state and see if your /live correctly stays 200.

Q2: Why not use a single `/ping` endpoint for everything?

A: Probe-purpose mismatch. Liveness needs minimum dependency-free yes/no. Readiness needs dependency yes/no. Health needs everything no-fail-fast. A single /ping either over-checks (killing pods on DB blips) or under-checks (routing traffic to broken pods). Three routes, three contracts.

Q3: How do you differentiate “DB slow” from “DB down” with the readiness probe?

A: Add a latency budget. Wrap check_db_connection() in asyncio.wait_for(..., timeout=2.0). If the DB responds within 2s → ready; if the timeout fires → 503 not ready. This way a 30s DB hang produces a 2s readiness fail (and the pod is removed from the Service) without waiting for the full TCP connect timeout. The same pattern works for liveness if pod-level latency matters — but with a tighter ceiling (e.g., 200ms).

Top-to-Bottom Code Walkthrough (`app/routers/health.py`)

Kubernetes has three probe endpoints, each with a distinct purpose. Mixing them is a common ops mistake — your app restarts unnecessarily because a “live” probe checks the database.

`/health` — comprehensive


@router.get("/health")
async def health_check() -> HealthResponse:
    db_healthy = await check_db_connection()
    overall_status = "healthy" if db_healthy else "unhealthy"
    ...
    return JSONResponse(status_code=503, ...)

Purpose: humans and uptime monitors. Includes all dependency checks (DB, Redis, Groq config). Returns 503 if anything is degraded.

`/ping` — immediate response


@router.get("/ping")
async def ping() -> PingResponse:
    return PingResponse(status="ok", timestamp=datetime.now(timezone.utc))

Purpose: cheap liveness check. Returns instantly without dependency probing. Used for extremely-fast probe intervals (1s).

`/ready` — readiness probe


@router.get("/ready")
async def readiness() -> dict:
    db_healthy = await check_db_connection()
    is_ready = db_healthy
    return {"status": "ready" if is_ready else "not_ready"}

Purpose: tells Kubernetes “should I send traffic here?”. 503 = pod is alive but not yet ready. Stops traffic until the pod can serve. Critical for graceful startup — K8s waits for /ready 200 before adding the pod to service load balancers.

`/live` — liveness probe


@router.get("/live")
async def liveness() -> dict:
    return {"status": "alive", "timestamp": ...}

Purpose: tells Kubernetes “is the pod dead?”. 200 = alive. Only non-200 should trigger a pod restart. If a probe depends on external services, a flaky Redis or DB causes pod restarts you don’t want.

The critical distinction

/live should never touch dependencies. /ready should touch only critical dependencies (DB). /health checks everything.

Mixing them is the most common k8s misconfiguration.

Kubernetes pod spec example


livenessProbe:
  httpGet:
    path: /api/v1/live
    port: 8000
  initialDelaySeconds: 30
  periodSeconds: 10
readinessProbe:
  httpGet:
    path: /api/v1/ready
    port: 8000
  initialDelaySeconds: 5
  periodSeconds: 5

30s initial delay on liveness to let the app boot. 5s on readiness to detect quickly when the pod is healthy.

`check_db_connection()` (from `app/database.py`)


async def check_db_connection() -> bool:
    try:
        async with AsyncSessionLocal() as session:
            await session.execute(text("SELECT 1"))
            return True
    except Exception:
        return False

The simplest possible DB ping — one SELECT 1. Does not touch tables, indexes, or row counts.

Common Pitfalls

Using /health as the liveness probe — DB outage causes all your pods to restart, compounding the outage (thundering-herd restart).

Forgetting initialDelaySeconds — Kubernetes probes during app boot before TCP socket opens, fails probes for the first 30-60s, marks pod unhealthy. Always delay probes.

Long timeouts on /ready — readiness probe with timeoutSeconds: 1 and a slow endpoint causes spurious pod removals. Tune carefully.

Real-World Interview Prep

Q1: Why three endpoints instead of one?

A: Kubernetes has three probe types with different semantics. Mixing them leads to cascading outages when dependencies fail. The triple-endpoint design expresses “liveness ≠ readiness ≠ diagnostic health” loudly.

Q2: What if the DB has a slow query that makes `/ready` time out?

A: Add a timeout to check_db_connection (e.g., 1 second via asyncio.wait_for). If the DB is slow, mark not_ready — but never mark the pod dead (only /live can do that).

Q3: How do you expose these in a service mesh like Istio?

A: Mesh probes run on a separate path (typically /healthz/ready). Map your endpoints to those via EnvoyFilter configuration. The mesh decides whether to route traffic, not the pod.

Kubernetes Probes — /live, /ready, /health

What? (Concept Overview)

Project Context

How? (Quick Reference Blocks)

3.1 The Health Router

3.2 /live — Liveness Probe

3.3 /ready — Readiness Probe

3.4 /health — Full Diagnostic

Why? (Parameter Breakdown

Common Pitfalls

Real-World Interview Prep

Q1: Your pods are flapping — kubectl get pods shows CrashLoopBackOff. Walk through your diagnostic ladder.

Q2: Why not use a single /ping endpoint for everything?

Q3: How do you differentiate “DB slow” from “DB down” with the readiness probe?

Top-to-Bottom Code Walkthrough (app/routers/health.py)

/health — comprehensive

/ping — immediate response

/ready — readiness probe

/live — liveness probe

The critical distinction

Kubernetes pod spec example

check_db_connection() (from app/database.py)

Common Pitfalls

Real-World Interview Prep

Q1: Why three endpoints instead of one?

Q2: What if the DB has a slow query that makes /ready time out?

Q3: How do you expose these in a service mesh like Istio?

Kubernetes Probes — `/live`, `/ready`, `/health`

3.2 `/live` — Liveness Probe

3.3 `/ready` — Readiness Probe

3.4 `/health` — Full Diagnostic

Q1: Your pods are flapping — `kubectl get pods` shows `CrashLoopBackOff`. Walk through your diagnostic ladder.

Q2: Why not use a single `/ping` endpoint for everything?

Top-to-Bottom Code Walkthrough (`app/routers/health.py`)

`/health` — comprehensive

`/ping` — immediate response

`/ready` — readiness probe

`/live` — liveness probe

`check_db_connection()` (from `app/database.py`)

Q2: What if the DB has a slow query that makes `/ready` time out?