Skip to Content
DatabaseAlembic Migrations with Dual Sync/Async Engines

Alembic Migrations with Dual Sync/Async Engines

What? (Concept Overview)

Alembic needs a synchronous DBAPI driver (psycopg2/psycopg) because it inspects schema state with introspection queries that don’t have async equivalents. Modern FastAPI apps use asyncpg for the runtime engine; bridging the two requires either run_sync adapters or a property that swaps +asyncpgpsycopg2 on the URL. Alembic’s env.py is the single place where this swap happens.

Project Context

The FCA Support Agent’s runtime uses postgresql+asyncpg://... everywhere (app/database.py), but Alembic runs alembic upgrade head against a sync URL. Settings.database_url_sync (a computed property in app/config.py) is the bridging helper — it strips the SQLAlchemy driver prefix and returns a plain postgresql:// URL that psycopg2 accepts. app/alembic/env.py consumes this property directly to feed Alembic’s sync engine.

How? (Quick Reference Blocks)

3.1 The Bridging Property on Settings

# app/config.py — Settings.database_url_sync @property def database_url_sync(self) -> str: """Synchronous URL for Alembic (psycopg2 friendly).""" # Strip +asyncpg +psycopg2 so plain libpq URL remains. return self.database_url.replace("+asyncpg", "")

3.2 Reading It From alembic/env.py

# alembic/env.py from logging.config import fileConfig from sqlalchemy import engine_from_config, pool from alembic import context # Import the Settings singleton — works because env vars are loaded once. from app.config import settings from app.database import Base # metadata holder for autogenerate config = context.config if config.config_file_name is not None: fileConfig(config.config_file_name) # CRITICAL: feed Alembic's [sqlalchemy.url] from Pydantic Settings. # Do NOT hardcode in alembic.ini — env vars own the source of truth. config.set_main_option("sqlalchemy.url", settings.database_url_sync) target_metadata = Base.metadata

3.3 Running Migrations Offline (SQL script)

# alembic/env.py — run_migrations_offline def run_migrations_offline() -> None: url = config.get_main_option("sqlalchemy.url") context.configure( url=url, target_metadata=target_metadata, literal_binds=True, dialect_opts={"paramstyle": "named"}, ) with context.begin_transaction(): context.run_migrations()

3.4 Running Migrations Online (live DB)

# alembic/env.py — run_migrations_online def run_migrations_online() -> None: connectable = engine_from_config( config.get_section(config.config_ini_section, {}), prefix="sqlalchemy.", poolclass=pool.NullPool, # one-shot connection ) with connectable.connect() as connection: context.configure( connection=connection, target_metadata=target_metadata, compare_type=True, # detect column-type changes ) with context.begin_transaction(): context.run_migrations()

Why? (Parameter Breakdown

  • database_url_sync as a computed property on Settings, not a separate env var — Single source of truth for the connection. Operators only configure DATABASE_URL; the sync URL is derived. Eliminates the “migrations point at a different DB than the app” failure mode.
  • .replace("+asyncpg", "") — A first-pass implementation. Cleaner variants swap to a literal +psycopg2 for psycopg2-specific features (e.g., psycopg2.extras). Either works for basic migrations.
  • config.set_main_option("sqlalchemy.url", ...) — Overrides whatever is in alembic.ini. Necessary because alembic.ini cannot read environment variables (only inline values like %(DATABASE_URL)s with interpolation, which is brittle).
  • poolclass=pool.NullPool — Alembic opens ONE connection for the migration run; pooling is unnecessary and would hold a connection idle for the lifetime of the migrate process.
  • compare_type=True — Enables autogenerate to detect column-type changes (e.g., String(50)String(255) would otherwise silently migrate no-op). Without it, autogen produces spurious “no changes detected” diffs.
  • target_metadata = Base.metadata — Wire up autogenerate: alembic revision --autogenerate -m "..." will diff Base.metadata against the live DB. Without this assignment, autogenerate is empty.

Common Pitfalls

  1. Hardcoding the URL in alembic.ini. Operators moving dev → staging will forget to update the file. Centralising the URL through Settings makes stage migrations configless.
  2. Forgetting run_async wrapping. If you mix sync Alembic with async test fixtures, the test event loop will deadlock against Alembic’s sync operations. Run Alembic outside any asyncio loop (e.g., a subprocess in CI).

Real-World Interview Prep

Q1: How do you support async-only migrations (no sync driver installed)?

A: Async migrations are possible in modern SQLAlchemy via async_migration_context. You wrap engine.connect() in await and pass the connection through context.configure(connection=conn) inside an async function, replacing the standard run_migrations. The catch: Alembic’s autogenerate introspection is still sync internally, so you must run_sync(...) every call. In practice, most teams keep sync Alembic (+psycopg2) and treat the runtime engine (+asyncpg) as a separate concern — the asymmetry is worth the operational simplicity.

Q2: How do you detect a migration that would have caused downtime before applying it?

A: Two checks. (1) alembic upgrade head --sql emits the migration as raw SQL without executing it; review the DDL to see if it requires an ACCESS EXCLUSIVE lock on a hot table. (2) Compare the migration against the production schema via pg_locks: if the migration waits on a long-running query, abort and pick a lower-traffic window. Build a CI step that requires SQL review sign-off for any migration that touches a column with >100M rows in production.

Q3: Why is compare_type=True important and when would you disable it?

A: It enables coercion-detection for column type changes (Numeric precision, VARCHAR length, Enum value additions). Disable it for very large bases where spurious diffs outweigh the benefits — for example, when you’ve modelled JSON columns with dict[str, Any] and don’t want every Pydantic addition to trigger a diff. Pair compare_type=True with compare_server_default=True to catch default-expression changes.

Top-to-Bottom Code Walkthrough (alembic/env.py + alembic/versions/)

Alembic migrations and async SQLAlchemy have an incompatibility problem: Alembic’s introspection engine is sync (psycopg2), but the runtime engine is async (asyncpg). The bridge is a tiny URL transformation.

alembic/env.py

  • Imports from app.config import settings — uses the same validated Settings that the runtime uses.
  • Builds the sync URL via settings.database_url_sync — the property strips the +asyncpg driver suffix:
    sync_url = settings.database_url.replace("+asyncpg", "") # "postgresql://fca_user:..." config.set_main_option("sqlalchemy.url", sync_url)
  • Calls target_metadata = Base.metadata to point Alembic at the SQLAlchemy declarative models.
  • run_migrations_offline() and run_migrations_online() are auto-generated templates, but the URL-setting line above is the only project-specific change.

alembic/versions/f57c936ded09_initial_migration_create_customers_.py

  • def upgrade(): — uses op.create_table(...), op.add_column(...), op.create_index(...).
  • def downgrade(): — the reverse of every upgrade step. Importantly, downgrade() is symlink-tested: every change in upgrade() must have a counterpart.
  • Naming convention: revision = "f57c936ded09" (auto-generated SHA-1) and down_revision = None (first migration in the chain).

Why a separate db_pool_size migration step is dangerous

Migrations run before the runtime app, sometimes during deploys. Adding pool tuning inside upgrade() couples schema migrations to runtime config — keep them apart.

Running migrations in CI

alembic upgrade head is called via command: ["alembic", "upgrade", "head"] in the worker service in docker-compose.yml before it starts consuming tasks. The web service depends_on: db: condition: service_healthy covers the DB, but schema is a separate concern.

Async-aware Alembic

For very large migrations (millions of rows), Alembic’s connection-level sync driver is too slow. The project uses a hybrid: schema changes via Alembic, data backfills via a one-shot async script. Never try to write a million rows inside an Alembic upgrade().

Common Pitfalls

Editing a migration after it’s applied is dangerous — the migration’s down_revision is recorded on Postgres, so it now diverges from production. Always add a new migration; never edit old ones.

Missing downgrade() — the team can’t roll back if a migration fails on prod. Make-downgrade mandatory.

Forgetting to create the vector extension — pgvector should be enabled via CREATE EXTENSION IF NOT EXISTS vector; in a migration before any vector(N) column is created.

Real-World Interview Prep

Q1: Why +asyncpg must be stripped for Alembic?

A: Asyncpg is for async I/O; psycopg2 (Alembic’s default driver) is synchronous. Alembic uses psycopg2 because it doesn’t need async I/O while introspecting. The database_url_sync property is a single source of truth that both worlds can read.

Q2: What if I want to test a migration before production?

A: Spin up the migration against a temporary Postgres container in CI. Compare the schema before and after using sqlalchemy.inspect(). The test should be idempotent — apply, downgrade, apply.

Q3: How do you handle data migrations that take hours?

A: Use pg_dump + restore to a new database, then run the data migration in the background with a Celery worker. Production traffic continues on the old DB; a brief switchover commits the changes atomically. Never block prod on a long migration.

Last updated on