-
Notifications
You must be signed in to change notification settings - Fork 16.7k
Description
Apache Airflow version
main (development)
If "Other Airflow 3 version" selected, which one?
No response
What happened?
Migration 0101_3_2_0_ui_improvements_for_deadlines downgrade creates a savepoint for every serialized_dag row via conn.begin_nested() to isolate per-DAG errors during deadline alert data restoration
for serialized_dag_id, dag_id, data, data_compressed in batch_results:
savepoint = conn.begin_nested()
try:
# ... process DAG ...
savepoint.commit()
except Exception as e:
savepoint.rollback()
On PostgreSQL, locks acquired within a savepoint are not released when the savepoint is committed — they persist in the parent transaction until the outer COMMIT. Since the entire migration runs in a single Alembic transaction, lock entries accumulate across all DAGs.
PostgreSQL's shared lock table size is max_locks_per_transaction × max_connections (default: 64 × 100 = 6,400 entries). Each savepoint acquires multiple lock entries (relation-level locks on serialized_dag, deadline_alert, plus the savepoint virtual transaction ID). With thousands of DAGs the lock table overflows.
Observed behavior:
With 100K serialized_dag entries the migration failed after processing ~20K DAGs (batch 2 of 11):
psycopg2.errors.OutOfMemory: out of shared memoryHINT: You might need to increase max_locks_per_transaction.CONTEXT: parallel worker
This was followed by a secondary RecursionError as SQLAlchemy tried to unwind 970+ nested savepoint objects via recursive _cancel() calls during rollback:
RecursionError: maximum recursion depth exceeded
Deployments with more than ~1K–2K DAGs will likely hit this on default PostgreSQL settings. The exact threshold depends on how many lock entries each savepoint consumes and the configured max_locks_per_transaction and max_connections.
What you think should happen instead?
No response
How to reproduce
Steps to reproduce:
- Start Airflow on 3.1.8 with PostgreSQL backend (default max_locks_per_transaction = 64)
- Create 2K+ DAGs into serialized_dag with deadline alert data
- Upgrade to 3.2.0 (airflow db migrate)
- Run airflow db downgrade -n 3.1.8
- Migration 0101 fails with out of shared memory error
Operating System
Macos
Versions of Apache Airflow Providers
No response
Deployment
Official Apache Airflow Helm Chart
Deployment details
No response
Anything else?
No response
Are you willing to submit PR?
- Yes I am willing to submit a PR!
Code of Conduct
- I agree to follow this project's Code of Conduct