Skip to content

Fix crashes during interpreter shutdown on all Python versions#499

Closed
nbouvrette wants to merge 1 commit intopython-greenlet:masterfrom
nbouvrette:fix/safe-getcurrent-during-finalization
Closed

Fix crashes during interpreter shutdown on all Python versions#499
nbouvrette wants to merge 1 commit intopython-greenlet:masterfrom
nbouvrette:fix/safe-getcurrent-during-finalization

Conversation

@nbouvrette
Copy link
Copy Markdown
Contributor

@nbouvrette nbouvrette commented Mar 11, 2026

Summary

Fix multiple SIGSEGV crash paths during Py_FinalizeEx on all Python versions (3.10–3.14). Observed in production on ARM64 (Python 3.11 + uWSGI with max-requests worker recycling) where greenlet was installed as a transitive dependency but never explicitly used by application code.

Relationship to PR #495

PR #495 partially addressed this class of crashes by adding murder_in_place() and _Py_IsFinalizing() guards, but only on Python < 3.11 (#if !GREENLET_PY311). Further investigation revealed that:

  1. The vulnerability exists on all Python versions (3.10–3.14), not just < 3.11 — Py_IsFinalizing() is set after atexit handlers complete on every version.
  2. Multiple crash paths were unguardedgetcurrent(), type checkers (GreenletChecker, MainGreenletExactChecker, ContextExactChecker), and clear_deleteme_list() had no shutdown protection.
  3. PR Fix SIGSEGV/SIGABRT during interpreter shutdown on Python < 3.11 #495's tests were smoke tests, not regression tests — they pass on both pre-fix (greenlet 3.1.1) and post-fix (3.3.2) versions, meaning they cannot detect if the fix is reverted.

This PR supersedes PR #495 by making all guards unconditional, protecting the remaining crash paths, and adding TDD-certified tests that demonstrably fail on unpatched greenlet 3.3.2.

Design

Two independent guards now protect all shutdown phases:

  1. g_greenlet_shutting_down — an atexit handler registered at module init (LIFO = runs first) sets this flag. Covers the atexit phase of Py_FinalizeEx, where Py_IsFinalizing() is still False on all Python versions.

  2. Py_IsFinalizing() — covers the GC collection and later phases of Py_FinalizeEx. A compatibility shim is provided for Python < 3.13 (where only the private _Py_IsFinalizing() existed).

These guards are checked in mod_getcurrent, PyGreenlet_GetCurrent, GreenletChecker, MainGreenletExactChecker, ContextExactChecker, clear_deleteme_list(), ThreadState::~ThreadState(), _green_dealloc_kill_started_non_main_greenlet, and ThreadState_DestroyNoGIL::AddPendingCall.

Root cause

_Py_IsFinalizing() is only set after atexit handlers complete inside Py_FinalizeEx on all Python versions:

Py_FinalizeEx()
├── call_py_exitfuncs()              ← atexit phase (Py_IsFinalizing() == False)
│   └── g_greenlet_shutting_down     ← our flag covers this gap
├── _PyRuntimeState_SetFinalizing()  ← Py_IsFinalizing() becomes True
├── _PyGC_CollectIfEnabled()         ← GC phase (__del__ methods run here)
│   └── Py_IsFinalizing()           ← standard API covers this
├── finalize_interp_clear()          ← type objects freed here

Without the guards, code running in atexit handlers (e.g. uWSGI plugin cleanup calling Py_FinalizeEx) or __del__ methods could call greenlet.getcurrent(), reaching into partially-torn-down C++ state and crashing in PyType_IsSubtype via GreenletChecker.

What changed

C++ shutdown guards (8 files)

File Change
PyModule.cpp g_greenlet_shutting_down + atexit handler made unconditional (was #if !GREENLET_PY311)
CObjects.cpp PyGreenlet_GetCurrent guard made unconditional
PyGreenlet.cpp murder_in_place() guard made unconditional
TThreadState.hpp clear_deleteme_list() + destructor guards made unconditional
TThreadStateDestroy.cpp AddPendingCall guard extended with g_greenlet_shutting_down
greenlet.cpp Atexit handler registration made unconditional
greenlet_refs.hpp Added guards to GreenletChecker + ContextExactChecker
greenlet_internal.hpp Added guard to MainGreenletExactChecker

Additional hardening

  • clear_deleteme_list() uses std::swap (zero-allocation) instead of copying the PythonAllocator-backed vector
  • deleteme vector uses std::allocator (system malloc) instead of PyMem_Malloc
  • ThreadState uses std::malloc/std::free instead of PyObject_Malloc
  • clear_deleteme_list() preserves pending Python exceptions around its cleanup loop

Tests (3 files)

  • 5 new TDD-certified regression tests in test_interpreter_shutdown.py — verified RED on greenlet 3.3.2 (UNGUARDED) and GREEN with fix (GUARDED) across Python 3.10–3.14
  • 3 strengthened smoke tests — assert getcurrent() still returns valid objects when called before greenlet's cleanup (guards against over-blocking)
  • Updated file docstring and section headers — organized 21 tests into 4 documented groups
  • Fixed test_dealloc_catches_GreenletExit_throws_other — use sys.unraisablehook instead of stderr capture (pytest compatibility)
  • Fixed test_version — skip gracefully on old setuptools that can't parse PEP 639 SPDX license format

TDD verification

Ran both test types against unpatched greenlet 3.3.2 and the patched code across 6 Python versions:

Python greenlet 3.3.2 (RED) Patched (GREEN)
3.9 N/A (requires-python >= 3.10) N/A
3.10 UNGUARDED GUARDED (None)
3.11 UNGUARDED GUARDED (None)
3.12 UNGUARDED GUARDED (None)
3.13 UNGUARDED GUARDED (None)
3.14 UNGUARDED GUARDED (None)

PR #495's tests were also re-evaluated as part of this work: all 9 original tests pass on both greenlet 3.1.1 (pre-#495) and 3.3.2 (post-#495), confirming they are smoke tests that cannot detect regressions. The 5 Group D tests added here are the true regression safety net.

Additionally, the crash reproducer (uWSGI + Flask on ARM64 Python 3.11) ran 45,000 requests with 0 crashes (15 worker recycling cycles) with the patched greenlet.

Test plan

  • Full local test suite: 158 passed, 3 skipped, 0 failed (pytest)
  • TDD RED/GREEN verification across Python 3.10–3.14 via Docker
  • Crash reproducer: 45,000 requests, 0 segfaults on ARM64 Python 3.11
  • Behavioral review: murder_in_place() guard only fires during shutdown, not normal thread exit (Group B tests verify GreenletExit/finally still work)
  • Full CI on all supported Python versions

Backport note

These fixes have already been backported to the maint/3.2 branch in PR #500 (targeting 3.2.6), since the previous backport (3.2.5 / PR #495) did not fully stabilize shutdown behavior.

nbouvrette added a commit to nbouvrette/greenlet that referenced this pull request Mar 11, 2026
@nbouvrette nbouvrette force-pushed the fix/safe-getcurrent-during-finalization branch 2 times, most recently from 17d17e3 to 733a419 Compare March 11, 2026 05:37
nbouvrette added a commit to nbouvrette/greenlet that referenced this pull request Mar 12, 2026
Ports all crash fixes from the main branch (PR python-greenlet#499) to maint/3.2 for
a 3.2.6 release targeting Python 3.9 stability.

Three root causes of SIGSEGV during Py_FinalizeEx on Python < 3.11:

1. clear_deleteme_list() vector allocation crash: replaced copy with
   std::swap and switched deleteme_t to std::allocator (system malloc).

2. ThreadState memory corruption: switched from PythonAllocator
   (PyObject_Malloc) to std::malloc/std::free.

3. getcurrent() crash on invalidated type objects: added atexit handler
   that sets g_greenlet_shutting_down before _Py_IsFinalizing() is set.

Also fixes exception preservation in clear_deleteme_list(), adds
Py_IsFinalizing() compat shim for Python < 3.13, Windows USS tolerance
for flaky memory test, and additional shutdown tests.

Made-with: Cursor
@nbouvrette nbouvrette force-pushed the fix/safe-getcurrent-during-finalization branch from 25a4dfa to e1fdf27 Compare March 24, 2026 11:20
@nbouvrette nbouvrette changed the title Fix crash in getcurrent()/greenlet construction during early Py_FinalizeEx Fix crashes during interpreter shutdown on all Python versions Mar 24, 2026
During Py_FinalizeEx, multiple greenlet code paths accessed
partially-destroyed Python state, causing SIGSEGV in production
(uWSGI worker recycling on ARM64 and x86_64, Python 3.11).

Root cause: _Py_IsFinalizing() is set AFTER atexit handlers complete
on ALL Python versions, leaving a window where getcurrent() and type
validators reach into torn-down C++ state.

Fix: Two independent guards now protect all shutdown phases:

1. g_greenlet_shutting_down — atexit handler registered at module init
   (LIFO = runs first). Covers the atexit phase where
   Py_IsFinalizing() is still False.

2. Py_IsFinalizing() — covers the GC collection and later phases.
   A compatibility shim is provided for Python < 3.13.

These guards are checked in mod_getcurrent, PyGreenlet_GetCurrent,
GreenletChecker, MainGreenletExactChecker, ContextExactChecker,
clear_deleteme_list, ThreadState destructor,
_green_dealloc_kill_started_non_main_greenlet, and AddPendingCall.

Additional hardening:
- clear_deleteme_list() uses std::swap (zero-allocation)
- deleteme vector uses std::allocator (system malloc)
- ThreadState uses std::malloc/std::free
- clear_deleteme_list() preserves pending Python exceptions

TDD-certified: tests fail on greenlet 3.3.2 and pass with the fix
across Python 3.10-3.14. Test suite: 21 shutdown tests (5 TDD
regression, 2 behavioral, 14 smoke with 3 strengthened).

Also fixes:
- test_dealloc_catches_GreenletExit_throws_other: use
  sys.unraisablehook for pytest compatibility
- test_version: skip gracefully on old setuptools (PEP 639)
- test_no_gil_on_free_threaded: use getattr for pylint compatibility
- Flaky USS memory test on Windows

Made-with: Cursor
@nbouvrette nbouvrette force-pushed the fix/safe-getcurrent-during-finalization branch from a4a6510 to 5745a6c Compare March 24, 2026 11:47
nbouvrette added a commit to nbouvrette/greenlet that referenced this pull request Mar 24, 2026
Backport of PR python-greenlet#499 (master) to maint/3.2 for greenlet 3.2.6, with all
shutdown guards made unconditional across Python 3.9-3.13.

The previous backport (3.2.5 / PR python-greenlet#495) only guarded Python < 3.11,
but the vulnerability exists on ALL Python versions: Py_IsFinalizing()
is set AFTER atexit handlers complete inside Py_FinalizeEx.

Two independent guards now protect all shutdown phases:

1. g_greenlet_shutting_down — atexit handler registered at module init
   (LIFO = runs first). Covers the atexit phase where
   Py_IsFinalizing() is still False.

2. Py_IsFinalizing() — covers the GC collection and later phases.
   A compatibility shim maps to _Py_IsFinalizing() on Python < 3.13.

These guards are checked in mod_getcurrent, PyGreenlet_GetCurrent,
GreenletChecker, MainGreenletExactChecker, ContextExactChecker,
clear_deleteme_list, ThreadState destructor,
_green_dealloc_kill_started_non_main_greenlet, and AddPendingCall.

Additional hardening:
- clear_deleteme_list() uses std::swap (zero-allocation)
- deleteme vector uses std::allocator (system malloc)
- ThreadState uses std::malloc/std::free
- clear_deleteme_list() preserves pending Python exceptions

TDD-certified: tests fail on greenlet 3.3.2 and pass with the fix
across Python 3.10-3.14. Docker verification on Python 3.9 and 3.10
confirms GUARDED on the maint/3.2 branch.

Also fixes:
- SPDX license identifier: Python-2.0 -> PSF-2.0
- test_dealloc_catches_GreenletExit_throws_other: use
  sys.unraisablehook for pytest compatibility
- test_version: skip gracefully on old setuptools
- Flaky USS memory test on Windows

Made-with: Cursor
@jamadden
Copy link
Copy Markdown
Contributor

Thanks for this PR, it looks sound to me.

I've made some minor changes (be more idiomatic, add some more comments, resolve the conflict) which I'll merge after the tests pass.

@jamadden
Copy link
Copy Markdown
Contributor

Apparently I handled the merge wrong as far as the UI is concerned, but this has been merged.

@jamadden jamadden closed this Mar 31, 2026
@nbouvrette nbouvrette deleted the fix/safe-getcurrent-during-finalization branch March 31, 2026 19:01
@nbouvrette
Copy link
Copy Markdown
Contributor Author

Thanks @jamadden - are you consider #500 as well? it would really be great if its possible to backport this fix

@nbouvrette
Copy link
Copy Markdown
Contributor Author

I can bring all your other commits in #500 if this help

nbouvrette added a commit to nbouvrette/greenlet that referenced this pull request Mar 31, 2026
Port the maintainer's (jamadden) follow-up refinements from master
to the maint/3.2 backport branch:

- Refactor atexit registration to use NewReference/Require framework
  instead of nested ifs and manual decrefs. Crash safety is not
  optional. (38bf3d7)
- Namespace g_greenlet_shutting_down as static in namespace greenlet
  instead of extern across multiple files. (c545379)
- Encapsulate the dual-guard pattern in IsShuttingDown() function,
  replacing all g_greenlet_shutting_down || Py_IsFinalizing() checks.
  (879a868, fbb4bcd)
- Add comment explaining why g_greenlet_shutting_down does not need
  std::atomic<int>. (c79fb07)
- Add comments on deliberate leaking and exception safety in
  clear_deleteme_list. (6c517cb)
- Comment cleanup and formatting. (fcf6f72, 98e8fb0 partial)

Skipped (not applicable to maint/3.2):
- CI: bump docker/setup-qemu-action (7b56329) — different CI config
- test_greenlet: revert getattr for _is_gil_enabled (98e8fb0) — test
  does not exist on maint/3.2

All 21 shutdown tests pass, full suite 159 passed / 1 skipped.

Made-with: Cursor
@nbouvrette
Copy link
Copy Markdown
Contributor Author

nbouvrette commented Mar 31, 2026

@jamadden I ported back all your latest changes on #500

Ported (7 of 8 commits):

Change Files Why port
Atexit refactor — use NewReference/Require framework greenlet.cpp Cleaner code, crash safety non-optional (no PyErr_Clear fallback)
Namespace the flagstatic int in namespace greenlet 7 C++ files Better scoping, removes extern declarations from 3 files
IsShuttingDown() function greenlet_refs.hpp + 6 callers Encapsulates the dual-guard pattern, reduces code duplication
Atomicity comment greenlet_refs.hpp Documents why std::atomic isn't needed
clear_deleteme_list comments TThreadState.hpp Documents deliberate leaking and exception safety
Comment cleanup 4 files Formatting parity with master
test_leaks.py comment reformat test_leaks.py Formatting parity

Skipped (2 items):

  • CI bump (docker/setup-qemu-action v3→v4) — different CI config on maint/3.2
  • test_greenlet.py getattr revert — test_no_gil_on_free_threaded doesn't exist on maint/3.2

Verification: Build succeeded, all 21 shutdown tests passed, full suite 159 passed / 1 skipped.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants