Skip to content

fix: crash in background with ExpiringActivity - WPB-23839#4402

Open
netbe wants to merge 8 commits intodevelopfrom
fix/crash-background-expiringactivity
Open

fix: crash in background with ExpiringActivity - WPB-23839#4402
netbe wants to merge 8 commits intodevelopfrom
fix/crash-background-expiringactivity

Conversation

@netbe
Copy link
Copy Markdown
Collaborator

@netbe netbe commented Mar 5, 2026

BugWPB-23839 [iOS] Crash CoreFoundation: __CFRunLoopServiceMachPort + 160

Issue

The app was being killed by the iOS watchdog with termination reason FRONTBOARD 0xBAADCA11 after running
for ~6 hours in the background. The crash was observed in TestFlight (Wire 4.16.0 / build 17993, iPhone
18,1, iOS 26.3).

  Exception Type:  EXC_CRASH (SIGKILL)
  Termination Reason: FRONTBOARD 0xbaadca11

  Thread 14 (crashing):
    semaphore_wait_trap
    _dispatch_semaphore_wait_slow
    closure #1 in closure #1 in ExpiringActivityManager.withExpiringActivity(reason:block:)
    ExpiringActivity.swift:79

0xBAADCA11 ("bad call") is the watchdog timer — iOS killed the app because it blocked a system-managed
thread indefinitely.

Root cause in ExpiringActivity.swift:

performExpiringActivity's callback must block its thread until the work is done (Apple API requirement).
The implementation used a DispatchSemaphore for this:

// expiring = false branch
Task { try await self.startWork(block: block, semaphore: semaphore).value ... }
semaphore.wait() // ← blocks the callback thread

// expiring = true branch
Task { try await self.stopWork() } // cancels the inner task
// ← semaphore is never signaled if block() ignores Task.cancel()

When expiring = true fires, stopWork() calls task.cancel(). If block() is a cooperative caller that
checks Task.checkCancellation(), the inner task exits and its defer { semaphore.signal() } fires
normally. But if block() does not cooperate with cancellation — which is a valid real-world scenario —
the inner task keeps running, semaphore.signal() is never called, and semaphore.wait() blocks the
callback thread indefinitely until the watchdog intervenes.

A secondary bug was also present: if expiring = true fires before the actor executes startWork (a race
condition), self.task is nil, stopWork() throws, and continuation.resume(throwing:) is called. When
startWork later runs and block() completes, continuation.resume() is called a second time — undefined
behaviour on CheckedContinuation.

Solution:

Two changes:

  1. The expiring = true branch now always signals the semaphore and resumes the continuation directly,
    without relying on the inner task to do so:
  Task {
      do {
          try await self.stopWork()
          finish(.failure(CancellationError()))   // resume caller immediately
      } catch {
          finish(.failure(error))
      }
      semaphore.signal()   // always unblock callback thread
  }

This guarantees the callback thread is unblocked and withExpiringActivity returns to the caller
regardless of whether block() cooperates with cancellation.

  1. All continuation.resume() calls are routed through OnceAction, a simple NSLock-guarded wrapper that
    ensures the continuation is resumed at most once. This eliminates the double-resume race and makes the
    fix safe.

Testing

  • testBug1_ActivityCompletesAfterExpiry_WhenBlockIgnoresCancellation — verifies withExpiringActivity
    returns promptly on expiry even when block() loops with Task.yield() and never calls
    Task.checkCancellation().
  • testBug2_NoContinuationDoubleResume_WhenExpiryRacesBeforeStartWork — verifies no double-resume occurs
    when the expiring = true callback is processed by the actor before startWork.

Checklist

  • Title contains a reference JIRA issue number like [WPB-XXX].
  • Description is filled and free of optional paragraphs.
  • Adds/updates automated tests.

UI accessibility checklist

If your PR includes UI changes, please utilize this checklist:

  • Make sure you use the API for UI elements that support large fonts.
  • All colors are taken from WireDesign.ColorTheme or constructed using WireDesign.BaseColorPalette.
  • New UI elements have Accessibility strings for VoiceOver.

@netbe netbe added the WIP label Mar 9, 2026
@netbe netbe requested a review from typfel March 11, 2026 15:26
@netbe netbe removed the WIP label Mar 11, 2026
@netbe netbe requested a review from caldrian March 11, 2026 15:28
@netbe netbe marked this pull request as ready for review March 11, 2026 15:28
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 11, 2026

Test Results

86 tests   83 ✅  2s ⏱️
13 suites   3 💤
 1 files     0 ❌

Results for commit a281549.

♻️ This comment has been updated with latest results.

Summary: workflow run #23857699663
Allure report (download zip): html-report-28990-fix_crash-background-expiringactivity

@caldrian
Copy link
Copy Markdown
Contributor

Could/should this go into release/cycle-4.17? @netbe

continuation.resume(throwing: error)
finish(.failure(error))
}
// Unblock the callback thread regardless. A double-signal (if the
Copy link
Copy Markdown
Member

@typfel typfel Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't this mean we are not waiting until the task has finished "cancelling" anymore and we'll instead immediately exit function and let the app suspend.

Couldn't this happen:

  1. start expiring activity decrypting messages
  2. activity expires and signal to cancel the decryption task
  3. we exit the activity
  4. crash because the decryption task didn't finish cancelling and therefore didn't have time to release the transaction lock.

@netbe netbe requested a review from David-Henner April 1, 2026 15:51
func withExpiringActivity(reason: String, block: @escaping () async throws -> Void) async throws {
try await withTaskCancellationHandler {
try await withCheckedThrowingContinuation { continuation in
// Shared between both callback branches.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@David-Henner since you added withTaskCancellationHandler I wonder if the PR still makes sense

@sonarqubecloud
Copy link
Copy Markdown

sonarqubecloud bot commented Apr 1, 2026

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants