Skip to content

fix(batch): retry R2 upload on transient failure in BatchPayloadProcessor#3331

Merged
matt-aitken merged 1 commit intomainfrom
batch-trigger-payload-fix
Apr 7, 2026
Merged

fix(batch): retry R2 upload on transient failure in BatchPayloadProcessor#3331
matt-aitken merged 1 commit intomainfrom
batch-trigger-payload-fix

Conversation

@matt-aitken
Copy link
Copy Markdown
Member

@matt-aitken matt-aitken commented Apr 7, 2026

A single "fetch failed" from the object store was aborting the entire batch stream with no retry. Added p-retry (3 attempts, 500ms-2s backoff) around ploadPacketToObjectStore so transient network errors self-heal server-side instead of propagating to the SDK.

@changeset-bot
Copy link
Copy Markdown

changeset-bot bot commented Apr 7, 2026

⚠️ No Changeset found

Latest commit: 8046caa

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 7, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 39171e2f-ec62-4557-a41d-fbe5e5943032

📥 Commits

Reviewing files that changed from the base of the PR and between 48bd11c and 8046caa.

📒 Files selected for processing (4)
  • .server-changes/batch-r2-upload-retry.md
  • apps/webapp/app/routes/api.v3.batches.$batchId.items.ts
  • apps/webapp/app/runEngine/concerns/batchPayloads.server.ts
  • apps/webapp/test/engine/batchPayloads.test.ts
✅ Files skipped from review due to trivial changes (3)
  • apps/webapp/app/routes/api.v3.batches.$batchId.items.ts
  • apps/webapp/test/engine/batchPayloads.test.ts
  • .server-changes/batch-r2-upload-retry.md
🚧 Files skipped from review as they are similar to previous changes (1)
  • apps/webapp/app/runEngine/concerns/batchPayloads.server.ts
📜 Recent review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (27)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (7, 8)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (6, 8)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (5, 8)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (4, 8)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (6, 8)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (8, 8)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (2, 8)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (8, 8)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (3, 8)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (4, 8)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (7, 8)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (1, 8)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (2, 8)
  • GitHub Check: units / packages / 🧪 Unit Tests: Packages (1, 1)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (5, 8)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (1, 8)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (3, 8)
  • GitHub Check: sdk-compat / Node.js 20.20 (ubuntu-latest)
  • GitHub Check: sdk-compat / Deno Runtime
  • GitHub Check: sdk-compat / Bun Runtime
  • GitHub Check: sdk-compat / Cloudflare Workers
  • GitHub Check: e2e / 🧪 CLI v3 tests (ubuntu-latest - npm)
  • GitHub Check: e2e / 🧪 CLI v3 tests (ubuntu-latest - pnpm)
  • GitHub Check: e2e / 🧪 CLI v3 tests (windows-latest - npm)
  • GitHub Check: sdk-compat / Node.js 22.12 (ubuntu-latest)
  • GitHub Check: e2e / 🧪 CLI v3 tests (windows-latest - pnpm)
  • GitHub Check: typecheck / typecheck

Walkthrough

This pull request adds retry handling for transient object-store upload failures in batch processing: BatchPayloadProcessor.process() now retries uploadPacketToObjectStore using p-retry with 3 attempts and exponential backoff (500ms–2s). The batch items route no longer sets an x-should-retry: false header on 500 responses. A new Vitest test suite verifies offload success, retry behavior, terminal failure after retries, and the no-offload case. Documentation describing the change was also added.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Explanation

Changes touch four areas: a small route handler header removal, the batch payload processor (adds retried upload with logging and backoff), a comprehensive test file with four scenarios, and a documentation file. The edits are focused on a single feature but include moderate new logic and test coverage requiring separate verification of retry semantics, logging, and test mocks.

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Description check ⚠️ Warning The PR description is minimal but clearly explains the core issue and solution. However, it does not follow the repository's required template structure, which includes sections for Closes #, Checklist, Testing, Changelog, and Screenshots. Provide a description that follows the template structure: include issue reference, completed checklist items, testing details, and a changelog entry. The technical details about p-retry configuration should be included in the Changelog section.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title 'fix(batch): retry R2 upload on transient failure in BatchPayloadProcessor' directly describes the main change: adding retry logic to R2 uploads in the BatchPayloadProcessor, which aligns perfectly with the changeset's primary objective.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch batch-trigger-payload-fix

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

devin-ai-integration[bot]

This comment was marked as resolved.

coderabbitai[bot]

This comment was marked as resolved.

…ssor

A single "fetch failed" from the object store was aborting the entire
batch stream with no retry. Added p-retry (3 attempts, 500ms-2s backoff)
around uploadPacketToObjectStore so transient network errors self-heal
server-side instead of propagating to the SDK.
@matt-aitken matt-aitken force-pushed the batch-trigger-payload-fix branch from 48bd11c to 8046caa Compare April 7, 2026 13:49
@matt-aitken matt-aitken marked this pull request as ready for review April 7, 2026 14:12
@matt-aitken matt-aitken merged commit def21b2 into main Apr 7, 2026
52 checks passed
@matt-aitken matt-aitken deleted the batch-trigger-payload-fix branch April 7, 2026 14:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants