Add Sourcepoint first-party CMP integration#625
Add Sourcepoint first-party CMP integration#625ChristianPavilonis wants to merge 11 commits intomainfrom
Conversation
… through first-party proxy
… URLs to first-party paths
…visibility, improve docs
aram356
left a comment
There was a problem hiding this comment.
Summary
Adds a well-structured Sourcepoint CMP first-party proxy with four rewriting layers (HTML attributes, JS body, runtime config trap, client-side DOM guard). The implementation follows existing integration patterns closely, has 14 Rust + 6 JS tests, and ships disabled by default. A few items need attention before merge.
Blocking
🔧 wrench
take_body_str()panics on non-UTF-8 upstream responses:response.take_body_str()at line 400 will panic if the upstream CDN returns invalid UTF-8 with a JS Content-Type. Read bytes and attempt conversion with a fallback instead. (sourcepoint.rs:400)- CI failure —
format-typescript: Prettier check is failing ontest/integrations/sourcepoint/script_guard.test.ts. Must be fixed before merge.
❓ question
- Redirect
Locationheaders not rewritten: The JS body rewrite is gated onstatus == 200. If the CDN returns a 3xx redirect with aLocationpointing tocdn.privacy-mgmt.com, the browser follows it back to the third-party CDN, defeating the proxy. (sourcepoint.rs:394)
Non-blocking
🤔 thinking
- Upstream response headers forwarded verbatim:
Set-Cookieand other headers fromcdn.privacy-mgmt.compass through to the client on the non-rewritten path. Same as Didomi, but worth confirming this is intended. (sourcepoint.rs:418) - Regex matches mismatched quotes: The CDN URL pattern can match
'url"(single open, double close). Extremely unlikely in minified JS but could produce malformed output. (sourcepoint.rs:65)
♻️ refactor
- Redundant
rewrite_sdkguard: Checked in bothhandles_attribute()andrewrite()— the second is unreachable. (sourcepoint.rs:437) - Head injector config property names: Hardcoded Sourcepoint config keys could go stale. A test asserting known property names appear in the generated script would catch omissions. (
sourcepoint.rs:467)
🌱 seedling
- No body size limit: The body is read entirely into memory with no upper bound. A
Content-Lengthcheck would prevent OOM from unexpected large responses. (sourcepoint.rs:400) - Missing validation test for invalid
cdn_origin: No test proves thatregister()fails whencdn_originis not a valid URL. (sourcepoint.rs:153)
⛏ nitpick
- Manual
[sourcepoint]log prefix: Other integrations rely on thelogcrate's module path rather than a manual bracketed prefix. (sourcepoint.rs:355)
CI Status
- cargo fmt: PASS
- cargo test: PASS
- vitest: PASS
- format-typescript: FAIL (Prettier issue in
script_guard.test.ts) - format-docs: PASS
- browser integration tests: PASS
- integration tests: PASS
- CodeQL: FAIL (likely infra)
prk-Jr
left a comment
There was a problem hiding this comment.
Reviewed commit a211eb0. Three blocking issues require fixes before merge; the rest are non-blocking.
Blocking (must fix before merge)
B1 — CI: format-typescript still failing on current HEAD
The latest commit (a211eb02) re-introduced a Prettier violation in crates/js/lib/test/integrations/sourcepoint/script_guard.test.ts. CI's format-typescript check will fail on this commit. Fix with:
cd crates/js/lib && npm run format -- --write
then commit the result.
B2 — SSRF via cdn_origin (inline comment on line 96)
See inline comment. A syntactically valid but host-unrestricted cdn_origin allows proxying to any IP including 169.254.169.254 (cloud metadata). Needs a custom host validator.
B3 — PUT/PATCH unreachable (inline comment on line 337)
routes() only registers GET and POST; the PUT/PATCH arms in handle() are dead code. Either register the methods or remove the arms.
Important (non-blocking)
I1 — Single-quoted '/unified/' missed by SP_ORIGIN_UNIFIED_PATTERN (inline on line 81) — fix is a one-character change to the regex character class.
I2 — BackendConfig::from_url per request (inline on line 357) — pre-compute in new() following the pattern in aps.rs / prebid.rs.
I3 — PR checklist says "Uses tracing macros" but codebase uses log crate
The checklist item reads "Uses tracing macros (not println!)" but CLAUDE.md specifies log. The source correctly uses log::info! throughout. The checklist entry should read "Uses log macros (not println!)" to avoid confusion for future contributors.
I4 — JS-rewrite path drops all upstream headers (inline on line 403) — Response::new() discards Vary, CORS, and security headers. Start from response.clone_without_body() instead.
I5 — No test for build_target_url with a path-bearing cdn_origin (inline on line 337) — set_path silently drops any path prefix in cdn_origin. Should be tested and either fixed or documented as an explicit constraint.
Suggestions (non-blocking)
S1 — URL normalisation logic duplicated across Rust and TypeScript
parse_sourcepoint_url in sourcepoint.rs and normalizeSourcepointUrl in script_guard.ts implement identical protocol-relative URL normalisation in parallel. Not a defect, but a cross-reference comment in each file would prevent future fixes being applied to only one side.
S2 — Redundant rewrite_sdk guard in rewrite() (inline on line 437) — unreachable because handles_attribute already gates on it. Either remove with a comment or document the intentional belt-and-suspenders.
S3 — No test for single-quoted CDN URLs (inline on line 81)
S4 — Unchecked WASM build in PR checklist
The test plan has an unchecked - [ ] WASM build item, but CI's WASM build passes. The checkbox should be checked to accurately reflect build status.
S5 — register missing # Examples doc section (inline on line 314) — CLAUDE.md requires this for all public API functions.
- Replace #[validate(url)] with a custom validator that restricts cdn_origin to *.privacy-mgmt.com hosts, preventing SSRF via arbitrary origins (e.g. cloud metadata endpoints). - Remove unreachable PUT/PATCH arms from the request body match since routes() only registers GET and POST. - Fix Prettier formatting in script_guard.test.ts. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Replace take_body_str() with take_body_bytes() + String::from_utf8() to avoid panicking on non-UTF-8 upstream responses. - Rewrite Location headers on 3xx redirects that point to cdn.privacy-mgmt.com so browsers stay on the first-party proxy. - Preserve upstream CORS headers on the JS-rewrite path instead of discarding them when building a fresh Response. - Extend SP_ORIGIN_UNIFIED_PATTERN to match both single- and double-quoted "/unified/" chunk paths, preserving the original quote character in the replacement. - Normalise log prefixes from [sourcepoint] to Sourcepoint: for consistency with APS/Prebid style. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Remove redundant rewrite_sdk check from rewrite() since handles_attribute() already gates on it; update test to verify the guard at the handles_attribute level. - Add # Examples section to register() per documentation standards. - Add tests for cdn_origin validation (rejects non-privacy-mgmt.com hosts, accepts valid origins). - Add test for single-quoted origin+'/unified/' rewrite pattern. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Review feedback addressedAll blocking and important findings have been fixed across three commits. Replies posted on each inline thread. Blocking (fixed)
Important non-blocking (fixed)
Suggestions (fixed)
Intentionally skipped (with rationale in thread replies)
All CI checks pass locally: |
Refactor normalizeSourcepointUrl to remove the bare-domain startsWith check that triggered CodeQL "Incomplete URL substring sanitization" alerts. The host === exact match was already the security boundary; now the normalization layer no longer references the CDN hostname at all, eliminating the static analysis finding. Add a Content-Length guard (5 MB) before reading upstream response bodies into memory for JavaScript rewriting, preventing unbounded memory consumption from unexpectedly large responses. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Summary
cdn.privacy-mgmt.comandgeo.privacymanager.iotraffic through Trusted Server, eliminating third-party requests for consent management assets.Changes
crates/trusted-server-core/src/integrations/sourcepoint.rsrewrite_script_content), head injector (window._sp_property trap), Accept-Encoding scoping, backend selection, and 14 unit testscrates/trusted-server-core/src/integrations/mod.rssourcepointmodule and wire it into the integration registrycrates/js/lib/src/integrations/sourcepoint/index.tscrates/js/lib/src/integrations/sourcepoint/script_guard.ts<script>/<link>elements pointing at Sourcepoint CDN and rewrites them to first-party pathscrates/js/lib/test/integrations/sourcepoint/script_guard.test.tsdocs/guide/integrations/sourcepoint.mddocs/guide/integrations-overview.mdtrusted-server.toml[integrations.sourcepoint]config stanza (disabled by default)Closes
Closes #145
Closes #344
Closes #345
Test plan
cargo test --workspacecargo clippy --workspace --all-targets --all-features -- -D warningscargo fmt --all -- --checkcd crates/js/lib && npx vitest runcd crates/js/lib && npm run formatcd docs && npm run formatcargo build --package trusted-server-adapter-fastly --release --target wasm32-wasip1fastly compute serveChecklist
unwrap()in production code — useexpect("should ...")logmacros (notprintln!)