Skip to content

feat(webui): Add S3 compression job submission support with browsable S3 key selection.#2169

Draft
junhaoliao wants to merge 3 commits intoy-scope:mainfrom
junhaoliao:webui-s3-form
Draft

feat(webui): Add S3 compression job submission support with browsable S3 key selection.#2169
junhaoliao wants to merge 3 commits intoy-scope:mainfrom
junhaoliao:webui-s3-form

Conversation

@junhaoliao
Copy link
Copy Markdown
Member

Description

Add S3 compression job submission support to the WebUI. When logs_input.type is s3 in the CLP
deployment config, the WebUI now shows an S3 compression form (instead of the filesystem form) that
allows users to browse S3 bucket contents via a TreeSelect and submit compression jobs targeting
specific S3 objects or prefixes.

Design:

  • AWS authentication (aws_authentication) comes from the CLP deployment config
    (S3IngestionConfig) and is propagated to the WebUI server at startup via flat settings keys
    (e.g., LogsInputS3AwsAuthType, LogsInputS3AwsProfile). Credentials are passed through
    environment variables (CLP_LOGS_INPUT_AWS_ACCESS_KEY_ID /
    CLP_LOGS_INPUT_AWS_SECRET_ACCESS_KEY), matching the existing CLP_STREAM_OUTPUT pattern.
  • Per-job details (bucket, regionCode, keyPrefix/keys) come from the user's form
    submission. The region is per-request since different buckets may reside in different regions.
  • Custom S3 endpoints are not supported for logs input at this time.

Key changes:

  • Common schemas (compression.ts, s3.ts):

    • Refactor CompressionJobCreationSchema into a Type.Union discriminated on inputType
      ("fs" | "s3"). Extract shared CLP-S fields (dataset, timestampKey, unstructured) into
      a clpSFields spread object to eliminate duplication.
    • Add S3CompressionJobCreationSchema with bucket (required), regionCode (required),
      keyPrefix (optional), and keys (optional) fields.
    • Add region (required) to S3ListRequestSchema for per-request region support.
    • New s3.ts with AwsAuthType enum, AwsAuthenticationSchema, S3ListRequestSchema,
      S3ListResponseSchema, and S3EntrySchema.
  • S3Manager plugin (S3Manager/):

    • Extract resolveAwsCredentials() helper to deduplicate credential resolution across logs input
      and stream files configs. Priority: env var credentials > named profile > SDK default chain.
    • Add S3ListManager (separate file) with per-request region support — creates an S3Client for
      each listing call with the user-provided region.
    • Rename container-side env vars from bare AWS_ACCESS_KEY_ID /
      AWS_SECRET_ACCESS_KEY to CLP_STREAM_OUTPUT_AWS_ACCESS_KEY_ID /
      CLP_STREAM_OUTPUT_AWS_SECRET_ACCESS_KEY in both docker-compose and Helm chart.
    • Add CLP_LOGS_INPUT_AWS_ACCESS_KEY_ID / CLP_LOGS_INPUT_AWS_SECRET_ACCESS_KEY env vars to
      docker-compose and Helm webui-deployment (conditional on logs_input.type: "s3" with
      aws_authentication.type: "credentials").
    • Add LogsInputS3AwsAuthType and LogsInputS3AwsProfile to Helm configmap's
      webui-server-settings.json (conditional on logs_input.type: "s3").
  • S3 listing API (routes/api/s3/index.ts): New GET /api/s3/ls endpoint that delegates to
    S3ListManager.listObjects() with bucket, prefix, and region from query params.

  • Compress route (routes/api/compress/index.ts): Refactor into buildFsJobConfig() and
    buildS3JobConfig() builders. The S3 builder populates aws_authentication from server settings
    and env vars, while bucket, regionCode, keyPrefix, and keys come from the request body.

  • Client S3 form (S3InputFormItems.tsx, S3KeySelectFormItem/):

    • Bucket and AWS Region inputs plus a TreeSelect for browsing/selecting S3 keys/prefixes.
    • TreeSelect features: lazy loading of prefixes on expand, pagination via "Load more..." nodes,
      automatic reset on bucket/region change, SHOW_PARENT display strategy.
    • Selection maps to keyPrefix (prefix selections) or keys (object selections) on the payload.
  • Python config (clp_config.py): Remove endpoint_url and region_code from
    S3IngestionConfig — region is per-job (from the form), and custom endpoints are not supported.

  • Controller (controller.py): Propagate only LogsInputS3AwsAuthType and
    LogsInputS3AwsProfile (no LogsInputS3EndpointUrl or LogsInputS3Region) from
    S3IngestionConfig to WebUI server settings.

  • Stream-files route (stream-files/index.ts): Update S3Manager reference to renamed
    StreamFilesS3Manager decorator.

Checklist

  • The PR satisfies the contribution guidelines.
  • This is a breaking change and that has been indicated in the PR title, OR this isn't a
    breaking change.
  • Necessary docs have been updated, OR no docs need to be updated.

Validation performed

1. Build and lint checks pass

Task: Verify all WebUI workspaces build and pass lint checks with zero warnings.

Command:

cd components/webui
npm run build --workspace=common && npm run build --workspace=server
npm run lint:check

Output:

> @webui/common@0.1.0 build
> tsc

> @webui/server@0.1.0 build
> tsc

> @webui/common@0.1.0 lint:check
> eslint . --max-warnings 0

> @webui/client@0.1.0 lint:check
> eslint . --max-warnings 0

> @webui/server@0.1.0 lint:check
> eslint . --max-warnings 0

2. Package build succeeds

Task: Verify the full CLP package builds without errors.

Command:

task package

Output:

...
#31 [stage-1 19/19] COPY --link --chown=1000 ./build/webui/ var/www/webui/
#31 DONE 1.8s
...
task: [package] echo '0.10.1-dev' > '/home/junhao/workspace/8-clp/build/clp-package/VERSION'

3. CLP starts with S3 input configured

Task: Configure logs_input.type: "s3" with AWS credentials and start the CLP package.

Command:

cd build/clp-package
./sbin/start-clp.sh

Output:

...
Container clp-package-5654-webui-1 Healthy
Container clp-package-5654-database-1 Healthy
2026-04-01T23:08:45.325 INFO [controller] Started CLP.

4. S3 listing API returns bucket contents with per-request region

Task: Verify the /api/s3/ls endpoint accepts region and correctly lists S3 objects.

Command:

curl -s 'http://localhost:4000/api/s3/ls?bucket=yscope-junhao-test&region=us-east-2' | python3 -m json.tool

Output:

{
    "entries": [
        {"isPrefix": true, "key": "/"},
        {"isPrefix": true, "key": "archives/"},
        {"isPrefix": true, "key": "clp-archives/"},
        {"isPrefix": true, "key": "streams/"},
        {"isPrefix": false, "key": "postgresql-simple.jsonl"},
        {"isPrefix": false, "key": "yarn.log"}
    ],
    "isTruncated": false,
    "nextContinuationToken": null
}

Explanation: The listing endpoint creates an S3Client with the provided region and returns both
common prefixes and objects. The region parameter enables listing buckets in different regions
without restarting CLP.

5. End-to-end S3 compression job completes with per-job region

Task: Submit an S3 compression job with regionCode in the request body.

Command:

curl -s -X POST http://localhost:4000/api/compress/ \
  -H 'Content-Type: application/json' \
  -d '{
    "inputType": "s3",
    "bucket": "yscope-junhao-test",
    "regionCode": "us-east-2",
    "keys": ["postgresql-simple.jsonl"],
    "dataset": "test_dataset",
    "timestampKey": "timestamp"
  }'

Output:

{"jobId":1}

Command:

docker logs clp-package-5654-compression-scheduler-1 2>&1 | tail -5

Output:

2026-04-01 23:08:42,643 compression_scheduler [INFO] Starting compression_scheduler
2026-04-01 23:09:42,041 compression_scheduler [INFO] Dispatched job 1 with 1 tasks (0 remaining).
2026-04-01 23:09:42,542 compression_scheduler [INFO] Compression task job-1-task-1 completed in 0.25391 second(s).
2026-04-01 23:09:42,542 compression_scheduler [INFO] Job 1 succeeded (1 tasks completed).

Explanation: The compression job completed successfully. The aws_authentication came from
server settings, while bucket, regionCode, and keys came from the request body.

6. Stop CLP

Command:

./sbin/stop-clp.sh

Output:

2026-04-01T23:10:17.848 INFO [controller] Stopped CLP.

@junhaoliao junhaoliao requested a review from a team as a code owner April 2, 2026 10:11
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 2, 2026

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: c2ceeef8-445a-45d3-b83e-3015e43fc7a5

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@junhaoliao junhaoliao marked this pull request as draft April 2, 2026 10:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant