Skip to content

refactor: split pd/server/store image stage & bump to V2 version#13

Merged
imbajin merged 11 commits intomasterfrom
publish-gate
Mar 21, 2026
Merged

refactor: split pd/server/store image stage & bump to V2 version#13
imbajin merged 11 commits intomasterfrom
publish-gate

Conversation

@imbajin
Copy link
Collaborator

@imbajin imbajin commented Mar 21, 2026

Related PR:

Refer:

TODO: (After Test)
image
image

Summary by CodeRabbit

发布说明

  • 新功能

    • 增加可选严格模式输入(默认开启),并支持自定义等待超时参数。
  • 改进

    • 引入源码解析以锁定构建来源,流程拆分为预检与按模块发布矩阵,支持模块化自检与多架构镜像推送。
    • 增加并发控制与输入校验,优化构建缓存策略与磁盘使用汇报。
  • Bug修复

    • 改善故障时的日志采集、资源清理与回滚流程,确保异常后正确停止并释放环境。

- replace gate_mode choice with strict_mode boolean input (default true)

- run integration precheck only when strict_mode is enabled

- keep matrix publish path available when precheck is skipped

- preserve per-module self-check and multi-arch push behavior
Copilot AI review requested due to automatic review settings March 21, 2026 12:10
@gemini-code-assist
Copy link

Note

Gemini is unable to generate a summary for this pull request due to the file types involved not being currently supported.

@dosubot dosubot bot added the size:L This PR changes 100-499 lines, ignoring generated files. label Mar 21, 2026
@github-actions
Copy link

@codecov-ai-reviewer review

@coderabbitai
Copy link

coderabbitai bot commented Mar 21, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

新增可选严格模式与并行控制;拆分为 resolve_source(固定 upstream SHA)、可选 integration_precheck(本地 x86 构建 + Compose 健康检查)和矩阵化 publish_matrix(模块自检后构建/推送多架构镜像);引入模块级 BuildKit 缓存与清理流程。

Changes

Cohort / File(s) Summary
工作流主文件
​.github/workflows/publish_latest_pd_store_server_image.yml
重构工作流:新增 workflow_dispatch 输入 strict_modewait_timeout_sec;加入顶层 concurrency;将单 job 拆为多个阶段并调整 Maven 参数引用为 `${{ github.event.inputs.mvn_args
解析上游源码
​.github/workflows/.../publish_latest_pd_store_server_image.yml (resolve_source job)
新增 resolve_source 作业以解析并输出 SOURCE_SHA,供后续作业 checkout 使用。
集成预检(可选)
​.github/workflows/.../publish_latest_pd_store_server_image.yml (integration_precheck job/steps)
新增受 strict_mode 控制的预检:用 buildx 在本地构建 x86 镜像并 load,生成 compose override 强制 pull_policy: never,启动 compose 并轮询 /v1/health/versions,失败时导出日志,始终 teardown 并清理 Docker 状态;加入 WAIT_TIMEOUT_SEC 整数验证。
矩阵化发布
​.github/workflows/.../publish_latest_pd_store_server_image.yml (publish_matrix job/steps)
用模块矩阵替代直接多架构单步:每模块先构建/加载 x86 自检镜像并可选以 compose 启动模块自检(轮询模块 probe_path),通过后使用 buildx+QEMU 构建并推送 multi-arch(amd64, arm64),每步使用模块作用域 cache 并在结尾清理/汇报磁盘使用。
工具链与 action 版本
​.github/workflows/.../publish_latest_pd_store_server_image.yml
统一/升级 actions:actions/checkout@v4docker/setup-buildx-action@v4docker/login-action@v4docker/build-push-action@v7docker/setup-qemu-action@v4;调整 build-push 的 cache-from/cache-to 为模块范围,并添加日志/清理步骤。

Sequence Diagram(s)

sequenceDiagram
    actor GitHub as GitHub Actions
    participant Dispatch as workflow_dispatch
    participant Resolve as resolve_source
    participant Precheck as integration_precheck
    participant BuilderLocal as buildx_local_x86
    participant Compose as DockerCompose
    participant Services as PD/Store/Server
    participant Health as HealthEndpoints
    participant Matrix as publish_matrix
    participant SelfCheck as SelfCheckContainer
    participant QEMU as setup-qemu
    participant Push as build-push-action

    GitHub->>Dispatch: 手动触发 (inputs: strict_mode, mvn_args, wait_timeout_sec)
    Dispatch->>Resolve: 解析 upstream -> SOURCE_SHA
    alt strict_mode == true
        Dispatch->>Precheck: 运行 integration_precheck (checkout SOURCE_SHA)
        Precheck->>BuilderLocal: 本地构建 x86 镜像 (load: true, module cache scopes)
        Precheck->>Compose: 生成 override & docker compose up (pull_policy: never)
        Compose->>Services: 启动服务
        Precheck->>Health: 轮询 /v1/health 与 /versions
        Health-->>Precheck: 返回就绪
        Precheck->>Compose: 导出日志(若失败) 并 teardown
        Precheck-->>Matrix: 通过或跳过继续
    else strict_mode == false
        Dispatch-->>Matrix: 跳过预检直接继续
    end

    Matrix->>BuilderLocal: per-module 构建/加载 x86 自检镜像
    Matrix->>SelfCheck: 启动模块自检容器并轮询 probe_path
    SelfCheck->>Health: 轮询模块探针
    Health-->>SelfCheck: 就绪
    SelfCheck-->>Matrix: 移除自检容器
    Matrix->>QEMU: 设置 QEMU 支持 multi-arch
    Matrix->>Push: 构建并推送 multi-arch 镜像 (amd64, arm64) 使用模块 cache scopes
    Push-->>Matrix: 推送完成并清理
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐰 我是小兔子,跑到流水线边张望,

先把上游锁定,搭堆栈做个小尝,
模块逐个跳过自检门,才敢把镜像发往远方,
缓存收好,日志记清,发布路上稳又忙。

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately reflects the main changes: introduction of strict_mode as a publish gate and version updates, which are the core modifications to the workflow.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch publish-gate

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the publish_latest_pd_store_server_image GitHub Actions workflow to introduce a strict_mode publish gate, making the integration precheck optional while still allowing the publish matrix path to run when the precheck is skipped.

Changes:

  • Replaces the previous gating approach with a strict_mode boolean workflow input (default true).
  • Adds an integration_precheck job that runs only when strict_mode is enabled.
  • Refactors publishing into a publish_matrix job that can proceed when precheck succeeds or is skipped, while preserving per-module self-check and multi-arch push behavior.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.github/workflows/publish_latest_pd_store_server_image.yml:
- Around line 29-34: 当前问题是 integration_precheck 和 publish_matrix 两个 job 各自直接
checkout master,导致 publish 可能用到在 precheck 之后的新提交;请修改 workflow:在
integration_precheck job(job 名称 integration_precheck)在 checkout 后将当前提交 SHA 导出为
job 输出(例如通过一个 step 读取 $GITHUB_SHA 或 steps.checkout.outputs.sha 并用 echo
"::set-output name=commit_sha::$SHA"),然后在 publish_matrix(以及其它类似的 publish
job/位置)将 actions/checkout 的 ref 改为使用该输出(ref: ${{
needs.integration_precheck.outputs.commit_sha }})以保证所有 matrix 分片和后续 publish 步骤都
checkout 相同的 commit SHA(同样把文件中另一个位置 193-198 的 checkout 改为使用相同的输出)。
- Around line 160-165: 当前 workflow 在并行推送三个 moving tag 时缺少并发锁,导致不同 runs
会交错覆盖镜像(相关 job 为 publish_matrix)。在 workflow 顶层添加一个 concurrency
配置以序列化同名发布(例如使用固定组名或基于引用/输入的组串),并设置 cancel-in-progress: false(或根据需要为
true)以防止并行运行;确保该 concurrency 放在顶层(与 jobs 同级),以覆盖整个 workflow 而非单个 job,从而避免
publish_matrix 的并行覆盖问题。
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: d2837a33-8a3e-49d8-b131-67582b66844d

📥 Commits

Reviewing files that changed from the base of the PR and between 8a871e4 and 53ca07e.

📒 Files selected for processing (1)
  • .github/workflows/publish_latest_pd_store_server_image.yml

imbajin added 2 commits March 21, 2026 20:20
Add concurrency group and a new resolve_source job that fetches the master commit SHA from apache/hugegraph and exposes it as SOURCE_SHA. Use SOURCE_SHA for actions/checkout in integration_precheck and publish_matrix (replacing the previous hardcoded branch), and make publish_matrix depend on resolve_source with an updated success condition. Replace hardcoded local image names in the compose override with PD/STORE/SERVER image variables and remove quoted heredoc to enable substitution. Add curl connection and max timeouts for readiness probes to avoid hanging.
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@imbajin
Copy link
Collaborator Author

imbajin commented Mar 21, 2026

@coderabbitai full review

@coderabbitai
Copy link

coderabbitai bot commented Mar 21, 2026

✅ Actions performed

Full review triggered.

- add always() to publish_matrix job condition

- keep explicit needs result checks for resolve_source and precheck

- ensure strict_mode=false path can still run matrix publish
@imbajin imbajin requested a review from Copilot March 21, 2026 13:07
@imbajin imbajin changed the title chore(workflows): add strict_mode publish gate refactor: add strict_mode publish gate & bump V2 version Mar 21, 2026
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
- switch matrix self-check from raw docker run to compose-based startup

- override target module image with local build and keep pull_policy never

- use compose --wait with bounded 300s timeout for precheck and self-check

- simplify integration probes to direct endpoint assertions after wait
imbajin added 2 commits March 21, 2026 21:29
- add workflow_dispatch input wait_timeout_sec with default 300

- wire WAIT_TIMEOUT_SEC into precheck and matrix self-check compose waits

- validate wait_timeout_sec as integer within 30-1800 before builds
- add explicit docker-compose path validation before compose up in precheck

- add the same compose path validation in matrix self-check

- remove redundant cache-to from x86 self-check build step
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 5 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

- align matrix host ports with upstream compose port mappings

- skip standalone self-check for store and server in module mode

- keep pd self-check enabled to retain basic startup validation

- move wait timeout validation to the first step in each job

chore(workflows): remove unused matrix port field

- drop container_port from pd/store/server matrix entries

- keep host_port and probe_path as the actual self-check inputs

- reduce config ambiguity in publish_matrix
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
.github/workflows/publish_latest_pd_store_server_image.yml (1)

54-59: 建议去重 wait_timeout_sec 校验脚本,避免后续双处漂移。

Line 54 和 Line 217 的校验逻辑完全重复,后续若调整边界(如 30~1800)需要双改,容易遗漏。建议抽到一个可复用步骤(例如复用工作流/复合 action)统一维护。

Also applies to: 217-222

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/publish_latest_pd_store_server_image.yml around lines 54 -
59, The duplicate validation for WAIT_TIMEOUT_SEC (step named "Validate wait
timeout") should be extracted into a single reusable step to avoid drift; create
a composite action or a reusable workflow that encapsulates the numeric check
(regex integer and bounds 30–1800) and replace both inline blocks with a call to
that reusable step, ensuring the composite accepts WAIT_TIMEOUT_SEC as an input
and returns non-zero on invalid values so both occurrences (the current
"Validate wait timeout" step and the similar step later) call the same
centralized validation logic.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In @.github/workflows/publish_latest_pd_store_server_image.yml:
- Around line 54-59: The duplicate validation for WAIT_TIMEOUT_SEC (step named
"Validate wait timeout") should be extracted into a single reusable step to
avoid drift; create a composite action or a reusable workflow that encapsulates
the numeric check (regex integer and bounds 30–1800) and replace both inline
blocks with a call to that reusable step, ensuring the composite accepts
WAIT_TIMEOUT_SEC as an input and returns non-zero on invalid values so both
occurrences (the current "Validate wait timeout" step and the similar step
later) call the same centralized validation logic.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 275763a1-3fd1-4749-97da-e609a5ab969d

📥 Commits

Reviewing files that changed from the base of the PR and between de4c958 and 9b2347d.

📒 Files selected for processing (1)
  • .github/workflows/publish_latest_pd_store_server_image.yml

dockerfile: ./hugegraph-store/Dockerfile
host_port: 8520
probe_path: /v1/health
skip_selfcheck: true
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

‼️ strict_mode=false now bypasses the only runtime validation for store and server: both matrix entries hard-code skip_selfcheck: true, so a manual publish can ship those images without any compose or healthcheck run. If the intent is just to make the integration precheck optional, keep the module self-checks enabled here or gate the skip behind a separate opt-out; otherwise the false path becomes a silent publish-without-verification mode.

# cache-from: type=gha
# cache-to: type=gha,mode=max
cache-from: type=gha,scope=latest-pd
cache-to: type=gha,scope=latest-pd,mode=min
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ mode=min will not export the build-stage cache graph from these multi-stage Dockerfiles. The expensive mvn package work lives in the throwaway build stage, so the later publish jobs will still end up rebuilding most of it instead of reusing the precheck output. If the point of integration_precheck is to amortize build time for publish_matrix, switch these exports to mode=max on all three modules.

- gate x86 self-check build with matrix.skip_selfcheck

- avoid building and loading amd64 images for store/server when self-check is disabled

- reduce runner disk usage and unnecessary build time
Copy link
Collaborator Author

@imbajin imbajin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: Other CI files need to be updated as well (refer to the current PR & try with https://docs.docker.com/build/ci/github-actions/github-builder )

Updated in: #14

@imbajin imbajin merged commit 6eb290f into master Mar 21, 2026
1 check passed
@imbajin imbajin deleted the publish-gate branch March 21, 2026 14:05
@imbajin imbajin changed the title refactor: add strict_mode publish gate & bump V2 version refactor: split pd/server/store image stage & bump to V2 version Mar 21, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:L This PR changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants