feat: integrate RL training with vLLM_ascend inference backend#1641

Open

NengXu001 wants to merge 2 commits intoInternLM:mainfrom

NengXu001:rl_vllm

Contributor

NengXu001 commented Mar 27, 2026

Integrates the vLLM inference backend into the RL training pipeline, specifically enabling support for the Ascend NPU environment.

Key Features:

Enabled Tensor Parallelism (TP) and Data Parallelism (DP) .
Weight Synchronization: Implemented sync_weight logic to ensure consistency between the actor and the inference engine.

Out of Scope:

Ref model support
Evaluation step
Router replay

Collaborator

YanhuiDua commented Mar 30, 2026

@claude review

claude bot reviewed

View reviewed changes

xtuner/v1/ray/rollout/vllm.py Show resolved Hide resolved

claude bot reviewed

View reviewed changes

xtuner/v1/ray/rollout/vllm.py Show resolved Hide resolved

claude bot reviewed

View reviewed changes

xtuner/v1/ray/rollout/vllm.py Show resolved Hide resolved

claude bot reviewed

View reviewed changes

xtuner/v1/ray/rollout/vllm.py Show resolved Hide resolved

claude bot reviewed

View reviewed changes

xtuner/v1/ray/rollout/vllm.py Outdated Show resolved Hide resolved

claude bot reviewed

View reviewed changes

xtuner/v1/ray/rollout/vllm.py Outdated Show resolved Hide resolved

claude bot reviewed

View reviewed changes

xtuner/v1/ray/rollout/vllm.py Show resolved Hide resolved

claude bot reviewed

View reviewed changes

xtuner/v1/ray/rollout/vllm.py Show resolved Hide resolved

claude bot reviewed

View reviewed changes

xtuner/v1/ray/rollout/vllm.py Show resolved Hide resolved

claude bot reviewed

View reviewed changes

xtuner/v1/ray/rollout/vllm.py Show resolved Hide resolved

claude bot reviewed

View reviewed changes

xtuner/v1/ray/rollout/vllm.py Show resolved Hide resolved

claude bot reviewed

View reviewed changes

xtuner/v1/ray/rollout/vllm.py Show resolved Hide resolved

claude bot reviewed

View reviewed changes

xtuner/v1/rl/base/worker.py Show resolved Hide resolved

claude bot reviewed

View reviewed changes

xtuner/v1/ray/rollout/vllm.py Show resolved Hide resolved

Contributor

claude bot commented Mar 30, 2026

Claude: ## Summary

Integrates vLLM inference backend for Ascend NPU into the RL training pipeline with TP+DP support and weight sync via IPC.

Issues

Critical

[vllm.py:30] Hard dependency on Ascend NPU (vllm_ascend, PyHcclCommunicator, torch.npu.*) — makes the module unusable on CUDA/GPU environments. Needs device abstraction or conditional imports.
[vllm.py:424] last_token_ids and last_logprobs are unbound when "logprobs" is not in the response, causing UnboundLocalError at runtime.
[vllm.py:147] tensor_parallel_size // data_parallel_size — no validation for zero or non-divisible values.

Warning

[vllm.py:213] Missing TimeoutError raise after ray.cancel on timeout — silently returns None, caller assumes server started.
[vllm.py:109] except Exception in vllm_server_wrapper.__init__ swallows startup failures; actor_health() still returns "healthy".
[vllm.py:324] reset_prefix_cache uses self.endpoints['reset_prefix_cache'] but endpoint key is never registered — KeyError at runtime.
[vllm.py:141] Redundant AutoTokenizer.from_pretrained — already initialized in parent RolloutWorker.__init__.
[vllm.py:385,367] allowed_local_media_path = "/" set twice; hardcoding / is a security concern (unrestricted filesystem read).
[vllm.py:396] Hardcoded Ascend-specific env vars (ASCEND_RT_VISIBLE_DEVICES for 16 devices, HCCL_*, VLLM_ASCEND_*) — not portable.
[vllm.py:432] Chinese assertion message — should be English for consistency.
[worker.py:1228] vLLM path gated by XTUNER_USE_VLLM env var instead of self.rollout_cfg_info["backend"] — inconsistent with existing pattern.

Nit

[vllm.py:98] Class vllm_server_wrapper should use CamelCase (VllmServerWrapper).
[vllm.py:352,389] uvicorn_log_level set twice.
[vllm.py:86] Redundant local import os — already imported at module level.

Verdict

REQUEST_CHANGES

YanhuiDua reviewed

View reviewed changes

xtuner/v1/ray/rollout/vllm.py Outdated Show resolved Hide resolved

YanhuiDua reviewed

View reviewed changes

xtuner/v1/ray/rollout/vllm.py Outdated Show resolved Hide resolved

xtuner/v1/ray/rollout/vllm.py Show resolved Hide resolved

xtuner/v1/ray/rollout/vllm.py Outdated Show resolved Hide resolved

NengXu001 force-pushed the rl_vllm branch from 78e312a to 25b6918 Compare

March 30, 2026 12:47

NengXu001 and others added 2 commits

March 31, 2026 03:47


          feat: integrate RL training with vLLM inference backend

715f9aa

Co-authored-by: yifeililn <yifeilin1202@qq.com>


          fixup! feat: integrate RL training with vLLM inference backend

e6f9ae2

NengXu001 force-pushed the rl_vllm branch from 25b6918 to e6f9ae2 Compare

March 31, 2026 03:51

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet