Skip to content

feat: integrate RL training with vLLM_ascend inference backend#1641

Open
NengXu001 wants to merge 2 commits intoInternLM:mainfrom
NengXu001:rl_vllm
Open

feat: integrate RL training with vLLM_ascend inference backend#1641
NengXu001 wants to merge 2 commits intoInternLM:mainfrom
NengXu001:rl_vllm

Conversation

@NengXu001
Copy link
Copy Markdown
Contributor

Integrates the vLLM inference backend into the RL training pipeline, specifically enabling support for the Ascend NPU environment.

Key Features:

  1. Enabled Tensor Parallelism (TP) and Data Parallelism (DP) .
  2. Weight Synchronization: Implemented sync_weight logic to ensure consistency between the actor and the inference engine.

Out of Scope:

  1. Ref model support
  2. Evaluation step
  3. Router replay

@YanhuiDua
Copy link
Copy Markdown
Collaborator

@claude review

@claude
Copy link
Copy Markdown
Contributor

claude bot commented Mar 30, 2026

Claude: ## Summary

Integrates vLLM inference backend for Ascend NPU into the RL training pipeline with TP+DP support and weight sync via IPC.

Issues

Critical

  • [vllm.py:30] Hard dependency on Ascend NPU (vllm_ascend, PyHcclCommunicator, torch.npu.*) — makes the module unusable on CUDA/GPU environments. Needs device abstraction or conditional imports.
  • [vllm.py:424] last_token_ids and last_logprobs are unbound when "logprobs" is not in the response, causing UnboundLocalError at runtime.
  • [vllm.py:147] tensor_parallel_size // data_parallel_size — no validation for zero or non-divisible values.

Warning

  • [vllm.py:213] Missing TimeoutError raise after ray.cancel on timeout — silently returns None, caller assumes server started.
  • [vllm.py:109] except Exception in vllm_server_wrapper.__init__ swallows startup failures; actor_health() still returns "healthy".
  • [vllm.py:324] reset_prefix_cache uses self.endpoints['reset_prefix_cache'] but endpoint key is never registered — KeyError at runtime.
  • [vllm.py:141] Redundant AutoTokenizer.from_pretrained — already initialized in parent RolloutWorker.__init__.
  • [vllm.py:385,367] allowed_local_media_path = "/" set twice; hardcoding / is a security concern (unrestricted filesystem read).
  • [vllm.py:396] Hardcoded Ascend-specific env vars (ASCEND_RT_VISIBLE_DEVICES for 16 devices, HCCL_*, VLLM_ASCEND_*) — not portable.
  • [vllm.py:432] Chinese assertion message — should be English for consistency.
  • [worker.py:1228] vLLM path gated by XTUNER_USE_VLLM env var instead of self.rollout_cfg_info["backend"] — inconsistent with existing pattern.

Nit

  • [vllm.py:98] Class vllm_server_wrapper should use CamelCase (VllmServerWrapper).
  • [vllm.py:352,389] uvicorn_log_level set twice.
  • [vllm.py:86] Redundant local import os — already imported at module level.

Verdict

REQUEST_CHANGES

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants