add qwen35 rl config and fix mix bug#1640
Conversation
|
@claude review |
| # it's helpful to add labels to the images and videos for better reference. | ||
| add_vision_id: bool = True | ||
|
|
||
| template_name: str = "qwen3_vl" |
There was a problem hiding this comment.
Claude: Critical: Default value mismatch — the default here is "qwen3_vl" (underscore) but the CHAT_TEMPLATE_MAP keys use hyphens: "qwen3-vl" and "qwen3.5-vl-rl". This will cause a KeyError at runtime for anyone using the default value.
| template_name: str = "qwen3_vl" | |
| template_name: str = "qwen3-vl" |
|
|
||
|
|
||
| tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True) | ||
|
|
There was a problem hiding this comment.
Claude: Nit: open() without a context manager — the file handle is never closed. Consider:
| ds_collections = json.loads(Path(meta_data_path).read_text()) |
(Per CLAUDE.md, prefer pathlib.Path over os.path for filesystem operations.)
| eos_token_id = get_eos_token(model_path) | ||
| eos_token_str = tokenizer.convert_ids_to_tokens(eos_token_id) |
There was a problem hiding this comment.
Claude: Nit: Mid-file import. Move from xtuner.v1.utils.rl_test_utils import get_eos_token to the top-level imports block for consistency.
| prompt_repeat_k=prompt_repeat_k, | ||
| global_batch_size=global_batch_size, | ||
| sample_params=training_sample_params, | ||
| # max_concurrent=64, # optional, will be determined automatically if not set |
There was a problem hiding this comment.
Claude: Nit: Chinese comment (不需要修改). Use English for consistency across the codebase.
| expert_parallel_size=rollout_ep_size, | ||
| gpu_memory_utilization=0.8, |
There was a problem hiding this comment.
Claude: Nit: Extraneous spaces around = in keyword arguments (PEP 8):
| expert_parallel_size=rollout_ep_size, | |
| gpu_memory_utilization=0.8, | |
| context_length=max_response_length + max_prompt_length, | |
| enable_return_routed_experts=True, |
|
Claude: ## Summary This PR adds a Qwen3.5-VL 35B GRPO mixed-data RL config, a new IssuesCritical
Nit
Bug Fix ReviewThe VerdictREQUEST_CHANGES — the |
No description provided.