add qwen35 rl config and fix mix bug by hhaAndroid · Pull Request #1640 · InternLM/xtuner

hhaAndroid · 2026-03-27T11:17:08Z

No description provided.

…en35_rl_config

hhaAndroid · 2026-03-30T08:01:30Z

claude · 2026-03-30T08:03:36Z

xtuner/v1/datasets/mllm_tokenize_fn/qwen3_vl_tokenize_fn.py

    # it's helpful to add labels to the images and videos for better reference.
    add_vision_id: bool = True

+    template_name: str = "qwen3_vl"


Claude: Critical: Default value mismatch — the default here is "qwen3_vl" (underscore) but the CHAT_TEMPLATE_MAP keys use hyphens: "qwen3-vl" and "qwen3.5-vl-rl". This will cause a KeyError at runtime for anyone using the default value.

Suggested change

template_name: str = "qwen3_vl"

template_name: str = "qwen3-vl"

claude · 2026-03-30T08:03:39Z

examples/v1/config/rl_qwen3p5_vl_35B_grpo_mixdata.py

+
+
+tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
+


Claude: Nit: open() without a context manager — the file handle is never closed. Consider:

Suggested change

ds_collections = json.loads(Path(meta_data_path).read_text())

(Per CLAUDE.md, prefer pathlib.Path over os.path for filesystem operations.)

claude · 2026-03-30T08:03:41Z

examples/v1/config/rl_qwen3p5_vl_35B_grpo_mixdata.py

+eos_token_id = get_eos_token(model_path)
+eos_token_str = tokenizer.convert_ids_to_tokens(eos_token_id)


Claude: Nit: Mid-file import. Move from xtuner.v1.utils.rl_test_utils import get_eos_token to the top-level imports block for consistency.

claude · 2026-03-30T08:03:42Z

examples/v1/config/rl_qwen3p5_vl_35B_grpo_mixdata.py

+    prompt_repeat_k=prompt_repeat_k,
+    global_batch_size=global_batch_size,
+    sample_params=training_sample_params,
+    # max_concurrent=64,  # optional, will be determined automatically if not set


Claude: Nit: Chinese comment (不需要修改). Use English for consistency across the codebase.

claude · 2026-03-30T08:03:45Z

examples/v1/config/rl_qwen3p5_vl_35B_grpo_mixdata.py

+    expert_parallel_size=rollout_ep_size,
+    gpu_memory_utilization=0.8,


Claude: Nit: Extraneous spaces around = in keyword arguments (PEP 8):

Suggested change

expert_parallel_size=rollout_ep_size,

gpu_memory_utilization=0.8,

context_length=max_response_length + max_prompt_length,

enable_return_routed_experts=True,

claude · 2026-03-30T08:03:58Z

Claude: ## Summary

This PR adds a Qwen3.5-VL 35B GRPO mixed-data RL config, a new "qwen3.5-vl-rl" chat template (with <think> tag for chain-of-thought), configurable template_name in Qwen3VLTokenizeFnConfig, and a bug fix in replay_buffer.py.

Issues

Critical

qwen3_vl_tokenize_fn.py:908 — Default template_name is "qwen3_vl" (underscore) but CHAT_TEMPLATE_MAP uses "qwen3-vl" (hyphen). This will cause a KeyError at runtime for anyone using the default.

Nit

Config file: unclosed file handle (open() without context manager), mid-file import, Chinese comment, PEP 8 spacing.

Bug Fix Review

The replay_buffer.py fix is correct — multimodal_train_info must be assigned back to data regardless of whether pixel_values is present, otherwise mixed text+image data loses position IDs.

Verdict

REQUEST_CHANGES — the template_name default value mismatch is a runtime bug that needs fixing.

hhaAndroid added 3 commits March 27, 2026 11:16

add qwen35 rl config and fix mix bug

e91e498

Merge branch 'main' of https://github.com/InternLM/xtuner into add_qw…

f2b62be

…en35_rl_config

update

ee28e2e

claude bot reviewed Mar 30, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add qwen35 rl config and fix mix bug#1640

add qwen35 rl config and fix mix bug#1640
hhaAndroid wants to merge 3 commits intoInternLM:mainfrom
hhaAndroid:add_qwen35_rl_config

hhaAndroid commented Mar 27, 2026

Uh oh!

hhaAndroid commented Mar 30, 2026

Uh oh!

claude bot Mar 30, 2026

Uh oh!

claude bot Mar 30, 2026

Uh oh!

claude bot Mar 30, 2026

Uh oh!

claude bot Mar 30, 2026

Uh oh!

claude bot Mar 30, 2026

Uh oh!

claude bot commented Mar 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	template_name: str = "qwen3_vl"
	template_name: str = "qwen3-vl"



		tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)


	ds_collections = json.loads(Path(meta_data_path).read_text())

		eos_token_id = get_eos_token(model_path)
		eos_token_str = tokenizer.convert_ids_to_tokens(eos_token_id)

		expert_parallel_size=rollout_ep_size,
		gpu_memory_utilization=0.8,

Conversation

hhaAndroid commented Mar 27, 2026

Uh oh!

hhaAndroid commented Mar 30, 2026

Uh oh!

claude bot Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

claude bot Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

claude bot Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

claude bot Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

claude bot Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

claude bot commented Mar 30, 2026

Issues

Critical

Nit

Bug Fix Review

Verdict

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant