Skip to content

add qwen35 rl config and fix mix bug#1640

Open
hhaAndroid wants to merge 3 commits intoInternLM:mainfrom
hhaAndroid:add_qwen35_rl_config
Open

add qwen35 rl config and fix mix bug#1640
hhaAndroid wants to merge 3 commits intoInternLM:mainfrom
hhaAndroid:add_qwen35_rl_config

Conversation

@hhaAndroid
Copy link
Copy Markdown
Collaborator

No description provided.

@hhaAndroid
Copy link
Copy Markdown
Collaborator Author

@claude review

# it's helpful to add labels to the images and videos for better reference.
add_vision_id: bool = True

template_name: str = "qwen3_vl"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude: Critical: Default value mismatch — the default here is "qwen3_vl" (underscore) but the CHAT_TEMPLATE_MAP keys use hyphens: "qwen3-vl" and "qwen3.5-vl-rl". This will cause a KeyError at runtime for anyone using the default value.

Suggested change
template_name: str = "qwen3_vl"
template_name: str = "qwen3-vl"



tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude: Nit: open() without a context manager — the file handle is never closed. Consider:

Suggested change
ds_collections = json.loads(Path(meta_data_path).read_text())

(Per CLAUDE.md, prefer pathlib.Path over os.path for filesystem operations.)

Comment on lines +97 to +98
eos_token_id = get_eos_token(model_path)
eos_token_str = tokenizer.convert_ids_to_tokens(eos_token_id)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude: Nit: Mid-file import. Move from xtuner.v1.utils.rl_test_utils import get_eos_token to the top-level imports block for consistency.

prompt_repeat_k=prompt_repeat_k,
global_batch_size=global_batch_size,
sample_params=training_sample_params,
# max_concurrent=64, # optional, will be determined automatically if not set
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude: Nit: Chinese comment (不需要修改). Use English for consistency across the codebase.

Comment on lines +55 to +56
expert_parallel_size=rollout_ep_size,
gpu_memory_utilization=0.8,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude: Nit: Extraneous spaces around = in keyword arguments (PEP 8):

Suggested change
expert_parallel_size=rollout_ep_size,
gpu_memory_utilization=0.8,
context_length=max_response_length + max_prompt_length,
enable_return_routed_experts=True,

@claude
Copy link
Copy Markdown
Contributor

claude bot commented Mar 30, 2026

Claude: ## Summary

This PR adds a Qwen3.5-VL 35B GRPO mixed-data RL config, a new "qwen3.5-vl-rl" chat template (with <think> tag for chain-of-thought), configurable template_name in Qwen3VLTokenizeFnConfig, and a bug fix in replay_buffer.py.

Issues

Critical

  • qwen3_vl_tokenize_fn.py:908 — Default template_name is "qwen3_vl" (underscore) but CHAT_TEMPLATE_MAP uses "qwen3-vl" (hyphen). This will cause a KeyError at runtime for anyone using the default.

Nit

  • Config file: unclosed file handle (open() without context manager), mid-file import, Chinese comment, PEP 8 spacing.

Bug Fix Review

The replay_buffer.py fix is correct — multimodal_train_info must be assigned back to data regardless of whether pixel_values is present, otherwise mixed text+image data loses position IDs.

Verdict

REQUEST_CHANGES — the template_name default value mismatch is a runtime bug that needs fixing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant