Add HuggingFace tp_plan support for AutoTP#7901
Add HuggingFace tp_plan support for AutoTP#7901delock wants to merge 24 commits intodeepspeedai:masterfrom
Conversation
|
Hi @inkcherry @tohtana @PKUWZP @tjruwase , this is the PR provide HuggingFace tp_plan support for AutoTP. Hoping to see your comments, thanks! |
9f24ace to
8870c98
Compare
This PR adds support for HuggingFace's native tensor parallel plan (tp_plan)
to DeepSpeed's AutoTP feature, enabling automatic tensor parallelism configuration
without manual specification.
Key changes:
- Add tp_plan_converter.py: Convert HF tp_plan format to DeepSpeed TPLayerSpec
- Extend tensor_parallel/config.py: Add resolve_tp_config() and _get_hf_tp_plan()
- Support priority: custom config > HF tp_plan > DeepSpeed preset
Test results:
- 28/28 unit tests passed (no GPU required)
- Covers format conversion, extraction, priority, and integration
- E2E tests require multi-GPU environment
Example usage:
ds_config = {'tensor_parallel': {'autotp_size': 4}}
# Auto-detects and uses model's tp_plan from HuggingFace config
Signed-off-by: Guokai Ma <guokai.ma@intel.com>
Signed-off-by: Ma, Guokai <guokai.ma@gmail.com>
Signed-off-by: Ma, Guokai <guokai.ma@gmail.com>
Signed-off-by: Ma, Guokai <guokai.ma@gmail.com>
Signed-off-by: Ma, Guokai <guokai.ma@gmail.com>
Signed-off-by: Ma, Guokai <guokai.ma@gmail.com>
Signed-off-by: Ma, Guokai <guokai.ma@gmail.com>
Signed-off-by: Ma, Guokai <guokai.ma@gmail.com>
Signed-off-by: Ma, Guokai <guokai.ma@gmail.com>
Signed-off-by: Ma, Guokai <guokai.ma@gmail.com>
- Delete resolve_tp_config from config.py (dead code, never used at runtime) - Delete test_tp_plan_priority.py and test_tp_plan_integration.py (tested dead function) - Move test_alternate_prefixes and test_alternate_projection_names to converter tests - Replace duplicated _get_hf_tp_plan in extraction tests with proper import - Remove resolve_tp_config usage from test_tp_plan_real_models.py Reduces from 6 test files / 34 tests to 4 files / 23 tests with no loss of runtime coverage. Signed-off-by: Ma, Guokai <guokai.ma@gmail.com>
Apply yapf formatting to new files and fix flake8 F401 (unused pytest import). Signed-off-by: Ma, Guokai <guokai.ma@gmail.com>
Rename parameter model_or_config to model and remove the third fallback that checked base_model_tp_plan directly on the input. This path was unreachable since engine.py always passes a model object. Signed-off-by: Ma, Guokai <guokai.ma@gmail.com>
Document that colwise_rep, local_colwise, local_rowwise, local_packed_rowwise, gather, and sequence_parallel are not yet handled. Signed-off-by: Ma, Guokai <guokai.ma@gmail.com>
… and API docs Signed-off-by: Ma, Guokai <guokai.ma@gmail.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 9f24acece6
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
Signed-off-by: Ma, Guokai <guokai.ma@gmail.com>
Address review feedback: hasattr check returned None/_tp_plan={} without
falling back to model.config.base_model_tp_plan. Use getattr with
truthiness check so that falsy _tp_plan values correctly fall through
to the config-based plan.
Signed-off-by: Ma, Guokai <guokai.ma@gmail.com>
Signed-off-by: Ma, Guokai <guokai.ma@gmail.com>
Signed-off-by: Ma, Guokai <guokai.ma@gmail.com>
…e_model_tp_plan Fix two bugs in the HuggingFace tp_plan AutoTP path: 1. _replace_module() passed only the immediate child name to recursive calls instead of the accumulated full_name. This meant pattern matching in _replace_with_config() never matched patterns like 'layers.*.self_attn.q_proj' because the name was only 2 levels deep (e.g. 'self_attn.q_proj'). Zero modules were being replaced, causing a 32% performance regression vs the master AutoTP path. 2. _get_hf_tp_plan() now prefers config.base_model_tp_plan over model._tp_plan because HuggingFace's _tp_plan contains duplicate entries (both 'layers.*' and 'model.layers.*' prefixed versions), causing spurious duplicate-match warnings during conversion. Signed-off-by: Ma, Guokai <guokai.ma@gmail.com>
Add test_deep_model_full_path_propagation that uses a 4-level-deep model
hierarchy (layers.N.self_attn.{q,o}_proj) with patterns requiring
intermediate path components. This catches regressions where _replace_module
passes immediate child names instead of accumulated full paths to recursive
calls, which causes pattern matching to silently fail on deep models.
Signed-off-by: Ma, Guokai <guokai.ma@gmail.com>
|
Hi @delock, One gap to handle is the mismatch between HuggingFace TP-plan styles and what AutoTP can consume directly. The new converter currently supports only AutoTP does not handle those HF styles directly, but some models are already supported through existing AutoTP specs/presets. Phi-3 is a good example: its HF TP plan uses So I think the safer behavior would be:
This would avoid regressing models like Phi-3 while still enabling the new HF-plan path for models whose TP plans map cleanly to AutoTP. |
|
@tohtana Thanks for the comments, this is a good suggestion. Let me add this bypass in the PR and test Phi-3 accordingly. In the long run, I think it would help if AutoTP can consume these additional styles, which can be done in a seperate PR.
|
HF tp_plan may use partition styles beyond colwise/rowwise (e.g. colwise_rep, rowwise_rep). Instead of raising ValueError, detect unsupported styles and fall back to the existing AutoTP preset path. Signed-off-by: Ma, Guokai <guokai.ma@gmail.com>
…/autotp_improvement
|
Hi @tohtana fallback path for unsupported tp_plan layer specs are added. Before the change, phi3 will report unsupported layer specs, after the change phi3 does not have such complain. Note my test on Phi3-mini raise a seperate failure with shape mismatch, which is also raised on master. I havn't looked into it yet. It can be addressed in a seperated investigation and probably a fix.
|
Summary
Adds automatic detection and use of HuggingFace's built-in
base_model_tp_planfor AutoTP, addressing the HuggingFace tp_plan support item from #7861.Models that ship with a
tp_plan(e.g. Llama, Qwen, Gemma2) now work with AutoTP out of the box — nopreset_modelorpartition_configneeded, just setautotp_size.Changes
Runtime
engine.py: Added tp_plan fallback in_apply_autotp_partitioning. Priority order:partition_config> HFtp_plan> AutoTP heuristics.config.py: Added_get_hf_tp_plan(model)to extract tp_plan frommodel._tp_planormodel.config.base_model_tp_plan.tp_plan_converter.py: New file.TPPlanConverterconverts HF tp_plan entries (colwise/rowwise) to DeepSpeedTPLayerSpec.Other HF partition types (
colwise_rep,local_colwise, etc.) are not yet supported (documented with TODO).Tests (11 files, 17 CPU + 5 GPU tests)
test_tp_plan_converter.py: Unit tests for the converter (alternate prefixes, projection names, unsupported types, etc.)test_tp_plan_extraction.py: Unit tests for_get_hf_tp_planwith mock models.test_tp_plan_e2e.py: GPU e2e tests with ZeRO 0/1/2 (requires 2 GPUs).test_tp_plan_real_models.py: GPU tests with Qwen2 and custom models (requires 2 GPUs).Documentation
autotp-training.md.config-json.md.training.rst.blogs/huggingface-tp/README.md.Limitations
colwiseandrowwisepartition types are supported. Extended types (colwise_rep,local_colwise,local_rowwise,local_packed_rowwise,gather,sequence_parallel) are deferred.