Skip to content

Add HuggingFace tp_plan support for AutoTP#7901

Open
delock wants to merge 24 commits intodeepspeedai:masterfrom
delock:gma/autotp_improvement
Open

Add HuggingFace tp_plan support for AutoTP#7901
delock wants to merge 24 commits intodeepspeedai:masterfrom
delock:gma/autotp_improvement

Conversation

@delock
Copy link
Collaborator

@delock delock commented Mar 13, 2026

Summary

Adds automatic detection and use of HuggingFace's built-in base_model_tp_plan for AutoTP, addressing the HuggingFace tp_plan support item from #7861.

Models that ship with a tp_plan (e.g. Llama, Qwen, Gemma2) now work with AutoTP out of the box — no preset_model or partition_config needed, just set autotp_size.

Changes

Runtime

  • engine.py: Added tp_plan fallback in _apply_autotp_partitioning. Priority order: partition_config > HF tp_plan > AutoTP heuristics.
  • config.py: Added _get_hf_tp_plan(model) to extract tp_plan from model._tp_plan or model.config.base_model_tp_plan.
  • tp_plan_converter.py: New file. TPPlanConverter converts HF tp_plan entries (colwise/rowwise) to DeepSpeed TPLayerSpec.
    Other HF partition types (colwise_rep, local_colwise, etc.) are not yet supported (documented with TODO).

Tests (11 files, 17 CPU + 5 GPU tests)

  • test_tp_plan_converter.py: Unit tests for the converter (alternate prefixes, projection names, unsupported types, etc.)
  • test_tp_plan_extraction.py: Unit tests for _get_hf_tp_plan with mock models.
  • test_tp_plan_e2e.py: GPU e2e tests with ZeRO 0/1/2 (requires 2 GPUs).
  • test_tp_plan_real_models.py: GPU tests with Qwen2 and custom models (requires 2 GPUs).

Documentation

  • Tutorial: New "HuggingFace tp_plan Support" section in autotp-training.md.
  • Config reference: Added tp_plan paragraph in config-json.md.
  • API docs: Added tp_plan subsection in training.rst.
  • Blog: Updated ongoing work in blogs/huggingface-tp/README.md.

Limitations

  • Only colwise and rowwise partition types are supported. Extended types (colwise_rep, local_colwise, local_rowwise,
    local_packed_rowwise, gather, sequence_parallel) are deferred.

@delock
Copy link
Collaborator Author

delock commented Mar 13, 2026

Hi @inkcherry @tohtana @PKUWZP @tjruwase , this is the PR provide HuggingFace tp_plan support for AutoTP. Hoping to see your comments, thanks!

@delock delock force-pushed the gma/autotp_improvement branch from 9f24ace to 8870c98 Compare March 13, 2026 08:05
delock and others added 15 commits March 13, 2026 01:05
This PR adds support for HuggingFace's native tensor parallel plan (tp_plan)
to DeepSpeed's AutoTP feature, enabling automatic tensor parallelism configuration
without manual specification.

Key changes:
- Add tp_plan_converter.py: Convert HF tp_plan format to DeepSpeed TPLayerSpec
- Extend tensor_parallel/config.py: Add resolve_tp_config() and _get_hf_tp_plan()
- Support priority: custom config > HF tp_plan > DeepSpeed preset

Test results:
- 28/28 unit tests passed (no GPU required)
- Covers format conversion, extraction, priority, and integration
- E2E tests require multi-GPU environment

Example usage:
  ds_config = {'tensor_parallel': {'autotp_size': 4}}
  # Auto-detects and uses model's tp_plan from HuggingFace config

Signed-off-by: Guokai Ma <guokai.ma@intel.com>
Signed-off-by: Ma, Guokai <guokai.ma@gmail.com>
Signed-off-by: Ma, Guokai <guokai.ma@gmail.com>
Signed-off-by: Ma, Guokai <guokai.ma@gmail.com>
Signed-off-by: Ma, Guokai <guokai.ma@gmail.com>
Signed-off-by: Ma, Guokai <guokai.ma@gmail.com>
Signed-off-by: Ma, Guokai <guokai.ma@gmail.com>
Signed-off-by: Ma, Guokai <guokai.ma@gmail.com>
Signed-off-by: Ma, Guokai <guokai.ma@gmail.com>
Signed-off-by: Ma, Guokai <guokai.ma@gmail.com>
Signed-off-by: Ma, Guokai <guokai.ma@gmail.com>
- Delete resolve_tp_config from config.py (dead code, never used at runtime)
- Delete test_tp_plan_priority.py and test_tp_plan_integration.py (tested dead function)
- Move test_alternate_prefixes and test_alternate_projection_names to converter tests
- Replace duplicated _get_hf_tp_plan in extraction tests with proper import
- Remove resolve_tp_config usage from test_tp_plan_real_models.py

Reduces from 6 test files / 34 tests to 4 files / 23 tests with no loss of
runtime coverage.

Signed-off-by: Ma, Guokai <guokai.ma@gmail.com>
Apply yapf formatting to new files and fix flake8 F401 (unused pytest import).

Signed-off-by: Ma, Guokai <guokai.ma@gmail.com>
Rename parameter model_or_config to model and remove the third fallback
that checked base_model_tp_plan directly on the input. This path was
unreachable since engine.py always passes a model object.

Signed-off-by: Ma, Guokai <guokai.ma@gmail.com>
Document that colwise_rep, local_colwise, local_rowwise,
local_packed_rowwise, gather, and sequence_parallel are not yet handled.

Signed-off-by: Ma, Guokai <guokai.ma@gmail.com>
… and API docs

Signed-off-by: Ma, Guokai <guokai.ma@gmail.com>
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 9f24acece6

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

delock added 4 commits March 13, 2026 01:13
Signed-off-by: Ma, Guokai <guokai.ma@gmail.com>
Address review feedback: hasattr check returned None/_tp_plan={} without
falling back to model.config.base_model_tp_plan. Use getattr with
truthiness check so that falsy _tp_plan values correctly fall through
to the config-based plan.

Signed-off-by: Ma, Guokai <guokai.ma@gmail.com>
Signed-off-by: Ma, Guokai <guokai.ma@gmail.com>
Signed-off-by: Ma, Guokai <guokai.ma@gmail.com>
@PKUWZP PKUWZP self-requested a review March 13, 2026 13:58
delock added 2 commits March 15, 2026 19:32
…e_model_tp_plan

Fix two bugs in the HuggingFace tp_plan AutoTP path:

1. _replace_module() passed only the immediate child name to recursive
   calls instead of the accumulated full_name. This meant pattern
   matching in _replace_with_config() never matched patterns like
   'layers.*.self_attn.q_proj' because the name was only 2 levels deep
   (e.g. 'self_attn.q_proj'). Zero modules were being replaced, causing
   a 32% performance regression vs the master AutoTP path.

2. _get_hf_tp_plan() now prefers config.base_model_tp_plan over
   model._tp_plan because HuggingFace's _tp_plan contains duplicate
   entries (both 'layers.*' and 'model.layers.*' prefixed versions),
   causing spurious duplicate-match warnings during conversion.

Signed-off-by: Ma, Guokai <guokai.ma@gmail.com>
Add test_deep_model_full_path_propagation that uses a 4-level-deep model
hierarchy (layers.N.self_attn.{q,o}_proj) with patterns requiring
intermediate path components. This catches regressions where _replace_module
passes immediate child names instead of accumulated full paths to recursive
calls, which causes pattern matching to silently fail on deep models.

Signed-off-by: Ma, Guokai <guokai.ma@gmail.com>
@tohtana
Copy link
Collaborator

tohtana commented Mar 16, 2026

Hi @delock,
This is an amazing enhancement!

One gap to handle is the mismatch between HuggingFace TP-plan styles and what AutoTP can consume directly. The new converter currently supports only colwise and rowwise, while HuggingFace TP plans use additional styles such as colwise_gather_output and rowwise_split_input.

AutoTP does not handle those HF styles directly, but some models are already supported through existing AutoTP specs/presets. Phi-3 is a good example: its HF TP plan uses colwise_gather_output / rowwise_split_input, but DeepSpeed already supports Phi-3 through the existing AutoTP configuration with sub-parameter partitioning.

So I think the safer behavior would be:

  1. Inspect the HF TP plan styles.
  2. If all styles are supported by the converter, use the HF TP plan.
  3. Otherwise, skip the HF TP plan path and fall back to the existing AutoTP path.

This would avoid regressing models like Phi-3 while still enabling the new HF-plan path for models whose TP plans map cleanly to AutoTP.

@sfc-gh-truwase sfc-gh-truwase enabled auto-merge (squash) March 16, 2026 13:13
@sfc-gh-truwase sfc-gh-truwase disabled auto-merge March 16, 2026 14:06
@delock
Copy link
Collaborator Author

delock commented Mar 17, 2026

@tohtana Thanks for the comments, this is a good suggestion. Let me add this bypass in the PR and test Phi-3 accordingly.

In the long run, I think it would help if AutoTP can consume these additional styles, which can be done in a seperate PR.

Hi @delock, This is an amazing enhancement!

One gap to handle is the mismatch between HuggingFace TP-plan styles and what AutoTP can consume directly. The new converter currently supports only colwise and rowwise, while HuggingFace TP plans use additional styles such as colwise_gather_output and rowwise_split_input.

AutoTP does not handle those HF styles directly, but some models are already supported through existing AutoTP specs/presets. Phi-3 is a good example: its HF TP plan uses colwise_gather_output / rowwise_split_input, but DeepSpeed already supports Phi-3 through the existing AutoTP configuration with sub-parameter partitioning.

So I think the safer behavior would be:

  1. Inspect the HF TP plan styles.
  2. If all styles are supported by the converter, use the HF TP plan.
  3. Otherwise, skip the HF TP plan path and fall back to the existing AutoTP path.

This would avoid regressing models like Phi-3 while still enabling the new HF-plan path for models whose TP plans map cleanly to AutoTP.

delock added 2 commits March 17, 2026 19:28
HF tp_plan may use partition styles beyond colwise/rowwise (e.g.
colwise_rep, rowwise_rep). Instead of raising ValueError, detect
unsupported styles and fall back to the existing AutoTP preset path.

Signed-off-by: Ma, Guokai <guokai.ma@gmail.com>
@delock
Copy link
Collaborator Author

delock commented Mar 18, 2026

Hi @tohtana fallback path for unsupported tp_plan layer specs are added. Before the change, phi3 will report unsupported layer specs, after the change phi3 does not have such complain.

Note my test on Phi3-mini raise a seperate failure with shape mismatch, which is also raised on master. I havn't looked into it yet. It can be addressed in a seperated investigation and probably a fix.

@tohtana Thanks for the comments, this is a good suggestion. Let me add this bypass in the PR and test Phi-3 accordingly.

In the long run, I think it would help if AutoTP can consume these additional styles, which can be done in a seperate PR.

Hi @delock, This is an amazing enhancement!
One gap to handle is the mismatch between HuggingFace TP-plan styles and what AutoTP can consume directly. The new converter currently supports only colwise and rowwise, while HuggingFace TP plans use additional styles such as colwise_gather_output and rowwise_split_input.
AutoTP does not handle those HF styles directly, but some models are already supported through existing AutoTP specs/presets. Phi-3 is a good example: its HF TP plan uses colwise_gather_output / rowwise_split_input, but DeepSpeed already supports Phi-3 through the existing AutoTP configuration with sub-parameter partitioning.
So I think the safer behavior would be:

  1. Inspect the HF TP plan styles.
  2. If all styles are supported by the converter, use the HF TP plan.
  3. Otherwise, skip the HF TP plan path and fall back to the existing AutoTP path.

This would avoid regressing models like Phi-3 while still enabling the new HF-plan path for models whose TP plans map cleanly to AutoTP.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants