fix: TP config for nemotron-flash-1b + super-49B vllm_deploy cascade#2593
Draft
adil-a wants to merge 1 commit into
Draft
fix: TP config for nemotron-flash-1b + super-49B vllm_deploy cascade#2593adil-a wants to merge 1 commit into
adil-a wants to merge 1 commit into
Conversation
Both *_vllm_deploy tests (jobs 337980668 nemotron-flash-1b PEFT, 337980592 llama-3.3-nemotron-super-49B SFT) cascade from "No checkpoint found": the upstream finetune/robustness job dies because a custom-code (trust_remote_code) architecture has no registered TP plan and now hard-errors at tp_size>1 (torch DTensor shard_order assert; #2244 fail-fast in parallelizer.py). These used to ride AutoModel's default base plan on older torch. - nemotron-flash-1b: NemotronFlash (hybrid mamba2/deltanet) has no TP plan in any transformers version (5.5.0/5.12.1), in the model's Hub code, or in AutoModel; its hybrid layers aren't expressible with the standard TP styles. The robustness cross-TP phase ran at tp_size=2 and aborted before the checkpoint was saved. It's a 1B model that doesn't need TP -> run the robustness reload at tp_size=1. - super-49B (DeciLM/nemotron-nas): AutoModel already ships a TP plan (get_decilm_nemotron_tp_plan, named "llama_nemotron_super_tp_plan", since #1487) but the recipe never selected it, so the finetune fell through to the broken default plan at tp_size=4. Wire distributed.tp_plan: llama_nemotron_super_tp_plan. All 49 real-attention blocks have 8 KV heads, divisible by tp 4 (finetune) and 8 (robustness). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Adil Asif <adasif@nvidia.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Fixes two
*_vllm_deployCI tests that cascade from a missing checkpoint:Root cause
Both deploys error
No checkpoint found under .../robustness_checkpointbecause the upstream finetune/robustness producer dies first: a custom-code (trust_remote_code) architecture with no registered TP plan now hard-errors attp_size>1(newer torch DTensorshard_orderassert; the #2244 fail-fast inparallelizer.py). These previously rode AutoModel's default base plan on older torch.Fixes
nemotron_flash_1b_squad_peft.yaml):NemotronFlash(hybrid mamba2/deltanet) has no TP plan in any transformers version (verified 5.5.0 and latest 5.12.1), in the model's Hub code, or in AutoModel — and its hybrid SSM/conv layers aren't expressible with the standard colwise/rowwise TP styles. The robustness cross-TP phase ran attp_size=2and aborted at setup, before the checkpoint was saved. It's a 1B model that doesn't need TP → set the robustness reload totp_size: 1. (Train→save→AutoModel-reload→HF-reload still validate; only cross-TP-at-2, which never had a real plan, is dropped.)llama3_3_nemotron_super_49B_squad.yaml): AutoModel already shipsget_decilm_nemotron_tp_plan(namedllama_nemotron_super_tp_plan, since fix: tp plan for nemotron super #1487); the recipe never selected it, so the finetune fell through to the broken default plan attp_size=4. Wiredistributed.tp_plan: llama_nemotron_super_tp_plan. (All 49 real-attention blocks have 8 KV heads → divisible by tp=4 finetune and tp=8 robustness; the 31 no-op attention blocks stay replicated.)Validation
_vllm_deploy, on this commit): https://gitlab-master.nvidia.com/dl/JoC/nemo-ci/-/pipelines/54962684 (running)Pre-checks
🤖 Generated with Claude Code