Fix Nemotron-H PTQ failure on Transformers 5.x with --trust_remote_code (moe_latent_size AttributeError)#1839
Conversation
|
/claude review |
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Enterprise Run ID: 📒 Files selected for processing (2)
🚧 Files skipped from review as they are similar to previous changes (2)
📝 WalkthroughWalkthroughAdds ChangesRemote-code model init reconfiguration
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes 🚥 Pre-merge checks | ✅ 5 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (5 passed)
✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
cjluo-nv
left a comment
There was a problem hiding this comment.
Bot review — DM the bot to share feedback.
Small, well-scoped bug fix (+13/-2, 1 file) for Nemotron-H PTQ on Transformers 5.x with --trust_remote_code. The fix re-derives a built-in config (via AutoConfig.from_pretrained without trust_remote_code) when a remote-code config (module under transformers_modules) is paired with a built-in concrete model class, then uses that config for the empty-weights device-map build (torch_dtype + from_config).
Correctness looks sound:
- The branch only fires for concrete transformers classes (
auto_model_module not in [AutoModelForCausalLM, AutoModel]), which are reached only whenhasattr(transformers, architecture)is true — so the built-inconfig_classexists and the re-derive won't fail for an unregistered arch. - Non-remote configs and the
Auto*fallback paths are untouched; defaultconfig_for_init = hf_configpreserves prior behavior. - The real model load downstream re-resolves config via the built-in
config_class, so the change is contained to the device-map build.
No licensing concerns; no design-review trigger; no prompt-injection in the untrusted blocks.
Why nudge rather than approve:
- No test. The author states a regression test would require loading a remote-code checkpoint, but the decision logic here (module-prefix check + config re-derive, and which config is passed to
from_config) is unit-testable by mockingAutoConfig.from_pretrained/from_config—tests/examples/llm_ptq/test_example_utils.pyalready testsget_modelhelpers with pure-Python stubs. A focused unit test asserting the built-in config is used fortransformers_modulesconfigs (and that non-remote configs are untouched) would lock in the fix. - The fix was validated only end-to-end on GPU with a specific checkpoint (Transformers 5.7.0); no automated coverage guards against regressions.
There was a problem hiding this comment.
Claude review passed — no blocking issues found. LGTM
Summary
Findings — CRITICAL: 0, IMPORTANT: 0, SUGGESTION: 1
Reviewed the single changed file (examples/llm_ptq/example_utils.py, +13/-2). This is a narrow, well-reasoned bug fix for the Nemotron-H moe_latent_size AttributeError on Transformers 5.x with --trust_remote_code.
Assessment
The fix is correct and appropriately scoped:
- Root cause is accurately diagnosed: a remote-loaded (old)
NemotronHConfigpaired with a built-in (new) model class during the empty-weights device-map build. The new code re-derives the config with the built-in class so required fields get their defaults. - The guard is precise: re-derivation only happens when (a) the model class is a built-in architecture class (not the
AutoModelForCausalLM/AutoModelremote fallback branch) and (b) the loaded config actually came fromtransformers_modules(remote code). Non-remote configs and the fallback path are untouched — matching the stated backward-compat claim. config_kwargsfiltering is sound:config_kwargsonly containstrust_remote_codewhen truthy, so stripping it yields a clean built-in config load.- Consistency restored: the device-map build now uses the same built-in config that the real
from_pretrainedload at the bottom of the branch resolves via the built-inconfig_class.
Most impactful note
One non-blocking SUGGESTION: the new second AutoConfig.from_pretrained call could raise on edge-case checkpoints (built-in architecture name + non-built-in model_type), turning a path that previously progressed into a hard crash. Wrapping it with a fallback to the existing hf_config would harden the branch. Not required for merge.
Risk level: low — single-file example-script fix, behavior change confined to a guarded branch, validated end-to-end per the PR description.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #1839 +/- ##
==========================================
- Coverage 77.40% 77.40% -0.01%
==========================================
Files 515 515
Lines 57118 57118
==========================================
- Hits 44214 44213 -1
- Misses 12904 12905 +1
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
…it tests - Extract the remote-code vs built-in config selection into _resolve_init_config() in example_utils.get_model, and harden it: wrap the built-in AutoConfig reload in try/except so an edge-case checkpoint (built-in architecture name but non-built-in model_type) falls back to hf_config instead of hard-crashing. - Add focused unit tests in tests/examples/llm_ptq/test_example_utils.py for the decision logic (re-derive for remote config, keep non-remote config, fall back when the reload raises), mocking AutoConfig.from_pretrained. Addresses reviewer feedback on missing tests and the unguarded reload. Signed-off-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com>
meenchen
left a comment
There was a problem hiding this comment.
Approve to unblock, but this looks like a general issue outside of ModelOpt, and maybe we should update/fix the configs of the model
…it tests - Extract the remote-code vs built-in config selection into _resolve_init_config() in example_utils.get_model, and harden it: wrap the built-in AutoConfig reload in try/except so an edge-case checkpoint (built-in architecture name but non-built-in model_type) falls back to hf_config instead of hard-crashing. - Add focused unit tests in tests/examples/llm_ptq/test_example_utils.py for the decision logic (re-derive for remote config, keep non-remote config, fall back when the reload raises), mocking AutoConfig.from_pretrained. Addresses reviewer feedback on missing tests and the unguarded reload. Signed-off-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com>
3900252 to
da22e66
Compare
…te_code Bug: Quantizing nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16 via hf_ptq.py with --trust_remote_code fails on Transformers 5.x with "AttributeError: 'NemotronHConfig' object has no attribute 'moe_latent_size'" (works on 4.57.x). In example_utils.get_model, AutoConfig.from_pretrained(..., trust_remote_code=True) loads the checkpoint's bundled *remote* NemotronHConfig (authored for Transformers 4.55.4, no moe_latent_size). But since NemotronHForCausalLM is a *built-in* class in Transformers 5.x, the empty-weights device-map build uses the built-in model class, whose modeling code reads config.moe_latent_size -> the remote (old) config and the built-in (new) model are a mismatched pair. Transformers 4.57.x only worked because its built-in model never accessed that field. Fix: When instantiating the built-in model class, feed it a config from the same version as the model definition: if the loaded config came from remote code (class module under "transformers_modules"), re-derive it with the built-in class (AutoConfig without trust_remote_code) so required fields get their defaults. Non-remote configs are untouched. The real model load already resolves config via the built-in config_class, so only the device-map build needed aligning. Validated end-to-end: Nemotron-3-Nano-30B nvfp4 PTQ + export succeeds on Transformers 5.7.0; llm_ptq example unit tests pass. Signed-off-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com>
…it tests - Extract the remote-code vs built-in config selection into _resolve_init_config() in example_utils.get_model, and harden it: wrap the built-in AutoConfig reload in try/except so an edge-case checkpoint (built-in architecture name but non-built-in model_type) falls back to hf_config instead of hard-crashing. - Add focused unit tests in tests/examples/llm_ptq/test_example_utils.py for the decision logic (re-derive for remote config, keep non-remote config, fall back when the reload raises), mocking AutoConfig.from_pretrained. Addresses reviewer feedback on missing tests and the unguarded reload. Signed-off-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com>
da22e66 to
5525048
Compare
I agree, the cleanest long-term fix is the model publisher updating the checkpoint's bundled config. |
Branch protection rule check failed
|
Transplant the combined get_model fix from PRs #1839, #1857 and #1869 onto release/0.45.0's examples/llm_ptq/example_utils.py. These PRs could not be cherry-picked directly because the file was renamed llm_ptq -> hf_ptq (#1759) and surrounding get_model code diverged on main, but the actual fix targets the init_empty_weights / from_config block that already exists on the release branch: - _resolve_init_config: re-derive a built-in config for remote-code checkpoints so device-map inference matches the model definition's version (fixes Nemotron-H moe_latent_size AttributeError on transformers 5.x, #1839). - _get_config_dtype / _apply_dtype_to_config: derive dtype from the resolved config and forward the DeciLM-supported dtype kwarg, dropping unsupported dtype forwarding on the real from_pretrained load (#1857, #1869). Ports the accompanying unit tests (path-adjusted to llm_ptq). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
#1858 #1839 #1857 #1869 (#1880) ## Cherry-picked PRs - #1801 - #1808 - #1629 - #1627 - #1824 - #1826 - #1830 - #1760 - #1831 - #1858 - #1839 - #1857 - #1869 #1839, #1857 and #1869 were back-ported (not a clean cherry-pick): the file was renamed `llm_ptq` -> `hf_ptq` (#1759) and surrounding `get_model` code diverged on `main`, but the actual fix targets the `init_empty_weights` / `from_config` block that already exists on the release branch. Accompanying unit tests were ported (15 passed). <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **New Features** * Added a new PTQ recipe for NVFP4 MLP/MoE quantization with FP8 KV-cache calibration. * **Bug Fixes** * Improved ONNX mixed-precision/FP16 conversion reliability with stricter type handling and better stale output-shape reconciliation. * Fixed quantization/export edge cases: MoE router/gate handling, FP8 calibration/reduction failures, and additional FP8/INT8 robustness during export. * Standardized Puzzletron validation split naming to `validation`. * **Documentation** * Refreshed LM-Eval and TensorRT-Edge-LLM CLI instructions, including updated command names and examples. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Signed-off-by: Meng Xin <mxin@nvidia.com> Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com> Signed-off-by: ajrasane <131806219+ajrasane@users.noreply.github.com> Signed-off-by: dimapihtar <dpykhtar@nvidia.com> Signed-off-by: Chenjie Luo <chenjiel@nvidia.com> Signed-off-by: Zhiyu Cheng <zhiyuc@nvidia.com> Signed-off-by: Grzegorz Karch <gkarch@nvidia.com> Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com> Co-authored-by: mxinO <164952785+mxinO@users.noreply.github.com> Co-authored-by: Ajinkya Rasane <131806219+ajrasane@users.noreply.github.com> Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Co-authored-by: Dmytro Pykhtar <37850217+dimapihtar@users.noreply.github.com> Co-authored-by: Chenjie Luo <108829653+cjluo-nv@users.noreply.github.com> Co-authored-by: Zhiyu <zhiyuc@nvidia.com> Co-authored-by: Grzegorz K. Karch <grzegorz-k-karch@users.noreply.github.com> Co-authored-by: Daniel Korzekwa <daniel.korzekwa@gmail.com>
What does this PR do?
Type of change: Bug fix
Quantizing remote-code checkpoints such as
nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16with--trust_remote_codefails on Transformers 5.x during model loading:(It works on Transformers 4.57.x.)
Root cause: In
examples/llm_ptq/example_utils.py::get_model,AutoConfig.from_pretrained(..., trust_remote_code=True)loads the checkpoint's bundled remoteNemotronHConfig(authored for Transformers 4.55.4, which has nomoe_latent_size). But becauseNemotronHForCausalLMis a built-in class in Transformers 5.x, the empty-weights device-map build instantiates the built-in model class, whose modeling code readsconfig.moe_latent_size. The remote (old) config and the built-in (new) model are a mismatched pair. Transformers 4.57.x only worked by luck — its built-in model never accessed that field.Fix: When instantiating the built-in model class, feed it a config from the same version as the model definition. If the loaded config came from remote code (its class module lives under
transformers_modules), re-derive it with the built-in class (AutoConfigwithouttrust_remote_code) so required fields get their defaults. Non-remote configs are untouched. The subsequent real model load already resolves the config via the built-inconfig_class, so only the device-map build needed aligning.Usage
No API change. The previously-failing command now works:
Testing
AttributeErroron Transformers 5.7.0, then confirmed the fix resolves it.nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16nvfp4 PTQ + export completes successfully on Transformers 5.7.0.tests/examples/llm_ptq/unit tests (test_example_utils.py,test_hf_ptq_args.py,test_cast_mxfp4_to_nvfp4.py) all pass.Before your PR is "Ready for review"
Make sure you read and follow Contributor guidelines and your commits are signed (
git commit -s -S).Make sure you read and follow the Security Best Practices (e.g. avoiding hardcoded
trust_remote_code=True,torch.load(..., weights_only=False),pickle, etc.).CONTRIBUTING.md: N/AAdditional Information
The fix is general: re-deriving the config with the built-in class handles any field the built-in model adds in future Transformers releases, not just
moe_latent_size.Summary by CodeRabbit