Fix Nemotron-H PTQ failure on Transformers 5.x with --trust_remote_code (moe_latent_size AttributeError) by Fridah-nv · Pull Request #1839 · NVIDIA/Model-Optimizer

Fridah-nv · 2026-06-26T22:51:20Z

What does this PR do?

Type of change: Bug fix

Quantizing remote-code checkpoints such as nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16 with --trust_remote_code fails on Transformers 5.x during model loading:

AttributeError: 'NemotronHConfig' object has no attribute 'moe_latent_size'

(It works on Transformers 4.57.x.)

Root cause: In examples/llm_ptq/example_utils.py::get_model, AutoConfig.from_pretrained(..., trust_remote_code=True) loads the checkpoint's bundled remote NemotronHConfig (authored for Transformers 4.55.4, which has no moe_latent_size). But because NemotronHForCausalLM is a built-in class in Transformers 5.x, the empty-weights device-map build instantiates the built-in model class, whose modeling code reads config.moe_latent_size. The remote (old) config and the built-in (new) model are a mismatched pair. Transformers 4.57.x only worked by luck — its built-in model never accessed that field.

Fix: When instantiating the built-in model class, feed it a config from the same version as the model definition. If the loaded config came from remote code (its class module lives under transformers_modules), re-derive it with the built-in class (AutoConfig without trust_remote_code) so required fields get their defaults. Non-remote configs are untouched. The subsequent real model load already resolves the config via the built-in config_class, so only the device-map build needed aligning.

Usage

No API change. The previously-failing command now works:

python examples/llm_ptq/hf_ptq.py \
    --pyt_ckpt_path nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16 \
    --qformat fp8,nvfp4_mse --calib_size 64 \
    --export_path ./output/nemotron-nano-fp8-nvfp4_mse \
    --trust_remote_code --dataset cnn_dailymail --auto_quantize_bits 4.75

Testing

Reproduced the original AttributeError on Transformers 5.7.0, then confirmed the fix resolves it.
End-to-end: nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16 nvfp4 PTQ + export completes successfully on Transformers 5.7.0.
tests/examples/llm_ptq/ unit tests (test_example_utils.py, test_hf_ptq_args.py, test_cast_mxfp4_to_nvfp4.py) all pass.

Before your PR is "Ready for review"

Make sure you read and follow Contributor guidelines and your commits are signed (git commit -s -S).

Make sure you read and follow the Security Best Practices (e.g. avoiding hardcoded trust_remote_code=True, torch.load(..., weights_only=False), pickle, etc.).

Is this change backward compatible?: ✅
If you copied code from any other sources or added a new PIP dependency, did you follow guidance in CONTRIBUTING.md: N/A
Did you write any new necessary tests?: N/A
Did you update Changelog?: N/A
Did you get Claude approval on this PR?: ❌

Additional Information

The fix is general: re-deriving the config with the built-in class handles any field the built-in model adds in future Transformers releases, not just moe_latent_size.

Summary by CodeRabbit

Bug Fixes
- Improved model initialization when using remote-code configurations by re-deriving the initialization config for built-in model classes when possible.
- Added a safe fallback to keep using the original configuration if re-derivation fails.
Tests
- Added unit tests covering remote-code config re-derivation, ensuring unchanged configs are not re-derived, and verifying fallback behavior on errors.

Fridah-nv · 2026-06-26T22:51:31Z

/claude review

coderabbitai · 2026-06-26T22:51:34Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: c6e48874-6083-4a51-b854-4c8ed69ff6ff

📥 Commits

Reviewing files that changed from the base of the PR and between da22e66 and 5525048.

📒 Files selected for processing (2)

examples/hf_ptq/example_utils.py
tests/examples/hf_ptq/test_example_utils.py

🚧 Files skipped from review as they are similar to previous changes (2)

tests/examples/hf_ptq/test_example_utils.py
examples/hf_ptq/example_utils.py

📝 Walkthrough

Walkthrough

Adds _resolve_init_config() to example_utils.py, which re-derives a built-in Transformers config via AutoConfig.from_pretrained (without trust_remote_code) when a remote-code config is paired with a built-in model class, falling back to the original config on failure. get_model() now uses this resolved config for torch_dtype and from_config(). Three unit tests cover re-derivation, passthrough, and fallback.

Changes

Remote-code model init reconfiguration

Layer / File(s)	Summary
`_resolve_init_config` helper and `get_model` wiring` <br>` examples/hf_ptq/example_utils.py`,` tests/examples/hf_ptq/test_example_utils.py`	New `_resolve_init_config()` conditionally calls `AutoConfig.from_pretrained` without `trust_remote_code` for remote-code configs and falls back to the original on failure. `get_model()` derives `config_for_init` and `torch_dtype` from the resolved config, and unit tests cover re-derivation, passthrough, and exception fallback.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

🚥 Pre-merge checks | ✅ 5 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 25.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly matches the PR’s main fix: Nemotron-H PTQ loading with trust_remote_code on Transformers 5.x.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Security Anti-Patterns	✅ Passed	Touched files only add _resolve_init_config/tests; no hardcoded trust_remote_code, weights_only=False, allow_pickle=True, exec/eval, or # nosec appeared in the changes.

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fridah/fix-nemotron-h-trust-remote-code-config

_{Comment @coderabbitai help to get the list of available commands.}

cjluo-nv

Bot review — DM the bot to share feedback.

Small, well-scoped bug fix (+13/-2, 1 file) for Nemotron-H PTQ on Transformers 5.x with --trust_remote_code. The fix re-derives a built-in config (via AutoConfig.from_pretrained without trust_remote_code) when a remote-code config (module under transformers_modules) is paired with a built-in concrete model class, then uses that config for the empty-weights device-map build (torch_dtype + from_config).

Correctness looks sound:

The branch only fires for concrete transformers classes (auto_model_module not in [AutoModelForCausalLM, AutoModel]), which are reached only when hasattr(transformers, architecture) is true — so the built-in config_class exists and the re-derive won't fail for an unregistered arch.
Non-remote configs and the Auto* fallback paths are untouched; default config_for_init = hf_config preserves prior behavior.
The real model load downstream re-resolves config via the built-in config_class, so the change is contained to the device-map build.

No licensing concerns; no design-review trigger; no prompt-injection in the untrusted blocks.

Why nudge rather than approve:

No test. The author states a regression test would require loading a remote-code checkpoint, but the decision logic here (module-prefix check + config re-derive, and which config is passed to from_config) is unit-testable by mocking AutoConfig.from_pretrained/from_config — tests/examples/llm_ptq/test_example_utils.py already tests get_model helpers with pure-Python stubs. A focused unit test asserting the built-in config is used for transformers_modules configs (and that non-remote configs are untouched) would lock in the fix.
The fix was validated only end-to-end on GPU with a specific checkpoint (Transformers 5.7.0); no automated coverage guards against regressions.

claude

Claude review passed — no blocking issues found. LGTM

Summary

Findings — CRITICAL: 0, IMPORTANT: 0, SUGGESTION: 1

Reviewed the single changed file (examples/llm_ptq/example_utils.py, +13/-2). This is a narrow, well-reasoned bug fix for the Nemotron-H moe_latent_size AttributeError on Transformers 5.x with --trust_remote_code.

Assessment

The fix is correct and appropriately scoped:

Root cause is accurately diagnosed: a remote-loaded (old) NemotronHConfig paired with a built-in (new) model class during the empty-weights device-map build. The new code re-derives the config with the built-in class so required fields get their defaults.
The guard is precise: re-derivation only happens when (a) the model class is a built-in architecture class (not the AutoModelForCausalLM/AutoModel remote fallback branch) and (b) the loaded config actually came from transformers_modules (remote code). Non-remote configs and the fallback path are untouched — matching the stated backward-compat claim.
config_kwargs filtering is sound: config_kwargs only contains trust_remote_code when truthy, so stripping it yields a clean built-in config load.
Consistency restored: the device-map build now uses the same built-in config that the real from_pretrained load at the bottom of the branch resolves via the built-in config_class.

Most impactful note

One non-blocking SUGGESTION: the new second AutoConfig.from_pretrained call could raise on edge-case checkpoints (built-in architecture name + non-built-in model_type), turning a path that previously progressed into a hard crash. Wrapping it with a fallback to the existing hf_config would harden the branch. Not required for merge.

Risk level: low — single-file example-script fix, behavior change confined to a guarded branch, validated end-to-end per the PR description.

codecov · 2026-06-26T23:01:18Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 77.40%. Comparing base (248cbf2) to head (5525048).

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1839      +/-   ##
==========================================
- Coverage   77.40%   77.40%   -0.01%     
==========================================
  Files         515      515              
  Lines       57118    57118              
==========================================
- Hits        44214    44213       -1     
- Misses      12904    12905       +1

Flag	Coverage Δ
examples	`42.00% <ø> (-0.14%)`	⬇️
unit	`54.92% <ø> (+0.01%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

…it tests - Extract the remote-code vs built-in config selection into _resolve_init_config() in example_utils.get_model, and harden it: wrap the built-in AutoConfig reload in try/except so an edge-case checkpoint (built-in architecture name but non-built-in model_type) falls back to hf_config instead of hard-crashing. - Add focused unit tests in tests/examples/llm_ptq/test_example_utils.py for the decision logic (re-derive for remote config, keep non-remote config, fall back when the reload raises), mocking AutoConfig.from_pretrained. Addresses reviewer feedback on missing tests and the unguarded reload. Signed-off-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com>

meenchen

Approve to unblock, but this looks like a general issue outside of ModelOpt, and maybe we should update/fix the configs of the model

…it tests - Extract the remote-code vs built-in config selection into _resolve_init_config() in example_utils.get_model, and harden it: wrap the built-in AutoConfig reload in try/except so an edge-case checkpoint (built-in architecture name but non-built-in model_type) falls back to hf_config instead of hard-crashing. - Add focused unit tests in tests/examples/llm_ptq/test_example_utils.py for the decision logic (re-derive for remote config, keep non-remote config, fall back when the reload raises), mocking AutoConfig.from_pretrained. Addresses reviewer feedback on missing tests and the unguarded reload. Signed-off-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com>

copy-pr-bot · 2026-06-29T20:44:32Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

…te_code Bug: Quantizing nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16 via hf_ptq.py with --trust_remote_code fails on Transformers 5.x with "AttributeError: 'NemotronHConfig' object has no attribute 'moe_latent_size'" (works on 4.57.x). In example_utils.get_model, AutoConfig.from_pretrained(..., trust_remote_code=True) loads the checkpoint's bundled *remote* NemotronHConfig (authored for Transformers 4.55.4, no moe_latent_size). But since NemotronHForCausalLM is a *built-in* class in Transformers 5.x, the empty-weights device-map build uses the built-in model class, whose modeling code reads config.moe_latent_size -> the remote (old) config and the built-in (new) model are a mismatched pair. Transformers 4.57.x only worked because its built-in model never accessed that field. Fix: When instantiating the built-in model class, feed it a config from the same version as the model definition: if the loaded config came from remote code (class module under "transformers_modules"), re-derive it with the built-in class (AutoConfig without trust_remote_code) so required fields get their defaults. Non-remote configs are untouched. The real model load already resolves config via the built-in config_class, so only the device-map build needed aligning. Validated end-to-end: Nemotron-3-Nano-30B nvfp4 PTQ + export succeeds on Transformers 5.7.0; llm_ptq example unit tests pass. Signed-off-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com>

…it tests - Extract the remote-code vs built-in config selection into _resolve_init_config() in example_utils.get_model, and harden it: wrap the built-in AutoConfig reload in try/except so an edge-case checkpoint (built-in architecture name but non-built-in model_type) falls back to hf_config instead of hard-crashing. - Add focused unit tests in tests/examples/llm_ptq/test_example_utils.py for the decision logic (re-derive for remote config, keep non-remote config, fall back when the reload raises), mocking AutoConfig.from_pretrained. Addresses reviewer feedback on missing tests and the unguarded reload. Signed-off-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com>

Fridah-nv · 2026-06-29T20:48:55Z

Approve to unblock, but this looks like a general issue outside of ModelOpt, and maybe we should update/fix the configs of the model

I agree, the cleanest long-term fix is the model publisher updating the checkpoint's bundled config.

github-actions · 2026-06-29T23:48:58Z

PR Preview Action v1.8.1
Preview removed because the pull request was closed.
2026-06-29 23:48 UTC

Transplant the combined get_model fix from PRs #1839, #1857 and #1869 onto release/0.45.0's examples/llm_ptq/example_utils.py. These PRs could not be cherry-picked directly because the file was renamed llm_ptq -> hf_ptq (#1759) and surrounding get_model code diverged on main, but the actual fix targets the init_empty_weights / from_config block that already exists on the release branch: - _resolve_init_config: re-derive a built-in config for remote-code checkpoints so device-map inference matches the model definition's version (fixes Nemotron-H moe_latent_size AttributeError on transformers 5.x, #1839). - _get_config_dtype / _apply_dtype_to_config: derive dtype from the resolved config and forward the DeciLM-supported dtype kwarg, dropping unsupported dtype forwarding on the real from_pretrained load (#1857, #1869). Ports the accompanying unit tests (path-adjusted to llm_ptq). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>

#1858 #1839 #1857 #1869 (#1880) ## Cherry-picked PRs - #1801 - #1808 - #1629 - #1627 - #1824 - #1826 - #1830 - #1760 - #1831 - #1858 - #1839 - #1857 - #1869 #1839, #1857 and #1869 were back-ported (not a clean cherry-pick): the file was renamed `llm_ptq` -> `hf_ptq` (#1759) and surrounding `get_model` code diverged on `main`, but the actual fix targets the `init_empty_weights` / `from_config` block that already exists on the release branch. Accompanying unit tests were ported (15 passed).  ## Summary by CodeRabbit * **New Features** * Added a new PTQ recipe for NVFP4 MLP/MoE quantization with FP8 KV-cache calibration. * **Bug Fixes** * Improved ONNX mixed-precision/FP16 conversion reliability with stricter type handling and better stale output-shape reconciliation. * Fixed quantization/export edge cases: MoE router/gate handling, FP8 calibration/reduction failures, and additional FP8/INT8 robustness during export. * Standardized Puzzletron validation split naming to `validation`. * **Documentation** * Refreshed LM-Eval and TensorRT-Edge-LLM CLI instructions, including updated command names and examples.  --------- Signed-off-by: Meng Xin <mxin@nvidia.com> Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com> Signed-off-by: ajrasane <131806219+ajrasane@users.noreply.github.com> Signed-off-by: dimapihtar <dpykhtar@nvidia.com> Signed-off-by: Chenjie Luo <chenjiel@nvidia.com> Signed-off-by: Zhiyu Cheng <zhiyuc@nvidia.com> Signed-off-by: Grzegorz Karch <gkarch@nvidia.com> Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com> Co-authored-by: mxinO <164952785+mxinO@users.noreply.github.com> Co-authored-by: Ajinkya Rasane <131806219+ajrasane@users.noreply.github.com> Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Co-authored-by: Dmytro Pykhtar <37850217+dimapihtar@users.noreply.github.com> Co-authored-by: Chenjie Luo <108829653+cjluo-nv@users.noreply.github.com> Co-authored-by: Zhiyu <zhiyuc@nvidia.com> Co-authored-by: Grzegorz K. Karch <grzegorz-k-karch@users.noreply.github.com> Co-authored-by: Daniel Korzekwa <daniel.korzekwa@gmail.com>

Fridah-nv requested a review from a team as a code owner June 26, 2026 22:51

Fridah-nv requested a review from meenchen June 26, 2026 22:51

cjluo-nv reviewed Jun 26, 2026

View reviewed changes

claude Bot reviewed Jun 26, 2026

View reviewed changes

Comment thread examples/llm_ptq/example_utils.py Outdated

claude Bot approved these changes Jun 26, 2026

View reviewed changes

coderabbitai Bot approved these changes Jun 26, 2026

View reviewed changes

Fridah-nv added the cherry-pick-0.45.0 After code freeze, cherry-pick to release branch for next rc (bulk update). Only for bug fixes / doc label Jun 26, 2026

Fridah-nv requested a review from a team as a code owner June 26, 2026 23:17

meenchen approved these changes Jun 27, 2026

View reviewed changes

Fridah-nv force-pushed the fridah/fix-nemotron-h-trust-remote-code-config branch from 3900252 to da22e66 Compare June 29, 2026 20:44

Fridah-nv added 2 commits June 29, 2026 20:48

Fridah-nv force-pushed the fridah/fix-nemotron-h-trust-remote-code-config branch from da22e66 to 5525048 Compare June 29, 2026 20:48

Fridah-nv enabled auto-merge (squash) June 29, 2026 20:55

auto-merge was automatically disabled June 29, 2026 23:40
Branch protection rule check failed

Fridah-nv merged commit 72651b2 into main Jun 29, 2026
63 of 68 checks passed

Fridah-nv deleted the fridah/fix-nemotron-h-trust-remote-code-config branch June 29, 2026 23:48

This was referenced Jun 30, 2026

Fix HF PTQ empty-init dtype kwargs #1857

Merged

Refine DeciLM dtype handling in HF PTQ #1869

Merged

kevalmorabia97 mentioned this pull request Jul 1, 2026

[Cherry-pick] PRs #1801 #1808 #1629 #1627 #1824 #1826 #1830 #1760 #1831 #1858 #1839 #1857 #1869 #1880

Merged

kevalmorabia97 added the cherry-pick-done Added by bot once PR is cherry-picked to the release branch label Jul 1, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix Nemotron-H PTQ failure on Transformers 5.x with --trust_remote_code (moe_latent_size AttributeError)#1839

Fix Nemotron-H PTQ failure on Transformers 5.x with --trust_remote_code (moe_latent_size AttributeError)#1839
Fridah-nv merged 2 commits into
mainfrom
fridah/fix-nemotron-h-trust-remote-code-config

Fridah-nv commented Jun 26, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

Fridah-nv commented Jun 26, 2026

Uh oh!

coderabbitai Bot commented Jun 26, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

cjluo-nv left a comment

Uh oh!

Uh oh!

claude Bot left a comment

Uh oh!

codecov Bot commented Jun 26, 2026 •

edited

Loading

Uh oh!

meenchen left a comment

Uh oh!

copy-pr-bot Bot commented Jun 29, 2026

Uh oh!

Fridah-nv commented Jun 29, 2026

Uh oh!

Uh oh!

github-actions Bot commented Jun 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

Conversation

Fridah-nv commented Jun 26, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Usage

Testing

Before your PR is "Ready for review"

Additional Information

Summary by CodeRabbit

Uh oh!

Fridah-nv commented Jun 26, 2026

Uh oh!

coderabbitai Bot commented Jun 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

cjluo-nv left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

claude Bot left a comment

Choose a reason for hiding this comment

Summary

Assessment

Most impactful note

Uh oh!

codecov Bot commented Jun 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

meenchen left a comment

Choose a reason for hiding this comment

Uh oh!

copy-pr-bot Bot commented Jun 29, 2026

Uh oh!

Fridah-nv commented Jun 29, 2026

Uh oh!

Uh oh!

github-actions Bot commented Jun 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Fridah-nv commented Jun 26, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 26, 2026 •

edited

Loading

codecov Bot commented Jun 26, 2026 •

edited

Loading