Skip to content

Fix Nemotron-H PTQ failure on Transformers 5.x with --trust_remote_code (moe_latent_size AttributeError)#1839

Merged
Fridah-nv merged 2 commits into
mainfrom
fridah/fix-nemotron-h-trust-remote-code-config
Jun 29, 2026
Merged

Fix Nemotron-H PTQ failure on Transformers 5.x with --trust_remote_code (moe_latent_size AttributeError)#1839
Fridah-nv merged 2 commits into
mainfrom
fridah/fix-nemotron-h-trust-remote-code-config

Conversation

@Fridah-nv

@Fridah-nv Fridah-nv commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

What does this PR do?

Type of change: Bug fix

Quantizing remote-code checkpoints such as nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16 with --trust_remote_code fails on Transformers 5.x during model loading:

AttributeError: 'NemotronHConfig' object has no attribute 'moe_latent_size'

(It works on Transformers 4.57.x.)

Root cause: In examples/llm_ptq/example_utils.py::get_model, AutoConfig.from_pretrained(..., trust_remote_code=True) loads the checkpoint's bundled remote NemotronHConfig (authored for Transformers 4.55.4, which has no moe_latent_size). But because NemotronHForCausalLM is a built-in class in Transformers 5.x, the empty-weights device-map build instantiates the built-in model class, whose modeling code reads config.moe_latent_size. The remote (old) config and the built-in (new) model are a mismatched pair. Transformers 4.57.x only worked by luck — its built-in model never accessed that field.

Fix: When instantiating the built-in model class, feed it a config from the same version as the model definition. If the loaded config came from remote code (its class module lives under transformers_modules), re-derive it with the built-in class (AutoConfig without trust_remote_code) so required fields get their defaults. Non-remote configs are untouched. The subsequent real model load already resolves the config via the built-in config_class, so only the device-map build needed aligning.

Usage

No API change. The previously-failing command now works:

python examples/llm_ptq/hf_ptq.py \
    --pyt_ckpt_path nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16 \
    --qformat fp8,nvfp4_mse --calib_size 64 \
    --export_path ./output/nemotron-nano-fp8-nvfp4_mse \
    --trust_remote_code --dataset cnn_dailymail --auto_quantize_bits 4.75

Testing

  • Reproduced the original AttributeError on Transformers 5.7.0, then confirmed the fix resolves it.
  • End-to-end: nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16 nvfp4 PTQ + export completes successfully on Transformers 5.7.0.
  • tests/examples/llm_ptq/ unit tests (test_example_utils.py, test_hf_ptq_args.py, test_cast_mxfp4_to_nvfp4.py) all pass.

Before your PR is "Ready for review"

Make sure you read and follow Contributor guidelines and your commits are signed (git commit -s -S).

Make sure you read and follow the Security Best Practices (e.g. avoiding hardcoded trust_remote_code=True, torch.load(..., weights_only=False), pickle, etc.).

  • Is this change backward compatible?: ✅
  • If you copied code from any other sources or added a new PIP dependency, did you follow guidance in CONTRIBUTING.md: N/A
  • Did you write any new necessary tests?: N/A
  • Did you update Changelog?: N/A
  • Did you get Claude approval on this PR?: ❌

Additional Information

The fix is general: re-deriving the config with the built-in class handles any field the built-in model adds in future Transformers releases, not just moe_latent_size.

Summary by CodeRabbit

  • Bug Fixes
    • Improved model initialization when using remote-code configurations by re-deriving the initialization config for built-in model classes when possible.
    • Added a safe fallback to keep using the original configuration if re-derivation fails.
  • Tests
    • Added unit tests covering remote-code config re-derivation, ensuring unchanged configs are not re-derived, and verifying fallback behavior on errors.

@Fridah-nv Fridah-nv requested a review from a team as a code owner June 26, 2026 22:51
@Fridah-nv Fridah-nv requested a review from meenchen June 26, 2026 22:51
@Fridah-nv

Copy link
Copy Markdown
Contributor Author

/claude review

@coderabbitai

coderabbitai Bot commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: c6e48874-6083-4a51-b854-4c8ed69ff6ff

📥 Commits

Reviewing files that changed from the base of the PR and between da22e66 and 5525048.

📒 Files selected for processing (2)
  • examples/hf_ptq/example_utils.py
  • tests/examples/hf_ptq/test_example_utils.py
🚧 Files skipped from review as they are similar to previous changes (2)
  • tests/examples/hf_ptq/test_example_utils.py
  • examples/hf_ptq/example_utils.py

📝 Walkthrough

Walkthrough

Adds _resolve_init_config() to example_utils.py, which re-derives a built-in Transformers config via AutoConfig.from_pretrained (without trust_remote_code) when a remote-code config is paired with a built-in model class, falling back to the original config on failure. get_model() now uses this resolved config for torch_dtype and from_config(). Three unit tests cover re-derivation, passthrough, and fallback.

Changes

Remote-code model init reconfiguration

Layer / File(s) Summary
**_resolve_init_config helper and get_model wiring** <br> examples/hf_ptq/example_utils.py, tests/examples/hf_ptq/test_example_utils.py` New _resolve_init_config() conditionally calls AutoConfig.from_pretrained without trust_remote_code for remote-code configs and falls back to the original on failure. get_model() derives config_for_init and torch_dtype from the resolved config, and unit tests cover re-derivation, passthrough, and exception fallback.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

🚥 Pre-merge checks | ✅ 5 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 25.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly matches the PR’s main fix: Nemotron-H PTQ loading with trust_remote_code on Transformers 5.x.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Security Anti-Patterns ✅ Passed Touched files only add _resolve_init_config/tests; no hardcoded trust_remote_code, weights_only=False, allow_pickle=True, exec/eval, or # nosec appeared in the changes.
✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fridah/fix-nemotron-h-trust-remote-code-config

Comment @coderabbitai help to get the list of available commands.

@cjluo-nv cjluo-nv left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bot review — DM the bot to share feedback.

Small, well-scoped bug fix (+13/-2, 1 file) for Nemotron-H PTQ on Transformers 5.x with --trust_remote_code. The fix re-derives a built-in config (via AutoConfig.from_pretrained without trust_remote_code) when a remote-code config (module under transformers_modules) is paired with a built-in concrete model class, then uses that config for the empty-weights device-map build (torch_dtype + from_config).

Correctness looks sound:

  • The branch only fires for concrete transformers classes (auto_model_module not in [AutoModelForCausalLM, AutoModel]), which are reached only when hasattr(transformers, architecture) is true — so the built-in config_class exists and the re-derive won't fail for an unregistered arch.
  • Non-remote configs and the Auto* fallback paths are untouched; default config_for_init = hf_config preserves prior behavior.
  • The real model load downstream re-resolves config via the built-in config_class, so the change is contained to the device-map build.

No licensing concerns; no design-review trigger; no prompt-injection in the untrusted blocks.

Why nudge rather than approve:

  • No test. The author states a regression test would require loading a remote-code checkpoint, but the decision logic here (module-prefix check + config re-derive, and which config is passed to from_config) is unit-testable by mocking AutoConfig.from_pretrained/from_configtests/examples/llm_ptq/test_example_utils.py already tests get_model helpers with pure-Python stubs. A focused unit test asserting the built-in config is used for transformers_modules configs (and that non-remote configs are untouched) would lock in the fix.
  • The fix was validated only end-to-end on GPU with a specific checkpoint (Transformers 5.7.0); no automated coverage guards against regressions.

Comment thread examples/llm_ptq/example_utils.py Outdated

@claude claude Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude review passed — no blocking issues found. LGTM

Summary

Findings — CRITICAL: 0, IMPORTANT: 0, SUGGESTION: 1

Reviewed the single changed file (examples/llm_ptq/example_utils.py, +13/-2). This is a narrow, well-reasoned bug fix for the Nemotron-H moe_latent_size AttributeError on Transformers 5.x with --trust_remote_code.

Assessment

The fix is correct and appropriately scoped:

  • Root cause is accurately diagnosed: a remote-loaded (old) NemotronHConfig paired with a built-in (new) model class during the empty-weights device-map build. The new code re-derives the config with the built-in class so required fields get their defaults.
  • The guard is precise: re-derivation only happens when (a) the model class is a built-in architecture class (not the AutoModelForCausalLM/AutoModel remote fallback branch) and (b) the loaded config actually came from transformers_modules (remote code). Non-remote configs and the fallback path are untouched — matching the stated backward-compat claim.
  • config_kwargs filtering is sound: config_kwargs only contains trust_remote_code when truthy, so stripping it yields a clean built-in config load.
  • Consistency restored: the device-map build now uses the same built-in config that the real from_pretrained load at the bottom of the branch resolves via the built-in config_class.

Most impactful note

One non-blocking SUGGESTION: the new second AutoConfig.from_pretrained call could raise on edge-case checkpoints (built-in architecture name + non-built-in model_type), turning a path that previously progressed into a hard crash. Wrapping it with a fallback to the existing hf_config would harden the branch. Not required for merge.

Risk level: low — single-file example-script fix, behavior change confined to a guarded branch, validated end-to-end per the PR description.

@codecov

codecov Bot commented Jun 26, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 77.40%. Comparing base (248cbf2) to head (5525048).

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1839      +/-   ##
==========================================
- Coverage   77.40%   77.40%   -0.01%     
==========================================
  Files         515      515              
  Lines       57118    57118              
==========================================
- Hits        44214    44213       -1     
- Misses      12904    12905       +1     
Flag Coverage Δ
examples 42.00% <ø> (-0.14%) ⬇️
unit 54.92% <ø> (+0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@Fridah-nv Fridah-nv added the cherry-pick-0.45.0 After code freeze, cherry-pick to release branch for next rc (bulk update). Only for bug fixes / doc label Jun 26, 2026
Fridah-nv added a commit that referenced this pull request Jun 26, 2026
…it tests

- Extract the remote-code vs built-in config selection into _resolve_init_config()
  in example_utils.get_model, and harden it: wrap the built-in AutoConfig reload in
  try/except so an edge-case checkpoint (built-in architecture name but non-built-in
  model_type) falls back to hf_config instead of hard-crashing.
- Add focused unit tests in tests/examples/llm_ptq/test_example_utils.py for the
  decision logic (re-derive for remote config, keep non-remote config, fall back when
  the reload raises), mocking AutoConfig.from_pretrained.

Addresses reviewer feedback on missing tests and the unguarded reload.

Signed-off-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com>
@Fridah-nv Fridah-nv requested a review from a team as a code owner June 26, 2026 23:17

@meenchen meenchen left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approve to unblock, but this looks like a general issue outside of ModelOpt, and maybe we should update/fix the configs of the model

Fridah-nv added a commit that referenced this pull request Jun 29, 2026
…it tests

- Extract the remote-code vs built-in config selection into _resolve_init_config()
  in example_utils.get_model, and harden it: wrap the built-in AutoConfig reload in
  try/except so an edge-case checkpoint (built-in architecture name but non-built-in
  model_type) falls back to hf_config instead of hard-crashing.
- Add focused unit tests in tests/examples/llm_ptq/test_example_utils.py for the
  decision logic (re-derive for remote config, keep non-remote config, fall back when
  the reload raises), mocking AutoConfig.from_pretrained.

Addresses reviewer feedback on missing tests and the unguarded reload.

Signed-off-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com>
@Fridah-nv Fridah-nv force-pushed the fridah/fix-nemotron-h-trust-remote-code-config branch from 3900252 to da22e66 Compare June 29, 2026 20:44
@copy-pr-bot

copy-pr-bot Bot commented Jun 29, 2026

Copy link
Copy Markdown

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

…te_code

Bug:
  Quantizing nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16 via hf_ptq.py with
  --trust_remote_code fails on Transformers 5.x with
  "AttributeError: 'NemotronHConfig' object has no attribute 'moe_latent_size'"
  (works on 4.57.x).

  In example_utils.get_model, AutoConfig.from_pretrained(..., trust_remote_code=True)
  loads the checkpoint's bundled *remote* NemotronHConfig (authored for Transformers
  4.55.4, no moe_latent_size). But since NemotronHForCausalLM is a *built-in* class in
  Transformers 5.x, the empty-weights device-map build uses the built-in model class,
  whose modeling code reads config.moe_latent_size -> the remote (old) config and the
  built-in (new) model are a mismatched pair. Transformers 4.57.x only worked because
  its built-in model never accessed that field.

Fix:
  When instantiating the built-in model class, feed it a config from the same version
  as the model definition: if the loaded config came from remote code (class module
  under "transformers_modules"), re-derive it with the built-in class (AutoConfig
  without trust_remote_code) so required fields get their defaults. Non-remote configs
  are untouched. The real model load already resolves config via the built-in
  config_class, so only the device-map build needed aligning.

Validated end-to-end: Nemotron-3-Nano-30B nvfp4 PTQ + export succeeds on Transformers
5.7.0; llm_ptq example unit tests pass.

Signed-off-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com>
…it tests

- Extract the remote-code vs built-in config selection into _resolve_init_config()
  in example_utils.get_model, and harden it: wrap the built-in AutoConfig reload in
  try/except so an edge-case checkpoint (built-in architecture name but non-built-in
  model_type) falls back to hf_config instead of hard-crashing.
- Add focused unit tests in tests/examples/llm_ptq/test_example_utils.py for the
  decision logic (re-derive for remote config, keep non-remote config, fall back when
  the reload raises), mocking AutoConfig.from_pretrained.

Addresses reviewer feedback on missing tests and the unguarded reload.

Signed-off-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com>
@Fridah-nv Fridah-nv force-pushed the fridah/fix-nemotron-h-trust-remote-code-config branch from da22e66 to 5525048 Compare June 29, 2026 20:48
@Fridah-nv

Copy link
Copy Markdown
Contributor Author

Approve to unblock, but this looks like a general issue outside of ModelOpt, and maybe we should update/fix the configs of the model

I agree, the cleanest long-term fix is the model publisher updating the checkpoint's bundled config.

@Fridah-nv Fridah-nv enabled auto-merge (squash) June 29, 2026 20:55
auto-merge was automatically disabled June 29, 2026 23:40

Branch protection rule check failed

@Fridah-nv Fridah-nv merged commit 72651b2 into main Jun 29, 2026
63 of 68 checks passed
@Fridah-nv Fridah-nv deleted the fridah/fix-nemotron-h-trust-remote-code-config branch June 29, 2026 23:48
@github-actions

Copy link
Copy Markdown
Contributor
PR Preview Action v1.8.1
Preview removed because the pull request was closed.
2026-06-29 23:48 UTC

kevalmorabia97 added a commit that referenced this pull request Jul 1, 2026
Transplant the combined get_model fix from PRs #1839, #1857 and #1869
onto release/0.45.0's examples/llm_ptq/example_utils.py. These PRs could
not be cherry-picked directly because the file was renamed
llm_ptq -> hf_ptq (#1759) and surrounding get_model code diverged on main,
but the actual fix targets the init_empty_weights / from_config block that
already exists on the release branch:

- _resolve_init_config: re-derive a built-in config for remote-code
  checkpoints so device-map inference matches the model definition's
  version (fixes Nemotron-H moe_latent_size AttributeError on transformers
  5.x, #1839).
- _get_config_dtype / _apply_dtype_to_config: derive dtype from the
  resolved config and forward the DeciLM-supported dtype kwarg, dropping
  unsupported dtype forwarding on the real from_pretrained load
  (#1857, #1869).

Ports the accompanying unit tests (path-adjusted to llm_ptq).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
@kevalmorabia97 kevalmorabia97 added the cherry-pick-done Added by bot once PR is cherry-picked to the release branch label Jul 1, 2026
kevalmorabia97 added a commit that referenced this pull request Jul 2, 2026
#1858 #1839 #1857 #1869 (#1880)

## Cherry-picked PRs

- #1801
- #1808
- #1629
- #1627
- #1824
- #1826
- #1830
- #1760
- #1831
- #1858
- #1839
- #1857
- #1869

#1839, #1857 and #1869 were back-ported (not a clean cherry-pick): the
file was
renamed `llm_ptq` -> `hf_ptq` (#1759) and surrounding `get_model` code
diverged on
`main`, but the actual fix targets the `init_empty_weights` /
`from_config` block that
already exists on the release branch. Accompanying unit tests were
ported (15 passed).

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

* **New Features**
* Added a new PTQ recipe for NVFP4 MLP/MoE quantization with FP8
KV-cache calibration.
* **Bug Fixes**
* Improved ONNX mixed-precision/FP16 conversion reliability with
stricter type handling and better stale output-shape reconciliation.
* Fixed quantization/export edge cases: MoE router/gate handling, FP8
calibration/reduction failures, and additional FP8/INT8 robustness
during export.
  * Standardized Puzzletron validation split naming to `validation`.
* **Documentation**
* Refreshed LM-Eval and TensorRT-Edge-LLM CLI instructions, including
updated command names and examples.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Signed-off-by: Meng Xin <mxin@nvidia.com>
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
Signed-off-by: ajrasane <131806219+ajrasane@users.noreply.github.com>
Signed-off-by: dimapihtar <dpykhtar@nvidia.com>
Signed-off-by: Chenjie Luo <chenjiel@nvidia.com>
Signed-off-by: Zhiyu Cheng <zhiyuc@nvidia.com>
Signed-off-by: Grzegorz Karch <gkarch@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Co-authored-by: mxinO <164952785+mxinO@users.noreply.github.com>
Co-authored-by: Ajinkya Rasane <131806219+ajrasane@users.noreply.github.com>
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-authored-by: Dmytro Pykhtar <37850217+dimapihtar@users.noreply.github.com>
Co-authored-by: Chenjie Luo <108829653+cjluo-nv@users.noreply.github.com>
Co-authored-by: Zhiyu <zhiyuc@nvidia.com>
Co-authored-by: Grzegorz K. Karch <grzegorz-k-karch@users.noreply.github.com>
Co-authored-by: Daniel Korzekwa <daniel.korzekwa@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cherry-pick-0.45.0 After code freeze, cherry-pick to release branch for next rc (bulk update). Only for bug fixes / doc cherry-pick-done Added by bot once PR is cherry-picked to the release branch

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants