Fix HF PTQ empty-init dtype kwargs by realAsma · Pull Request #1857 · NVIDIA/Model-Optimizer

realAsma · 2026-06-29T22:55:44Z

Summary

Fixes NVBug 6359821: hf_ptq.py can fail for remote/custom architectures like DeciLMForCausalLM when dtype-related kwargs are forwarded into model construction paths that do not accept them.

This change keeps the fix scoped to the observed DeciLM/Llama Nemotron path. It resolves the init config used for empty-weight construction, derives dtype consistently from the resolved config, forwards the supported dtype kwarg for the DeciLM empty-weight probe, and drops unsupported dtype forwarding from the DeciLM real from_pretrained() load.

NVBug: https://nvbugspro.nvidia.com/bug/6359821

Validation

pre-commit run --files examples/hf_ptq/example_utils.py tests/examples/hf_ptq/test_example_utils.py
pytest_pwd tests/examples/hf_ptq/test_example_utils.py -q -x (15 passed)
Actual Llama-3_3-Nemotron-Super-49B-v1 end-to-end hf_ptq.py export on one node with 6 GPUs, Transformers 4.48.3: Fix HF PTQ empty-init dtype kwargs #1857 (comment)

Summary by CodeRabbit

Bug Fixes
- Improved model loading for Hugging Face remote-code scenarios by safely re-deriving the initialization configuration when needed, with a warning-based fallback.
- Ensured precision is derived consistently from the resolved config (including dtype name handling) with a safe default when unspecified.
- Tightened forwarding of precision-related kwargs and trust_remote_code, and avoided passing max_memory during config loading.
Tests
- Added unit coverage for initialization config resolution (including failure fallback).
- Extended integration-style coverage to validate dtype/kwarg forwarding, trust_remote_code behavior, and eval-mode initialization.

copy-pr-bot · 2026-06-29T22:55:48Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

coderabbitai · 2026-06-29T22:55:50Z

📝 Walkthrough

Walkthrough

get_model now re-derives init configs for remote-code cases, computes dtype kwargs from the resolved config, adjusts architecture-specific loading kwargs, and updates tests for the new config and loading paths.

Changes

dtype kwarg fix in get_model

Layer / File(s)	Summary
Init config resolution `examples/hf_ptq/example_utils.py`	`_resolve_init_config(...)` re-derives a built-in `AutoConfig` for remote-code configs used with built-in model classes, and falls back to the original config on failure.
Architecture init and pretrained kwargs `examples/hf_ptq/example_utils.py`	`get_model` uses `config_for_init`, derives `config_dtype` from `dtype` or `torch_dtype`, strips `trust_remote_code` for non-`AutoModelForCausalLM`/`AutoModel` loaders, applies DeciLM-specific `torch_dtype` handling, and passes a copied kwargs set to `from_pretrained`.
Tests for config resolution and kwargs `tests/examples/hf_ptq/test_example_utils.py`	Adds tests for `_resolve_init_config` and a parametrized `get_model` test covering dtype kwarg forwarding, `trust_remote_code`, `max_memory`, `eval()`, and model-class-specific loading behavior.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related PRs

NVIDIA/Model-Optimizer#1839: Shares the same get_model loading-path changes around config resolution, dtype selection, and trust_remote_code handling.

Suggested reviewers

claude
meenchen

🚥 Pre-merge checks | ✅ 5 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 15.38% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (5 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly matches the main change: fixing HF PTQ dtype kwargs handling during empty-init and model loading.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Security Anti-Patterns	✅ Passed	Touched HF PTQ code adds no hardcoded trust_remote_code=True, no bare eval/exec, no unsafe torch.load/numpy.load, and no # nosec; the lone True is only in a test call.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch asma/nvbug-6359821

_{Comment @coderabbitai help to get the list of available commands.}

realAsma

BB: Did you do an end to end export test for the DeciLM model?

realAsma · 2026-06-29T23:41:18Z

BB:

did you see this: #1857 (review)

realAsma · 2026-06-29T23:43:59Z

🤖 Bot comment.

Replying to #1857 (review)

I saw it. Short answer: not a clean end-to-end export pass on the current PR head yet.

What I verified so far:

The actual DeciLM model run got past the original empty-init dtype constructor failure.
The current PR head still hit the same unsupported dtype kwarg later in the real from_pretrained path.
With a local DeciLM dtype-strip patch for that real-load path, the run loaded the model, selected general/ptq/nvfp4_default-kv_fp8, inserted quantizers, completed the calibration loop, and wrote the quant summary.
It then failed later during HF export/save with a separate DeciLM/Transformers tied-weight issue: _get_tied_weight_keys saw a list where it expected .keys().

So the dtype fix is not complete as currently pushed. I have local evidence for the extra real-load dtype patch, but export still has a separate follow-up failure after calibration.

realAsma · 2026-06-30T00:00:20Z

🤖 Bot comment.

Replying to #1857 (review)

I saw it. Short answer: not a clean end-to-end export pass on the current PR head yet.

What I verified so far:

The actual DeciLM model run got past the original empty-init dtype constructor failure.

The current PR head still hit the same unsupported dtype kwarg later in the real from_pretrained path.

With a local DeciLM dtype-strip patch for that real-load path, the run loaded the model, selected general/ptq/nvfp4_default-kv_fp8, inserted quantizers, completed the calibration loop, and wrote the quant summary.

It then failed later during HF export/save with a separate DeciLM/Transformers tied-weight issue: _get_tied_weight_keys saw a list where it expected .keys().

So the dtype fix is not complete as currently pushed. I have local evidence for the extra real-load dtype patch, but export still has a separate follow-up failure after calibration.

BB: okay ping me in slack when this is done.

Signed-off-by: realAsma <akuriparambi@nvidia.com>

realAsma · 2026-06-30T02:49:32Z

🤖 Bot comment.

Pushed the follow-up fix as 2641497a19.

This keeps the empty-init device-map path unchanged, but for the final real from_pretrained load it now drops dtype without synthesizing deprecated torch_dtype="auto". I also updated the focused test expectation.

Validation already completed on omniml-a9.nvidia.com with Transformers 4.48.3 against the actual Llama-3_3-Nemotron-Super-49B-v1 hf_ptq flow: all 21 shards loaded, 1461 quantizers inserted, calibration completed, .quant_summary.txt written, custom DeciLM files copied, and export succeeded. DCO is green on the updated PR head.

Signed-off-by: realAsma <akuriparambi@nvidia.com>

realAsma

BB: Approve. Make this a regular PR.

realAsma · 2026-06-30T16:27:15Z

Llama Nemotron3 end-to-end export validation

Fresh validation passed on the current PR head.

PR head: acb6e702e0877f08bbb4c76564be219601d3092c (origin/asma/nvbug-6359821)
Hardware: one node with 6 GPUs
Container: nvcr.io/nvidia/pytorch:26.03-py3
Python: 3.12.3
Torch: 2.11.0a0+a6c236b9fd.nv26.03.46836102
Transformers: 4.48.3
Model: nvidia/Llama-3_3-Nemotron-Super-49B-v1
Recipe: general/ptq/nvfp4_default-kv_fp8

hf_ptq.py command:

python hf_ptq.py \
  --pyt_ckpt_path nvidia/Llama-3_3-Nemotron-Super-49B-v1 \
  --recipe general/ptq/nvfp4_default-kv_fp8 \
  --export_path <export_dir> \
  --trust_remote_code \
  --calib_size 1 \
  --batch_size 1 \
  --use_seq_device_map \
  --inference_tensor_parallel 6 \
  --attn_implementation eager \
  --skip_generate

Result from the log:

HEAD is now at acb6e702e0 Fold HF PTQ dtype test cases
transformers 4.48.3
Loading checkpoint shards: 100%|██████████| 21/21
Inserted 1461 quantizers
Quant summary saved to <export_dir>/.quant_summary.txt
Successfully copied 6 custom model files to <export_dir>
Quantized model exported to: <export_dir>. Total time used 99.43220281600952s

Signed-off-by: realAsma <akuriparambi@nvidia.com>

coderabbitai

Warning

CodeRabbit couldn't request changes on this pull request because it doesn't have sufficient GitHub permissions.

Please grant CodeRabbit Pull requests: Read and write permission and re-run the review.

👉 Steps to fix this

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@tests/examples/hf_ptq/test_example_utils.py`:
- Around line 209-210: The DeciLM test is still branching on whether
`transformers.DeciLMForCausalLM` exists, so the behavior can drift with package
versions. In the DeciLM parametrization in `test_example_utils.py`, make
`get_model()` take the `AutoModelForCausalLM` path explicitly by stubbing or
deleting `example_utils.transformers.DeciLMForCausalLM` before the call, while
leaving the `LlamaForCausalLM` case unchanged. Use the `get_model` and
`example_utils.transformers` symbols to locate the affected test setup.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 9e8119c2-de2e-4ede-9cd8-f91eb5b59553

📥 Commits

Reviewing files that changed from the base of the PR and between 72651b2 and acb6e70.

📒 Files selected for processing (2)

examples/hf_ptq/example_utils.py
tests/examples/hf_ptq/test_example_utils.py

codecov · 2026-06-30T17:04:27Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 77.40%. Comparing base (72651b2) to head (7462b8f).
⚠️ Report is 4 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1857      +/-   ##
==========================================
+ Coverage   74.12%   77.40%   +3.28%     
==========================================
  Files         515      515              
  Lines       57118    57118              
==========================================
+ Hits        42338    44213    +1875     
+ Misses      14780    12905    -1875

Flag	Coverage Δ
examples	`42.00% <ø> (+0.58%)`	⬆️
unit	`54.92% <ø> (ø)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

cjluo-nv

Bot review — DM the bot to share feedback.

Small, focused fix (+103/-3, 2 files) for NVBug 6359821: avoids leaking the dtype kwarg into custom remote-code constructors (e.g. DeciLMForCausalLM) by special-casing "DeciLM" in architecture to use torch_dtype for the empty-weight from_config() probe and dropping dtype entirely from the real from_pretrained() load. Also robustifies dtype derivation (hf_config.dtype → torch_dtype → bf16, with str→torch resolution). A new parametrized test covers both the DeciLM and Llama paths, asserting the correct dtype kwarg name, trust_remote_code forwarding, and eval(). No licensing concerns (test header is the standard NVIDIA Apache-2.0 block); no injection in the untrusted blocks. Nudging rather than approving for human sign-off on a few points:

💬 Author validated end-to-end on Llama-3_3-Nemotron-Super-49B-v1 (6 GPUs, Transformers 4.48.3) per the PR comment — but the DeciLM path is GPU/remote-code-only and can't be exercised in CI here, so the actual-load behavior change warrants an owner who knows DeciLM.
The PR body says the fix "sets PyTorch's default dtype only around the empty-weight construction" — that's not what the diff does (it sets torch_dtype/dtype kwargs explicitly and never touches torch.set_default_dtype). The description appears stale relative to the implementation; worth reconciling so future readers aren't misled.
Asymmetry worth a sanity check: for DeciLM the empty-weight probe is told torch_dtype=config_dtype, but the real from_pretrained() gets no dtype at all (dtype popped), so it falls back to the checkpoint's native dtype. That's probably fine (and matches what the probe was approximating), but a quick confirmation that DeciLM loads at the intended precision would be good.
The "DeciLM" in architecture substring check is a hardcoded special case; fine for a targeted bugfix but a fragile pattern if more remote-code architectures need the same treatment later.

Signed-off-by: realAsma <akuriparambi@nvidia.com>

meenchen · 2026-06-30T18:18:30Z

-            hf_config, auto_model_module, ckpt_path, config_kwargs
-        )
-
+        is_decilm = "DeciLM" in architecture


Is it possible to make this a general WAR instead of DiciLM specific?

🤖 Bot comment.

Thanks Wei-Ming. I agree this may be generalizable to other older remote-code models with the same Transformers 5+ incompatibility, but finding and validating those models would be a broader follow-up. For this PR, I would like to keep the fix scoped to the observed Llama Nemotron / DeciLM failure since broader remote-code fallback support is lower value and would need dedicated coverage.

🤖 Bot comment.

Thanks Wei-Ming. I agree this could probably be generalized to older remote-code models with the same constructor mismatch, but that would require identifying and validating the affected model set.

For this RC bug, I would keep the fix scoped to Llama Nemotron / DeciLM because that is the reported failure and the broader remote-code support case is lower value without dedicated coverage. I can follow up separately if we find more models with the same failure.

Signed-off-by: realAsma <akuriparambi@nvidia.com>

meenchen

Approve to unblock, please see if we can avoid model arch hardcode

…t-resolve Signed-off-by: realAsma <akuriparambi@nvidia.com> # Conflicts: # examples/hf_ptq/example_utils.py # tests/examples/hf_ptq/test_example_utils.py

realAsma · 2026-06-30T19:04:47Z

+        model_kwargs2 = model_kwargs.copy()
+        if is_decilm:
+            model_kwargs2.pop("dtype", None)


BB: can we use the same if is_decilm: model_kwargs2["torch_dtype"] = config_dtype model_kwargs2.pop("dtype", None) behavior here as well to be safe? we can create a helper to get the config with the correct dtype.

github-actions · 2026-06-30T19:13:05Z

PR Preview Action v1.8.1
Preview removed because the pull request was closed.
2026-06-30 19:12 UTC

cjluo-nv

Bot review — DM the bot to share feedback.

Re-review of small, focused bugfix (+104/-3, 2 files) for NVBug 6359821: avoids leaking the modern dtype kwarg into the DeciLM remote-code constructor by special-casing "DeciLM" in architecture to use torch_dtype for the empty-weight from_config() probe and dropping dtype from the real from_pretrained() load. Dtype derivation is also robustified (config.dtype → torch_dtype → bf16, with str→torch resolution). No licensing concerns (test header matches the canonical LICENSE_HEADER; file is modified, not new). No injection attempts in the untrusted blocks.

Status of previous review comments:

💬 CodeRabbit (test:210) — force the AutoModelForCausalLM path so the test doesn't drift with transformers versions: addressed — DeciLM parametrization now does monkeypatch.delattr(example_utils.transformers, architecture, raising=False).
💬 kevalmorabia97 (test:249) — make FakeLlamaForCausalLM a subclass of FakeAutoModelForCausalLM: addressed (subclass overriding only from_pretrained).
💬 Author asked for end-to-end DeciLM/Nemotron export validation: addressed — fresh e2e run posted on the current head acb6e702e0 (Llama-3_3-Nemotron-Super-49B-v1, 6 GPUs, TF 4.48.3): 21 shards loaded, 1461 quantizers, calibration + export succeeded.

Why nudge rather than approve:

💬 meenchen/realAsma asked whether the fix could be a general WAR instead of DeciLM-specific — author justified keeping it scoped to the reported RC bug. The "DeciLM" in architecture substring check is a deliberate, accepted hardcode but remains fragile if more remote-code architectures hit the same constructor mismatch; an owner familiar with DeciLM should confirm this is the right scope.
The behavior change (different dtype kwarg + dropping dtype on the real load) only affects the GPU/remote-code path and can't be exercised in CI; the unit test uses fakes/monkeypatch, so the actual-load precision behavior rests on the author's manual e2e run — worth an owner sign-off.
💬 realAsma asked to simplify the tests using tiny Qwen/Deci fixtures; the pushed version instead folds coverage into a parametrized get_model test using FakeModel stubs. Reasonable, but diverges from the requested fixture-based approach — owner should confirm this is acceptable.
Minor: the PR body's claim that the fix "sets PyTorch's default dtype only around the empty-weight construction" is stale relative to the diff (it sets dtype/torch_dtype kwargs explicitly, never torch.set_default_dtype). Worth reconciling so future readers aren't misled.

cjluo-nv

Bot review — DM the bot to share feedback.

Re-review of a small, focused bugfix (+104/-3, 2 files) for NVBug 6359821: avoids leaking the modern dtype kwarg into the DeciLM remote-code constructor by special-casing "DeciLM" in architecture to (a) use torch_dtype=config_dtype for the empty-weight from_config() probe and (b) drop dtype entirely from the real from_pretrained() load. Dtype derivation is robustified (config.dtype → torch_dtype → bf16, with str→torch resolution). No licensing concerns: the test header matches the canonical LICENSE_HEADER, and both files are modified (not new). No prompt-injection attempts in the untrusted blocks (the "BB: Approve"/"Approve to unblock" lines are genuine human reviewer comments, not directives aimed at me).

Status of previous review comments:

💬 realAsma (example_utils:799) — apply the same is_decilm dtype-drop to the real from_pretrained load: addressed — the diff now does if is_decilm: model_kwargs2.pop("dtype", None) on that path.
💬 kevalmorabia97 (test:249) — make FakeLlamaForCausalLM a subclass of FakeAutoModelForCausalLM: addressed (subclass overriding only from_pretrained).
💬 CodeRabbit (test:210) — force the AutoModelForCausalLM path deterministically: addressed — the DeciLM parametrization now does monkeypatch.delattr(example_utils.transformers, architecture, raising=False).
💬 PR-body staleness ("sets PyTorch's default dtype") flagged in prior review: addressed — the body now describes forwarding/dropping kwargs, matching the diff.

Why nudge rather than approve:

💬 Author posted a fresh e2e on the current head acb6e702e0 (Llama-3_3-Nemotron-Super-49B-v1, 6 GPUs, TF 4.48.3): 21 shards, 1461 quantizers, export succeeded — but an earlier run on this PR hit a separate DeciLM/Transformers tied-weight export failure (_get_tied_weight_keys), and the "now clean" claim rests entirely on a manual GPU run that CI cannot reproduce. The unit test only exercises fakes/monkeypatch, so the actual-load precision behavior (dropping dtype → falling back to the checkpoint's native dtype) needs an owner who knows DeciLM to confirm.
💬 meenchen/realAsma asked whether this could be a general WAR instead of DeciLM-specific; author justified keeping it scoped to the reported RC bug. The "DeciLM" in architecture substring is a deliberate, accepted hardcode but remains fragile if more remote-code archs hit the same constructor mismatch — worth an owner confirming the scope is right.

realAsma · 2026-07-01T01:49:04Z

🤖 Bot comment.

Processed the BB: review-body request.

Outcome:

No action needed for the regular-PR step: this PR was already non-draft and is now merged.
No approval review was submitted because GitHub does not allow the PR author to approve their own PR.
No code changes, commits, or pushes were made.

## Summary - factor config dtype resolution into helpers for HF PTQ model loading - keep DeciLM empty-init and final-load kwargs on `torch_dtype` while avoiding unsupported `dtype` forwarding - update the DeciLM dtype unit assertion for the follow-up behavior Follow-up to #1857 for NVBug 6359821. ## Validation - `pytest_pwd tests/examples/hf_ptq/test_example_utils.py -q -x` (`15 passed`) - `git diff --check` - `pre-commit run --files examples/hf_ptq/example_utils.py tests/examples/hf_ptq/test_example_utils.py`  ## Summary by CodeRabbit * **Bug Fixes** * Improved model loading so precision (dtype) is applied more consistently across supported loading paths, including DeciLM models. * Updated initialization to derive dtype from model configuration and pass the expected precision into model loading kwargs. * **Tests** * Updated test expectations to reflect the new dtype kwarg behavior during `from_pretrained` for causal language model loading scenarios.  --------- Signed-off-by: realAsma <akuriparambi@nvidia.com>

Transplant the combined get_model fix from PRs #1839, #1857 and #1869 onto release/0.45.0's examples/llm_ptq/example_utils.py. These PRs could not be cherry-picked directly because the file was renamed llm_ptq -> hf_ptq (#1759) and surrounding get_model code diverged on main, but the actual fix targets the init_empty_weights / from_config block that already exists on the release branch: - _resolve_init_config: re-derive a built-in config for remote-code checkpoints so device-map inference matches the model definition's version (fixes Nemotron-H moe_latent_size AttributeError on transformers 5.x, #1839). - _get_config_dtype / _apply_dtype_to_config: derive dtype from the resolved config and forward the DeciLM-supported dtype kwarg, dropping unsupported dtype forwarding on the real from_pretrained load (#1857, #1869). Ports the accompanying unit tests (path-adjusted to llm_ptq). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>

#1858 #1839 #1857 #1869 (#1880) ## Cherry-picked PRs - #1801 - #1808 - #1629 - #1627 - #1824 - #1826 - #1830 - #1760 - #1831 - #1858 - #1839 - #1857 - #1869 #1839, #1857 and #1869 were back-ported (not a clean cherry-pick): the file was renamed `llm_ptq` -> `hf_ptq` (#1759) and surrounding `get_model` code diverged on `main`, but the actual fix targets the `init_empty_weights` / `from_config` block that already exists on the release branch. Accompanying unit tests were ported (15 passed).  ## Summary by CodeRabbit * **New Features** * Added a new PTQ recipe for NVFP4 MLP/MoE quantization with FP8 KV-cache calibration. * **Bug Fixes** * Improved ONNX mixed-precision/FP16 conversion reliability with stricter type handling and better stale output-shape reconciliation. * Fixed quantization/export edge cases: MoE router/gate handling, FP8 calibration/reduction failures, and additional FP8/INT8 robustness during export. * Standardized Puzzletron validation split naming to `validation`. * **Documentation** * Refreshed LM-Eval and TensorRT-Edge-LLM CLI instructions, including updated command names and examples.  --------- Signed-off-by: Meng Xin <mxin@nvidia.com> Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com> Signed-off-by: ajrasane <131806219+ajrasane@users.noreply.github.com> Signed-off-by: dimapihtar <dpykhtar@nvidia.com> Signed-off-by: Chenjie Luo <chenjiel@nvidia.com> Signed-off-by: Zhiyu Cheng <zhiyuc@nvidia.com> Signed-off-by: Grzegorz Karch <gkarch@nvidia.com> Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com> Co-authored-by: mxinO <164952785+mxinO@users.noreply.github.com> Co-authored-by: Ajinkya Rasane <131806219+ajrasane@users.noreply.github.com> Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Co-authored-by: Dmytro Pykhtar <37850217+dimapihtar@users.noreply.github.com> Co-authored-by: Chenjie Luo <108829653+cjluo-nv@users.noreply.github.com> Co-authored-by: Zhiyu <zhiyuc@nvidia.com> Co-authored-by: Grzegorz K. Karch <grzegorz-k-karch@users.noreply.github.com> Co-authored-by: Daniel Korzekwa <daniel.korzekwa@gmail.com>

realAsma added the cherry-pick-0.45.0 After code freeze, cherry-pick to release branch for next rc (bulk update). Only for bug fixes / doc label Jun 29, 2026

realAsma mentioned this pull request Jun 29, 2026

Fix HF PTQ empty-init dtype kwargs #1853

Closed

realAsma commented Jun 29, 2026

View reviewed changes

realAsma added 4 commits June 30, 2026 02:48

Fix HF PTQ empty-init dtype fallback

72ee53f

Signed-off-by: realAsma <akuriparambi@nvidia.com>

Simplify HF PTQ empty init dtype fix

7d0bebd

Signed-off-by: realAsma <akuriparambi@nvidia.com>

Fix HF PTQ real-load dtype kwarg

6cb2243

Signed-off-by: realAsma <akuriparambi@nvidia.com>

Drop dtype from HF PTQ final load

2641497

Signed-off-by: realAsma <akuriparambi@nvidia.com>

realAsma force-pushed the asma/nvbug-6359821 branch from 0f50049 to 2641497 Compare June 30, 2026 02:48

realAsma commented Jun 30, 2026

View reviewed changes

Comment thread examples/hf_ptq/example_utils.py Outdated

Scope HF PTQ dtype workaround to DeciLM

97bb15c

Signed-off-by: realAsma <akuriparambi@nvidia.com>

coderabbitai Bot approved these changes Jun 30, 2026

View reviewed changes

realAsma commented Jun 30, 2026

View reviewed changes

Comment thread tests/examples/hf_ptq/test_example_utils.py

Fold HF PTQ dtype test cases

acb6e70

Signed-off-by: realAsma <akuriparambi@nvidia.com>

realAsma commented Jun 30, 2026

View reviewed changes

kevalmorabia97 reviewed Jun 30, 2026

View reviewed changes

Comment thread tests/examples/hf_ptq/test_example_utils.py Outdated

realAsma marked this pull request as ready for review June 30, 2026 16:46

realAsma requested review from a team as code owners June 30, 2026 16:46

realAsma requested a review from cjluo-nv June 30, 2026 16:46

Reuse HF PTQ fake model config assertions

d768850

Signed-off-by: realAsma <akuriparambi@nvidia.com>

coderabbitai Bot reviewed Jun 30, 2026

View reviewed changes

Comment thread tests/examples/hf_ptq/test_example_utils.py

realAsma requested a review from kevalmorabia97 June 30, 2026 17:01

cjluo-nv reviewed Jun 30, 2026

View reviewed changes

kevalmorabia97 approved these changes Jun 30, 2026

View reviewed changes

Force DeciLM dtype fallback in HF PTQ test

22b37ac

Signed-off-by: realAsma <akuriparambi@nvidia.com>

meenchen reviewed Jun 30, 2026

View reviewed changes

Resolve HF PTQ PR merge conflict

11bae59

Signed-off-by: realAsma <akuriparambi@nvidia.com>

meenchen approved these changes Jun 30, 2026

View reviewed changes

Merge remote-tracking branch 'origin/main' into beebot/pr1857-conflic…

7462b8f

…t-resolve Signed-off-by: realAsma <akuriparambi@nvidia.com> # Conflicts: # examples/hf_ptq/example_utils.py # tests/examples/hf_ptq/test_example_utils.py

realAsma enabled auto-merge (squash) June 30, 2026 18:31

realAsma commented Jun 30, 2026

View reviewed changes

realAsma disabled auto-merge June 30, 2026 19:04

realAsma enabled auto-merge (squash) June 30, 2026 19:05

realAsma merged commit d70c48c into main Jun 30, 2026
46 checks passed

realAsma deleted the asma/nvbug-6359821 branch June 30, 2026 19:12

realAsma mentioned this pull request Jun 30, 2026

Refine DeciLM dtype handling in HF PTQ #1869

Merged

cjluo-nv reviewed Jun 30, 2026

View reviewed changes

kevalmorabia97 mentioned this pull request Jul 1, 2026

[Cherry-pick] PRs #1801 #1808 #1629 #1627 #1824 #1826 #1830 #1760 #1831 #1858 #1839 #1857 #1869 #1880

Merged

kevalmorabia97 added the cherry-pick-done Added by bot once PR is cherry-picked to the release branch label Jul 1, 2026

Uh oh!

Conversation

realAsma commented Jun 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Validation

Summary by CodeRabbit

Uh oh!

copy-pr-bot Bot commented Jun 29, 2026

Uh oh!

coderabbitai Bot commented Jun 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

❌ Failed checks (1 warning)

Uh oh!

realAsma left a comment

Choose a reason for hiding this comment

Uh oh!

realAsma commented Jun 29, 2026

Uh oh!

realAsma commented Jun 29, 2026

Uh oh!

realAsma commented Jun 30, 2026

Uh oh!

realAsma commented Jun 30, 2026

Uh oh!

Uh oh!

Uh oh!

realAsma left a comment

Choose a reason for hiding this comment

Uh oh!

realAsma commented Jun 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Llama Nemotron3 end-to-end export validation

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

codecov Bot commented Jun 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

cjluo-nv left a comment

Choose a reason for hiding this comment

Uh oh!

meenchen Jun 30, 2026

Choose a reason for hiding this comment

Uh oh!

realAsma Jun 30, 2026

Choose a reason for hiding this comment

Uh oh!

realAsma Jun 30, 2026

Choose a reason for hiding this comment

Uh oh!

meenchen left a comment

Choose a reason for hiding this comment

Uh oh!

realAsma Jun 30, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

github-actions Bot commented Jun 30, 2026

Uh oh!

cjluo-nv left a comment

Choose a reason for hiding this comment

Uh oh!

cjluo-nv left a comment

Choose a reason for hiding this comment

Uh oh!

realAsma commented Jul 1, 2026

Uh oh!

Reviewers

Assignees

Labels

realAsma commented Jun 29, 2026 •

edited

Loading

coderabbitai Bot commented Jun 29, 2026 •

edited

Loading

realAsma commented Jun 30, 2026 •

edited

Loading

codecov Bot commented Jun 30, 2026 •

edited

Loading