Fix weight-only prequant layernorm export by meenchen · Pull Request #1825 · NVIDIA/Model-Optimizer

meenchen · 2026-06-25T17:57:55Z

What does this PR do?

Type of change: Bug fix

Fixes HF export for INT4 blockwise weight-only checkpoints. The export path previously used the coarse int4_awq format label to enter AWQ layernorm pre-quant fusion, even when the recipe had no input pre-quant scale state. The fusion gate now checks that all fused modules actually carry _pre_quant_scale before folding layernorm weights.

Usage

# Existing recipe path; no usage change.
python examples/llm_ptq/hf_ptq.py \
  --pyt_ckpt_path <hf-model> \
  --recipe general/ptq/int4_blockwise_weight_only \
  --export_path <output>

Testing

/Users/weimingc/miniconda3/envs/modelopt/bin/python -m pytest tests/unit/torch/export/test_unified_export_hf.py -q
/Users/weimingc/miniconda3/envs/modelopt/bin/python -m pytest tests/unit/torch/export -q
Remote GPU e2e smoke: hf_ptq.py with meta-llama/Llama-3.1-8B-Instruct, general/ptq/int4_blockwise_weight_only, --calib_size 64, and --skip_generate completed export and artifact validation.

Before your PR is "Ready for review"

Make sure you read and follow Contributor guidelines and your commits are signed (git commit -s -S).

Make sure you read and follow the Security Best Practices (e.g. avoiding hardcoded trust_remote_code=True, torch.load(..., weights_only=False), pickle, etc.).

Is this change backward compatible?: ✅
If you copied code from any other sources or added a new PIP dependency, did you follow guidance in CONTRIBUTING.md: N/A
Did you write any new necessary tests?: ✅
Did you update Changelog?: N/A
Did you get Claude approval on this PR?: N/A

Additional Information

The regression test covers the export preprocessing path for the weight-only recipe and verifies layernorm pre-quant fusion is skipped when _pre_quant_scale is absent.

Summary by CodeRabbit

Bug Fixes
- Improved quantized model handling so layernorm fusion only happens when all relevant inputs support the needed pre-quantization scale.
- Prevents unintended layernorm changes in INT4 weight-only models that do not include this scale, helping preserve model behavior during export.

Signed-off-by: weimingc <17592131+meenchen@users.noreply.github.com>

coderabbitai · 2026-06-25T17:58:16Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 4386b19c-4d7d-452c-b9dd-18c19b6fd7a1

📥 Commits

Reviewing files that changed from the base of the PR and between 64f355e and e08b913.

📒 Files selected for processing (2)

modelopt/torch/export/unified_export_hf.py
tests/unit/torch/export/test_unified_export_hf.py

📝 Walkthrough

Walkthrough

Adds a pre-quant-scale presence check to AWQ layernorm fusion in unified HF export and adds a unit test covering the path where that attribute is absent.

Changes

AWQ pre-quant-scale gating

Layer / File(s)	Summary
Fusion guard update `modelopt/torch/export/unified_export_hf.py`	Adds `_has_pre_quant_scale(module)` and uses it in `_fuse_shared_input_modules` so shared-input layernorm fusion only runs when every module in the group exposes `_pre_quant_scale`.
Resmooth skip test `tests/unit/torch/export/test_unified_export_hf.py`	Imports `SmallQKVModel` and `requantize_resmooth_fused_llm_layers`, then adds a test that quantizes a small QKV model, confirms `_pre_quant_scale` is missing, runs requantize-resmooth, and checks layernorm weights and fusion flags remain unchanged.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Suggested reviewers

vishalpandya1990
ynankani
kevalmorabia97

🚥 Pre-merge checks | ✅ 5 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 75.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly reflects the main change: fixing pre-quant layernorm export behavior for weight-only checkpoints.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Security Anti-Patterns	✅ Passed	Changed modelopt/example Python files only add a pre_quant_scale guard and test; no unsafe torch/numpy loads, trust_remote_code, eval/exec, nosec, or dependency changes.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch weimingc/omniml-5271

_{Comment @coderabbitai help to get the list of available commands.}

github-actions · 2026-06-25T18:02:12Z

PR Preview Action v1.8.1
🚀 View preview at https://NVIDIA.github.io/Model-Optimizer/pr-preview/pr-1825/
Built to branch `gh-pages` at 2026-06-25 18:01 UTC. Preview will be ready when the GitHub Pages deployment is complete.

codecov · 2026-06-25T18:07:18Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 74.37%. Comparing base (e19f793) to head (e08b913).
⚠️ Report is 2 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1825      +/-   ##
==========================================
- Coverage   77.36%   74.37%   -3.00%     
==========================================
  Files         513      513              
  Lines       56891    56894       +3     
==========================================
- Hits        44013    42313    -1700     
- Misses      12878    14581    +1703

Flag	Coverage Δ
examples	`35.62% <33.33%> (-6.60%)`	⬇️
gpu	`57.95% <100.00%> (-0.65%)`	⬇️
regression	`14.83% <33.33%> (+0.06%)`	⬆️
unit	`54.78% <100.00%> (+0.14%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

copy-pr-bot · 2026-06-25T18:12:04Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

Fix weight-only prequant layernorm export

e08b913

Signed-off-by: weimingc <17592131+meenchen@users.noreply.github.com>

meenchen requested review from a team as code owners June 25, 2026 17:57

meenchen requested a review from cjluo-nv June 25, 2026 17:57

meenchen requested a review from sychen52 June 25, 2026 17:58

meenchen self-assigned this Jun 25, 2026

coderabbitai Bot approved these changes Jun 25, 2026

View reviewed changes

meenchen marked this pull request as draft June 25, 2026 18:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix weight-only prequant layernorm export#1825

Fix weight-only prequant layernorm export#1825
meenchen wants to merge 1 commit into
mainfrom
weimingc/omniml-5271

meenchen commented Jun 25, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Jun 25, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Suggested reviewers

❌ Failed checks (1 warning)

Uh oh!

github-actions Bot commented Jun 25, 2026

Built to branch `gh-pages` at 2026-06-25 18:01 UTC.
Preview will be ready when the GitHub Pages deployment is complete.

Uh oh!

codecov Bot commented Jun 25, 2026 •

edited

Loading

Uh oh!

copy-pr-bot Bot commented Jun 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

meenchen commented Jun 25, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Usage

Testing

Before your PR is "Ready for review"

Additional Information

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Suggested reviewers

❌ Failed checks (1 warning)

Uh oh!

github-actions Bot commented Jun 25, 2026

Built to branch gh-pages at 2026-06-25 18:01 UTC. Preview will be ready when the GitHub Pages deployment is complete.

Uh oh!

codecov Bot commented Jun 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

copy-pr-bot Bot commented Jun 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

meenchen commented Jun 25, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 25, 2026 •

edited

Loading

Built to branch `gh-pages` at 2026-06-25 18:01 UTC.
Preview will be ready when the GitHub Pages deployment is complete.

codecov Bot commented Jun 25, 2026 •

edited

Loading