Skip to content

Fix weight-only prequant layernorm export#1825

Draft
meenchen wants to merge 1 commit into
mainfrom
weimingc/omniml-5271
Draft

Fix weight-only prequant layernorm export#1825
meenchen wants to merge 1 commit into
mainfrom
weimingc/omniml-5271

Conversation

@meenchen

@meenchen meenchen commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

What does this PR do?

Type of change: Bug fix

Fixes HF export for INT4 blockwise weight-only checkpoints. The export path previously used the coarse int4_awq format label to enter AWQ layernorm pre-quant fusion, even when the recipe had no input pre-quant scale state. The fusion gate now checks that all fused modules actually carry _pre_quant_scale before folding layernorm weights.

Usage

# Existing recipe path; no usage change.
python examples/llm_ptq/hf_ptq.py \
  --pyt_ckpt_path <hf-model> \
  --recipe general/ptq/int4_blockwise_weight_only \
  --export_path <output>

Testing

  • /Users/weimingc/miniconda3/envs/modelopt/bin/python -m pytest tests/unit/torch/export/test_unified_export_hf.py -q
  • /Users/weimingc/miniconda3/envs/modelopt/bin/python -m pytest tests/unit/torch/export -q
  • Remote GPU e2e smoke: hf_ptq.py with meta-llama/Llama-3.1-8B-Instruct, general/ptq/int4_blockwise_weight_only, --calib_size 64, and --skip_generate completed export and artifact validation.

Before your PR is "Ready for review"

Make sure you read and follow Contributor guidelines and your commits are signed (git commit -s -S).

Make sure you read and follow the Security Best Practices (e.g. avoiding hardcoded trust_remote_code=True, torch.load(..., weights_only=False), pickle, etc.).

  • Is this change backward compatible?: ✅
  • If you copied code from any other sources or added a new PIP dependency, did you follow guidance in CONTRIBUTING.md: N/A
  • Did you write any new necessary tests?: ✅
  • Did you update Changelog?: N/A
  • Did you get Claude approval on this PR?: N/A

Additional Information

The regression test covers the export preprocessing path for the weight-only recipe and verifies layernorm pre-quant fusion is skipped when _pre_quant_scale is absent.

Summary by CodeRabbit

  • Bug Fixes
    • Improved quantized model handling so layernorm fusion only happens when all relevant inputs support the needed pre-quantization scale.
    • Prevents unintended layernorm changes in INT4 weight-only models that do not include this scale, helping preserve model behavior during export.

Signed-off-by: weimingc <17592131+meenchen@users.noreply.github.com>
@meenchen meenchen requested review from a team as code owners June 25, 2026 17:57
@meenchen meenchen requested a review from cjluo-nv June 25, 2026 17:57
@coderabbitai

coderabbitai Bot commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 4386b19c-4d7d-452c-b9dd-18c19b6fd7a1

📥 Commits

Reviewing files that changed from the base of the PR and between 64f355e and e08b913.

📒 Files selected for processing (2)
  • modelopt/torch/export/unified_export_hf.py
  • tests/unit/torch/export/test_unified_export_hf.py

📝 Walkthrough

Walkthrough

Adds a pre-quant-scale presence check to AWQ layernorm fusion in unified HF export and adds a unit test covering the path where that attribute is absent.

Changes

AWQ pre-quant-scale gating

Layer / File(s) Summary
Fusion guard update
modelopt/torch/export/unified_export_hf.py
Adds _has_pre_quant_scale(module) and uses it in _fuse_shared_input_modules so shared-input layernorm fusion only runs when every module in the group exposes _pre_quant_scale.
Resmooth skip test
tests/unit/torch/export/test_unified_export_hf.py
Imports SmallQKVModel and requantize_resmooth_fused_llm_layers, then adds a test that quantizes a small QKV model, confirms _pre_quant_scale is missing, runs requantize-resmooth, and checks layernorm weights and fusion flags remain unchanged.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Suggested reviewers

  • vishalpandya1990
  • ynankani
  • kevalmorabia97
🚥 Pre-merge checks | ✅ 5 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 75.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly reflects the main change: fixing pre-quant layernorm export behavior for weight-only checkpoints.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Security Anti-Patterns ✅ Passed Changed modelopt/example Python files only add a pre_quant_scale guard and test; no unsafe torch/numpy loads, trust_remote_code, eval/exec, nosec, or dependency changes.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch weimingc/omniml-5271

Comment @coderabbitai help to get the list of available commands.

@meenchen meenchen requested a review from sychen52 June 25, 2026 17:58
@meenchen meenchen self-assigned this Jun 25, 2026
@github-actions

Copy link
Copy Markdown
Contributor
PR Preview Action v1.8.1

QR code for preview link

🚀 View preview at
https://NVIDIA.github.io/Model-Optimizer/pr-preview/pr-1825/

Built to branch gh-pages at 2026-06-25 18:01 UTC.
Preview will be ready when the GitHub Pages deployment is complete.

@codecov

codecov Bot commented Jun 25, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 74.37%. Comparing base (e19f793) to head (e08b913).
⚠️ Report is 2 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1825      +/-   ##
==========================================
- Coverage   77.36%   74.37%   -3.00%     
==========================================
  Files         513      513              
  Lines       56891    56894       +3     
==========================================
- Hits        44013    42313    -1700     
- Misses      12878    14581    +1703     
Flag Coverage Δ
examples 35.62% <33.33%> (-6.60%) ⬇️
gpu 57.95% <100.00%> (-0.65%) ⬇️
regression 14.83% <33.33%> (+0.06%) ⬆️
unit 54.78% <100.00%> (+0.14%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@meenchen meenchen marked this pull request as draft June 25, 2026 18:12
@copy-pr-bot

copy-pr-bot Bot commented Jun 25, 2026

Copy link
Copy Markdown

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant