Fix AutoQuantize causal LM score scaling by realAsma · Pull Request #1810 · NVIDIA/Model-Optimizer

realAsma · 2026-06-23T22:06:07Z

Summary

use a sum-reduced shifted-token causal LM loss for HF AutoQuantize gradient scoring when logits and labels are available
preserve the existing output.loss fallback for non-causal or non-dict outputs
apply the same correction to the HF PTQ, eval, and autodeploy AutoQuantize entrypoints

Motivation

A Qwen3 8B bs=1 vs bs=8 AutoQuantize run over the same 128 scoring samples showed nearly identical normalized layer sensitivity shapes but very different absolute totals:

NVFP4/W4A4 raw score total: bs1 0.5883269825, bs8 0.008391587796, bs1/bs8 70.109x
FP8 raw score total: bs1 0.05273269751, bs8 0.0007778984065, bs1/bs8 67.789x
Layer Pearson correlation after per-run normalization: NVFP4/W4A4 0.9619, FP8 0.9864

The existing HF callbacks used Transformers output.loss, which is mean-reduced for causal LM labels. AutoQuantize squares gradients and accumulates scores, so grouping samples into larger batches changes the absolute score scale. A sum-reduced next-token loss makes the score additive over valid labels and stable across batch grouping.

Validation

python_pwd -m py_compile examples/llm_ptq/hf_ptq.py examples/llm_eval/quantization_utils.py examples/llm_autodeploy/run_auto_quantize.py
targeted CPU probe extracted all three _causal_lm_sum_loss helpers and verified the summed loss is differentiable and identical for one combined batch vs split single-sample batches, including -100 ignored labels
git diff --check
pre-commit run --files examples/llm_ptq/hf_ptq.py examples/llm_eval/quantization_utils.py examples/llm_autodeploy/run_auto_quantize.py passes ruff/format/bandit/license checks; mypy currently fails on existing accelerate attribute errors in modelopt/torch/quantization/plugins/{accelerate,huggingface}.py outside this diff

copy-pr-bot · 2026-06-23T22:06:11Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

coderabbitai · 2026-06-23T22:06:16Z

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 1449bee6-d600-4394-af59-83391d42fe5f

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch akuriparambi/fix-aq-score-scaling-20260623

_{Comment @coderabbitai help to get the list of available commands.}

realAsma

Can you add a unit test for tinyqwen tranformers model to make sure that after your fix bs = 1 and say bs = 2 produce similar scores?

Create a 2 sample input calibration dataset for this; sample 1 length: 16, sample2 length: 32 (so basically = 2 means padding is activated).

Signed-off-by: realAsma <akuriparambi@nvidia.com>

realAsma · 2026-06-23T22:33:24Z

🤖 Bot comment.

Addressed the TinyQwen/Tiny Qwen3 regression test request in 71257f9ffa.

What changed:

Added test_autoquantize_huggingface_scores_are_batch_size_invariant_with_padding in tests/unit/torch/quantization/plugins/test_huggingface.py.
The test uses two calibration samples with lengths 16 and 32.
It compares the same samples as bs=1 singleton batches versus one bs=2 padded batch and asserts AutoQuantize candidate scores match within tight tolerance.

Validation run locally:

git diff --check
python_pwd PYTHONPATH=tests -m py_compile tests/unit/torch/quantization/plugins/test_huggingface.py
ruff format --check tests/unit/torch/quantization/plugins/test_huggingface.py
ruff check tests/unit/torch/quantization/plugins/test_huggingface.py
pytest_pwd PYTHONPATH=tests tests/unit/torch/quantization/plugins/test_huggingface.py::test_autoquantize_huggingface_scores_are_batch_size_invariant_with_padding -q

Signed-off-by: realAsma <akuriparambi@nvidia.com>

codecov · 2026-06-23T22:47:54Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 68.77%. Comparing base (54fb87e) to head (2a2163e).
⚠️ Report is 130 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1810      +/-   ##
==========================================
- Coverage   73.22%   68.77%   -4.46%     
==========================================
  Files         478      511      +33     
  Lines       52421    64138   +11717     
==========================================
+ Hits        38387    44112    +5725     
- Misses      14034    20026    +5992

Flag	Coverage Δ
unit	`54.66% <ø> (+1.04%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Signed-off-by: realAsma <akuriparambi@nvidia.com>

realAsma commented Jun 23, 2026

View reviewed changes

Comment thread examples/llm_ptq/hf_ptq.py

Fix AutoQuantize causal LM score scaling

71257f9

Signed-off-by: realAsma <akuriparambi@nvidia.com>

realAsma force-pushed the akuriparambi/fix-aq-score-scaling-20260623 branch from ccdc827 to 71257f9 Compare June 23, 2026 22:33

coderabbitai Bot approved these changes Jun 23, 2026

View reviewed changes

Refine AutoQuantize batch-size regression test

ae34e30

Signed-off-by: realAsma <akuriparambi@nvidia.com>

Use HF num_items_in_batch for AQ sum loss

2a2163e

Signed-off-by: realAsma <akuriparambi@nvidia.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix AutoQuantize causal LM score scaling#1810

Fix AutoQuantize causal LM score scaling#1810
realAsma wants to merge 3 commits into
mainfrom
akuriparambi/fix-aq-score-scaling-20260623

realAsma commented Jun 23, 2026

Uh oh!

copy-pr-bot Bot commented Jun 23, 2026

Uh oh!

coderabbitai Bot commented Jun 23, 2026 •

edited

Loading

Review skipped

Uh oh!

realAsma left a comment

Uh oh!

Uh oh!

realAsma commented Jun 23, 2026

Uh oh!

codecov Bot commented Jun 23, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

realAsma commented Jun 23, 2026

Summary

Motivation

Validation

Uh oh!

copy-pr-bot Bot commented Jun 23, 2026

Uh oh!

coderabbitai Bot commented Jun 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Uh oh!

realAsma left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

realAsma commented Jun 23, 2026

Uh oh!

codecov Bot commented Jun 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

coderabbitai Bot commented Jun 23, 2026 •

edited

Loading

codecov Bot commented Jun 23, 2026 •

edited

Loading