Fix AutoQuantize causal LM score scaling#1810
Conversation
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
|
Important Review skippedDraft detected. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Enterprise Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
realAsma
left a comment
There was a problem hiding this comment.
Can you add a unit test for tinyqwen tranformers model to make sure that after your fix bs = 1 and say bs = 2 produce similar scores?
Create a 2 sample input calibration dataset for this; sample 1 length: 16, sample2 length: 32 (so basically = 2 means padding is activated).
Signed-off-by: realAsma <akuriparambi@nvidia.com>
ccdc827 to
71257f9
Compare
Addressed the TinyQwen/Tiny Qwen3 regression test request in What changed:
Validation run locally:
|
Signed-off-by: realAsma <akuriparambi@nvidia.com>
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #1810 +/- ##
==========================================
- Coverage 73.22% 68.77% -4.46%
==========================================
Files 478 511 +33
Lines 52421 64138 +11717
==========================================
+ Hits 38387 44112 +5725
- Misses 14034 20026 +5992
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
Signed-off-by: realAsma <akuriparambi@nvidia.com>
Summary
output.lossfallback for non-causal or non-dict outputsMotivation
A Qwen3 8B bs=1 vs bs=8 AutoQuantize run over the same 128 scoring samples showed nearly identical normalized layer sensitivity shapes but very different absolute totals:
0.5883269825, bs80.008391587796, bs1/bs870.109x0.05273269751, bs80.0007778984065, bs1/bs867.789x0.9619, FP80.9864The existing HF callbacks used Transformers
output.loss, which is mean-reduced for causal LM labels. AutoQuantize squares gradients and accumulates scores, so grouping samples into larger batches changes the absolute score scale. A sum-reduced next-token loss makes the score additive over valid labels and stable across batch grouping.Validation
python_pwd -m py_compile examples/llm_ptq/hf_ptq.py examples/llm_eval/quantization_utils.py examples/llm_autodeploy/run_auto_quantize.py_causal_lm_sum_losshelpers and verified the summed loss is differentiable and identical for one combined batch vs split single-sample batches, including-100ignored labelsgit diff --checkpre-commit run --files examples/llm_ptq/hf_ptq.py examples/llm_eval/quantization_utils.py examples/llm_autodeploy/run_auto_quantize.pypasses ruff/format/bandit/license checks;mypycurrently fails on existingaccelerateattribute errors inmodelopt/torch/quantization/plugins/{accelerate,huggingface}.pyoutside this diff