Skip to content

Support INT block scale learning#1795

Draft
realAsma wants to merge 3 commits into
asma/laq-algorithmfrom
asma/laq-int3-int2-scale-learning
Draft

Support INT block scale learning#1795
realAsma wants to merge 3 commits into
asma/laq-algorithmfrom
asma/laq-int3-int2-scale-learning

Conversation

@realAsma

@realAsma realAsma commented Jun 22, 2026

Copy link
Copy Markdown
Contributor

What does this PR do?

Type of change: new feature, new tests

Adds minimal INT block scale-learning support for the LAQ investigation branch:

  • Converts eligible static INT block quantizers to StaticBlockScaleQuantizer after max, mse, or local_hessian scale initialization so LAQ can start from non-max initializers.
  • Adds a fake dynamic INT block quantization path for weight-only dynamic max scale baselines with integer num_bits and block_sizes: {"type": "dynamic"}.
  • Extends LAQ CPU unit coverage for INT3/INT2 static block max/mse initialization, frozen/original/dual/tied amax variants, and dynamic max weight-only forward.

Usage

quant_cfg = {
    "quant_cfg": [
        {"quantizer_name": "*", "enable": False},
        {
            "quantizer_name": "*weight_quantizer",
            "enable": True,
            "cfg": {"num_bits": 3, "block_sizes": {-1: 16, "type": "static"}},
        },
    ],
    "algorithm": {
        "method": "laq",
        "learnable_amax": ["pre", "post"],
        "tied_amax": False,
        "scale_algorithm": {"method": "mse"},
    },
}

Dynamic max scale baseline:

quant_cfg = {
    "quant_cfg": [
        {"quantizer_name": "*", "enable": False},
        {
            "quantizer_name": "*weight_quantizer",
            "enable": True,
            "cfg": {"num_bits": 2, "block_sizes": {-1: 16, "type": "dynamic"}},
        },
    ],
    "algorithm": "max",
}

QAD experiment plan

The QAD experiment plan is intentionally kept out of the PR description and
shared separately with the owner for review before any QAD jobs are launched.

Testing

  • python_pwd -m ruff check modelopt/torch/quantization/tensor_quant.py modelopt/torch/quantization/nn/modules/tensor_quantizer.py tests/unit/torch/quantization/test_laq.py
  • pytest_pwd tests/unit/torch/quantization/test_laq.py -q
  • pre-commit run --files modelopt/torch/quantization/tensor_quant.py modelopt/torch/quantization/nn/modules/tensor_quantizer.py tests/unit/torch/quantization/test_laq.py
  • pytest_pwd tests/unit/torch/quantization/test_laq.py -q after pre-commit

Before your PR is "Ready for review"

  • Is this change backward compatible?: Yes
  • If you copied code from any other sources or added a new PIP dependency, did you follow guidance in CONTRIBUTING.md: N/A
  • Did you write any new necessary tests?: Yes
  • Did you update Changelog?: N/A
  • Did you get Claude approval on this PR?: N/A, draft PR for experiment-plan review

Additional Information

Draft PR targeting the LAQ branch for owner review before launching QAD jobs.

Signed-off-by: realAsma <akuriparambi@nvidia.com>
@copy-pr-bot

copy-pr-bot Bot commented Jun 22, 2026

Copy link
Copy Markdown

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@coderabbitai

coderabbitai Bot commented Jun 22, 2026

Copy link
Copy Markdown
Contributor

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: e8549e43-2005-4bee-a860-0d824d1dcd9f

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch asma/laq-int3-int2-scale-learning

Comment @coderabbitai help to get the list of available commands.

@realAsma

Copy link
Copy Markdown
Contributor Author

Keep tied Dual LSQ as a second-pass reducer if independent Dual LSQ is unstable or too expensive.

what does this mean? Not clear to me.

@realAsma

Copy link
Copy Markdown
Contributor Author

Train CE loss.
Eval CE loss.
Eval KL/KD loss when teacher logits are available. -> fix QAD uses KL div training loss.

@realAsma

Copy link
Copy Markdown
Contributor Author

Export compatibility status for static INT block variants.

Ignore this please

Comment thread modelopt/torch/quantization/nn/modules/tensor_quantizer.py Outdated
@realAsma

Copy link
Copy Markdown
Contributor Author

🤖 Bot comment.

Addressed the QAD-plan comments in the PR body:

  • Removed the unclear tied-Dual-LSQ fallback wording from the first grid.
  • Updated metrics so QAD training KL/KD loss is the primary training objective, with train CE only called out if logged separately.
  • Removed export compatibility from the reviewed metrics list as requested.

Plan comments addressed: #1795 (comment), #1795 (comment), #1795 (comment)

Signed-off-by: realAsma <akuriparambi@nvidia.com>
@realAsma realAsma force-pushed the asma/laq-int3-int2-scale-learning branch from ec72e81 to 4f44252 Compare June 22, 2026 23:43
Signed-off-by: realAsma <akuriparambi@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant