MiniMax-M3 mixed MXFP8-base + NVFP4-experts PTQ export by chadvoegele · Pull Request #1806 · NVIDIA/Model-Optimizer

chadvoegele · 2026-06-23T19:43:14Z

What does this PR do?

Type of change: New example + bug fix

Adds an hf_ptq.py-driven pipeline to export MiniMax-M3 as a mixed-precision checkpoint (vendor MXFP8 base + NVFP4 routed experts), plus the detection fix needed to quantize MiniMax-M3's fused experts at all.

Commits

hf_ptq: factor post-parse normalization into prepare_args() — extract the dataset/calib_size splitting and --cast_mxfp4_to_nvfp4 / --specdec_offline_dataset validation out of __main__ into a reusable prepare_args(), and let parse_args(argv) take an explicit argv. CLI behavior unchanged (main(prepare_args(parse_args()))); lets the wrapper drive hf_ptq in-process.
minimax_m3: hf_ptq-driven mixed MXFP8-base + NVFP4-experts exporter — new examples/minimax_m3/hf_ptq_mixed_mxfp8_nvfp4.py. Drives examples/llm_ptq/hf_ptq.py to quantize routed experts to NVFP4 from the BF16 source, then merges them onto the vendor MXFP8 base (non-expert tensors pass through unchanged) into a ModelOpt MIXED_PRECISION HF checkpoint. The vendor config.json is preserved with only its quantization_config replaced; routed-expert input_scale is forced to 1.0 by default. Unrecognized args are forwarded to hf_ptq.py; the 5 it controls are rejected.
fix(quantization): detect fused MoE experts without act_fn (MiniMax-M3) — _fused_experts_wrapper_class no longer requires an act_fn attribute. Modules like MiniMaxM3VLExperts apply a custom gated activation between the two F.linear calls instead of exposing act_fn, so they were silently skipped (routed experts left unquantized; HF export raised NotImplementedError). _QuantFusedExperts is activation-agnostic, so the requirement was unnecessary. Orthogonal to the non-gated NemotronH support added in [OMNIML-5003] Support non-gated fused MoE experts (NemotronH) in HF PTQ #1756. Original fix by @zhiyuc.

Testing

tests/unit/torch/quantization/plugins/test_fused_experts.py — 43 passed (incl. the flipped test_module_missing_act_fn_still_detected).

🤖 Generated with Claude Code

Summary by CodeRabbit

New Features
- Added example script for exporting MiniMax-M3 models with mixed-precision quantization (MXFP8 base + NVFP4 routed experts).
Bug Fixes
- Fixed fused MoE expert auto-detection to work without requiring an act_fn attribute, enabling proper quantization and export of modules with internally-applied gated activations.

Extract the dataset/calib_size splitting and --cast_mxfp4_to_nvfp4 / --specdec_offline_dataset validation out of __main__ into a reusable prepare_args(), and let parse_args(argv) take an explicit argv. CLI behavior is unchanged (main(prepare_args(parse_args()))); this lets the MiniMax-M3 mixed-export wrapper drive hf_ptq in-process. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Wrapper that drives examples/llm_ptq/hf_ptq.py to produce the NVFP4 routed-expert export, then merges it onto the vendor MXFP8 base into a ModelOpt MIXED_PRECISION checkpoint: non-expert tensors pass through unchanged from the vendor MXFP8 ckpt, routed experts come from the NVFP4 export (input_scale forced to 1.0 by default), and the vendor config.json is preserved with only its quantization_config replaced. Unrecognized args are forwarded to hf_ptq; the 5 it controls are rejected. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Re-applies Zhiyu Cheng's act_fn fix (d9bdf292c) onto main's refactored _fused_experts_wrapper_class. main's OMNIML-5003 (#1756) added non-gated NemotronH support but kept the act_fn requirement, so act_fn-less fused experts (e.g. MiniMaxM3VLExperts, which apply a custom gated activation between the two F.linear calls) were still skipped -- routed experts stayed unquantized and HF export raised NotImplementedError. _QuantFusedExperts is activation-agnostic (it only intercepts the two F.linear calls), so drop the act_fn requirement from the detection guard. Enables NVFP4/FP8 PTQ + export for MiniMax-M2 / MiniMax-M3. Co-Authored-By: Zhiyu Cheng <zhiyuc@nvidia.com> Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Adds a 0.46 New Features entry for examples/minimax_m3/hf_ptq_mixed_mxfp8_nvfp4.py (added in a047288). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

coderabbitai · 2026-06-23T19:43:30Z

📝 Walkthrough

Walkthrough

Removes the act_fn attribute requirement from fused MoE expert auto-detection in huggingface.py. Refactors hf_ptq.py to expose parse_args(argv) and a new prepare_args() for programmatic invocation. Adds examples/minimax_m3/hf_ptq_mixed_mxfp8_nvfp4.py, a new wrapper that produces a mixed MXFP8/NVFP4 MiniMax-M3 checkpoint.

Changes

Fused-expert detection fix

Layer / File(s)	Summary
Remove `act_fn` guard from fused-expert detection `modelopt/torch/quantization/plugins/huggingface.py`, `tests/unit/torch/quantization/plugins/test_fused_experts.py`	Drops `act_fn` from the `_fused_experts_wrapper_class` validation guard and updates its docstring; flips the corresponding unit test assertion from `False` to `True` for modules without `act_fn`.

MiniMax-M3 mixed MXFP8/NVFP4 checkpoint export

Layer / File(s)	Summary
`hf_ptq.py` programmatic API `examples/llm_ptq/hf_ptq.py`	`parse_args` gains an optional `argv` parameter; post-parse normalization and validation (CSV splitting, `export_fmt` deprecation, `cast_mxfp4_to_nvfp4` constraints) is extracted into a new `prepare_args` function; `__main__` calls `main(prepare_args(parse_args()))`.
Wrapper CLI, passthrough args, and argv construction `examples/minimax_m3/hf_ptq_mixed_mxfp8_nvfp4.py`	Module docstring, `sys.path` adjustment, `hf_ptq` import, regex expert-key constants, and `_has_option`/`_validate_passthrough_args`/`_hf_ptq_argv` helpers that build a controlled NVFP4 command line while forwarding unrecognized args; `parse_args()` returns `(args, passthrough)`.
Index loading, expert classification, and quant config builders `examples/minimax_m3/hf_ptq_mixed_mxfp8_nvfp4.py`	`_load_index`, `_is_routed_expert_tensor`, `_selected_layers`, `_build_quantized_layers`, and `_build_quant_config` identify routed-expert tensors and generate the per-module MXFP8/NVFP4 quantization configuration map.
Shard copying, config writing, and orchestration `examples/minimax_m3/hf_ptq_mixed_mxfp8_nvfp4.py`	`_copy_experts_from_nvfp4_export` writes NVFP4 expert shards and optionally forces `input_scale` to 1.0; `_copy_mxfp8_bf16_from_base` copies non-expert tensors from vendor MXFP8 shards; `_write_mixed_config` emits `config.json` and `hf_quant_config.json`; `_export_mixed_mxfp8_nvfp4` orchestrates all steps and copies ancillary files.
`main()` entrypoint and CHANGELOG `examples/minimax_m3/hf_ptq_mixed_mxfp8_nvfp4.py`, `CHANGELOG.rst`	`main()` chains the full pipeline: computes intermediate NVFP4 path, guards against existing output, runs `hf_ptq`, then runs the mixed export. CHANGELOG documents both the new example and the `act_fn` bug fix.

Sequence Diagram

sequenceDiagram
  participant CLI as User / CLI
  participant wrapper as hf_ptq_mixed_mxfp8_nvfp4.main()
  participant hf_ptq as hf_ptq.main()
  participant nvfp4_copy as _copy_experts_from_nvfp4_export
  participant mxfp8_copy as _copy_mxfp8_bf16_from_base
  participant cfg_writer as _write_mixed_config

  CLI->>wrapper: parse_args() → (args, passthrough)
  wrapper->>hf_ptq: parse_args(argv) → prepare_args() → main()
  Note over hf_ptq: Writes intermediate NVFP4 safetensors export
  wrapper->>nvfp4_copy: read NVFP4 index, write per-layer expert shards, patch input_scale
  nvfp4_copy-->>wrapper: updated weight_map + expert module names
  wrapper->>mxfp8_copy: read vendor MXFP8 shards, write base shards
  mxfp8_copy-->>wrapper: updated weight_map
  wrapper->>cfg_writer: compute layer counts, write config.json + hf_quant_config.json
  wrapper->>wrapper: write model.safetensors.index.json, copy ancillary files

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Possibly related PRs

NVIDIA/Model-Optimizer#1756: Directly related — both PRs modify _fused_experts_wrapper_class in huggingface.py (and its tests) to expand fused-expert layout recognition; this PR removes the act_fn check while that PR added up_proj-only fused-expert support.

Suggested reviewers

ChenhanYu
sychen52
cjluo-nv
kevalmorabia97
Edwardf0t1

🚥 Pre-merge checks | ✅ 5 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 8.70% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly and specifically describes the main change: introducing MiniMax-M3 mixed-precision PTQ export with MXFP8 base and NVFP4 experts, which aligns with the primary purpose of this PR.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Security Anti-Patterns	✅ Passed	No security anti-patterns from SECURITY.md found: no unsafe torch.load/numpy.load, no hardcoded trust_remote_code, no eval/exec on external input, no # nosec comments, no new non-permissive depende...

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch cvoegele/minimax-m3-mixed-mxfp8-nvfp4

_{Comment @coderabbitai help to get the list of available commands.}

github-actions · 2026-06-23T19:46:59Z

PR Preview Action v1.8.1
🚀 View preview at https://NVIDIA.github.io/Model-Optimizer/pr-preview/pr-1806/
Built to branch `gh-pages` at 2026-06-23 19:46 UTC. Preview will be ready when the GitHub Pages deployment is complete.

coderabbitai

Warning

CodeRabbit couldn't request changes on this pull request because it doesn't have sufficient GitHub permissions.

Please grant CodeRabbit Pull requests: Read and write permission and re-run the review.

👉 Steps to fix this

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@examples/llm_ptq/hf_ptq.py`:
- Around line 1506-1508: The prepare_args() function warns that export_fmt is
being forced to "hf", but it does not actually modify args.export_fmt to enforce
this behavior. Add a statement after the warning that sets args.export_fmt =
"hf" to match the declared behavior in the warning message and prevent non-hf
values from propagating to main().

In `@examples/minimax_m3/hf_ptq_mixed_mxfp8_nvfp4.py`:
- Line 177: After the _selected_layers function call assigns the result to the
layers variable, add a validation check to ensure layers is not empty. If layers
is empty, raise an exception or exit the script with a clear error message
indicating that no routed-expert layers were found or matched. This prevents the
script from continuing with an incomplete checkpoint that would skip expert
weights during the _copy_mxfp8_bf16_from_base execution.
- Around line 167-170: The imports for torch, safe_open, and save_file are
currently placed inside a function (lines 167-170) instead of at the module
scope, which delays import failures until runtime and violates the repo's import
guidelines. Move these three imports to the top of the file with other
module-level imports. Additionally, address the same issue at lines 217-218
mentioned in the comment by moving any function-local imports there to the
module scope as well.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 19326279-46e4-44b2-9c40-34c1ca2e5631

📥 Commits

Reviewing files that changed from the base of the PR and between 37dbbda and ea1ea09.

📒 Files selected for processing (5)

CHANGELOG.rst
examples/llm_ptq/hf_ptq.py
examples/minimax_m3/hf_ptq_mixed_mxfp8_nvfp4.py
modelopt/torch/quantization/plugins/huggingface.py
tests/unit/torch/quantization/plugins/test_fused_experts.py

coderabbitai · 2026-06-23T19:49:46Z

+    if args.export_fmt != "hf":
+        warnings.warn("Deprecated. --export_fmt forced to hf.")
+


🎯 Functional Correctness | 🟠 Major | ⚡ Quick win

prepare_args() says --export_fmt is forced, but it never actually forces it

The warning text states forced behavior, but args.export_fmt is left unchanged. That can propagate a non-hf value into main().

Suggested fix

def prepare_args(args: argparse.Namespace) -> argparse.Namespace: """Apply the same post-parse normalization used by the CLI entrypoint.""" if args.export_fmt != "hf": warnings.warn("Deprecated. --export_fmt forced to hf.") + args.export_fmt = "hf"

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@examples/llm_ptq/hf_ptq.py` around lines 1506 - 1508, The prepare_args() function warns that export_fmt is being forced to "hf", but it does not actually modify args.export_fmt to enforce this behavior. Add a statement after the warning that sets args.export_fmt = "hf" to match the declared behavior in the warning message and prevent non-hf values from propagating to main().

coderabbitai · 2026-06-23T19:49:46Z

+    import torch
+    from safetensors import safe_open
+    from safetensors.torch import save_file
+


📐 Maintainability & Code Quality | 🟠 Major | ⚡ Quick win

Move these imports to module scope

These function-local imports delay import failures until runtime and violate the repo import rule for Python files.

Suggested fix

import argparse import json import re import shutil import sys from collections import defaultdict from pathlib import Path from typing import Any +import torch +from safetensors import safe_open +from safetensors.torch import save_file + _THIS_DIR = Path(__file__).resolve().parent _LLM_PTQ_DIR = _THIS_DIR.parent / "llm_ptq" @@ def _copy_experts_from_nvfp4_export( nvfp4: Path, dst: Path, layers_arg: str | None, force_input_scale_one: bool, ) -> tuple[dict[str, str], list[str]]: - import torch - from safetensors import safe_open - from safetensors.torch import save_file @@ def _copy_mxfp8_bf16_from_base( mxfp8: Path, dst: Path, mxfp8_map: dict[str, str], new_index: dict[str, str] ) -> None: - from safetensors import safe_open - from safetensors.torch import save_file

As per coding guidelines, “Keep imports at the top of the file; place imports inside functions only when necessary … with explicit justification.”

Also applies to: 217-218

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@examples/minimax_m3/hf_ptq_mixed_mxfp8_nvfp4.py` around lines 167 - 170, The imports for torch, safe_open, and save_file are currently placed inside a function (lines 167-170) instead of at the module scope, which delays import failures until runtime and violates the repo's import guidelines. Move these three imports to the top of the file with other module-level imports. Additionally, address the same issue at lines 217-218 mentioned in the comment by moving any function-local imports there to the module scope as well.

Source: Coding guidelines

coderabbitai · 2026-06-23T19:49:46Z

+        match = _EXPERT_TENSOR_RE.match(key)
+        if match:
+            layer_keys[int(match.group("L"))].append(key)
+    layers = _selected_layers(layer_keys, layers_arg)


🗄️ Data Integrity & Integration | 🟠 Major | ⚡ Quick win

Fail fast when no routed-expert layers are selected/matched

If layers is empty, the script still continues, and _copy_mxfp8_bf16_from_base() skips all routed-expert tensors. That can produce an incomplete checkpoint missing expert weights.

Suggested fix

layers = _selected_layers(layer_keys, layers_arg) + if not layers: + raise ValueError( + "No routed-expert layers were selected/matched; refusing to export a mixed checkpoint " + "without routed-expert tensors." + ) print(f"[mixed] {len(layers)} MoE layers; experts NVFP4<-{nvfp4}, base MXFP8<-vendor")

Also applies to: 225-226

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@examples/minimax_m3/hf_ptq_mixed_mxfp8_nvfp4.py` at line 177, After the _selected_layers function call assigns the result to the layers variable, add a validation check to ensure layers is not empty. If layers is empty, raise an exception or exit the script with a clear error message indicating that no routed-expert layers were found or matched. This prevents the script from continuing with an incomplete checkpoint that would skip expert weights during the _copy_mxfp8_bf16_from_base execution.

codecov · 2026-06-23T19:56:44Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 76.73%. Comparing base (c81210f) to head (ea1ea09).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files

@@             Coverage Diff             @@
##             main    #1806       +/-   ##
===========================================
+ Coverage   62.89%   76.73%   +13.84%     
===========================================
  Files         511      511               
  Lines       56683    56683               
===========================================
+ Hits        35651    43498     +7847     
+ Misses      21032    13185     -7847

Flag	Coverage Δ
examples	`42.11% <100.00%> (+4.09%)`	⬆️
gpu	`57.87% <100.00%> (+37.30%)`	⬆️
regression	`14.72% <0.00%> (+0.05%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Edwardf0t1 · 2026-06-23T21:24:00Z

Is it possible to make this more general, not for m3 only?

chadvoegele and others added 4 commits June 23, 2026 14:06

docs(changelog): note minimax_m3 mixed MXFP8/NVFP4 export script

ea1ea09

Adds a 0.46 New Features entry for examples/minimax_m3/hf_ptq_mixed_mxfp8_nvfp4.py (added in a047288). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

chadvoegele requested review from a team as code owners June 23, 2026 19:43

chadvoegele requested review from kevalmorabia97 and sugunav14 June 23, 2026 19:43

coderabbitai Bot reviewed Jun 23, 2026

View reviewed changes

Edwardf0t1 reviewed Jun 23, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

MiniMax-M3 mixed MXFP8-base + NVFP4-experts PTQ export#1806

MiniMax-M3 mixed MXFP8-base + NVFP4-experts PTQ export#1806
chadvoegele wants to merge 4 commits into
mainfrom
cvoegele/minimax-m3-mixed-mxfp8-nvfp4

chadvoegele commented Jun 23, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Jun 23, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Possibly related PRs

Suggested reviewers

❌ Failed checks (1 warning)

Uh oh!

github-actions Bot commented Jun 23, 2026

Built to branch `gh-pages` at 2026-06-23 19:46 UTC.
Preview will be ready when the GitHub Pages deployment is complete.

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot Jun 23, 2026

Uh oh!

coderabbitai Bot Jun 23, 2026

Uh oh!

coderabbitai Bot Jun 23, 2026

Uh oh!

codecov Bot commented Jun 23, 2026 •

edited

Loading

Uh oh!

Edwardf0t1 Jun 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		if args.export_fmt != "hf":
		warnings.warn("Deprecated. --export_fmt forced to hf.")

Uh oh!

Conversation

chadvoegele commented Jun 23, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Commits

Testing

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Possibly related PRs

Suggested reviewers

❌ Failed checks (1 warning)

Uh oh!

github-actions Bot commented Jun 23, 2026

Built to branch gh-pages at 2026-06-23 19:46 UTC. Preview will be ready when the GitHub Pages deployment is complete.

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 23, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 23, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 23, 2026

Choose a reason for hiding this comment

Uh oh!

codecov Bot commented Jun 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Edwardf0t1 Jun 23, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

chadvoegele commented Jun 23, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 23, 2026 •

edited

Loading

Built to branch `gh-pages` at 2026-06-23 19:46 UTC.
Preview will be ready when the GitHub Pages deployment is complete.

codecov Bot commented Jun 23, 2026 •

edited

Loading