fix(models): use bool sparse masks for sdpa by yuhezhang-ai · Pull Request #2624 · NVIDIA-NeMo/Automodel

yuhezhang-ai · 2026-06-17T16:05:20Z

Summary

Remove the MiniMax-M3 changes from this PR; MiniMax masking is handled in feat: CP support for MiniMax M3 #2551.
Use a boolean keep-mask for the DeepSeek-V3.2 SDPA sparse attention path instead of an additive sparse mask.
Keep the TE core_attention_bias path additive and unchanged.
Cover the shared DeepSeek-V3.2 path used by GLM-MoE-DSA.

Root Cause

DeepSeek-V3.2 built an additive sparse SDPA mask as fp32. That can hit the same class of backend-dependent SDPA issue when Q/K/V are bf16 and cuDNN SDPA is unavailable. Casting the additive mask to bf16 avoids the dtype mismatch, but #2551 shows that bf16 additive -inf masks can leak in fused SDPA kernels. A boolean SDPA mask avoids both concerns.

MiniMax-M3 is intentionally left untouched here because #2551 removes its additive float sparse mask entirely and owns #2617.

Tests

source work/runs/_shared/env.sh && uv run --no-sync pytest tests/unit_tests/models/deepseek_v32/test_dsv32_layers.py::TestBuildSparseMaskWithAttentionMask::test_build_sparse_mask_combines_with_attention_mask -q
source work/runs/_shared/env.sh && uv run --no-sync pytest tests/unit_tests/models/deepseek_v32/test_dsv32_layers.py::TestDeepseekV32MLASparseMask tests/unit_tests/models/deepseek_v32/test_dsv32_layers.py::TestBuildSparseMaskWithAttentionMask -q
source work/runs/_shared/env.sh && uv run --no-sync ruff format --check nemo_automodel/components/models/deepseek_v32/layers.py
source work/runs/_shared/env.sh && uv run --no-sync ruff check nemo_automodel/components/models/deepseek_v32/layers.py

copy-pr-bot · 2026-06-17T16:05:24Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

yuhezhang-ai · 2026-06-17T16:11:19Z

/ok to test caec37a

yuhezhang-ai · 2026-06-17T16:12:32Z

/ok to test 925e139

athitten · 2026-06-17T20:28:20Z

@yuhezhang-ai thank you for the PR! @jQizhang had reported #2617 with me on slack last night and I already had a fix in the CP PR for minimax: #2551. Sorry I dint get a chance to assign the issue to myself last night, mind reverting the minimax fix ? Minimax M3 removes the float additive bias entirely in my PR, since it was causing mask leakage in BF16. So now the problem reported in #2551 shouldn't happen. And we need that fix to have correct masking (wo leak).

Also minimax is not part of the upcoming release and we dont want minimax related change in the release branch (the model is not there at all). Should be good to merge other changes you have in this PR, except minimax. Thank you!

Signed-off-by: Yuhe Zhang <yuhez@nvidia.com>

yuhezhang-ai · 2026-06-17T21:33:09Z

@yuhezhang-ai thank you for the PR! @jQizhang had reported #2617 with me on slack last night and I already had a fix in the CP PR for minimax: #2551. Sorry I dint get a chance to assign the issue to myself last night, mind reverting the minimax fix ? Minimax M3 removes the float additive bias entirely in my PR, since it was causing mask leakage in BF16. So now the problem reported in #2551 shouldn't happen. And we need that fix to have correct masking (wo leak).

Also minimax is not part of the upcoming release and we dont want minimax related change in the release branch (the model is not there at all). Should be good to merge other changes you have in this PR, except minimax. Thank you!

@athitten Thanks for the context! I reverted the MiniMax-M3 changes from this PR. I also checked the bf16 additive-mask leakage point and updated the remaining DeepSeek-V3.2 SDPA path here to use a similar boolean keep-mask instead of casting the additive mask to bf16. TE still uses the additive core_attention_bias path.

yuhezhang-ai · 2026-06-18T02:32:32Z

/ok to test 670d303

yuhezhang-ai mentioned this pull request Jun 17, 2026

MiniMax-M3 sparse attention: SDPA crashes with "invalid dtype for bias" #2617

Open

yuhezhang-ai force-pushed the yuhez/fix/sparse-attn-mask-dtype branch from caec37a to 925e139 Compare June 17, 2026 16:11

copy-pr-bot Bot temporarily deployed to public June 17, 2026 16:13 Inactive

copy-pr-bot Bot temporarily deployed to test June 17, 2026 16:13 Inactive

copy-pr-bot Bot temporarily deployed to nemo-ci June 17, 2026 16:13 Inactive

copy-pr-bot Bot temporarily deployed to public June 17, 2026 16:15 Inactive

copy-pr-bot Bot temporarily deployed to public June 17, 2026 16:16 Inactive

copy-pr-bot Bot temporarily deployed to nemo-ci June 17, 2026 16:18 Inactive

yuhezhang-ai marked this pull request as ready for review June 17, 2026 16:21

yuhezhang-ai requested a review from a team as a code owner June 17, 2026 16:21

akoumpa added the r0.5.0 Auto-cherrypick to release branch. Apply before merge; cherrypick happens after merge. label Jun 17, 2026

fix(models): use bool sparse masks for sdpa

670d303

Signed-off-by: Yuhe Zhang <yuhez@nvidia.com>

yuhezhang-ai force-pushed the yuhez/fix/sparse-attn-mask-dtype branch from 925e139 to 670d303 Compare June 17, 2026 21:17

yuhezhang-ai changed the title ~~fix(models): match sparse attention mask dtype~~ fix(models): use bool sparse masks for sdpa Jun 17, 2026

copy-pr-bot Bot temporarily deployed to nemo-ci June 18, 2026 02:32 Inactive

copy-pr-bot Bot temporarily deployed to test June 18, 2026 02:33 Inactive

copy-pr-bot Bot temporarily deployed to public June 18, 2026 02:33 Inactive

copy-pr-bot Bot temporarily deployed to public June 18, 2026 02:35 Inactive

copy-pr-bot Bot temporarily deployed to nemo-ci June 18, 2026 02:37 Inactive

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(models): use bool sparse masks for sdpa#2624

fix(models): use bool sparse masks for sdpa#2624
yuhezhang-ai wants to merge 1 commit into
mainfrom
yuhez/fix/sparse-attn-mask-dtype

yuhezhang-ai commented Jun 17, 2026 •

edited

Loading

Uh oh!

copy-pr-bot Bot commented Jun 17, 2026

Uh oh!

yuhezhang-ai commented Jun 17, 2026

Uh oh!

yuhezhang-ai commented Jun 17, 2026

Uh oh!

athitten commented Jun 17, 2026 •

edited

Loading

Uh oh!

yuhezhang-ai commented Jun 17, 2026

Uh oh!

yuhezhang-ai commented Jun 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

yuhezhang-ai commented Jun 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Root Cause

Tests

Uh oh!

copy-pr-bot Bot commented Jun 17, 2026

Uh oh!

yuhezhang-ai commented Jun 17, 2026

Uh oh!

yuhezhang-ai commented Jun 17, 2026

Uh oh!

athitten commented Jun 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yuhezhang-ai commented Jun 17, 2026

Uh oh!

yuhezhang-ai commented Jun 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

yuhezhang-ai commented Jun 17, 2026 •

edited

Loading

athitten commented Jun 17, 2026 •

edited

Loading