Skip to content

fix(models): use bool sparse masks for sdpa#2624

Open
yuhezhang-ai wants to merge 1 commit into
mainfrom
yuhez/fix/sparse-attn-mask-dtype
Open

fix(models): use bool sparse masks for sdpa#2624
yuhezhang-ai wants to merge 1 commit into
mainfrom
yuhez/fix/sparse-attn-mask-dtype

Conversation

@yuhezhang-ai

@yuhezhang-ai yuhezhang-ai commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Remove the MiniMax-M3 changes from this PR; MiniMax masking is handled in feat: CP support for MiniMax M3 #2551.
  • Use a boolean keep-mask for the DeepSeek-V3.2 SDPA sparse attention path instead of an additive sparse mask.
  • Keep the TE core_attention_bias path additive and unchanged.
  • Cover the shared DeepSeek-V3.2 path used by GLM-MoE-DSA.

Root Cause

DeepSeek-V3.2 built an additive sparse SDPA mask as fp32. That can hit the same class of backend-dependent SDPA issue when Q/K/V are bf16 and cuDNN SDPA is unavailable. Casting the additive mask to bf16 avoids the dtype mismatch, but #2551 shows that bf16 additive -inf masks can leak in fused SDPA kernels. A boolean SDPA mask avoids both concerns.

MiniMax-M3 is intentionally left untouched here because #2551 removes its additive float sparse mask entirely and owns #2617.

Tests

  • source work/runs/_shared/env.sh && uv run --no-sync pytest tests/unit_tests/models/deepseek_v32/test_dsv32_layers.py::TestBuildSparseMaskWithAttentionMask::test_build_sparse_mask_combines_with_attention_mask -q
  • source work/runs/_shared/env.sh && uv run --no-sync pytest tests/unit_tests/models/deepseek_v32/test_dsv32_layers.py::TestDeepseekV32MLASparseMask tests/unit_tests/models/deepseek_v32/test_dsv32_layers.py::TestBuildSparseMaskWithAttentionMask -q
  • source work/runs/_shared/env.sh && uv run --no-sync ruff format --check nemo_automodel/components/models/deepseek_v32/layers.py
  • source work/runs/_shared/env.sh && uv run --no-sync ruff check nemo_automodel/components/models/deepseek_v32/layers.py

@copy-pr-bot

copy-pr-bot Bot commented Jun 17, 2026

Copy link
Copy Markdown

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@yuhezhang-ai

Copy link
Copy Markdown
Contributor Author

/ok to test caec37a

@yuhezhang-ai yuhezhang-ai force-pushed the yuhez/fix/sparse-attn-mask-dtype branch from caec37a to 925e139 Compare June 17, 2026 16:11
@yuhezhang-ai

Copy link
Copy Markdown
Contributor Author

/ok to test 925e139

@yuhezhang-ai yuhezhang-ai marked this pull request as ready for review June 17, 2026 16:21
@yuhezhang-ai yuhezhang-ai requested a review from a team as a code owner June 17, 2026 16:21
@akoumpa akoumpa added the r0.5.0 Auto-cherrypick to release branch. Apply before merge; cherrypick happens after merge. label Jun 17, 2026
@athitten

athitten commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

@yuhezhang-ai thank you for the PR! @jQizhang had reported #2617 with me on slack last night and I already had a fix in the CP PR for minimax: #2551. Sorry I dint get a chance to assign the issue to myself last night, mind reverting the minimax fix ? Minimax M3 removes the float additive bias entirely in my PR, since it was causing mask leakage in BF16. So now the problem reported in #2551 shouldn't happen. And we need that fix to have correct masking (wo leak).

Also minimax is not part of the upcoming release and we dont want minimax related change in the release branch (the model is not there at all). Should be good to merge other changes you have in this PR, except minimax. Thank you!

Signed-off-by: Yuhe Zhang <yuhez@nvidia.com>
@yuhezhang-ai yuhezhang-ai force-pushed the yuhez/fix/sparse-attn-mask-dtype branch from 925e139 to 670d303 Compare June 17, 2026 21:17
@yuhezhang-ai yuhezhang-ai changed the title fix(models): match sparse attention mask dtype fix(models): use bool sparse masks for sdpa Jun 17, 2026
@yuhezhang-ai

Copy link
Copy Markdown
Contributor Author

@yuhezhang-ai thank you for the PR! @jQizhang had reported #2617 with me on slack last night and I already had a fix in the CP PR for minimax: #2551. Sorry I dint get a chance to assign the issue to myself last night, mind reverting the minimax fix ? Minimax M3 removes the float additive bias entirely in my PR, since it was causing mask leakage in BF16. So now the problem reported in #2551 shouldn't happen. And we need that fix to have correct masking (wo leak).

Also minimax is not part of the upcoming release and we dont want minimax related change in the release branch (the model is not there at all). Should be good to merge other changes you have in this PR, except minimax. Thank you!

@athitten Thanks for the context! I reverted the MiniMax-M3 changes from this PR. I also checked the bf16 additive-mask leakage point and updated the remaining DeepSeek-V3.2 SDPA path here to use a similar boolean keep-mask instead of casting the additive mask to bf16. TE still uses the additive core_attention_bias path.

@yuhezhang-ai

Copy link
Copy Markdown
Contributor Author

/ok to test 670d303

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

r0.5.0 Auto-cherrypick to release branch. Apply before merge; cherrypick happens after merge.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants