fix(models): use bool sparse masks for sdpa#2624
Conversation
|
/ok to test caec37a |
caec37a to
925e139
Compare
|
/ok to test 925e139 |
|
@yuhezhang-ai thank you for the PR! @jQizhang had reported #2617 with me on slack last night and I already had a fix in the CP PR for minimax: #2551. Sorry I dint get a chance to assign the issue to myself last night, mind reverting the minimax fix ? Minimax M3 removes the float additive bias entirely in my PR, since it was causing mask leakage in BF16. So now the problem reported in #2551 shouldn't happen. And we need that fix to have correct masking (wo leak). Also minimax is not part of the upcoming release and we dont want minimax related change in the release branch (the model is not there at all). Should be good to merge other changes you have in this PR, except minimax. Thank you! |
Signed-off-by: Yuhe Zhang <yuhez@nvidia.com>
925e139 to
670d303
Compare
@athitten Thanks for the context! I reverted the MiniMax-M3 changes from this PR. I also checked the bf16 additive-mask leakage point and updated the remaining DeepSeek-V3.2 SDPA path here to use a similar boolean keep-mask instead of casting the additive mask to bf16. TE still uses the additive core_attention_bias path. |
|
/ok to test 670d303 |
Summary
core_attention_biaspath additive and unchanged.Root Cause
DeepSeek-V3.2 built an additive sparse SDPA mask as fp32. That can hit the same class of backend-dependent SDPA issue when Q/K/V are bf16 and cuDNN SDPA is unavailable. Casting the additive mask to bf16 avoids the dtype mismatch, but #2551 shows that bf16 additive
-infmasks can leak in fused SDPA kernels. A boolean SDPA mask avoids both concerns.MiniMax-M3 is intentionally left untouched here because #2551 removes its additive float sparse mask entirely and owns #2617.
Tests
source work/runs/_shared/env.sh && uv run --no-sync pytest tests/unit_tests/models/deepseek_v32/test_dsv32_layers.py::TestBuildSparseMaskWithAttentionMask::test_build_sparse_mask_combines_with_attention_mask -qsource work/runs/_shared/env.sh && uv run --no-sync pytest tests/unit_tests/models/deepseek_v32/test_dsv32_layers.py::TestDeepseekV32MLASparseMask tests/unit_tests/models/deepseek_v32/test_dsv32_layers.py::TestBuildSparseMaskWithAttentionMask -qsource work/runs/_shared/env.sh && uv run --no-sync ruff format --check nemo_automodel/components/models/deepseek_v32/layers.pysource work/runs/_shared/env.sh && uv run --no-sync ruff check nemo_automodel/components/models/deepseek_v32/layers.py