Skip to content

[docs/examples] Blackwell cute tutorials: narrow TMEM_LOAD atoms (32dp32b1x) carry a large per-load lowering cost — prefer wider atoms (32dp32b32x) in t2r epilogues#3313

Open
cfregly wants to merge 1 commit into
NVIDIA:mainfrom
cfregly:docs/blackwell-tutorial-tmem-load-atom-width

docs(cute tutorials): note the per-load lowering cost of narrow TMEM_…

e6707d9
Select commit
Loading
Failed to load commit list.

Workflow runs completed with no jobs