Skip to content

[docs/examples] Blackwell cute tutorials: narrow TMEM_LOAD atoms (32dp32b1x) carry a large per-load lowering cost — prefer wider atoms (32dp32b32x) in t2r epilogues#3313

Open
cfregly wants to merge 1 commit into
NVIDIA:mainfrom
cfregly:docs/blackwell-tutorial-tmem-load-atom-width

Commits