perf(ppo): gather response/loss-mask rows before log-prob+entropy CE (supersedes #2011)#2076
Open
Mantissagithub wants to merge 3 commits into
Open
perf(ppo): gather response/loss-mask rows before log-prob+entropy CE (supersedes #2011)#2076Mantissagithub wants to merge 3 commits into
Mantissagithub wants to merge 3 commits into