Respect GC for GRPO by danielhanchen · Pull Request #69 · unslothai/unsloth-staging-1

danielhanchen · 2026-05-06T12:18:15Z

Staging mirror of unslothai#5269

Original PR: unslothai#5269
Author: Datta0

This is a staging copy for review and editing. Once finalized, changes will be pushed back to the original PR.

Original description

danielhanchen · 2026-05-06T12:25:25Z

/gemini review

gemini-code-assist

Code Review

This pull request updates the grpo_trainer__generate_and_score_completions function in unsloth/models/rl_replacements.py to ensure that the use_gradient_checkpointing parameter is correctly passed to the model's training state based on the trainer's configuration. There are no review comments to address, and I have no additional feedback to provide.

danielhanchen · 2026-05-06T12:55:47Z

/gemini review

gemini-code-assist

Code Review

This pull request ensures that gradient checkpointing settings are correctly preserved when switching models between inference and training modes during generation. It implements logic to snapshot the gradient checkpointing state across modules before inference and restores that state when returning to training mode in Llama and RL model implementations. I have no feedback to provide.

…stores Two sibling generation paths put the model into inference mode and then unconditionally restored training with the for_training default, which re-enabled gradient checkpointing even when the caller had it disabled: - unsloth/models/rl.py: unsloth_unwrap_model_for_generation, installed onto every TRL *_trainer module that exposes unwrap_model_for_generation. - unsloth/models/llama.py: unsloth_fast_generate, bound onto model.generate. Snapshot the active gradient_checkpointing state from the model modules before for_inference clears it, then thread the snapshot through the matching for_training call. Same one-line restore semantics already used by prepare_for_training_mode and the GRPO replacement at rl_replacements.py. The for_training(...) call on each line is preserved; only the kwarg is added. The pre-existing post-generate guards (the conditional restore in unsloth_fast_generate and the finally restore in unsloth_unwrap_model_for_generation) continue to run unchanged.

…n restores Two follow-ups to the post-generate gradient_checkpointing restore: 1. unsloth/models/rl.py: TRL's _unwrap_model_for_generation calls unwrapped_model.gradient_checkpointing_disable() before yielding (trl/models/utils.py:124-127 in 0.22.2, 0.27.1, and 1.3.0). The previous snapshot was taken inside the with-block and therefore read the post-disable state, restoring for_training with use_gradient_checkpointing=False even when the caller had it on. Move the snapshot above the with-block so it observes the caller's pre-disable configuration. 2. unsloth/models/{rl.py,llama.py}: any(getattr(m, "gradient_checkpointing")) collapses Unsloth's smart-GC mode value "unsloth" (a documented loader default at unsloth/models/_utils.py:212 and unsloth/models/llama.py 2824/3314, loader.py:248/854) into a plain True. After generation, the restore would silently downgrade "unsloth" smart GC to standard HF GC. Replace any() with a value-preserving next((v for ... if v), False) so the actual mode value survives the round-trip. The for_training(...) calls on each line are preserved; only the snapshot expression and its position change. The pre-existing post-generate restore guards continue to run unchanged.

Datta0 and others added 4 commits May 4, 2026 09:46

Respect GC for GRPO

5a4146d

Merge branch 'main' into fix_gc_grpo

4160a6e

Merge remote-tracking branch 'origin/main' into

0afec10

Scrub .github/workflows for staging push (matches staging base)

f45d5cf

gemini-code-assist Bot reviewed May 6, 2026

View reviewed changes

danielhanchen added 2 commits May 6, 2026 13:28

danielhanchen force-pushed the pr-5269-head branch from 324e30e to 89cedfe Compare May 6, 2026 13:29

danielhanchen force-pushed the main branch 3 times, most recently from e128c6f to 1555c15 Compare May 18, 2026 03:46

danielhanchen force-pushed the main branch from 9f47625 to b9dd7cf Compare June 7, 2026 10:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Respect GC for GRPO#69

Respect GC for GRPO#69
danielhanchen wants to merge 6 commits into
mainfrom
pr-5269-head

danielhanchen commented May 6, 2026

Uh oh!

danielhanchen commented May 6, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

danielhanchen commented May 6, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

danielhanchen commented May 6, 2026

Staging mirror of unslothai#5269

Original description

Uh oh!

danielhanchen commented May 6, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

danielhanchen commented May 6, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants