relax trl 0.x upper bound from <=0.24.0 to <0.30.0#636
Conversation
The <=0.24.0 cap blocks resolution against trl 0.25–0.29 (current latest 0.29.1), even though unsloth-zoo's runtime code is compatible. Bumping the ceiling to <0.30.0 unblocks 0.25–0.29.x while still guarding against trl 1.x.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: a5e0b66303
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
There was a problem hiding this comment.
Code Review
This pull request updates the upper bound for the trl dependency in pyproject.toml from 0.24.0 to 0.30.0. The reviewer identified a discrepancy where the code uses an inclusive bound (<=0.30.0) despite the PR's stated intent to use an exclusive bound (<0.30.0). Suggestions were provided to correct this in both the main dependency list and the core extras section.
Relax
trl0.x upper bound from<=0.24.0to<0.30.0Problem
unsloth-zoo==2026.5.1and currentmainpin:trl>=0.18.2,!=0.19.0,<=0.24.0This blocks users who intentionally need TRL 0.25-0.29, even though these versions remain in the pre-1.0 line. For example, post-0.24 GRPO work includes newer configuration and trainer surface such as
multi_objective_aggregation, SAPO-related loss knobs,off_policy_mask_threshold, vLLM configuration updates, and optional tool/environment rollout support.With the current cap, dependency resolution fails:
Compatibility notes
The
<=0.24.0ceiling appears stricter than the current package metadata needs:GRPOTrainer.__init__intrl==0.29.1keeps the existing required arguments (model,reward_funcs) and only adds optional parameters (tools,rollout_func,environment_factory).GRPOConfigin bothtrl==0.24.0andtrl==0.29.1exposesnum_iterationsandimportance_sampling_level.num_mini_batchesis not used byunsloth-zoo.unsloth-zooalready runtime-detects newer TRL GRPO loss features (get_sapo_token_loss,get_off_policy_mask,sapo_temperature_*,off_policy_mask_threshold) viahasattrguards inrl_replacements.py.This PR intentionally keeps an upper bound at
<0.30.0. TRL 1.x is a larger version jump and should be evaluated separately.Fix
Change
<=0.24.0to<0.30.0in bothpyproject.tomldependency lists:and in
[project.optional-dependencies] core:Testing
Environment: Python 3.11.14, uv 0.9.16.
Current published metadata fails as expected:
Patched local checkout resolves:
Also checked:
trlversion0.24.00.27.20.29.11.4.0<0.30.0The patched checkout was also resolver-tested with
--python-platform linuxfortrl==0.29.1.