relax trl 0.x upper bound from <=0.24.0 to <0.30.0 by shylane · Pull Request #636 · unslothai/unsloth-zoo

shylane · 2026-05-12T16:36:17Z

Relax `trl` 0.x upper bound from `<=0.24.0` to `<0.30.0`

Problem

unsloth-zoo==2026.5.1 and current main pin:

trl>=0.18.2,!=0.19.0,<=0.24.0

This blocks users who intentionally need TRL 0.25-0.29, even though these versions remain in the pre-1.0 line. For example, post-0.24 GRPO work includes newer configuration and trainer surface such as multi_objective_aggregation, SAPO-related loss knobs, off_policy_mask_threshold, vLLM configuration updates, and optional tool/environment rollout support.

With the current cap, dependency resolution fails:

uv pip install "unsloth-zoo==2026.5.1" "trl==0.29.1"
# -> No solution found: unsloth-zoo requires trl<=0.24.0

Compatibility notes

The <=0.24.0 ceiling appears stricter than the current package metadata needs:

GRPOTrainer.__init__ in trl==0.29.1 keeps the existing required arguments (model, reward_funcs) and only adds optional parameters (tools, rollout_func, environment_factory).
GRPOConfig in both trl==0.24.0 and trl==0.29.1 exposes num_iterations and importance_sampling_level.
num_mini_batches is not used by unsloth-zoo.
unsloth-zoo already runtime-detects newer TRL GRPO loss features (get_sapo_token_loss, get_off_policy_mask, sapo_temperature_*, off_policy_mask_threshold) via hasattr guards in rl_replacements.py.

This PR intentionally keeps an upper bound at <0.30.0. TRL 1.x is a larger version jump and should be evaluated separately.

Fix

Change <=0.24.0 to <0.30.0 in both pyproject.toml dependency lists:

-    "trl>=0.18.2,!=0.19.0,<=0.24.0 ; (sys_platform != 'darwin' or platform_machine != 'arm64')",
+    "trl>=0.18.2,!=0.19.0,<0.30.0 ; (sys_platform != 'darwin' or platform_machine != 'arm64')",

and in [project.optional-dependencies] core:

-    "trl>=0.18.2,!=0.19.0,<=0.24.0",
+    "trl>=0.18.2,!=0.19.0,<0.30.0",

Testing

Environment: Python 3.11.14, uv 0.9.16.

Current published metadata fails as expected:

uv pip install --dry-run "unsloth-zoo==2026.5.1" "trl==0.29.1"
# -> No solution found: unsloth-zoo requires trl<=0.24.0

Patched local checkout resolves:

uv pip install --dry-run "./unsloth-zoo" "trl==0.29.1"
# -> Resolved successfully

Also checked:

`trl` version	Result with patched constraint
`0.24.0`	resolves
`0.27.2`	resolves
`0.29.1`	resolves
`1.4.0`	rejected by `<0.30.0`

The patched checkout was also resolver-tested with --python-platform linux for trl==0.29.1.

The <=0.24.0 cap blocks resolution against trl 0.25–0.29 (current latest 0.29.1), even though unsloth-zoo's runtime code is compatible. Bumping the ceiling to <0.30.0 unblocks 0.25–0.29.x while still guarding against trl 1.x.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a5e0b66303

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

gemini-code-assist

Code Review

This pull request updates the upper bound for the trl dependency in pyproject.toml from 0.24.0 to 0.30.0. The reviewer identified a discrepancy where the code uses an inclusive bound (<=0.30.0) despite the PR's stated intent to use an exclusive bound (<0.30.0). Suggestions were provided to correct this in both the main dependency list and the core extras section.

relax trl 0.x upper bound from <=0.24.0 to <0.30.0

a5e0b66

The <=0.24.0 cap blocks resolution against trl 0.25–0.29 (current latest 0.29.1), even though unsloth-zoo's runtime code is compatible. Bumping the ceiling to <0.30.0 unblocks 0.25–0.29.x while still guarding against trl 1.x.

chatgpt-codex-connector Bot reviewed May 12, 2026

View reviewed changes

Comment thread pyproject.toml Outdated

gemini-code-assist Bot reviewed May 12, 2026

View reviewed changes

Comment thread pyproject.toml Outdated

Comment thread pyproject.toml Outdated

fix: use strict <0.30.0 (not <=0.30.0)

8bbba66

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

relax trl 0.x upper bound from <=0.24.0 to <0.30.0#636

relax trl 0.x upper bound from <=0.24.0 to <0.30.0#636
shylane wants to merge 2 commits into
unslothai:mainfrom
shylane:patch-1

shylane commented May 12, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

shylane commented May 12, 2026

Relax trl 0.x upper bound from <=0.24.0 to <0.30.0

Problem

Compatibility notes

Fix

Testing

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Relax `trl` 0.x upper bound from `<=0.24.0` to `<0.30.0`