Skip to content

fix(rm_hub): guard deepscaler reward against a missing response#2115

Open
vjsai wants to merge 1 commit into
THUDM:mainfrom
vjsai:fix/deepscaler-none-response-guard
Open

fix(rm_hub): guard deepscaler reward against a missing response#2115
vjsai wants to merge 1 commit into
THUDM:mainfrom
vjsai:fix/deepscaler-none-response-guard

Conversation

@vjsai

@vjsai vjsai commented Jun 21, 2026

Copy link
Copy Markdown

Summary

get_deepscaler_rule_based_reward (slime/rollout/rm_hub/deepscaler.py) starts with if "</think>" in response: without first checking response. The dispatcher async_rm (slime/rollout/rm_hub/__init__.py) passes sample.response straight through:

elif rm_type == "deepscaler":
    return get_deepscaler_rule_based_reward(response, label)

The sibling rule-based rewards in this same package already guard a missing response and return 0 — gpqa.py (if not response: return 0.0) and f1.py (if prediction is None: return ZERO_METRIC) — but deepscaler did not, so a None response raised:

TypeError: argument of type 'NoneType' is not iterable

instead of scoring 0 like its siblings.

Fix

Return 0 for a falsy response at the top of the function, matching the gpqa/f1 contract. An empty-string response already returned 0 (no marker match), so only the crash path changes.

Test Plan

Added test_missing_response_returns_zero to tests/test_rm_deepscaler.py (the cpu-unittest suite):

  • Before: get_deepscaler_rule_based_reward(None, "42")TypeError
  • After: None and ""0 ✅; all existing cases unchanged.
tests/test_rm_deepscaler.py ...........  [100%]
11 passed

ruff check passes on both files.


Prepared with AI assistance (Claude Code); the change was reviewed and tested by a human before submitting.

get_deepscaler_rule_based_reward did 'if "</think>" in response'
without checking response first. async_rm passes sample.response straight
through, and the sibling reward functions in this package (gpqa, f1)
already guard a missing response and return 0 — but deepscaler did not, so
a None response raised 'TypeError: argument of type NoneType is not
iterable' instead of scoring 0.

Return 0 for a falsy response, matching the gpqa/f1 contract. An empty
string already returned 0, so only the crash path changes.

Adds a regression test for None / empty response.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant