fix(rm_hub): guard deepscaler reward against a missing response by vjsai · Pull Request #2115 · THUDM/slime

vjsai · 2026-06-21T20:22:14Z

Summary

get_deepscaler_rule_based_reward (slime/rollout/rm_hub/deepscaler.py) starts with if "</think>" in response: without first checking response. The dispatcher async_rm (slime/rollout/rm_hub/__init__.py) passes sample.response straight through:

elif rm_type == "deepscaler":
    return get_deepscaler_rule_based_reward(response, label)

The sibling rule-based rewards in this same package already guard a missing response and return 0 — gpqa.py (if not response: return 0.0) and f1.py (if prediction is None: return ZERO_METRIC) — but deepscaler did not, so a None response raised:

TypeError: argument of type 'NoneType' is not iterable

instead of scoring 0 like its siblings.

Fix

Return 0 for a falsy response at the top of the function, matching the gpqa/f1 contract. An empty-string response already returned 0 (no marker match), so only the crash path changes.

Test Plan

Added test_missing_response_returns_zero to tests/test_rm_deepscaler.py (the cpu-unittest suite):

Before: get_deepscaler_rule_based_reward(None, "42") → TypeError ❌
After: None and "" → 0 ✅; all existing cases unchanged.

tests/test_rm_deepscaler.py ...........  [100%]
11 passed

ruff check passes on both files.

Prepared with AI assistance (Claude Code); the change was reviewed and tested by a human before submitting.

get_deepscaler_rule_based_reward did 'if "</think>" in response' without checking response first. async_rm passes sample.response straight through, and the sibling reward functions in this package (gpqa, f1) already guard a missing response and return 0 — but deepscaler did not, so a None response raised 'TypeError: argument of type NoneType is not iterable' instead of scoring 0. Return 0 for a falsy response, matching the gpqa/f1 contract. An empty string already returned 0, so only the crash path changes. Adds a regression test for None / empty response.

SuperMarioYL mentioned this pull request Jun 22, 2026

fix(rm_hub): grade the final ###Response segment in deepscaler reward #2116

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(rm_hub): guard deepscaler reward against a missing response#2115

fix(rm_hub): guard deepscaler reward against a missing response#2115
vjsai wants to merge 1 commit into
THUDM:mainfrom
vjsai:fix/deepscaler-none-response-guard

vjsai commented Jun 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

vjsai commented Jun 21, 2026

Summary

Fix

Test Plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant