Skip to content

fix: execute tool calls emitted as plain text in interactive mode#673

Open
mvanhorn wants to merge 1 commit into
usestrix:mainfrom
mvanhorn:fix/520-ollama-tool-calls-not-executed
Open

fix: execute tool calls emitted as plain text in interactive mode#673
mvanhorn wants to merge 1 commit into
usestrix:mainfrom
mvanhorn:fix/520-ollama-tool-calls-not-executed

Conversation

@mvanhorn

@mvanhorn mvanhorn commented Jul 4, 2026

Copy link
Copy Markdown
Contributor

Summary

Interactive runs no longer park on Send message to resume when a model emits tool-call JSON as plain text. The run loop now injects a bounded corrective re-prompt (same recovery the non-interactive path already has), so the tool call gets executed instead of displayed.

Why this matters

"the model does emit a valid native tool_calls response" - the reporter verified this with a direct Ollama /api/chat call in #520, so the failure is in Strix's run loop, not the model.

With LiteLLM-routed backends (reported on ollama/qwen3:8b, also seen on Gemma and larger models), a cycle sometimes ends with the raw {"action": ..., "params": ...} blob as final output. The interactive path (run_agent_loop -> _run_cycle -> _settle_run_result) settles the agent to waiting and blocks on coordinator.wait_for_message, so the scan stalls until a human types something. @AkikoOrenji diagnosed the same gap in the thread: some models intermittently fail to finalise the tool call and Strix has no mechanism to force a retry.

The non-interactive path already solves this: _run_noninteractive_until_lifecycle detects a non-lifecycle final output and injects a corrective message via _append_noninteractive_tool_required_message, bounded by invalid_final_output_limit. This PR gives the interactive path the same recovery.

Changes

  • The corrective-retry mechanism is shared rather than duplicated: the existing non-interactive helper is generalized, and the interactive loop re-prompts only when the final output looks like an unexecuted tool call (tool-call-shaped JSON or a non-lifecycle text output), capped by the same retry limit so a persistently misbehaving model still falls back to the current park-and-wait behavior instead of looping forever.
  • A legitimate plain-text answer to the user still parks and waits as before - the retry only fires on tool-call-shaped output.

Testing

pytest tests/test_execution.py - 8 passed, covering: corrective re-prompt on tool-call-shaped final output, successful execution on retry, bounded fallback to waiting after the retry limit, plain-text answers still parking normally, and the non-interactive path unchanged.

Fixes #520

@greptile-apps

greptile-apps Bot commented Jul 4, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR adds interactive recovery for tool calls returned as text. The main changes are:

  • Adds a retry wrapper around interactive agent cycles.
  • Detects tool-call-shaped JSON in final output.
  • Shares the corrective prompt helper with non-interactive recovery.
  • Adds tests for retries, retry limits, plain-text parking, and unchanged non-interactive behavior.

Confidence Score: 5/5

The changed flow looks safe to merge after a small cleanup to the JSON detection heuristic.

  • The retry loop is bounded and follows the existing recovery pattern.
  • Normal JSON answers can be misread as tool calls when they use generic fields like action, params, name, or input.
  • No security issue was identified in the changed code.

strix/core/execution.py

Important Files Changed

Filename Overview
strix/core/execution.py Adds bounded interactive retries for final outputs that look like unexecuted tool calls.
tests/test_execution.py Adds async tests covering the new interactive retry behavior and the existing non-interactive path.
Prompt To Fix All With AI
Fix the following 1 code review issue. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 1
strix/core/execution.py:571-576
**Generic JSON Triggers Retry**

When an interactive answer is valid plain JSON such as `{"action":"report","params":{...}}` or `{"name":"example","input":"..."}`, this predicate treats it as an unexecuted tool call because the keys overlap the heuristic. The run then appends corrective tool-use prompts until the retry limit before parking, so a normal answer can consume extra turns and mutate the session just because its schema resembles a tool call.

Reviews (1): Last reviewed commit: "fix: execute tool calls returned by Olla..." | Re-trigger Greptile

Comment thread strix/core/execution.py
Comment on lines +571 to +576
tool_name = payload.get("action") or payload.get("tool") or payload.get("name")
return (
isinstance(tool_name, str)
and bool(tool_name.strip())
and any(key in payload for key in _TOOL_ARGUMENT_KEYS)
)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Generic JSON Triggers Retry

When an interactive answer is valid plain JSON such as {"action":"report","params":{...}} or {"name":"example","input":"..."}, this predicate treats it as an unexecuted tool call because the keys overlap the heuristic. The run then appends corrective tool-use prompts until the retry limit before parking, so a normal answer can consume extra turns and mutate the session just because its schema resembles a tool call.

Prompt To Fix With AI
This is a comment left during a code review.
Path: strix/core/execution.py
Line: 571-576

Comment:
**Generic JSON Triggers Retry**

When an interactive answer is valid plain JSON such as `{"action":"report","params":{...}}` or `{"name":"example","input":"..."}`, this predicate treats it as an unexecuted tool call because the keys overlap the heuristic. The run then appends corrective tool-use prompts until the retry limit before parking, so a normal answer can consume extra turns and mutate the session just because its schema resembles a tool call.

How can I resolve this? If you propose a fix, please make it concise.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair concern. Two bounds keep this from mutating a normal session much: the predicate only runs when the cycle produced no executed tool call and no lifecycle settle, and retries are capped by the same invalid_final_output_limit the non-interactive path uses - after the cap it falls back to the existing park-and-wait, so worst case for a JSON-shaped plain answer is a couple of extra turns. Validating the extracted name against registered tools would be tighter, but the sandbox-bound tools are attached per-run in the agents factory and aren't visible from module scope here, so that check would need a larger refactor than this fix wants to carry. Happy to tighten the key heuristic if maintainers prefer.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The explanation is fair and the guard conditions do meaningfully reduce the blast radius. That said, the original concern still partially holds and there's a concrete way to tighten it without needing the per-run tool registry.

The "name" key in _is_tool_call_payload is the main culprit. It's very generic — {"name": "Alice", "input": "hello"}, {"name": "report", "parameters": {...}}, {"name": "success", "arguments": []} all match. The stricter keys (tool_calls, function) have structural nesting that makes false positives much less likely, and "action" + "tool" are more domain-specific. Dropping "name" from the tool-name candidates would cut the false-positive rate significantly at essentially no cost to the actual recovery path, since LiteLLM/Ollama plain-text tool blobs use action or tool, not name.

Suggested change
tool_name = payload.get("action") or payload.get("tool") or payload.get("name")
return (
isinstance(tool_name, str)
and bool(tool_name.strip())
and any(key in payload for key in _TOOL_ARGUMENT_KEYS)
)
tool_name = payload.get("action") or payload.get("tool")
return (
isinstance(tool_name, str)
and bool(tool_name.strip())
and any(key in payload for key in _TOOL_ARGUMENT_KEYS)
)

If name needs to stay for some backend that emits it, consider requiring type == "function" alongside it (matching the OpenAI tool-call schema), which filters out almost all natural-language JSON:

    name_key = payload.get("action") or payload.get("tool") or (
        payload.get("name") if payload.get("type") == "function" else None
    )

The bounding arguments are reasonable — the concern is more about the quality of the in-session experience (spurious corrective prompts visible in the transcript) than catastrophic mutation. Either of those tightenings would address that without touching the architecture.

Tip: You can customize Greptile's behavior for this repo with .greptile/rules.md and .greptile/config.json.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] Tool calls returned as plain text instead of being executed when using Ollama backend

1 participant