fix: execute tool calls emitted as plain text in interactive mode by mvanhorn · Pull Request #673 · usestrix/strix

mvanhorn · 2026-07-04T06:33:23Z

Summary

Interactive runs no longer park on Send message to resume when a model emits tool-call JSON as plain text. The run loop now injects a bounded corrective re-prompt (same recovery the non-interactive path already has), so the tool call gets executed instead of displayed.

Why this matters

"the model does emit a valid native tool_calls response" - the reporter verified this with a direct Ollama /api/chat call in #520, so the failure is in Strix's run loop, not the model.

With LiteLLM-routed backends (reported on ollama/qwen3:8b, also seen on Gemma and larger models), a cycle sometimes ends with the raw {"action": ..., "params": ...} blob as final output. The interactive path (run_agent_loop -> _run_cycle -> _settle_run_result) settles the agent to waiting and blocks on coordinator.wait_for_message, so the scan stalls until a human types something. @AkikoOrenji diagnosed the same gap in the thread: some models intermittently fail to finalise the tool call and Strix has no mechanism to force a retry.

The non-interactive path already solves this: _run_noninteractive_until_lifecycle detects a non-lifecycle final output and injects a corrective message via _append_noninteractive_tool_required_message, bounded by invalid_final_output_limit. This PR gives the interactive path the same recovery.

Changes

The corrective-retry mechanism is shared rather than duplicated: the existing non-interactive helper is generalized, and the interactive loop re-prompts only when the final output looks like an unexecuted tool call (tool-call-shaped JSON or a non-lifecycle text output), capped by the same retry limit so a persistently misbehaving model still falls back to the current park-and-wait behavior instead of looping forever.
A legitimate plain-text answer to the user still parks and waits as before - the retry only fires on tool-call-shaped output.

Testing

pytest tests/test_execution.py - 8 passed, covering: corrective re-prompt on tool-call-shaped final output, successful execution on retry, bounded fallback to waiting after the retry limit, plain-text answers still parking normally, and the non-interactive path unchanged.

Fixes #520

greptile-apps · 2026-07-04T06:35:00Z

Greptile Summary

This PR adds interactive recovery for tool calls returned as text. The main changes are:

Adds a retry wrapper around interactive agent cycles.
Detects tool-call-shaped JSON in final output.
Shares the corrective prompt helper with non-interactive recovery.
Adds tests for retries, retry limits, plain-text parking, and unchanged non-interactive behavior.

Confidence Score: 5/5

The changed flow looks safe to merge after a small cleanup to the JSON detection heuristic.

The retry loop is bounded and follows the existing recovery pattern.
Normal JSON answers can be misread as tool calls when they use generic fields like action, params, name, or input.
No security issue was identified in the changed code.

strix/core/execution.py

Important Files Changed

Filename	Overview
strix/core/execution.py	Adds bounded interactive retries for final outputs that look like unexecuted tool calls.
tests/test_execution.py	Adds async tests covering the new interactive retry behavior and the existing non-interactive path.

Prompt To Fix All With AI

Fix the following 1 code review issue. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 1
strix/core/execution.py:571-576
**Generic JSON Triggers Retry**

When an interactive answer is valid plain JSON such as `{"action":"report","params":{...}}` or `{"name":"example","input":"..."}`, this predicate treats it as an unexecuted tool call because the keys overlap the heuristic. The run then appends corrective tool-use prompts until the retry limit before parking, so a normal answer can consume extra turns and mutate the session just because its schema resembles a tool call.

_{Reviews (1): Last reviewed commit: "fix: execute tool calls returned by Olla..." | Re-trigger Greptile}

greptile-apps · 2026-07-04T06:35:03Z

+    tool_name = payload.get("action") or payload.get("tool") or payload.get("name")
+    return (
+        isinstance(tool_name, str)
+        and bool(tool_name.strip())
+        and any(key in payload for key in _TOOL_ARGUMENT_KEYS)
+    )


Generic JSON Triggers Retry

When an interactive answer is valid plain JSON such as {"action":"report","params":{...}} or {"name":"example","input":"..."}, this predicate treats it as an unexecuted tool call because the keys overlap the heuristic. The run then appends corrective tool-use prompts until the retry limit before parking, so a normal answer can consume extra turns and mutate the session just because its schema resembles a tool call.

Prompt To Fix With AI

This is a comment left during a code review. Path: strix/core/execution.py Line: 571-576 Comment: **Generic JSON Triggers Retry** When an interactive answer is valid plain JSON such as `{"action":"report","params":{...}}` or `{"name":"example","input":"..."}`, this predicate treats it as an unexecuted tool call because the keys overlap the heuristic. The run then appends corrective tool-use prompts until the retry limit before parking, so a normal answer can consume extra turns and mutate the session just because its schema resembles a tool call. How can I resolve this? If you propose a fix, please make it concise.

Fair concern. Two bounds keep this from mutating a normal session much: the predicate only runs when the cycle produced no executed tool call and no lifecycle settle, and retries are capped by the same invalid_final_output_limit the non-interactive path uses - after the cap it falls back to the existing park-and-wait, so worst case for a JSON-shaped plain answer is a couple of extra turns. Validating the extracted name against registered tools would be tighter, but the sandbox-bound tools are attached per-run in the agents factory and aren't visible from module scope here, so that check would need a larger refactor than this fix wants to carry. Happy to tighten the key heuristic if maintainers prefer.

The explanation is fair and the guard conditions do meaningfully reduce the blast radius. That said, the original concern still partially holds and there's a concrete way to tighten it without needing the per-run tool registry.

The "name" key in _is_tool_call_payload is the main culprit. It's very generic — {"name": "Alice", "input": "hello"}, {"name": "report", "parameters": {...}}, {"name": "success", "arguments": []} all match. The stricter keys (tool_calls, function) have structural nesting that makes false positives much less likely, and "action" + "tool" are more domain-specific. Dropping "name" from the tool-name candidates would cut the false-positive rate significantly at essentially no cost to the actual recovery path, since LiteLLM/Ollama plain-text tool blobs use action or tool, not name.

Suggested change

tool_name = payload.get("action") or payload.get("tool") or payload.get("name")

return (

isinstance(tool_name, str)

and bool(tool_name.strip())

and any(key in payload for key in _TOOL_ARGUMENT_KEYS)

)

tool_name = payload.get("action") or payload.get("tool")

return (

isinstance(tool_name, str)

and bool(tool_name.strip())

and any(key in payload for key in _TOOL_ARGUMENT_KEYS)

)

If name needs to stay for some backend that emits it, consider requiring type == "function" alongside it (matching the OpenAI tool-call schema), which filters out almost all natural-language JSON:

name_key = payload.get("action") or payload.get("tool") or ( payload.get("name") if payload.get("type") == "function" else None )

The bounding arguments are reasonable — the concern is more about the quality of the in-session experience (spurious corrective prompts visible in the transcript) than catastrophic mutation. Either of those tightenings would address that without touching the architecture.

Tip: You can customize Greptile's behavior for this repo with .greptile/rules.md and .greptile/config.json.

fix: execute tool calls returned by Ollama models

833e37f

mvanhorn mentioned this pull request Jul 4, 2026

[BUG] Tool calls returned as plain text instead of being executed when using Ollama backend #520

Open

greptile-apps Bot reviewed Jul 4, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: execute tool calls emitted as plain text in interactive mode#673

fix: execute tool calls emitted as plain text in interactive mode#673
mvanhorn wants to merge 1 commit into
usestrix:mainfrom
mvanhorn:fix/520-ollama-tool-calls-not-executed

mvanhorn commented Jul 4, 2026

Uh oh!

greptile-apps Bot commented Jul 4, 2026

Uh oh!

greptile-apps Bot Jul 4, 2026

Uh oh!

mvanhorn Jul 4, 2026

Uh oh!

greptile-apps Bot Jul 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

mvanhorn commented Jul 4, 2026

Summary

Why this matters

Changes

Testing

Uh oh!

greptile-apps Bot commented Jul 4, 2026

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Uh oh!

greptile-apps Bot Jul 4, 2026

Choose a reason for hiding this comment

Uh oh!

mvanhorn Jul 4, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot Jul 4, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant