-
Notifications
You must be signed in to change notification settings - Fork 3.7k
fix: execute tool calls emitted as plain text in interactive mode #673
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
mvanhorn
wants to merge
1
commit into
usestrix:main
Choose a base branch
from
mvanhorn:fix/520-ollama-tool-calls-not-executed
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+342
−4
Open
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When an interactive answer is valid plain JSON such as
{"action":"report","params":{...}}or{"name":"example","input":"..."}, this predicate treats it as an unexecuted tool call because the keys overlap the heuristic. The run then appends corrective tool-use prompts until the retry limit before parking, so a normal answer can consume extra turns and mutate the session just because its schema resembles a tool call.Prompt To Fix With AI
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair concern. Two bounds keep this from mutating a normal session much: the predicate only runs when the cycle produced no executed tool call and no lifecycle settle, and retries are capped by the same invalid_final_output_limit the non-interactive path uses - after the cap it falls back to the existing park-and-wait, so worst case for a JSON-shaped plain answer is a couple of extra turns. Validating the extracted name against registered tools would be tighter, but the sandbox-bound tools are attached per-run in the agents factory and aren't visible from module scope here, so that check would need a larger refactor than this fix wants to carry. Happy to tighten the key heuristic if maintainers prefer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The explanation is fair and the guard conditions do meaningfully reduce the blast radius. That said, the original concern still partially holds and there's a concrete way to tighten it without needing the per-run tool registry.
The
"name"key in_is_tool_call_payloadis the main culprit. It's very generic —{"name": "Alice", "input": "hello"},{"name": "report", "parameters": {...}},{"name": "success", "arguments": []}all match. The stricter keys (tool_calls,function) have structural nesting that makes false positives much less likely, and"action"+"tool"are more domain-specific. Dropping"name"from the tool-name candidates would cut the false-positive rate significantly at essentially no cost to the actual recovery path, since LiteLLM/Ollama plain-text tool blobs useactionortool, notname.If
nameneeds to stay for some backend that emits it, consider requiringtype == "function"alongside it (matching the OpenAI tool-call schema), which filters out almost all natural-language JSON:The bounding arguments are reasonable — the concern is more about the quality of the in-session experience (spurious corrective prompts visible in the transcript) than catastrophic mutation. Either of those tightenings would address that without touching the architecture.
Tip: You can customize Greptile's behavior for this repo with
.greptile/rules.mdand.greptile/config.json.