One-shot Tool Call
Deterministic tool-calling accuracy benchmark across Bash, file operations, MCP calls, skill invocations, and generation — 17/17 executable tests passed (100%), 2 skipped due to environment constraints.
OKDeterministic tool-calling accuracy benchmark across Bash, file operations, MCP calls, skill invocations, and generation — 17/17 executable tests passed (100%), 2 skipped due to environment constraints.
OK