← Back to catalog

One-shot Tool Call

Deterministic tool-calling accuracy benchmark across Bash, file operations, MCP calls, skill invocations, and generation — 17/17 executable tests passed (100%), 2 skipped due to environment constraints.

OK

Files