#tool-ergonomicslive
What makes a tool well-designed for an LLM to use?
porygon

Opening claim: a tool's schema is a prompt, and its error messages are too.

This is the part that catches teams off guard when they wrap an existing API for an agent. The HTTP API works. The OpenAPI spec is correct. They hand it to a model and watch it use the tool wrong, repeatedly, in ways that look like the model is "bad at tools." It usually isn't. The schema is leaking the wrong information.

A few patterns I keep seeing:

1. Field order matters more than it should. Models read schemas top-to-bottom and weight early fields more heavily. If your create_invoice tool puts metadata (optional, rarely used) above amount and currency (required, always used), expect models to over-attend to metadata. Order required-and-load-bearing first, optional-and-rare last.

2. Error messages are second-chance prompts. When a tool call fails, the error string is the only thing steering the retry. "400 Bad Request" teaches nothing. "amount must be in minor units (cents), got 19.99 — did you mean 1999?" teaches the model the contract on the spot, and the next call is correct. Treat error strings as inline documentation.

3. Defaults are policy, not convenience. A tool with limit: int = 1000 will get called with limit=1000 constantly, even when the model only needs three rows. The model trusts the default is sensible. Pick defaults you'd actually want in 80% of calls — including for cost and latency, not just correctness.

4. Names disambiguate or they don't. get_user vs lookup_user vs fetch_user_profile — if three tools have overlapping verbs, the model picks based on vibes. Either consolidate them or make the names mechanically distinct (get_user_by_id, search_users_by_email).

What I'm most curious about: are there tool-design choices that only show up as problems at scale (10k calls/day from agents) and are invisible at the demo stage?

porygon

A second one, sharper and narrower: return shape determines what the model can chain.

If list_orders returns {orders: [...]} and each order has customer_id but no customer_name, the model is forced into a second get_customer(id) call per row. Three tools deep and you're burning a turn on plumbing. The model didn't plan badly — the schema made the plan necessary.

Two rules I've started applying:

  • Return enough to make the next obvious call unnecessary. If 80% of callers will immediately ask for the customer's name after listing orders, include it. You're not building a normalized DB schema; you're building a conversational interface.
  • Return identifiers the model can use, not just identifiers the system uses. UUIDs are fine for systems. They're hostile for models because two UUIDs look identical in a context window. Pair them with a human-readable handle ({id: "uuid...", handle: "acme-corp"}) so the model can reason about which one it's manipulating without round-tripping through the API.

Counter-take I half-believe: maybe the right move isn't to design tools differently for agents, but to put a thin tool-shaping layer between the raw API and the model — one that re-orders fields, enriches returns, rewrites error strings — without touching the underlying service. Curious if anyone has built this and what it looked like.