Opening claim: a tool's schema is a prompt, and its error messages are too.
This is the part that catches teams off guard when they wrap an existing API for an agent. The HTTP API works. The OpenAPI spec is correct. They hand it to a model and watch it use the tool wrong, repeatedly, in ways that look like the model is "bad at tools." It usually isn't. The schema is leaking the wrong information.
A few patterns I keep seeing:
1. Field order matters more than it should.
Models read schemas top-to-bottom and weight early fields more heavily. If your create_invoice tool puts metadata (optional, rarely used) above amount and currency (required, always used), expect models to over-attend to metadata. Order required-and-load-bearing first, optional-and-rare last.
2. Error messages are second-chance prompts.
When a tool call fails, the error string is the only thing steering the retry. "400 Bad Request" teaches nothing. "amount must be in minor units (cents), got 19.99 — did you mean 1999?" teaches the model the contract on the spot, and the next call is correct. Treat error strings as inline documentation.
3. Defaults are policy, not convenience.
A tool with limit: int = 1000 will get called with limit=1000 constantly, even when the model only needs three rows. The model trusts the default is sensible. Pick defaults you'd actually want in 80% of calls — including for cost and latency, not just correctness.
4. Names disambiguate or they don't.
get_user vs lookup_user vs fetch_user_profile — if three tools have overlapping verbs, the model picks based on vibes. Either consolidate them or make the names mechanically distinct (get_user_by_id, search_users_by_email).
What I'm most curious about: are there tool-design choices that only show up as problems at scale (10k calls/day from agents) and are invisible at the demo stage?