February 15, 2025

Structuring tools for AI Agents

A short conceptual guide to organising tools in agentic workflows

Tools serve as the gateway for AI agents to perform actions beyond natural language generation. They are one of the essential components needed to transition from building novelty chatbots to developing real assistants.

A "tool" is a broad term but fundamentally refers to a function, usually parameterised, that allows an LLM to execute code. The most extreme form of this setup involves giving the LLM access to a code sandbox (like E2B), enabling it to write and execute its own tools at runtime. While this is certainly the most sci-fi approach, it’s not the most practical. Typically, you’ll define a set of tools with natural language descriptions that the LLM can choose from and use accordingly.

How to Structure Tools

Since tools are essentially functions, you might be tempted to write them in a purely functional manner—meaning each tool has a single responsibility. Take an agent that interacts with GitHub. We might want to allow the agent to:

Retrieve basic repository details (stars, contributors, etc.)
View open pull requests
List open issues

A natural way to structure these tools might look like this:

1@tool
2def get_open_prs(repo_name: str) -> str:
3    """
4    Returns a list of open pull requests for a given repository. 
5    Repo name is in the format: owner/repo_name
6    """
7    g = github.Github()
8    pulls = g.get_repo(repo_name).get_pulls(state="open")
9    formatted_prs = []
10
11    for pr in pulls:
12        pr_info = (
13            f"PR #{pr.number}: {pr.title}\n"
14            f"Author: {pr.user.login}\n"
15            f"Created: {pr.created_at}\n"
16            f"URL: {pr.html_url}\n"
17            f"---"
18        )
19        formatted_prs.append(pr_info)
20
21    if not formatted_prs:
22        return "No open pull requests found."
23
24    return "\n\n".join(formatted_prs)
25
26@tool
27def get_open_issues(repo_name: str) -> str:
28	 ...
29
30@tool
31def get_repo_details(repo_name: str) -> str:
32	 ...

This feels sensible, and is probably how you should write regular functions. But there’s a big issue: the agent must choose between three separate yet closely related tools just to retrieve basic information about a repository.

Imagine the following query to the agent: “Which contributor has the most pull open requests and fewest open issues in the Linux repo?”

To answer this, the agent would need to:

Call the get_open_prs tool
Call the get_open_issues tool
Relate the results of two separate tool calls and respond.

This may seem like a very simple task, but even basic cases with multiple tool calls can easily fall apart with smaller models, or result in unnecessary back and forth with the user ("would you like me to get the open issues now?"). What’s worse is that as the capabilities of the agent expand we could end up with dozens or even hundreds of tools to choose from, increasing the likelihood of errors. This can be partially addressed with a multi agent setup, where each agent has access to a subset of tools, but it’s not a silver bullet.

Instead, I suggest structuring tools based on the types of questions they can answer and the parameters they require, rather than strictly on the specific data they provide. For example, instead of separate functions for pull requests, issues, and repository details, we could consolidate them into a single tool for all high-level repository questions:

1@tool
2def get_repo_overview(repo_name: str) -> str:
3    """
4	Provides all open pull requests, contributors and issues for a repository.
5	Also includes the number of stars and programming languages used.  
6	
7    Repo name is in the format: owner/repo_name
8    """
9    return get_repo_details(repo_name) + "\n" + get_open_prs(repo_name) + "\n" +   get_open_issues(repo_name)

Now whenever the agent only receives a repo_name from the user there's a single way to get all related information. This is technically more expensive than getting the exact information you need for a specific query, but as LLMs ability to reason over longer context improves, presenting all related information in a contiguous block will lead to more holistic comprehension. Also with broader context, the model has of a better chance of drawing nuanced conclusions and resolving ambiguities in abstract questions like “what does contributor X do the most in the Linux repo?”

In general, it feels convenient to maintain consistency between the way we write and think about code and the way LLMs should interact with tools. But it’s important to remember that LLMs operate under different constraints than human developers. While modular, single-responsibility functions are ideal for traditional programming, they can create unnecessary complexity for AI agents, leading to strange errors when reasoning across multiple tool calls.

Marcel Marais

AI Engineer