Minimum AI Knowledge for Practical Use, Part 6: Tool Calling and Agents
An introduction to tool calling, agents, tool risk levels, permission boundaries, execution logs, and approval flows.
Contents
In the previous post, I covered RAG: letting AI retrieve documents before answering.
Now the next step is execution.
What happens when AI can do more than answer?
Examples:
- search files
- read files
- edit code
- run tests
- build
- deploy
- send an external notification
Once AI can do those things, it is no longer just a chatbot. It starts acting like a work agent.
Tool Calling Is Like a Function Call
A tool is an external capability the model can call.
For example:
{
"name": "read_file",
"description": "Read a file from the repository",
"inputSchema": {
"type": "object",
"properties": {
"path": { "type": "string" }
},
"required": ["path"]
}
}
If the model decides it needs to inspect a file, it can call:
{
"path": "src/index.js"
}
The tool runs, returns a result, and the model continues with that result in context.
The flow looks like this:
user request
-> model selects a tool
-> tool executes
-> result returns to model
-> model answers or calls another tool
The key difference is that tool calls are real external actions. They are not just text.
An Agent Uses Tools as a Workflow
A single tool call is useful.
An agent usually chains multiple tool calls.
For example, updating a page in a documentation site may require:
1. Read the target Markdown files
2. Update the required metadata
3. Check both KO and EN versions
4. Generate a review preview
5. Check that only intended files are exposed
6. Deploy
7. Verify production URLs
8. Report the result
This is not one function call. It is a workflow.
Agent behavior is about planning, executing, observing results, and choosing the next step.
Not All Tools Have the Same Risk
When giving AI tools, risk separation matters.
| Tool Type | Risk | Example |
|---|---|---|
| Read | Low | Search files, read docs |
| Local write | Medium | Edit code, format files |
| Verification | Medium | Run tests, build |
| External effect | High | Deploy, run migration |
| Public/destructive | Very high | Publish, delete data |
Tool design is not only about what the model can do.
It is also about what the model should be allowed to do without additional approval.
Private preview deployment may be allowed. Public publishing may require explicit approval.
In Development Automation, Smaller Tools Are Easier to Control
At first, it is tempting to create one large tool:
{
"name": "publish_page_update",
"description": "Review, generate, deploy, and publish a page update"
}
That looks convenient, but it is not a great operational boundary.
If one tool performs file edits, validation, deployment, and public publishing, it becomes hard to understand where a failure happened. The bigger risk is intent mismatch: a user may ask for a review, while the tool internally performs a public action.
I prefer splitting the workflow into smaller tools:
read_page
update_draft
generate_preview
run_checks
deploy_preview
publish_page
notify_result
This makes the risk level visible.
read_page is mostly safe. publish_page changes public state. They should not be treated as the same kind of permission.
Smaller tools may look more verbose, but they are easier to operate as automation grows. You can see where the agent stopped, which step needs approval, and what should be written to the execution log.
Agents Need Boundaries, Not Unlimited Freedom
More permission is not always better.
Good agent workflows have clear boundaries:
Allowed:
- read files
- write draft posts
- generate review drafts
- deploy limited review environments
Approval required:
- publish public posts
- run production DB migrations
- push to main
- add public navigation links
Forbidden:
- print sensitive credentials
- revert unrelated user changes
- delete user data without approval
These boundaries make the system safer and easier to reason about.
They are not just restrictions. They are part of the design.
Tool Results Still Need Verification
Tool calling gives the model real execution results. That does not mean the final judgment is automatically correct.
A test command may pass while the test coverage is too narrow. A deploy command may succeed while the production page still has a broken image or CSS issue. A search tool may return useful files while missing the file that actually matters.
So an agent workflow needs a verification step after tool execution:
execute tool
-> inspect result
-> compare with expected state
-> search or edit again if needed
-> stop and report if risk is high
Without that loop, the agent becomes an automation that runs actions, not a workflow I can trust with real work.
Execution Logs Matter
When an agent uses tools, "done" is not enough.
We need to know what happened.
At minimum:
{
"time": "2026-05-10T12:05:00+09:00",
"tool": "deploy.preview",
"input": {
"target": "preview-environment"
},
"result": {
"status": "success",
"versionId": "..."
},
"approval": "not_required"
}
This kind of log makes later debugging possible.
Deploys, deletions, and public changes especially need an audit trail.
The Basic Agent Loop
A development agent usually follows this loop:
1. Plan
2. Search files or docs
3. Read relevant files
4. Edit
5. Verify
6. Analyze failures and retry
7. Report results
Each step needs tool support.
But not every step needs full automation. High-risk steps should include approval or manual review.
Summary
Tool calling and agents move AI from answering to acting.
That makes them powerful, but it also makes design more important.
1. Tool calling lets the model invoke external functions.
2. An agent chains tools to complete a workflow.
3. Tools have different risk levels.
4. Agents need clear permission boundaries.
5. Execution logs and approvals are part of safe operation.
In the next post, I will cover how to verify agent work with evals and tests.
Comments
0