2026.05.22 AI AI Basics en

Minimum AI Knowledge for Practical Use, Part 6: Tool Calling and Agents

An introduction to tool calling, agents, tool risk levels, permission boundaries, execution logs, and approval flows.

Contents

In the previous post, I covered RAG: letting AI retrieve documents before answering.

Now the next step is execution.

What happens when AI can do more than answer?

Examples:

- search files
- read files
- edit code
- run tests
- build
- deploy
- send an external notification

Once AI can do those things, it is no longer just a chatbot. It starts acting like a work agent.

Tool calling and agent loop across planning, tool execution, and result checks

Tool Calling Is Like a Function Call

A tool is an external capability the model can call.

For example:

{
  "name": "read_file",
  "description": "Read a file from the repository",
  "inputSchema": {
    "type": "object",
    "properties": {
      "path": { "type": "string" }
    },
    "required": ["path"]
  }
}

If the model decides it needs to inspect a file, it can call:

{
  "path": "src/index.js"
}

The tool runs, returns a result, and the model continues with that result in context.

The flow looks like this:

user request
-> model selects a tool
-> tool executes
-> result returns to model
-> model answers or calls another tool

The key difference is that tool calls are real external actions. They are not just text.

An Agent Uses Tools as a Workflow

A single tool call is useful.

An agent usually chains multiple tool calls.

For example, updating a page in a documentation site may require:

1. Read the target Markdown files
2. Update the required metadata
3. Check both KO and EN versions
4. Generate a review preview
5. Check that only intended files are exposed
6. Deploy
7. Verify production URLs
8. Report the result

This is not one function call. It is a workflow.

Agent behavior is about planning, executing, observing results, and choosing the next step.

Not All Tools Have the Same Risk

When giving AI tools, risk separation matters.

Tool TypeRiskExample
ReadLowSearch files, read docs
Local writeMediumEdit code, format files
VerificationMediumRun tests, build
External effectHighDeploy, run migration
Public/destructiveVery highPublish, delete data

Tool design is not only about what the model can do.

It is also about what the model should be allowed to do without additional approval.

Private preview deployment may be allowed. Public publishing may require explicit approval.

A permission ladder that separates read-only access, local edits, verification, limited external effects, and production changes

In Development Automation, Smaller Tools Are Easier to Control

At first, it is tempting to create one large tool:

{
  "name": "publish_page_update",
  "description": "Review, generate, deploy, and publish a page update"
}

That looks convenient, but it is not a great operational boundary.

If one tool performs file edits, validation, deployment, and public publishing, it becomes hard to understand where a failure happened. The bigger risk is intent mismatch: a user may ask for a review, while the tool internally performs a public action.

I prefer splitting the workflow into smaller tools:

read_page
update_draft
generate_preview
run_checks
deploy_preview
publish_page
notify_result

This makes the risk level visible.

read_page is mostly safe. publish_page changes public state. They should not be treated as the same kind of permission.

Smaller tools may look more verbose, but they are easier to operate as automation grows. You can see where the agent stopped, which step needs approval, and what should be written to the execution log.

Agents Need Boundaries, Not Unlimited Freedom

More permission is not always better.

Good agent workflows have clear boundaries:

Allowed:
- read files
- write draft posts
- generate review drafts
- deploy limited review environments

Approval required:
- publish public posts
- run production DB migrations
- push to main
- add public navigation links

Forbidden:
- print sensitive credentials
- revert unrelated user changes
- delete user data without approval

These boundaries make the system safer and easier to reason about.

They are not just restrictions. They are part of the design.

Tool Results Still Need Verification

Tool calling gives the model real execution results. That does not mean the final judgment is automatically correct.

A test command may pass while the test coverage is too narrow. A deploy command may succeed while the production page still has a broken image or CSS issue. A search tool may return useful files while missing the file that actually matters.

So an agent workflow needs a verification step after tool execution:

execute tool
-> inspect result
-> compare with expected state
-> search or edit again if needed
-> stop and report if risk is high

Without that loop, the agent becomes an automation that runs actions, not a workflow I can trust with real work.

Execution Logs Matter

When an agent uses tools, "done" is not enough.

We need to know what happened.

At minimum:

{
  "time": "2026-05-10T12:05:00+09:00",
  "tool": "deploy.preview",
  "input": {
    "target": "preview-environment"
  },
  "result": {
    "status": "success",
    "versionId": "..."
  },
  "approval": "not_required"
}

This kind of log makes later debugging possible.

Deploys, deletions, and public changes especially need an audit trail.

The Basic Agent Loop

A development agent usually follows this loop:

1. Plan
2. Search files or docs
3. Read relevant files
4. Edit
5. Verify
6. Analyze failures and retry
7. Report results

Each step needs tool support.

But not every step needs full automation. High-risk steps should include approval or manual review.

Summary

Tool calling and agents move AI from answering to acting.

That makes them powerful, but it also makes design more important.

1. Tool calling lets the model invoke external functions.
2. An agent chains tools to complete a workflow.
3. Tools have different risk levels.
4. Agents need clear permission boundaries.
5. Execution logs and approvals are part of safe operation.

In the next post, I will cover how to verify agent work with evals and tests.

Comments

0

Write a Comment

Comments are public by default. Private comments are visible to the admin only.