AI Subscriptions Have Become a Usage Management Game
A practical comparison of ChatGPT Plus/Pro, Claude Pro/Max, Google AI Pro/Ultra, and the usage models behind Codex, Claude Code, and Gemini CLI.
Contents
When I first started paying for AI tools, I thought about them like normal software subscriptions.
"Pay monthly, use it a lot."
That still works for light chat. But once tools like Codex and Claude Code enter the workflow, the feeling changes. A few prompts can feel like they burn through usage faster than expected, especially when the task involves a large codebase, long conversations, many file reads, tool calls, or multiple agents.
Under the hood, all of these systems are token-based. The model reads input tokens and produces output tokens, and API pricing is usually expressed in input and output tokens. The difference is that consumer subscription products often wrap those tokens in different surfaces: credits, usage limits, session limits, or compute-based limits.
OpenAI explains Codex with token-based credits more directly. Claude uses token-based API pricing, but Claude Pro and Max subscriptions are explained through usage limits, conversation length limits, and Claude Code workload. Google describes compute-based usage limits in Gemini Apps and quota rules for Gemini CLI and Code Assist.
From the user's side, though, the conclusion is similar.
The $20 Plan Is Becoming the Entry Tier
Across the official pages, the major AI subscriptions now sit in similar bands.
| Service | Base paid tier | Higher work tier | What stood out |
|---|---|---|---|
| ChatGPT / Codex | Plus $20 | Pro $100 / $200 | Codex is available through eligible plans, and its usage counts toward agentic usage limits. |
| Claude / Claude Code | Pro $20 | Max 5x $100 / Max 20x $200 | The API is token-priced, while Pro/Max subscriptions expose usage limits. Claude Code is also affected by codebase size and task complexity. |
| Google Gemini | AI Pro $19.99 | AI Ultra $99.99 / $199.99 | Gemini and developer-tool access expand by plan. |
The interesting part is not only the price list. It is why the higher tiers exist.
A few years ago, $20 felt like "the AI subscription." Now coding agents, deep research, long context, file analysis, browser actions, and tool execution can consume much more than a normal chat flow.
For the providers, a flat monthly price that feels unlimited is hard to keep simple. So the market is splitting: $20 for lighter use, $100-$200 for people who use AI as a work surface throughout the week.
Codex Is Moving Toward Direct Token-Based Accounting
OpenAI's documentation says Codex is included with plans such as ChatGPT Plus and Pro, and Codex usage counts toward agentic usage. The Codex rate card also explains that pricing moved in April 2026 from message-style averages toward token-based credits.
The current Codex rate card separates input tokens, cached input tokens, and output tokens.
That is a meaningful shift.
It means the number of messages is not the only useful mental model. These factors matter more:
- how much code or text the agent reads
- how long the conversation context stays alive
- how large the generated output becomes
- whether a faster or stronger model mode is used
- whether multiple agents or automations run in parallel
One prompt against a tiny code snippet and one prompt against a full repository are not the same kind of request.
Claude Is Token-Based Too, but Subscriptions Expose Limits
Claude points in a similar direction.
Claude Pro is listed at $20 per month in the US. Claude Max has two tiers: Max 5x at $100 and Max 20x at $200. Claude Code can be used through Pro and Max, and Claude's help docs explain that usage across Claude, Claude Desktop, and Claude Code counts toward the same usage limit.
The important distinction is this:
- Claude API: priced by input and output tokens.
- Claude Pro/Max subscriptions: presented to users as time-window usage limits and conversation length limits rather than a raw visible token counter.
- Claude Code CLI: not length-only. Project complexity, codebase size, file reads, auto-accept settings, parallel instances, and model choice all affect usage.
So if the question is "is Claude token-based too?", the answer is yes. But the subscription experience is closer to "I am approaching a usage limit" than "I have spent exactly this many tokens."
The practical point is that Claude Code is not a separate unlimited bucket.
If I use Claude for writing, research, and Claude Code for terminal work, those activities can affect the same allowance. The number of messages varies depending on conversation length, attachments, models, and features. In Claude Code, the number of prompts I can send also varies with repository size and task complexity.
Claude also distinguishes usage limits from length limits.
- Usage limits: how much I can interact with Claude over time
- Length limits: how long a single conversation can become
- Claude Code workload: how much project context it reads, edits, and runs through tools
Automatic context management can make long conversations easier to continue, but longer conversations that trigger summarization can still consume more usage. Convenience is not free.
Gemini Is Also Entering the CLI Agent Race
Google is no longer only a chat-app comparison either.
The Gemini CLI documentation describes it as an open-source AI agent that brings Gemini into the terminal. It uses a ReAct loop, built-in tools, and local or remote MCP servers for tasks such as fixing bugs, implementing features, and improving tests.
Google also documents that Gemini Code Assist agent mode and Gemini CLI share quotas. One prompt can result in multiple model requests, and the quotas page lists daily request limits for Standard and Enterprise editions.
Gemini Apps have compute-based usage limits as well. Prompt complexity, model and feature selection, and chat length can affect the limit. The limit refreshes every five hours until the weekly limit is reached.
My read is that this matters.
If Codex, Claude Code, and Gemini CLI all compete in the terminal and IDE workflow, the market will get more competitive. But I do not expect the result to be simple unlimited usage. More likely, the competition will make providers tune metering more carefully.
Why Usage Disappears Faster Than It Feels
The frustrating moment is usually the same:
"I did not use it that much. Why is the limit already lower?"
With agentic AI, the number of visible prompts can be very different from the real internal work.
For example, "look through this project" can trigger file reads, structure analysis, command execution, result interpretation, and follow-up reasoning. From my side, it is one request. Internally, it may be many steps.
These are the patterns that tend to burn usage quickly.
| Pattern | Why it costs more | How I reduce it |
|---|---|---|
| Continuing one long chat | Old context keeps traveling. | Start a new conversation when the task changes. |
| Loading huge files | Unneeded code enters context. | Pick only the relevant files first. |
| Asking for a broad review | Exploration becomes wide. | Narrow the scope to a function, screen, or error. |
| Requesting long output | Output tokens grow quickly. | Define the output format upfront. |
| Running parallel agents | Several workers spend at once. | Use parallel work only when it clearly helps. |
The point is not to ask fewer questions. The point is to avoid vague large requests and split work into smaller, sharper ones.
My Practical Usage Rules
Using AI efficiently is less about writing magical prompts and more about shaping the task.
Here is the workflow I want to keep using.
1. Ask for a plan first Before edits or execution, get the order of work. If the plan is wrong, the execution will drift too.
2. Limit the files Instead of sending the whole repository into the task, start with a small set of relevant files. If the right files are unclear, ask for a file inventory first.
3. Constrain the output "Explain everything" is expensive. "Give me the top three risks in a table" is usually better.
4. Use tools only when they matter Web search, browsers, MCP, and large file analysis are powerful, but they can be expensive. I want to turn them on when freshness or verification is actually needed.
5. Split drafting from final review Drafting can often use a cheaper or faster mode. Final review is where the strongest model earns its cost.
6. Use checkpoints in long work "Do everything" can waste a lot of usage in the wrong direction. "Stop here and report back" is easier to control.
This is not only a cost habit. It also improves quality.
The Buying Criteria Have Changed
AI subscriptions used to be judged mostly by model quality.
That is no longer enough.
These questions now matter just as much:
- How long can I work for the same monthly price?
- Are the limits predictable?
- Can I buy extra credits without changing the plan?
- Do web, app, CLI, and IDE usage share the same bucket?
- Can I control cost when running automations or parallel agents?
- Does the plan fit my actual workflow?
My view is that the AI subscription market will become more competitive, but also more metered.
For users, the right response is not to assume a subscription is close to unlimited. It is to understand the work pattern that burns usage.
[!CHECK] My takeaway
AI subscriptions are becoming less like simple monthly access and more like a way to allocate limited high-performance work time.
So when I choose or use a plan, I do not want to ask only "which model is best?"
The better question is:
"Is this task worth spending this much usage on?"
Comments
0