How I Cut Claude Tokens By 80%

If you're like me and you're always dragging your token usage down, here's the 20% of habits that save 80% of the tokens. Most of what you spend gets burned on things you never asked for. These are the fixes, in the order I'd apply them.

I also bundled this whole setup, the CLAUDE.md, the statusline, and the rest, into a free starter config you can grab here: free Claude Code starter config.

See what's already in your context

Every time you open Claude Code you start a new session, and that session isn't empty. Run /context and you'll see what's loaded before you even type: the system prompt, the system tools, your memory files, the skills being loaded, and the free space left.

This matters because that context gets added to every message you send in the session. If a pile of skills and memory files are loading that you don't need, that's thousands of tokens riding along on every single message. The rest of the tips are mostly about keeping this number small.

Keep CLAUDE.md under 200 lines

Open the CLAUDE.md in your project. A good rule: keep it under 200 lines. If it's creeping past 300 or 400, trim it.

The trick is to treat it like an index, not a manual. If a chunk of it explains how to do something, move that into a skill, and in CLAUDE.md just link to it: "to generate a report, use this skill." Same with documentation you only need once in a while. Put it in a readme, and in CLAUDE.md write one line saying what it is and when to read it. Then Claude only pulls that file into context when the task actually calls for it, instead of carrying it every message.

Plan with Opus, build with Sonnet

Run /model and you get three options: the default (Sonnet), the stronger one that costs more (Opus), and the lightest one (Haiku) for simple things. Picking one and leaving it isn't the move if you care about tokens.

What I do at the start of a task is select opus-plan. It uses the better model, Opus, to plan what needs to happen, then drops to Sonnet to actually do the work. You get the good thinking where it counts and the cheaper model for the rest.

Start in plan mode

Don't just hand Claude a task and let it run in auto mode. Start in plan mode and iterate first. Hit Shift+Tab and you'll see the mode change at the bottom of the screen. Plan mode lets Claude show you the plan before it touches anything. You read it, confirm it's what you want, then turn plan mode off and let it run. Catching a wrong plan before it writes code saves you a whole round of tokens fixing the mess.

Use the Karpathy guidelines

There's a set of skills based on Andrej Karpathy's notes about the common problems when you're working with a coding agent. The problems and the fixes are written up in a repo, and the simplest way to use them is to let Claude set it up for you. Copy the link, open a session, and tell it to set this up in your project. When it's done, start a fresh session so that setup conversation doesn't sit in your context for the next task.

Don't reuse one session for different tasks

This is the one most people get backwards. If you used 20,000 tokens on task A and then start task B in the same session, task B is now carrying those 20,000 tokens on every new message. Some of it might be cached so you pay less, but you're still paying for context you don't need.

When you do want to carry something over, use /compact instead of dragging the whole session along. It compacts the context and keeps the session going for cheaper. You can even add custom instructions so it doesn't drop the part you care about, something like "don't forget how endpoint X works" if you're about to build a middleware for it.

Opening a brand new session isn't free either. It has to go searching through your files again to find where things live, which burns even more tokens. So sometimes compacting and continuing beats starting over.

Set up Graphify

Every time you ask Claude something about your code, it searches through your files looking for what's relevant, and all those files end up in context. Start a new session and ask something similar, and it reads the same files all over again, sometimes the wrong ones. You pay for every one of them.

Graphify fixes this by building a graph of your codebase that Claude queries instead of rereading files. It looks up the node it needs and the connections around it rather than loading whole files that might not even be related. I did a full walkthrough of the setup here: How I Set Up Graphify In Claude Code.

Use agent-browser for browser work

If your task touches the browser, the Chrome plugin most people reach for wasn't built for agents. Claude ends up taking a screenshot, analyzing the image, and burning a lot of tokens to do it. Playwright MCP is already better than the default plugin. But for agent work the best option right now is agent-browser from Vercel Labs, which is built for this and uses far fewer tokens than the others. I compared all three side by side here: 3 Ways To Give Claude Code A Browser.

Bonus: build a statusline

This one saves time more than tokens, but it's worth knowing. Type /statusline and tell it, in plain English, what you want shown at the bottom: the current model, the effort, a progress bar for the context, your five-hour limit and when it resets. It builds that line for you.

It's useful because without it, checking your context means running /context, waiting, scrolling up, and reading the number. Checking your usage means /usage. Checking the reset means opening another tab. With the statusline it's all sitting there, so you can see at a glance whether your context is still small enough to keep going or whether it's time to compact.

Links

If your team is running up a Claude Code bill and you want it under control without slowing anyone down, get in touch. See our AI consulting service for how we work.