I Tested Headroom: Where the 95% Token Savings Is Real (And Where It Isn't)

I fed an AI a 400,000-token file and it only saw 69,000. Same information, 83% smaller, in a fraction of a second. The tool that did it is called Headroom, and in five months it went from nothing to more than 50,000 stars on GitHub. The promise on the repo is right there: 60 to 95% fewer tokens, same answers. So I tested it, and I'll be straight with you up front. One part genuinely blew me away. The part most people assume, that you bolt it onto Claude Code and cut your bill in half, is not how it works. Here's both.

What Headroom actually is

First, think about what fills up an AI's memory. It isn't your questions, those are tiny. It's the stuff the model pulls in to do the job: giant API responses, logs, search results, files, the output of every tool it runs. Walls of text and JSON. That's what eats the context window, slows everything down, and runs up the bill.

Headroom sits in the middle. Right between your AI and the model, it catches all that raw data and compresses it before the model ever sees it. The model still gets what it needs to answer, it just gets it small. It's open source, Apache licensed, and that's the whole idea in one sentence.

How it decides what to compress

The clever part is that it doesn't blindly zip everything. It looks at what kind of data it's dealing with.

If it's structured data, a big JSON list, say 500 rows from an API, it uses something they call Smart Crusher. Instead of keeping all 500 near-identical rows, it keeps the shape, the patterns, and the important bit: it specifically keeps the errors, the outliers, the weird ones. That's where the 80 to 90% comes from.

If it's plain, messy text, it uses a small AI model to rewrite it shorter, same meaning, fewer words. That one is gentler, more like 30 to 50%, and it's lossy. It changes the actual wording. Hold onto that difference, structured data versus plain text, because it's the whole reason for the twist later.

The library demo: 83% off a real 400,000-token file

This is the one that sold me. I used the plain Python library, just a function called compress. And I didn't use a toy example. I grabbed a real IoT telemetry file, the kind of giant JSON dump a real system spits out: sensor readings, timestamps, thousands of rows. 400,000 tokens in one file.

I wrapped it in a message, the way the AI would see it, and called compress:

File	Tokens before	Tokens after	Saved
Real IoT telemetry JSON	401,125	69,264	83%

In a fraction of a second, and no AI model needed for this one, pure pattern-crushing. And it didn't just chop the end off. It kept the structure, kept the outliers, kept the errors, so an AI reading the compressed version still knows what's in there. On big, repetitive data, this absolutely works.

Two agents, one handoff: 86% smaller

Here's where it gets genuinely cool. Say you've got two AI agents working together, one does the research and hands its findings to a second agent to act on. Normally agent one dumps everything into agent two's context, and you've burned half the window before agent two even starts.

Headroom has a thing called Shared Context built for exactly this. Agent one drops in 45,000 tokens of research, agent two picks it up at 6,000. That's 86% smaller, no API key, instant. And if agent two ever needs the full thing, it just asks for the original on demand. Nothing is actually lost. The handoff is tiny by default, the full data is one call away. For anyone building multi-agent stuff, that's a clean idea.

Running it with Claude Code: one command

So how do you put this in front of Claude Code? One command:

headroom wrap claude

That's the whole thing. It quietly starts the compressor in the background and launches Claude Code routed straight through it. You use Claude exactly like normal, you wouldn't even know it's there. There's a live dashboard at localhost:8787/dashboard, and as you work, pull data, read a big file, you watch the "saved" counter climb.

But here's where I have to stop you. If you're now thinking "great, I'll get 86% off my Claude bill," that's exactly the part nobody tells you.

The honest part: 83% in the library, ~14% in Claude Code

Remember the difference between structured data and plain text? Here's why it matters.

That 83% I showed you was Smart Crusher, on a big structured JSON file. That's the library: your own scripts, your own agents, where you feed it big walls of structured data. But inside a normal Claude Code session, most of what flows through isn't neat JSON arrays, it's mixed text. So wrap leans on the other engine, the text rewriter, and that one is way gentler. On a real, data-heavy session my total savings came out around 14%, not 83.

Where you run it	What it uses	Real savings
The library, on structured JSON	Smart Crusher	~83%
`wrap claude`, live coding session	text rewriter (lossy)	~14%

And the Claude Code path is lossy. It's literally rewording your context to make it shorter, which on code or exact instructions you might not always want. So let me be clear, because the GitHub page won't be: the jaw-dropping numbers are for one specific job, compressing big structured data in your own pipelines. Bolting it onto Claude Code as a magic discount helps a little, a few percent, and it changes your text to do it. Two different tools wearing the same name.

So should you use it?

If you're building your own stuff, agents, automations, scripts that pull in giant API responses or logs and shove them at an AI, then yes, absolutely. This is genuinely brilliant for that. I've got a daily script that scores dozens of articles and feeds them all to Claude at once, and that's exactly the kind of thing where 83% off is real money and real speed.

But if you just want a magic button that makes Claude Code cheaper while you chat, temper your expectations. It's a few percent, it's lossy, and for most people it's not the thing they pictured. If cutting your Claude Code usage is the actual goal, the habits in How I Cut Claude Tokens By 80% move the needle a lot more. Know which one you need, that's the whole game.

The real lesson

Honestly, that's the lesson here, and it's bigger than this one tool. A repo can have 50,000 stars and a screenshot of a huge number on it, and the only way to know what it does for you is to run it on your own work. That's it. You don't need permission, you don't need to be an engineer, you just try it and watch the real number, not the marketing one.

Links

If you want help figuring out which AI tools are actually worth adding to your workflow and which are just hype, get in touch. See our AI consulting service for how we work.